An intentional extensible language?

Mar 18 2009March 18, 2009

Ever since I first saw Lisp I have had a sort of hate love for it.
The syntax is horrible to look at, but at the same time it is pure genius in the sense that you can add new language constructs since everything are functions.
I have never seen any other language that can do this.

For those of you who don’t know what I’m talking about;
Pseudo lisp:

 (foreach
      (item list
            (print item)))

“foreach” is not a keyword here, it is a function that takes a variable name as it’s first argument, a list as it’s 2nd argument and a list of statements as it’s 3rd argument.
So even if it looks ugly as sin, _everything_ will look exactly the same, you can add a new concepts that looks as if they were part of the language due to the simplicity of the Lisp grammar.

There have also been countless of attempts to try to de-parenthesis Lisp, so that you get something that is actually readable.
But as far as I know, none of those attempts have been successful.

I always like a good challenge, so the last few days I have been sketching on a grammar that would actually allow you to extend a language and still be able to read it.

My attempt is based on what I call chain-expressions.

A chain expression consists of;
* one or more symbols
* an optional parenthesis group that contains other chain expressions
* an optional bracket group that contains other chain expressions

This allows you to write expressions like this:

//single symbol expression
break; 

//symbol + parenthesis group 
print ("hello");  

//symbol + parenthesis group + bracket group
if(foo)
{
 ...
};

That looks fairly decent, doesn’t it?
But what about that semi colon at the end of the “if” block?

The semi colon represents the end of a chain of chain expressions.
My grammar allows me to link multiple chain expressions together, like this;

if(foo)
{
 print ("foo is true");
}
else if (bar)
{
 ...
}
else
{
 ...
};

What you see above is a chain of three chain expressions.
If , else if and else, and the chain is terminated with the final semi colon.

The “if” expression would be implemented as a function in my language where the function takes a boolean expression as it’s first argument, and a list of statements (the bracket group) as it’s 2nd argument.
and a reference to the “next” expression, that is, the “else if” expression.

If we try to express the Lisp example at the top of the post using this grammar it would look something like this:

foreach item in list
{
       print (item);
};

So it would in practise work pretty much like Lisp, where functions or code is passed around to other functions, but with a far more readable syntax, at least in my own opinion ;-)

This would allow you to create true DSL’s without knowing how to write a grammar, which you have to know if you use parser frameworks like M Grammar or Gold Parser.
It would also enable you to step into the world of intentional programming using text based code instead of using a structured editor like most other intentional programming tools forces you to do.

Currently I’ve only got a working grammar for this, so the next step will be to create a compiler or engine so that I can run real programs in it.

//Roger

Published by Roger Johansson

View all posts by Roger Johansson

14 Comments

pascal says:

March 19, 2009 at 4:06 am

something like that has been on my mind for quite some time now: a very simple parser, that’s basically just a tokenizing and grouping blocks. starting with a very simple language (postscript like) one defines functions that are called when certain blocks/tokens/patterns come up.
there would be no special grammars, everyting would be in the library, as in lisp, for example escape sequences in strings would work like: “in a string block, bind these functions: append the string value of all tokens to the value of the string block, except when the token is backslash, then call the resolver function with the following tokens” etc.

of course this would be way too slow at runtime, since everything is basically being parsed anew every time — but my dream-compiler would execute *everyting* that has no side-effects at compile-time. (that could win every benchmark, since that program which calculates the 1000ths digit of PI would just end up as a single printf()-call)

Reply
yallie says:

March 19, 2009 at 9:49 am

This is a very nice idea :)

I think this approach gives powerful syntax extensibility facilities to the language, yet allows to keep things under control.

There were a few attempts to create languages with extensible syntax based on BNF grammar back in 60-80s: Imp, Lithe, Smalltalk-72, to name a few. All these languages have a common problem: BNF notation gives too much freedom, leading to poor readability of the extended language dialects. As far as I know, syntax-level extension facilities have been removed from Smalltalk-76 standard for that reason.

Your approach is free from this drawback, and that’s very cool. At least, any language extensions allowed by your grammar should look pretty consistent.

Reply
Torkel says:

March 19, 2009 at 10:12 am

Very interesting.

Reply
guenthernoack says:

March 19, 2009 at 8:07 pm

Are literals symbols, too? What about +, -, *, /, :=? Are they symbols or do you rather express them as add(a; b) or setq(a; b)?

Have you had a look at the FONC project?

Reply
Mark says:

March 19, 2009 at 8:52 pm

I think what you are proposing is very similar to Ioke, found at – http://ioke.org/

Reply
Erik says:

March 19, 2009 at 11:36 pm

R, the open source version of the statistics program S-plus, works precisely like you’re suggesting. It’s also far more open-ended than is suggested by being a statistical language.

http://www.r-project.org/

Reply
addmoreice says:

March 20, 2009 at 5:31 pm

any chance we can see the working grammar?

Reply
foo says:

March 21, 2009 at 9:23 am

The problem is the Lisp form is not what you have described in your text. It would be written (foreach item list (print item)). In Lisp this can’t also be a function, since the evaluation rules don’t allow to write such functions. The evaluation rule for functions say that arguments are evaluated and then passed to the function. ITEM would be evaluated. A FOREACH like this needs to be a built-in special operator with special evaluation rules or needs to be a macro that transforms FOREACH expressions into more primitive expressions. The way in Lisp expressions are formed is either based on the plain function syntax (fname arg*) or based on macros. The FOREACH construct you describe can’t be a function, because it follows neither the syntactic requirements for functions nor it supports the evaluation rule for functions. There is already a macro similar to this in the base language: (dolist (item list) (print item)). The function version looks like this: (mapcar (quote print) list) .

Not every s-expression is a valid Lisp program. Lisp has syntax based on s-expressions. There are also lots of conventions how the syntax for Lisp constructs typically looks like. Applying these conventions helps the Lisp programmer to understand what the syntactic rules for an expression are. For example the DOLIST macro has the syntax (dolist (item list [result]) body*) where the result is optional and body can be zero or more forms. The list around item, list and result allows that the result can be optional. If it would not be grouped, there would be no way to identify the optional result clause.

What looks as a simple s-expression syntax is in reality missing that Lisp syntax is two stage: the simple data syntax for s-expressions and then on top of s-expressions there is syntax for valid Lisp programs.

Reply
Roger Alsing says:

March 21, 2009 at 11:08 am

There is a reason why i wrote “pseudo” lisp.
And the intent of the post is not aimed at wheter Lisp can do foreach or not.
It’s aimed at the fact that Lisp has a genius way to incorporate new elements (be it functions or macros) where built in constructs looks the same as new constructs.

There are aslo multiple dialects based on the same concept, Common Lisp, Clojure, Iron Lisp etc etc.

So depending on implementation, different constructs are allowed.

eg. I have my own open source Lisp engine that _is_ able to process things like the foreach loop above w/o making a macro out of it.

The actual implementation of the different Lisp versions is pretty irrelevant here though.

The beauty of Lisp is the 1-1 mapping from the code to the AST, which at the same time is what makes it look so bad.
(That may ofcourse be a matter of opinion, but there are a few more than I who thinks so)

//Roger

Reply
Menno says:

March 23, 2009 at 3:56 pm

This made me think of the blocks you can pass to functions in ruby. It isn’t the same as you propose, but it accomplishes some of the same tasks. Basically it allows you to pass a code block to a function. (Yield is the keyword in the called function to execute the block, passing parameters is possible):

def do_twice
yield
yield
end

do_twice {puts “Hola”}

This allows (custom) implementation of functions normally seen in functional programming languages, such as fold and map.

Example taken from: http://eli.thegreenplace.net/2006/04/18/understanding-ruby-blocks-procs-and-methods/

Wikipedia has some nice examples on it:
http://en.wikipedia.org/wiki/Ruby_(programming_language)#Blocks_and_iterators

One of the downsides is that you can only pass 1 block to a function, so implementing if wouldn’t be possible.

Reply
Tarski says:

March 25, 2009 at 3:47 pm

What tool are you using to construct the grammar?

Reply
Roger Alsing says:

March 25, 2009 at 4:34 pm

I’m using Gold Parser ATM.
I might move it to M Grammar if I can find the motivation to actually implement a compiler for it.

Reply
Pingback: Generic DSL Grammar and Parser « Roger Alsing Weblog
Pingback: Playing with Plastic « Roger Alsing Weblog