Generic DSL Grammar and Parser

Mar 23 2010March 24, 2010

About a year ago I blogged about an idea of an extensible language; http://rogeralsing.com/2009/03/18/an-intentional-extensible-language/
Since then, I have been experimenting with this concept quite a bit.

I have now created a more complete grammar which contains hard-coded support for some constructs, e.g. assignments, lambdas, binary and unary operations.
What I ended up with is not a language, but rather a parser for a structured document.
You can think about it like XML, XML gives you structured documents which you can traverse and transform via code.

This is pretty much exactly the same, but the syntax looks like a C derivate rather than a mark up language.

So what’s the point of it?

The idea is to make it possible for developers to create their own textual DSL’s without having to care about any of the gory details of parsing and grammar construction.

A lot of people are currently using XML to define different types of rules or entire scripts, e.g. build scripts or custom business rules.
This works great since it is easy to deal with XML documents via code.

The downside of XML based DSLs is that they are extremely hard to read and edit!
(I know that a lot of people disagree with me here)

Let’s say for the sake of the argument that we want to define some custom business pricing rules.
Using XML that could look something like:

<rules>
   <rule product="50050" >
      <condition>
          <greaterthan>
              <leftoperand>
                    <property value="quantity" />
              </leftoperand>
              <rightoperand>
  <integerliteral value="200" />
              </rightoperand>
          </greaterthan>
      </condition>
      <price>
            <add>
  ... etc etc ...
            </add>
      </price>
   </rule>
</rules>

Such XML rule definition can easily bloat from something fairly simple to something completely horrible when more requirements are thrown at it.

Using the generic DSL grammar that I have created, the above rules would look something like this:

product 50050
{
    when (quantity > 200) then (20 - quantity / bonus);

    when ... //more pricing rules for same product
};

Or you could use it to define Google protobuffer messages:

message OrderPlaced
{
  1 Guid MessageId;
  2 Guid CustomerId;
  3 int OrderId;
  4 List(OrderDetail) Details;
};

Or what about defining some CQRS entity definitions?

public entity Customer
{
    private string Name;

   command Rename(string newName)
    {
       require (newName != null);
       Renamed (newName);
    };

    event Renamed(string newName)
    {
        this.Name = newName;
    };
};

How it works, in short:
The grammar is based on a few simple constructs;
Every line is an “statement” and statements are an expression terminated with “;”.

An expression can be things like an assignment,a lambda, a binary operation or a “Chain”.
A “Chain” consists of zero or more constructs that occurs in a whitespace separated sequence.
Members of a chain can be primitive constructs such as primitives, identifiers, bodies {a;b;c;} and tuples (a,b,c).
Since there is only a handful of constructs, it is very easy to traverse and analyze the parsed documents.

If we take the above “customer entity” definition, we would get a structure like:
A statement (chain) containing: public, entity, customer, body.
Where the first 3 elements are identifiers and the last item is a body.
A body is a structure containing zero or more statements.
So the body of the customer entity would consist of 3 statements, the member variable “name”, the command “Rename” and the event “Renamed”.
Each of those statements are chains containing their own sub items.

I am currently cleaning up the parser and I will also add some convenience methods to make it easier to match structures in the resulting document.
So code and more examples will hopefully be up in a few days.

//Roger

Published by Roger Johansson

View all posts by Roger Johansson

11 Comments

Torkel says:

March 23, 2010 at 10:04 am

interesting, show me more :)

Reply
rogpeppe says:

March 23, 2010 at 2:46 pm

So the body of the customer entity would consist of 3 statements, the member variable “name” […]

does that mean that “private”, “command”, etc are keywords? or is it just that in this particular DSL, the identifier “private” is treated as a keyword?

Reply
Roger Alsing says:

March 23, 2010 at 3:29 pm

@rogpeppe

There are no keywords in the grammar, they would be treated like “identifier” nodes by the document.
They would be keywords in this specific DSL, that is, the developer would have to apply conditions and evaluate if the contents of a “chain” makes sense.

Much like specific tags and attributes are present in a specific XML schema while XML itself does not treat them as keywords.
My intention is to make it possible to define patterns that can be matched in a “chain”, but I’m not sure how I want the API for that to look yet.

Reply
1. rogpeppe says:
  
  March 23, 2010 at 4:15 pm
  
  ah, that’s what i’d hoped.
  
  what about operator precedence & associativity? if you’d said, for example this.Name = x.Name, would that be parsed as ((this “.” name) “=” (x “.” Name))
  or (((this “.” name) “=” x) “.” Name) ?
  
  including binary operators seems potentially problematic.
  
  you say there’s hard coded support for assignments and lambdas. if there are assignments, presumably there are identifier declarations and scope rules; and if there are lambdas, then presumably you can invoke the lambdas too. all of that seems to me to be moving towards a fully fledged language rather than an XML equivalent…
  
  Reply
Roger Alsing says:

March 23, 2010 at 6:08 pm

@rogpeppe

Precedence is based on the standard operators.

Your example will result in:
((this “.” name) “=” (x “.” Name))

So there is special support for binary and unary operations, they have their own node type in the AST.

>>. if there are assignments, presumably there are identifier declarations and scope rules

No, no such thing exist in the grammar or parser, I only mean that there is support for infix operations and precedence in the syntax.

And the hard-coded support for lambdas does not imply any infrastructure for execution, simply that they have their own special syntax in the grammar:

(a,b,c) -> expression
or:
(a,b,c) -> { bar(); }
or arg-less:
-> expression

Although, I might remove that support since the same can be expressed as:

(a,b,c) expression

Reply
1. rogpeppe says:
  
  March 23, 2010 at 6:39 pm
  
  Precedence is based on the standard operators.
  
  the standard C operators?
  
  Although, I might remove that support since the same can be expressed as:
  
  (a,b,c) expression
  
  it looks to me like -> is just another binary operator. but what about
  (a) -> (b) -> expr ? would it associate right-to-left as in haskell?
  
  Reply
FallenGameR says:

March 23, 2010 at 7:02 pm

Have you read http://www.manning.com/rahien/ ?
There is а language that allows to work with it’s AST in run time. No need to write a parser – it’s already exists.

Reply
Roger Alsing says:

March 23, 2010 at 8:09 pm

@rogpeppe

>>The standard C operators?

Yes

>>but what about
(a) -> (b) -> expr ? would it associate right-to-left as in haskell?

Currently, the -> is not a binop, the grammar requires a (..) then -> and an expression.

Reply
Roger Alsing says:

March 23, 2010 at 8:16 pm

@FallenGameR

I’m well aware of Boo and have used it many times.
The thing is that in order to bend Boo to your will you have to have alot of knowledge of the inner workings of Boo and the compiler pipeline.

Boo brings a lot of nifty stuff to the table, but it also comes with alot of baggage.
You have to cut off the parts that you don’t need.

Reply
Pingback: Generic DSL using F# « Roger Alsing Weblog
Pingback: Playing with Plastic « Roger Alsing Weblog

Share this:

Related

Published by Roger Johansson

11 Comments

Leave a Comment Cancel reply