IN THIS ARTICLE
Welcome back! In our first Ohm post, we introduced Ohm and built our very own parser to handle numbers in various formats. In our second post, we turned our parser into a real arithmetic language with binary operations and symbols. Today we will give our language code blocks and conditionals.
Code Cleanup
The first thing our language needs is a name. I’m a fan of cats, but Cat and Kitten are already used by existing languages so I’m going to call our new language Meow. However, before we dive into new features features, let’s do a little refactoring.
Move all of the AST classes into a file called ast.js
. NodeJS doesn’t support the new module loading syntax of ES6 yet, so we’ll have to manually export the classes like this:
"use strict"; class MNumber { constructor(val) { this.val = val; } resolve(scope) { return this; } jsEquals(jsval) { return this.val == jsval; } } ... rest of the classes module.exports = { MNumber:MNumber, MSymbol:MSymbol, BinOp:BinOp, Assignment:Assignment, Scope:Scope };
Now move the semantic actions into a semantics.js
file.
"use strict"; var AST = require('./ast'); module.exports.make = function(semantics) { var Calculator = semantics.addOperation('calc', { ... the rest of the Calculator semantics rules } var ASTBuilder = semantics.addOperation('toAST', { ... the rest of the ASTBuilder semantics rules }); return { Calculator:Calculator, ASTBuilder:ASTBuilder } };
Finally, the test harness in test.js
can become a lot simpler:
"use strict"; var ohm = require('ohm-js'); var fs = require('fs'); var assert = require('assert'); var Scope = require('./ast').Scope; var grammar = ohm.grammar(fs.readFileSync('blog3/grammar.ohm').toString()); var semantics = grammar.createSemantics(); var ASTBuilder = require('./semantics').make(semantics).ASTBuilder; var GLOBAL = new Scope(null); function test(input, answer) { var match = grammar.match(input); if(match.failed()) return console.log("input failed to match " + input + match.message); var ast = ASTBuilder(match).toAST(); var result = ast.resolve(GLOBAL); console.log('result = ', result); assert.deepEqual(result.jsEquals(answer),true); console.log('success = ', result, answer); } test('4+5*10',4+5*10); test('10',10); test('x = 10',10); test('x',10); test('x * 2',20);
To further clean up the code, we will go back to Smalltalk precedence. I’ve always found it a pain to remember operator precedence. From now on all expressions will be evaluated left to right. If you want higher precedence you have to use parenthesis. We do this by putting the recursive expression on the left and the terminal expression on the right, such as: Add = Expr "+" Term
. This will make our implementation far simpler, and also be easier for programmers to reason about. All arithmetic is now grouped under MathOp
.
Now the grammar looks like this:
CoolNums { // just a basic integer Expr = Assign | MathOp | Term MathOp = Mul | Div | Add | Sub Add = Expr "+" Term Sub = Expr "-" Term Mul = Expr "*" Term Div = Expr "/" Term Group = "(" Expr ")" Term = Group | identifier | Number Assign = identifier "=" Expr identifier = letter (letter|digit)* Number = oct | hex | float | int int = digit+ float = digit+ "." digit+ exp? exp = "e" "-"? digit+ hex = "0x" hexDigit+ oct = "0o" octDigit+ octDigit = "0".."7" //hexDigit := "0".."9" | "a".."f" | "A".."F" //already defined by Ohm }
One more slight detail, I made the identifier
rule be lower case. This is a very subtle change but it’s worth mentioning. Rules beginning with an upper case letter are syntactic, meaning there is an implicit space*
match. For identifiers we don’t want that, or else you could have an identifier with spaces in the middle of it, like a b c
. Rules beginning with a lower case letter are lexical, meaning it will match what you specify in the rule and nothing else. There is no implicit whitespace stripping.
Boolean Expressions
For conditionals we need two things: boolean operations that evaluate to true or false, and an if statement which will execute code when the condition is met. Boolean conditions are easy because they are binary operations (BinOps) just like the addition and other math functions. To support the equality operator add this to the grammar:
MathOp = Mul | Div | Add | Sub | Eq Eq = Expr "==" Term
Update the toAST
semantic operation in semantics.js
with:
Eq: (a, _, b) => new AST.BinOp('eq', a.toAST(), b.toAST()),
And update BinOp
to support the eq
operator by adding a new line at the bottom of resolve
.
class BinOp { constructor(op, A, B) { this.op = op; this.A = A; this.B = B; } resolve(scope) { var a = this.A.resolve(scope).val; var b = this.B.resolve(scope).val; if(this.op == 'add') return new MNumber(a+b); if(this.op == 'sub') return new MNumber(a-b); if(this.op == 'mul') return new MNumber(a*b); if(this.op == 'div') return new MNumber(a/b); if(this.op == 'eq') return new MNumber(a==b); } }
Now this test case will evaluate correctly: test('4==4',true);
.
In fact, every boolean operation can be implemented this way. Here’s the final grammar with support for all of the common boolean operations:
MathOp = Mul | Div | Add | Sub | Eq | Neq | Lt | Lte | Gt | Gte Add = Expr "+" Term Sub = Expr "-" Term Mul = Expr "*" Term Div = Expr "/" Term Eq = Expr "==" Term Neq = Expr "!=" Term Lt = Expr "<" Term Lte = Expr "<=" Term Gt = Expr ">" Term Gte = Expr ">=" Term
And extra semantic rules:
Eq: (a, _, b) => new AST.BinOp('eq', a.toAST(), b.toAST()), Neq: (a, _, b) => new AST.BinOp('neq', a.toAST(), b.toAST()), Gt: (a, _, b) => new AST.BinOp('gt', a.toAST(), b.toAST()), Lt: (a, _, b) => new AST.BinOp('lt', a.toAST(), b.toAST()), Gte: (a, _, b) => new AST.BinOp('gte', a.toAST(), b.toAST()), Lte: (a, _, b) => new AST.BinOp('lte', a.toAST(), b.toAST()),
Extra lines in the BinOp
class:
if(this.op == 'eq') return new MNumber(a==b); if(this.op == 'neq') return new MNumber(a!=b); if(this.op == 'gt') return new MNumber(a>b); if(this.op == 'lt') return new MNumber(a<b); if(this.op == 'gte') return new MNumber(a>=b); if(this.op == 'lte') return new MNumber(a<=b);
And of course we need some more tests in test.js
for the new features:
test('4==4',true); test('4!=5',true); test('4<5',true); test('4>5',false); test('4<=5',true); test('4>=5',false);
Thanks to the BinOp work we did last time adding boolean conditionals is trivial.
Code Blocks
Now that we have an expression which can resolve to true or false, we need blocks of code to evaluate.
So far our little language can do math and set variables, but it can’t really do more than one thing at time. It only supports one expression per-parse. Later on we’ll want to have functions which do multiple calculations and return the final result. To do this we need a way to combine expressions. Let’s define what we want:
- An Expression is something which evaluates to a single value.
4
is an expression.2*(4*5+myVar)
is also an expression. And once we support function calls5 * doStuff(3,4))
will also be an expression. - A Statement is an expression which doesn’t return a value. For simplicity sake we’ll say that all statements return
null
. - A Block is a sequence of expressions or statements which are evaluated in order, and then returns the value of the final expression.
Now create a Block
class in ast.js
. When resolve
is called it resolves all of the statements and returns the value of the last one.
class Block { constructor(block) { this.statements = block; } resolve(scope) { var vals = this.statements.map((expr) => expr.resolve(scope)); //only return the last one return vals.pop(); } }
Now we can parse blocks by adding this to the grammar grammar.ohm
:
Block = "{" Expr* "}"
Then change the definition of Expr
to include Block:
Expr = Block | Assign | MathOp | Term
Then add this to the semantics in semantics.js:
Block: (_, body, _1) => new AST.Block(body.toAST()),
And of course a test case in test.js
.
test('{4+5}',9);
Now we can parse code like this, which will return the value of the last line (23).
{ x=4*5 y=x+6 y-3 }
If Statement
With blocks and conditions, we can finally build an if
statement. For Meow we’ll use an if with three blocks. The first block is the expression that must evaluate to true or false. The second block will be evaluated if the first block is true. The else clause is optional, and will be evaluated if the first block is false. With everything else in place implementing if
will be easy.
The IfCondition
class in ast.js
:
class IfCondition { constructor(cond, thenBody, elseBody) { this.cond = cond; this.thenBody = thenBody; this.elseBody = elseBody; } resolve(scope) { var val = this.cond.resolve(scope); if (val.val == true) { return this.thenBody.resolve(scope); } if(this.elseBody) return this.elseBody.resolve(scope); return new MBoolean(false); } }
A new rule in the grammar:
Expr = IfExpr | Block | Assign | MathOp | Term IfExpr = "if" Block Block ("else" Block)?
A new action in the semantics:
IfExpr: (_, cond, thenBlock, __, elseBlock) => { var thenBody = thenBlock.toAST(); var elseBody = elseBlock ? elseBlock.toAST()[0] : null; return new AST.IfCondition(cond.toAST(), thenBody, elseBody); },
And finally the new test cases:
//if then else test('if{4==2+2}{1}',1); test('if{4==2+2}{1}else{2}',1); test('if{4==2+3}{1}else{2}',2);
Next Time
Now all of our tests can pass and we have a real language with conditionals! Next time, we will add a while loop and functions, which are the last components required to make Meow a fully usable programming language.
I know this feels like a lot to take in. We added boolean expressions, blocks, and the if statement, but hopefully it feels straight forward. Each time we added a new language feature, we implemented them in the same way: define what we want, add a new grammar rule, create a new AST class, add an action to the semantics, and add a test case. Languages can be complex, so this approach of carefully adding code and tests keeps the project organized and understandable.