Writing grammar.lalrpop

Now we've got some types to work with we can write a grammar which lalrpop will use when generating the parser.

The top of the grammar.lalrpop will be inserted into the generated file as-is, making it the perfect place to insert the import statements we'll need.

At the moment we only need to put a single line here.


# #![allow(unused_variables)]
#fn main() {
use syntax::ast::{Expr, Atom, BinaryOp, FunctionCall};
#}

Next we tell lalrpop that the grammar section has started


# #![allow(unused_variables)]
#fn main() {
grammar;
#}

Our grammar is composed of expressions which are built up from a bunch of factors and terms. This lets us quite naturally break the grammar up into three rules.


# #![allow(unused_variables)]

#fn main() {
pub Expr: Expr = {
    <l:Expr> "+" <r:Factor> => BinaryOp::add(l, r).into(),
    <l:Expr> "-" <r:Factor> => BinaryOp::sub(l, r).into(),
    Factor,
};

Factor: Expr = {
    <l:Factor> "*" <r:Term> => BinaryOp::mult(l, r).into(),
    <l:Factor> "/" <r:Term> => BinaryOp::div(l, r).into(),
    Term,
};

Term: Expr = {
    "(" <e:Expr> ")" => e,
    Atom => Expr::Atom(<>),
    <f:FunctionCall> => f.into(),
};
#}

There's also the rule for function calls:


# #![allow(unused_variables)]
#fn main() {
pub FunctionCall: FunctionCall = {
    <i:ident> "(" <a:CommaSeparated<Expr>> ")" => FunctionCall::new(i, a),
};

CommaSeparated<T>: Vec<T> = { 
    <v:(<T> ",")*> <e:T?> => match e {
        None => v,
        Some(e) => {
            let mut v = v;
            v.push(e);
            v
        }
    }
};
#}

And finally, we define the rules for parsing an Atom.


# #![allow(unused_variables)]
#fn main() {
pub Atom: Atom = {
    num => Atom::Number(<>),
  ident => Atom::Ident(<>),
};

num: f64 = {
  <s:r"[0-9]+(\.[0-9]+)?"> => s.parse().unwrap(),
};

ident: String = {
  <i:r"[a-zA-Z][a-zA-Z0-9_-]*"> => i.to_string(),
};
#}

As a sanity check, we should add some tests to make sure the language's grammar parses correctly.

First up, we'll test for parsing atoms.


# #![allow(unused_variables)]
#fn main() {
#[test]
fn parse_a_number_atom() {
    let src = "3.14";
    let should_be = Atom::Number(3.14);

    let got = grammar::parse_Atom(src).unwrap();
    assert_eq!(got, should_be);
}

#[test]
fn parse_an_identifier() {
    let src = "x";
    let should_be = Atom::Ident(String::from(src));

    let got = grammar::parse_Atom(src).unwrap();
    assert_eq!(got, should_be);
}
#}

Then a binary op,


# #![allow(unused_variables)]
#fn main() {
#[test]
fn parse_a_multiply() {
    let src = "a * 5";
    let should_be = BinaryOp::mult(Atom::Ident(String::from("a")).into(), Atom::Number(5.0).into());
    let should_be = Expr::from(should_be);

    let got = grammar::parse_Expr(src).unwrap();
    assert_eq!(got, should_be);
}
#}

And we should also add a test for function calls.


# #![allow(unused_variables)]
#fn main() {
#[test]
fn parse_a_function_call() {
    let src = "sin(90.0)";
    let should_be = FunctionCall::new("sin", vec![Expr::Atom(Atom::Number(90.0))]);

    let got = grammar::parse_FunctionCall(src).unwrap();
    assert_eq!(got, should_be);
}
#}

So far our tests have checked individual grammar rules in isolation. To ensure operator precedence is encoded correctly in the language's grammar we'll need to create a non-trivial example and make sure it gives us exactly the parse tree we'd expect.


# #![allow(unused_variables)]
#fn main() {
#[test]
fn complex_parse_tree() {
    let src = "5 + (3-2) * x - sin(90.0)";
    let should_be = BinaryOp::sub(
        BinaryOp::add(Atom::from(5).into(),
            BinaryOp::mult(
                BinaryOp::sub(Atom::from(3).into(), Atom::from(2).into()).into(),
                Atom::from("x").into(),
            ).into()).into(),
            FunctionCall::new("sin", vec![Atom::Number(90.0).into()]).into()
    );
    let should_be = Expr::from(should_be);

    let got = grammar::parse_Expr(src).unwrap();

    assert_eq!(got, should_be);
}
#}