Lexing and parsing - Possible optimizations

3.7 Possible optimizations

4.1.2 Lexing and parsing

When we had to make the lexer and parser for the project, we considered hand-writing them ourselves, using a tool to create them or to take a JavaScript lexer and parser from another project. The advantage of handwriting the lexer and parser would be that we would know the code better and we could implement the features we think are important, whether it be performance, good error messages or solving the problem of automatic semicolon insertion in an elegant way.

The advantage of using a tool to generate them is that we estimate that it takes much less time to make, because they are written in a syntax that is very close to the Backus-Naur Form grammar notation, that is also used in the ECMAScript Language Specification. There are many well tested tools for doing this, and many of them have build in error reporting.

It could also be a good idea to use the parser and lexer from another project, such as RingoJs which is also written in Java. The advantage is that if the code is easy to understand and use for another project it would require the minimal amount of work.

Since a compiler needs to implement the language specification correctly to be useful, we estimated that the best approach would be to make the lexer and parser with a tool directly from the Backus-Naur Form of the standard, so this is the approach we went with. We also thought that this was better than using the parser of RingoJs, because it would give us a good overview of the constructs of the language and not force us to use a pre-defined AST.

Because we had experience with the ANTLR lexer and parser generator from a course in computer science modelling, we chose to use that tool.

4.1.2.1 Building the AST

When we were done writing the grammar, and had made the lexer and parser with ANTLR, we had to write the Java code that would build the abstract syntax tree. The authors of ANTLR suggest the approach of letting ANTLR build a parsetree, and then make the abstract syntax tree builder as a Java class that extends the parse tree visitor, that the tool provides [Par12, 2.2].

The visitor pattern The visitor pattern is a design pattern that is used for adding behavior to tree traversals [GHJV95][pp. 330-344]. The pattern consist of an abstract class, that contains a method for each of the classes of nodes that the tree can have. So, if the tree can contain nodes of the type X, the visitor will have a method called visitX that takes an object of the type X as input.

Each node type will have a method that takes the visitor as an argument and calls the correct method on the visitor.

Tree traversal can then be implemented by simply creating a concrete imple-mentation of the visitor class and running the visitor on the root node. The idea is then that the function visitX does the computations for the nodes of the X class. If the computation depends on the subtrees that an X object can have, visitNode can be called recursively on these subtrees. visitNode will then call the correct visit function for the subtrees according to their type.

The abstract visitor class contains some default implementations for the meth-ods. The default implementation for the visitX functions is that they should visit the child nodes recursively, and then aggregate the result, that is calculate

a result from these values. The aggregation is made by the method aggregateRe-sult which takes two reaggregateRe-sults and give an aggregated reaggregateRe-sult.

The defaultValue method gives the element that the aggregation should start from.

The default implementation of aggregateResult is to just give the right argument of aggregateResult, and the default implementation of defaultValue is to return null.

AST A parse-tree shows the syntactical structure of a program. The goal is to build the abstract syntax tree, which is very similar to the parse tree, but does not include syntactic details that are not important for the meaning of the program.

The parse tree of a simple JavaScript expression "true || false" is:

The reason that the parse tree is so deep is that the grammar takes into account the precedence of the different operators. The whole expression is first tried to be parsed as an assignment, because assignment should be done after all other operations, then as a conditional expression and lastly as a logical or expression, which is what it is. The same process is repeated for each of the literals. While

it is practical that the parsing takes care of the operator precedence, the parse tree is obviously much too deep, and most of the nodes in the tree provide no semantical information. To make the tree more shallow, we have made the AST builder such that it, when it visits an Expression nodes, for instance a logicalANDExpression, checks if the node has one child or two children, and if it only has one child, the builder will simply return the result of visiting that child note. Therefore the AST for the above example is much shallower:

The illustration also shows that in the AST, we have a single class, Operator-Expression, that represents all types of binary operators, by containing a string that represents the operator.

In document Compiling Dynamic Languages (Sider 36-39)