Intermediate code generation - Possible optimizations

3.7 Possible optimizations

4.1.4 Intermediate code generation

We have firstly made an intermediate representation that we call IR, which is very close to the syntax of the C programming language, albeit with some simplifications. We have chosen to have such a representation, because it makes it easy to make the C code that should be generated. When we want to translate from the AST to the IR, there are then two approaches:

Doing a direct translation or adding another (or more) intermediate language.

The direct translation can be done relatively easy, because many of the con-structs of JavaScript are directly available in C, such as if-then-else, for-loops and while-loops. This also has the advantage that the outputted code will be very similar to the structure in the JavaScript code. Function calls can also be translated more or less directly once closures have been taken in to account.

This has the problem, however, that the evaluation order of the arguments is defined to be from left to right in JavaScript, but in the C language the eval-uation order of the arguments is not defined [Int11, p. 4] and is often right to left. If any of the arguments are the results of functions that have side-effects the compiled program might be incorrect due to this difference.

The operators in C, of course, do not work directly on the JavaScript values.

This can be solved by implementing them as functions, but then they might also have the wrong evaluation order.

Another approach would be to make another intermediate language. Specifically we considered using an intermediate language, that we called TAC, because it used a three address code structure. The three address code structure is that the statements of the language are all on one of the three forms:

• a1=a2op a3

• a1=opa2

• a₁=a₂

Here op is an operator, and the as are addresses. The addresses that can be used as addresses in a TAC-statement represent variables, constants, properties and function calls.

This means that the equivalent to JavaScripta=2+2;is simplya= 2 + 2, while a=2+4*5can be translated as

t1= 2 t2= 4∗5 a=t1+t2

With this structure it is easy to make sure that the evaluation order is correct, and the order is also very clear from the representation. To be able to do loops, and conditionals, the TAC also has a jump statement, a conditional jump statement, and labels.

The TAC approach also has advantages over the direct translation when we want to do optimizations. Because there are so few types of statements, and they all are very simple, there are only few cases to consider when doing the optimizations. This makes it is easier to argue that a transformation that the optimizer will do does not change the produced result. The fact that the eval-uation order is expressed explicitly in the representation also makes this much easier.

Furthermore, the classical dataflow algorithms uses control flow graphs to do their computations, and control flow graphs are easy to construct from three address code compared to IR.

During the development, we tried using both approaches. In the end we chose to use the TAC, however, because we wanted to get the evaluation order right, and because we wanted to do optimizations.

4.1.4.1 Our choice of TAC

We had to choose how close the TAC should be to the source and target lan-guages. Since we already had a language, IR, that was close to the target language, we chose to make the TAC closer to the source language. We esti-mated that this would be a good idea, because it would mean that we could for instance infer JavaScript type information on variables, before translating the code to the IR.

Our TAC has statements of the following types:

• Assignments:

– a1=a2

– a₁=op a₂ – a1=a1op a2

• Jumps:

– goto l1

– if a₁goto l₁ – if F alse a1goto l1

• Labels:

– l1

• Returns:

– return a₁

The meaning of the operators is the same as in JavaScript, that is, they do the necessary type coercion to produce the result. For the conditional jumps, the conversion of the condition address to a boolean is also left implicit. That is, if a₁goto l₁ will only jump to thel₁ label if the value represented bya₁can be coerced to true and if F alse a₁ goto l₁ will only jump if it can be coersed to f alse.The coersion is done in the same way as in JavaScript.

The addresses that can be used in the statements are:

• Variables

• Temporary Values

• Property Addresses (e.g. ao[ap])

• Constants

• Function calls (a_f(a₁...a_n)for a call withnparameters)

• Constructor calls (new af(a1...an)for a call withnparameters)

Constants are used for two puporses: Constants, that represent the literals of JavaScript and Object literals / Function expressions. The function expressions and object literals are not really constants as they represent objects, but are modelled as such to simplify the translation. The function constant consists of

its name, its parameters and its body as a list of TAC statements. Furthermore it contains all the annotations from the context handling.

The variables have the same meaning as in JavaScript, such as the semantics about being defined in closures, that is, they are available to be used in function declarations on the same or a deeper level.

The temporary values are used for storing the temporary results that are used when evaluating an expression. These values are therefore not part of the clo-sures.

The PropertyAddresses are used for writing to and reading from properties.

The semantics are the same as when reading from and writing to properties in Javascript. The conversion of the index to a string is also left implicit in the TAC code.

The values that the variables and temporary values represent have the same meaning as in JavaScript.

4.1.4.2 Translation to TAC

When a node from the AST is translated in the compiler, the translation is returned as an object of the type TacResult. The TacResult consists of a field,

"statements", which consists of the statements that the node is translated to, and a field, "result", that contains the TAC address of the result that the statements compute.

We will not explain how every single statement and expression of JavaScript can be translated to TAC, but will show some of the most interesting transfor-mations.

In the following,si will represent the statements of the TAC translation of the AST node i, andai will represent the address of the result.

Binary operation When we translate a binary expression such as e₁+e₂, wheree1 ande2 are expressions themselves, we do it as follows:

First we translatee1 ande2.

The TAC statements for entire operation are then:

s₂

t=a₁+a₂

and the result is the address t, which is a fresh temporary value, that is a temporary value that has not yet been used in the translation of the program.

From this it follows clearly, that the left operand is evaluated before the right, because the statements of the left operand are added before the right hand side.

This will also be the case when we have an expression with several operators, and perhaps parenthesises, because the left operand is always written first in a depth first fashion. Writing the nodes in this way is called a pre-order depth first traversal.

If then else-statement The if then else structure can be written relatively simply in the three address code. An if then else statement consists of a condi-tion, an if-branch and an else-branch.

We translate the structure as follows:

First we translate the condition,c, theif-branch andelse-branch.

The TAC statements for entire operation are then:

s_c

if F alse a_c goto l_else s_if

goto l_{af ter} lelse

selse

laf ter , where the labels are fresh. For the result address we give null, because JavaScript statements do not give values back.

Short circuited logic The ECMAScript Language Specification specifies [ecm11][11.11] that when the the result of the logical or operator is evaluated, the right operand should only be evaluated if the lef t operand evaluates to false. The effect can be illustrated by:

function getTrue(){console.log("true"); return true ;}

function getFalse(){console.log("false"); return false;}

getFalse() || getTrue();

Where the result should be that "false" and "true" should be written in the console, whereas for

getTrue() || getFalse();

would only write "true" in the console. The semantics for the logical and func-tion and for the :?operator are similar.

In the TAC this is easy to translate, because the evaluation order can be con-trolled. It is therefore translated as follows:

slef t

tresult=alef t

if tresultgoto lshortcircuit

sright

tresult=aright

lshortcircuit

wheretresultis the result address.

Constants and functions The literals of the JavaScript language can be translated to the TAC constants described above. For the object literals, we also add property assingments to add the properties.

Function call We have added an address, FunctionCall, for doing function calls. The address contains the address of the Function that is called, and the addresses of the parameters. When we translate a JavaScript function call, we first write out the translations of the parameters, in order from left to right.

Afterwards, we make the FunctionCall address, and sets its parameters to the result addresses from the translated Javascript parameters. Lastly we assign the FunctionCall address to a fresh temporary variable. The variable is returned as the result address of the functionCall.

s₁ ...

t=sf(a1...an)

The reason that we do not simply return the function call as the result ad-dress is that then a JavaScript expression like f(g(), h()) would be translated as f(g(), h()) in the TAC, and then the evaluation order is not explicit. The

reason is also that the call of the function has to be represented by a statement in the TAC if it is called as a statement in JavaScript. This is often done with functions that do not return a value, but change the state of the program. In TAC this can only be done as an assignment because that is the only type of statement, that does not make a jump.

In document Compiling Dynamic Languages (Sider 44-50)