Interpreters, computers, and compilers Course 02131, week 10, SW

(1)

Interpreters, computers, and compilers

Course 02131, week 10, SW

Jørgen Steensgaard-Madsen November 16, 2005

Abstract

Shells, graphical user interfaces, pocket calculators, and domain specific sys- tems (e.g. Mathematica) can all be described as interpreters. Also computers can be described that way, and so can even compilers.

These notes describe the characteristics of interpreters, their internal structure, and how to construct them, possibly using some kind of tool support.

They are intended just as an appetiser for courses that go into the details, e.g. compiler construction, language design, or formal semantics.

Interpreter characteristics

An interpreter is a tool to perform computations according to given directions.

Often an interpreter iterates the actions of accepting an instruction and performing the requested computation, but some just terminates after the reaction to its first instruction (e.g. the Gezel interpreter).

Use of interpreters

• General programming languages (e.g. ML, Scheme, LISP, Prolog, Matlab)

• Transformation (of text)

e.g. L^ATEX, HTML browsers, tth, program ⇒ basic blocks + control

• Scripting languages (e.g. PERL, Tcl/Tk, shell command languages, make)

• Virtual machine implementation (e.g. JVM, .NET, P-code)

• Microprogrammed computers (e.g. Mic-1, Pentium, GIER)

• Preprocessors and ‘macro languages’

• Debuggers (e.g. gdb)

With interpreters, use of compilers is avoided

A compiler does translation — possibly as an interpreter

Organisation of an interpreter

• Any interpreter decodes input and finds amost significant operation It then dispatches control to the definition of the operation

The dispatching can be considered similar to a routine call An interpreter can perform a finite number of operations

Example: matching a while(...){...}-construct there is an operation defined in terms of the construct’s condition and body

• A common code structure for interpreters is while (more) {

switch (get_operation()) { /* analyse and dispatch */

case ... : ...; break; /* perform operation */

...

case exit: more = 0; break;

} }

In simple interpreters get operation()may be a parser routine.

(3)

An FSMD is an interpreter

Decode Dispatch

while (1) {

}

switch(action) {

case 0: x = a; y = b; x != y; break;

case 1: x < y; break;

case 2: y = y − x; break;

case 3: x = x − y; break;

case 4: x != y; break;

case 5: write(x); break;

} c

action

c = datapath(action);

action = A[S];

S = N[c,S];

Looking ahead: Coupled interpreters ∼ coupled FSMDs Possible opcodes could be: (FSMD number, Action number)

Claim: this fits nicely with control structures of high-level languages notincluding the notion of general subroutines

(4)

Recursive descent parsing

An interpreter, like a compiler, handles a formal language, i.e. a set of texts that are called expressions (or programs) of the language. Both typically depend on a scanner to collect characters into tokenswhich are presented to a parser.

The syntax of a language requires that tokens come in certain sequences, and the parser checks a seen sequence against the requirements. Error reporting and error recovery (skipping seen or subsuming forgotten tokens) is also done in a parser.

A grammar states language syntax in a suitable notation, of which several exist and are in actual use. The relation between grammars and parsers for their languages has been extensively studied. Ideally, one wants to generate a parser automatically from a grammar, and several tools of this kind exist.

A semantics for a language is a machine independent description of the computation represented by an expression. An implementation encodes the semantics in some implementation language, often the same as used for the language parser, e.g. C. Developers typically write an implementation such that the encoded semantics is integrated with the parser.

This section presents a simple way to develop an interpreter from a grammar expressed in a notation known as EBNF (Extended Baccus-Naur Form) — where BNF refers to a similar language used to describe an early programming language called Algol 60.

An expression interpreter

An example formal grammar

expression = product { (’+’ | ’-’) product } product = factor{(’*’ | ’/’ | ’%’) factor}

factor = numeral | ’(’ expression ’)’

A language is a set of strings

e.g. C programs are elements of the set known as the C language.

A syntax category ∼ a sublanguage ∼ a subset

e.g. product, factor, and numeral are sublanguages of expression.

EBNF: a language of formal grammers is a meta language The names of sublanguages are sometimes callednon-terminals.

Symbols (terminals) of the target language are quoted, e.g. ’(’ and ’+’.

Unquoted symbols, like =| { and ), are special symbols of the meta language

(5)

A recursive descent parser has a routine for each syntax cat- egory

A grammar rule like: expression = product { (’+’ | ’-’) product } can be transliterated to a corresponding routine in a C program:

myint expr(symset follow) { // never skips a token in follow myint tmp, res; // prepares for use of XLong?

token_t op;

res = product(follow | adder);

while (op = get_present(adder,follow)) { // { adder product } tmp = product(follow | adder);

switch (op) {

case ADD: res += tmp; break;

case SUB: res -= tmp; break;

} };

return res;

}

Error detection and -recovery

Recursive descent parsers can behave fairly good w.r.t. error handling:

• Eachrecognizerhas a parameter representing a set of possible follower tokens.

• Error handling depends onexpected - andacceptable sets of symbols:

static token_t get_present(symset expected,symset follow);

/* Returns current token if in expected, and advances.

* Otherwise NONE==0 is returned perhaps after an error message

******************************************************************/

static int skip_absent(symset start, symset follow);

/* Any token not in ‘start | follow’ will be skipped.

* Returns 1 when resulting current token is in start, 0 otherwise.

*******************************************************************/

Users must consider if a token should be read when get present returns 0.

Pure parsers (with no semantics) may be generated from a grammar, and appropriate sets may be (automatically?) calculated.

(6)

Using a library for long arithmetic of unbounded precision integers

Gezel uses a public mp-library for long arithmetic. We could use it also:

void expr(myint res,symset follow) { // product { adder product } myint tmp;

token_t op;

mpz_init(tmp); // required discipline

product(res,follow | adder);

while (op = get_present(adder,follow)) { product(tmp,follow | adder);

switch (op) {

case ADD: mpz_add(res,res,tmp); break; // result parameters!

case SUB: mpz_sub(res,res,tmp); break;

} };

mpz_clear(tmp); // releases memory

}

expr has been adjusted to use result parameters as do the library routines.

The interpreter routines get marginally longer, but no more complex.

The product routine

myint product(symset follow) { // never skips a token in follow myint res,tmp;

token_t op; // product = factor { multiplier factor } res = factor(follow | multiplier);

while (op = get_present(multiplier,follow)) { tmp = factor(follow | multiplier);

switch(op) {

case MUL: res *= tmp; break;

case DIV:

if (tmp) res /= tmp; else bye("Division by zero\n");

break;

case MOD:

if (tmp) res %= tmp; else bye("Division by zero\n");

break;

} }

return res;

}

(7)

The factor routine

myint factor(symset follow) { myint res=0;

if (skip_absent(primer,follow)) { // current token is in follow fprintf(stderr,"Missing factor at %s\n",symbol_string[token]);

parse_err++;

} else // current token is in primer

switch (token) { case LPAR:

get_token();

res = expr(follow | 1<<RPAR);

get_present(1<<RPAR,follow);

return res;

case NUM:

res = atoi(numeral);

get_token();

return res;

} }

... and its mp-counterpart

void factor(myint res,symset follow) { // constant or (expression) if (skip_absent(primer,follow)) {

fprintf(stderr,"Missing factor at %s\n",symbol_string[token]);

parse_err++;

mpz_init(res); // similar to a zero return

} else

switch (token) { case LPAR:

get_token();

expr(res,follow | 1<<RPAR);

get_present(1<<RPAR,follow);

break;

case NUM:

mpz_init_set_str(res,numeral,10);

get_token();

break;

} }

(8)

The interpreter’s main routine

int main(){

myint res;

printf("Ready for expressions -- Use Ctr-d to exit\n>\n");

get_token();

do {

skip_absent(primer,1<<EOFTOK);

if (token == EOFTOK) break;

parse_err = 0;

res = expr(1<<STOP);

if (parse_err == 0) printf("%i\n",res);

printf(">\n"); get_present(1<<STOP,primer | 1<<EOFTOK);

} while (1);

return 0;

}

Illustration: use of the mp-based interpreter

Ready for expressions -- Use Ctr-d to exit

>

9999999999999999999+8888888888;

10000000008888888887

>

-14;

Skipping unexpected symbols: - 14

>

0-14;

-14

>

8888888888888888 % 777777777;

342222221

>

...

The interpreter needs improvements for actual use, but this is easy

— first modify syntax, then the corresponding routines

— use another library readline to support line-editing.

(9)

Interpreters vs. compilers

A compiler translates programs of a programming language into some other language, in principle into (binary) machine language program, but in practice into some intermediate representation, e.g. assembly code- or object code programs. A binary machine language program can be executed by a computer, and should have an effect as stated by the semantics of the programming language.

An interpreter likewise accepts programs of a programming language, but has the effects stated by the semantics of the language. This means that the concepts of interpreters and compilers are intimately related, and this can be used in tools to construct compilers.

Types play an important rˆole in some programming languages and help programmers be aware of mistakes at an early stage. They can play a similar rˆole in definition of concepts. In physics it is common to check equations by formally com- puting ’dimensions’ (i.e. units of measurements) of both sides, which should give the same, and this is closely related to use of types.

One kind of type is somewhat special in this context: the type of functions.

In programming languages such types correspond to interfaces (or C-prototypes), and some languages like ML has a type sublanguage that includes function types.

Whether to include function types in a type sublanguage of a programming language is an important design issue, but for discussions about programming languages the notion of function types is of great value.

Types of interpreters and compilers

• The type of interpreters: I ≡ E → V

Where E is the type ofexpressions and V of values

• An expression may be: p(d) (of typeE of course) with da constant of type D Given an appropriate interpreter I :I we can obtain I(p(d)) of type V

We may strive for semantic compositionality such that I(p(d)) = I(p)(I(d)), i.e. I(p) : D → V and I(d) : D, so that V =D → V | D, which is more complicated that it may look. Often one is sloppy and says p : D → V and d:D

• A simplified type of compilersis: C ≡ E → D → V with programs p∈ E Where D → V is the type ofobject programs (and D is the type of inputs) The complexity associated with separate compilation is disregarded

• A physical machine is essentially an interpreter for (object programs, inputs):

M(C(p), d) =I(p(d))

(10)

Conceptual parts of interpreters

• Typical tasks performed by an interpreter can be broken down into:

– Scanner: lexical analysis (identify tokens like numerals and names) – Parser: syntax analysis (build an AST: an abstract syntax tree)

– Static semantic analysis (e.g. type checking)

– Code generation (possibly just use the AST)

– Execution: dynamic semantics (decode and dispatch over the ’code’)

• Construction

– Tools may be too complex for simple parsers and scanners.

– Recursive descent parsers are easy to write.

Anabstract syntax makes it easy to build an AST.

An abstract syntax is typically a union that match theconcrete syntax.

– A scanner is essentially an FSM and may be easy to write directly.

– Type checking can be quite complex, but should not be dropped.

Tools for construction of interpreters

• lex and yacc, or flex and bison, are quite complex – These require internal actions to be written in C

– Similar tools may exist if C is not your language of choice (Java, ML,...) – Error reporting and error recovery is often problematic.

– Invariably interpreters and semantics depend on another language – Compiled languages likewise depend on assembly/machine code

• A locally developed tool, called dulce, uses a different, new approach:

– A scanner, parser, and type checking exist for a suitablepre-language i.e. a language with little semantics, but a fixed pattern of syntax.

– Actual languages are defined relatively as sublanguages with semantics so sublanguages share syntax and type checking, but vary in operations.

– Semantics consists of highly independentcomponentswritten as C-routines all with substantial support by use of make.

– Semantics may also be defined by combination of known operations.

– One might use Lisp in a similar way for both prelanguage and semantics.

(11)

Breakpoint interpretation

Programmers often use a debugger during program development. Such a program depends onbreakpoinsinserted in a program by the developper. With an appropriate compiler, the meaning of a breakpoint is an interpreter executing in the context of the position of the breakpoint, i.e. variable and operation names are recognised by the interpreter and associated with their meaning in the context.

The language accepted by the interpreter at a breakpoint is most likely the same as the language accepted by the compiler, or at least a substantial subset of it. If the notion of a breakpoint is combined with an interpreter for a programming language, the interpreter at a breakpoint will appear as being an extension of the original language. It implies that the notion of a breakpoint is not necessarily associated with debugging of programs, but can be seen as a means to implement language extensions. Users of a language extended in this way may perceive the interpreter as any other interpreter for a language.

A tool, dulce, exists for construction of interpreters that accept languages with breakpoints as described. It has been developed to demonstrate that interpreters (and compilers) may be constructed without the need for developers to care about scanners and parsers, as well as a number of other aspects of language implementation. It has been used in applications that have been presented in the course, and will be detailed in the next section.

A very small dulce interpreter

Example: extend a predefined set of operations and allow all to be used.

{ DEF auxdiv(x:[int,int,int]):int { ... };

DEF fibdiv(Dd:int,Dr:int):int { auxdiv((Dd,Dr,1)) };

DEF fibmod(Dd:int,Dr:int):int { auxdiv((Dd,Dr,0)) };

loop{INTERPRET}; # semantics of INTERPRET is like a breakpoint };The above is itself a program to be interpreted by a dulce interpreter. It defines the semantics of three new operations in terms of some predefined ones. The form of auxdiv prepares for a later version, which is able to translate into IJVM assembly code. Semantics of predefined operations is ultimately written as C routines. The tool provides extensive help to write them. A possible user’s session is:

fib> fibdiv(109,22);

4

fib> fibmod(109,22);

21

fib> auxdiv((109,22,1));

4

(12)

Translation into FSM-like control

A compiler can be broken down into essentially the same components as an interpreter. The difference is that code generation has to be perceived in a more concrete way: it may be text that an assembler (which is just another kind of compiler) has to manipulate further. Eventually, the output will have a form that can be interpreted by a computer considered as an interpreter of its machine operations (i.e. the specific bit patterns that cause the computer to perform one of its native operations).

Programs are composed of computationsandcontrol, i.e. arithmetic and branch- ing. The split is reflected in the FSMD model of hardware: computations are performed by the Datapath (D) component, the control by a Finite State Machine (FSM) component. In assembly code the computations are expressed in terms of a subset of operations that contains no jumps other than routine calls, whereas control is expressed in terms of jumps that may depend on a recorded status (from comparisons, for instance).

Companion notes describe issues related to translation of computations into various kinds of ‘code’: stack operations, RISC-type assembly instructions, and spe- cialised hardware described as data flow. Here the focus is on control, and for illustration a simple language will be used that abstracts away from the details of computations.

A dulce interpreter to extract control and basic blocks

A more complex interpreter exists that can be used to extract basic blocks from programs written in a language with just if- and while statements. Although more complex, it is still described as a program in a language for a dulce interpreter.

‘Syntax’

[ cmd(S:string):Cmd ] [ test(S:string):Test ]

[ if (Cond:Test) [True:Cmd] [False:Cmd] ] [ while (Cond:Test) [Body:Cmd] ]

Sample program text(GCD) { cmd("x := a"); cmd("y := b");

while (test("x != y")) { if (test("x < y")) then

{ cmd("y := y-x"); } else

{ cmd("x := x-y"); } }

};

The syntax suffices to describe if and while asoperations.

The general syntax pattern allows the them to be used as in the program.

Expressions appear as strings, because their internals are uninteresting here.

The operations cmdand test just serve to classify expressions.

(13)

The interpreter in action

The GCD example above can in a first step be translated into FSMD(

ring([(0,ring(["x := a","y := b","x != y"])), (1,ring(["x < y"])),

(3,ring(["y := y-x"])), (4,ring(["x := x-y"])), (5,ring(["x != y"])), (2,ring(["exit"]))]),

ring([(0,1,2),(5,1,2),(1,3,4),(4,5,-1),(3,5,-1)]));

This can be read as one composite value consisting of two parts:

1. An enumeration of basic blocks represented as lists of strings (organised to allow easy reversal).

2. A representation of the control graph as a list of nodes,

each connected to two others (with -1 denoting absense of a node).

Basic blocks presented as in assembly code

A final step can then produce a more conventional description of an FSMD:

’Code’ for action A_Sk (’invoked’ in state Sk) A_S0:

x := a;

y := b;

x != y;

A_S1:

x < y;

A_S3:

y := y-x;

A_S4:

x := x-y;

A_S5:

x != y;

A_S2:

exit;

Control jumps

# entry point

je A_S2 jgt A_S4 j A_S5 jne A_S1

halt Labelled basic blocks are presented as in assembly code,

to which jumps of various kinds have been added manually to express control.

(14)

Control expressed as a FSM

Operation: (A_Sk) ==> (A_Si)

with Si = NextState(Sk,Status(A_Sk)) starting from S0

NextState:

S0 ==> S1, when Status(A_S0)=1, else S2 S5 ==> S1, when Status(A_S5)=1, else S2 S1 ==> S3, when Status(A_S1)=1, else S4 S4 ==> S5

S3 ==> S5 S2 ==> halt

================================

This is presented as a Moore-type FSM, with actions associated with nodes. How- ever, the same result obtained from the first step can also be translated into a Mealy-type machine, which is the case in a later, and more elaborate, version of the interpreter.

Simple semantics

The interpreter maintains a state that holds acurrent basic block,bb.r. The current basic block is saved and a new initialised when savebb is used.

Control structure information is saved in graph.r, which holds a list of elements and also belongs to the state,

(basic block, destination_true, destination_{f alse})

each corresponding to a graph node with two outgoing edges.

A current.r holds an identification of a label (or node) in graph.r The dispatcher-operation that matches cmd (and test)

# cmd [ cmd(S:string):Cmd ]

{ S :: bb.r } # extend the current basic block

# test [ test(S:string):Test ]

{ S :: bb.r } # extend the current basic block In both cases the operations are defined in terms of S which denotes a string. The representation of a basic block is thus simply a list of strings.

(15)

The extractor’s while-operation

# while [ while (Cond:Test) [Body:Cmd] ]

{ new (2) S_w; # reserves two integers for labels Cond;

with (savebb) init;

curlab.l := S_w.first;

Body; Cond;

savebb;

(curlab.r,S_w.first,S_w.first+1) :: graph.r;

(init.val,S_w.first,S_w.first+1) :: graph.r;

curlab.l := S_w.first+1;

};

Two labels (or node identifiers) are needed to represent the control structure of a while, so two are reserved ’at entry’ i.e. each time a while-operation is seen. This corresponds to a marking of the syntax as proposed in another note, i.e.

while (Cond _/ ) { _. Body }; _.

The code of the condition is included in whatever basic block is current at entry, but then saved and a new started. Saving a basic block by savebb implies that the value of a current label is returned, so that in this case it can be referenced as init.val. Please remember, that the marking patterns for basic blocks to some de- gree expresses a persons choice about for instance whether to consider the condition of a while as a basic block by itself.

The program for while refers to the condition and the computation to iterate by the namesCondandBody, respectively. Both could be perceived as routines that require no arguments, i.e. a reference to Cond might in an equivalent C program have to be expressed as Cond(). How to write such calls in a language is a minor language decision.

Each reference toCond, and likewise toBody, will contribute to an internal state and eventually to the resulting code.

Subsequently, the Bodyand theCond are analysed, i.e. the state is updated with information about their basic blocks and control structures. Before that, however, one of the two reserved labels is assigned as the current label. After the analysis, the current basic block is saved and graph.r is updated. Finally the current label is set to the other of the two reserved labels.

Updating graph.r adds a tripple of labels to a list, which represents a state

(16)

’next state’. In this case the updates tell that after either evaluation of Cond the next state can be the entry to theBody or one that will be determined later.

The association of a label with a basic block is actually done by savebb, which saves the current basic block and associates it with the value of curlab.r when it was initialised by a previous call use of savebb.

Note that savebb like Cond and Body behaves like a call to a C-routine with no arguments.

The if-operation

This is made slightly more complex to illustrate a first approach at optimisation.

A more extensive optimisation has been used in a successor (Zebra) to the present version, so the optimisation is not really justified by practice. Anyway, it is retained and its purpose is to avoid excessive use of labels when at least one of the branches in an application of if is empty.

The optimisation is realised by a variant of savebb, called save nonempty which requires two label arguments. Sometimes there is no need to save the current basic block, and one of the two labels given as arguments will be returned accordingly.

The ordinary case is to save and return the label given as first argument, so the unoptimised version would just replace the two uses ofsave_nonempty by savebb.

# if

{ new(3) S_if;

Cond;

with (savebb) init;

curlab.l := S_if.first;

True;

with (save_nonempty(S_if.first,S_if.first+2)) outT;

curlab.l := S_if.first+1;

False;

with (save_nonempty(S_if.first+1,S_if.first+2)) outF;

curlab.l := S_if.first+2; # save start of current BB (init.val,outT.val,outF.val) :: graph.r; # add node to FSM }

Overall organisation of the interpreter

The interpreter for the basic block extractor is defined in terms of operations for which semantics already exists, so it does not differ much from an ordinary program.

Some details, especially about semantics have beed described, but of course there are

(17)

such as savebb, has only been described informally. But the structure of the entire program can now be outlined.

Descriptions of if andwhileoperations differs slightly from those shown above.

The T in their descriptions denote some unknown type, so every if operation must have two branches that results in the same type. In other words: this if-operation combines the qualities of C’s if-statement and its ?-expression. The rˆole of T in the description of while is to let the type of the iterated part by any type — it will be discarded anyway.

This can be an advantage sometimes, but is not so for the LIJVM language, for instance. It all depends on the rˆole a language designer decides that types should have.

program{ # ...

DEF basicblocks

[ Syntax OF Cmd, Test [ cmd(S:string):Cmd ] [ test(S:string):Test ]

[ if OF T(Cond:Test)[True:T][False:T] ] [ while OF T(Cond:Test)[Body:T] ] ]

{ ####### (... initialisation); definition of semantics #############

var bb; bb.l := ring([]); # the basic block being built var graph; graph.l := ring([]); # branch nodes: (id,left,right) var curlab; curlab.l := S.first; # label of current BB

Syntax # NB: decode and dispatch

cmd: { S :: bb.r } # extend the current BB

test: { S :: bb.r } # ditto

# if ...

# while ...

################ output of the ‘state’ ############################

# ...

};

######################### application ##################################

loop{ basicblocks(INTERPRET); };

};

This interpreter illustrates actions of a traditional compiler. The interpreted language with sufficiently expressive, predefined operations is used to write a program that builds an internal representation (an abstract syntax tree) of the program and generates code for some target language from it.

Gezel code can be generated from a language that similarly abstracts from the details of computations. This is done by a more elaborate version of the interpreter, called Zebra.

Abstraction from computations can be convenient, but is in general unrealistic.

The next section tells about an interpreter that translates programs into IJVM assembly code without depending on such an abstraction.

(18)

LIJVM and its compiler

The IJVM computer does not provide usual support for multiplication, division, and shift operations, which makes it hard to write multiplication and division routines.

As shown in a companion note, it can be done by use of Fibonacci numbers. A special high-level language, LIJVM, allows these to be expressed, and a compiler for it can translate into assembly code for IJVM.

The compiler for LIJVM has been written to illustrate yet another implementation method for compilers. It relates to a concept from the theory of compilers, called abstract interpretation, but it is rarely used in actual construction. It is introduced to illustrate the similarities and differences between compilers and interpreters as well as to stimulate interests in theoretical aspects of compiler construction.

Interpretation vs. compilation

Simple interpretation of constants and expressions demo> 44;

44

demo> 67+22;

89

Semicolon terminates an expression.

A constant denote its value A composite expression is evaluated to a value Compilation of constants and expressions

lijvm> { 44; };

.constant objref 0xCAFE .end-constant .main

BIPUSH 44 POP

.end-main

Complication to be justified by the next example Assembly code is the value of an expression!

This is the actual value of 44

and this is caused by the semicolon!

Types may represent many properties lijvm> 44;

--- Error: Programs must have type Void

The semicolon in this case just indicates the end of an expression, which might otherwise have continued on the next line. Type semantics can be more complex than just a simple categorisation of values. In LIJVM the type Void is used to ensure stack consistency through a compile-time check. It implies that constants are not accepted as complete programs, but have to be wrapped as illustrated above.

(19)

Operators combine assembly code snippets lijvm> {67+22;};

BIPUSH 67 BIPUSH 22 IADD POP .end-main

An expression to translate The required opening

Value of 67 Value of 22

Combined by the semantics of +

A semicolon to consume a value from the stack

If we disregard the wrapping of the essential part of the assembly code, we find that the code agrees completely with the expectations according to Section 5.4.8 in Tanenbaum’s Structured Computer Organization. The association to expression interpretation has been emphasised in the lectureHardware oriented programming.

Details of compilation

Below is a small section of program text from an LIJVM compiler, focusing on translation of expressions as illustrated. The parts labelledadd: andsub: expresses how operators + and -, respectively, are translated. They both refer to their operands as leftand right.

Members

# --- Conversion of integer constants to code int: { if(X<255) { enc(" BIPUSH " X "\n") }

else {

obj_count.l += 1; objects.l ::= X;

enc(" LDC_W c" obj_count.r "\n") }

}

# --- Arithmetic, infix operators: + -

add: { enc(((dec_1(left)) dec_1(right) " IADD\n")) } sub: { enc(((dec_1(left)) dec_1(right) " ISUB\n")) } The enc routine maps strings into the type of assembly code.

dec 1maps assembly code to strings.

(20)

Corresponding description of syntax

[ Members OF Void, Cond

[ int(X:int): int Class ] # repr. of constants [ +(int Class,int Class): int Class ] # yes: addition [ -(int Class,int Class): int Class ] # subtraction ...

[ while (C: Cond) [ Body ]: Void ] # A usual while loop ...

[ ‘;’ (int Class,NONE): Void ] # pops a value [ ‘;’ (Void,NONE): Void ] # no action ...

]

Syntax is defined like an ‘interface’ of ‘member operations’.

Assembly code that leaves an int on the stack has typeint Class.

Assembly code that consumes an intfrom the stack has type Void.

Hack: int(X:int) is invoked by default on integer constants in the source text!

Hack: semicolon may be defined as an explicit operation.

Semantics of while

The interpretation of a while statement refers to the condition and the part to iterate as C and Body, respectively. Otherwise it is just slightly more complex than addition. It appears as just another labelled part under Members.

while: { with (lab.r) fst; lab.l += 2;

TFstack.l(TFtop.r) := (fst.val,fst.val+1);

TFtop.l += 1;

enc(

dec_1(C) # Condition snippet (see below)

"L" fst.val "\n" # Iteration label dec_1{Body; Nop.val} # Body snippet

" GOTO L" fst.val

"\nL" (fst.val+1) "\n" # Termination label )

}

An assembly code snippet for a condition ends with a conditional jump.

A false condition jumps to the second of a pair of labels on TFstack.

(21)

Translation example

lijvm 1> while (1 != 0) {1+2; 4;};

// while BIPUSH 1

BIPUSH 0 IF_ICMPEQ L1

L0: // do

BIPUSH 1 BIPUSH 2 IADD POP BIPUSH 4 POP GOTO L0

L1: // end-while

.end-main

Abstract interpretation in general

• Constants (and computed values) belong to particular ’abstract domains’.

• Type checking is abstract interpretation with a one-value domain for every type.

e.g. 4 + 6 is v_int+v_int which is interpreted as v_int.

• Values type correct ortype incorrect are program analysis results.

• Other kinds of program analysis results may be obtained by abstract interpretation.

Interpreters, computers, and compilers Course 02131, week 10, SW