A Denotational Framework for Data Flow Analysis

(1)

Acta Informatica 18, 265-287 (1982)

9 Springer-Verlag 1982

A Denotational Framework for Data Flow Analysis

Flemming Nielson

Department of Computer Science, James Clerk Maxwell Building, The King's Buildings Mayfield Road, Edinburgh EH9 3JZ, GB

Summary. It is shown how to express data flow analysis in a denotational framework by means of abstract interpretation. A continuation style formulation naturally leads to the M O P (Meet Over all Paths) solution, whereas a direct style formulation leads to the M F P (Maximal Fixed Point) solution.

O. Introduction

In this paper data flow analysis is treated from a semantic point of view. Data flow analysis is formulated in a denotational framework and in doing so the method of abstract interpretation is used.

Data flow analysis ([1, 7, 15]) associates properties (data flow information) with program points. The intention is that a property associated with some point must be satisfied in every execution of the program. This is because data flow information usually is used for applications like transforming a program to improve its efficiency. Therefore semantic considerations are needed to ensure the safeness of using the data flow information, e.g. to assure that the program transformation is meaning preserving. Data flow analysis is usually treated in an operational approach that is not syntax-directed, although there are exceptions (e.g. [13]). In some papers, e.g. [3] and [2], the semantics of data flow analysis is considered in this setting.

Here a denotational approach will be used where semantic functions are

" h o m o m o r p h i s m s " from syntax to denotations. The motivation for doing so is two-fold. First, the relative merits of an operational versus a denotational approach are not obvious. Previous formulations of data flow analysis in a denotational setting ([5, 4]) have been more ad hoc than the existing operational methods. Also the connection with the usual solutions ( M F P and MOP) has not been established. In this paper these two issues are addressed. A second motivation for the denotational approach is to use the data flow information to validate program transformations. The literature on data flow analysis and

" p r o g r a m optimization" hardly considers the issue at all. It turns out ([12, 11])

0001-5903/82/0018/0265/$04.60

(2)

that the a p p r o a c h to data flow analysis developed in this p a p e r forms a suitable platform for doing so.

In Sect. 2 it is described how to systematically develop non-standard denotational semantics specifying data flow information. This is achieved by a series of non-standard semantics leading to a formulation in continuation style. In Sect. 3 it is shown that this formulation yields the M O P (" meet over all paths") solution.

Also the M F P (" maximal fixed point") solution is considered and it is shown how it can be obtained using a direct style formulation. Section 4 contains the con- clusions.

1. Preliminaries

This paper builds on concepts from denotational semantics and data flow analysis (including abstract interpretation). T h e presupposed knowledge of data flow analysis is modest and some of the essential concepts will be reviewed before they are used. In defining semantic equations the notation of [14] and [10] is used, although the domains are not complete lattices but cpo's (as in [9]). Complete lattices are used when data flow analysis is specified. Below some fundamental notions a n d n o n - s t a n d a r d notation (__> __>, - - , - t > , - m > , - c > , - a > ) are explained.

A partially ordered set (S, _ ) is a set S with partial order ___, i.e. _ is a reflexive, antisymmetric and transitive relation on S. F o r S' _c S there m a y exist a (necessarily unique) least upper bound U S ' in S such that V s ~ S : ( s ~ l l S ' ~ V s ' ~ S ' : s ~ _ s ' ) . When S ' = {s 1, s2} one often writes s l u s 2 instead of [AS'. Dually a greatest lower bound R S ' m a y exist. A n o n - e m p t y subset S'_~ S is a chain if S' is countable and sl, s2~S' =~ (sl ~_s 2 v s 2 E s O. An element s ~ S is maximal if V s' ~S: (s'~_s =~ s'=s).

A partially ordered set is a cpo if it has a least element ( l = J-IS) and any chain has a least upper bound. It is a complete lattice if all U S ' and r-Is' exist; then there also is a greatest element (T = IIS). The word domain will be used b o t h for cpo's and complete lattices, and elements of some d o m a i n S are denoted s, s', sl etc.

A domain is f l a t if any chain contains at m o s t 2 elements, and is of f i n i t e height if any chain is finite.

D o m a i n s N, Q and T are flat cpo's of natural numbers, quotations and truth values. F r o m cpo's $1 . . . S, one can construct the separated sum $1 + ... + S , . This is a cpo with a new least element and injection functions in S i, enquiry functions E S / a n d projection functions [S i. The cartesian product S 1 x ... x S, is a cpo with selection functions ~,i. The cpo S* of lists is {( )} + S + ( S x S ) + ....

Function :~ yields the length of a list, function -~i removes the first i elements and w concatenates lists. The complete lattice S O is obtained from a set S by adjoining least and greatest elements. Byr is m e a n t the power-set of S with set inclusion as partial order. Sometimes a set is regarded as a partially ordered set whose partial order is equality.

All functions are assumed to be total. F o r partially ordered sets S and S' the set of (total) functions from S to S' is denoted S - t > S'. The set of monotone (isotone) functions from S to S' is denoted S - m > S ' a n d consists of those f ~S - t > S' that satisfy s 1 _ s 2 ~ f (sl)-- f (s2). A function f ~S - t > S' is continuous if f ( I IS")=I I { f ( s ) ] s ~ S " } holds for any chain S"c_S whose least upper bound

(3)

A Denotational Framework for Data Flow Analysis 267

exists. The set of continuous functions from S to S' is denoted by S - c > S'. A function f ~ S - t > S' is additive (a complete-u-morphism) if f (l lS") = U { f (s) l s~S"}

for any subset S" _ S whose least upper b o u n d exists. The set of additive functions from S to S' is denoted by S - a > S'. A n y subset of S - t > S' is partially ordered by f l ~ - j 2 ~ V s ~ S : fl(s)EJ2(s), and if S' is a cpo or complete lattice the same holds for S - t > S', S - m > S', S - c > S', S - a > S'.

An element s ~S is a fixed point of f s S - t > S if f ( s ) = s. When S is partially ordered it is the least fixed point provided it is a fixed point and s' = f(s') ~ s'~_ s.

F o r S a complete lattice and f ~ S - m > S the least fixed point always exists and is given by L F P ( f ) = i--]{s[f(s)r-s}. If f ~ S - c > S then L F P ( f ) = F I X ( f ) where F I X ( f ) = l l { f " ( l ) l n > O }. If S is only a cpo but f e S - c > S then F I X ( f ) still is the least fixed point of f.

T h e symbol = = denotes a continuous equality predicate (S x S - c > T), whereas = is reserved for true equality. So true = = • is • whereas true = • is false. When S is of finite height it is assumed that s x = = s 2 is _1_ if one of s 1, s 2 is n o n - m a x i m a l and equals sl = s 2 otherwise. Similarly > > is the continuous ex- tension of > (the predicate 'greater than or equal t o ' on the integers). The condi- tional t --*s~, s 2 is sl, s 2 or l depending on whether t is true, false or • By f l y ~ x ] is meant 2 z . z = = x ~ y , f ( z ) . Braces { and } are used to construct sets and to enclose continuations but the context should m a k e clear which is intended.

2. The Framework

In this section it is shown h o w data flow analysis can be expressed in a denotational framework. The development is performed for a toy language and consists of defining a series of non-standard semantics:

7W6-] ... ~ ~ 7 7 n ~ Fig. 1.

All the semantics are in continuation style. Semantics ind is the desired formulation of data flow analysis. Semantics sto is a store semantics [10] that is taken to be the canonical definition of the language considered. It will later be motivated why store semantics and continuation style are used.

The toy language consists of c o m m a n d s (syntactic category Cmd), expressions (Exp), identifiers (Ide), operators (Ope) and basic values (Bas). The syntax of c o m m a n d s and expressions can be deduced from Table 1, that will be explained shortly. F o r the remaining syntactic categories it is left unspecified. The syntactic categories are viewed as being sets or partially ordered sets whose partial order is equality.

Store Semantics

The store semantics (sto) is given by Tables 1, 2 and 3 together. The semantic functions cg (for commands) a n d 8 (for expressions) are defined in Table 1. The domains and auxiliary functions needed in Table 1 are defined in Tables 2 a n d 3.

(4)

Table 1. S e m a n t i c f u n c t i o n s

~ 9 c > C - c > C cg llcmdt; cmd2]] c =

<gllcmdl]] {~fllcmd2]] {c}}

cgllide: = e x p ] c =

gllexpll {assign []-ide]l {c}}

cs exp then c m d l else cmd2 fill c =

g~expll {cond(~ll-cmdl]l {c}, cgllcmd2]l {c})}

llwhile exp do c m d odll c =

F I X (2 c'. g llexpll {cond( cg llcmdll {c'}, c)}) :g llwrite exp]l c =

g llexp]l {write {c}}

cg llread idell c =

read {assign llidell {c}}

o ~ 9 C - c > C [exp~ ope exp211 c =

811expl]l {d~llexp2]l {apply llope]l {c}}}

d ~ llide]] c =

c o n t e n t llide]] {c}

g [[basll c = p u s h llbas]J {c}

Table 2. S o m e d o m a i n s a n d auxiliary f u n c t i o n s D o m a i n s

Val = T + N + . . . + {" nil"}

E n v = Ide - c > Val Inp = Val*

O u t = Val*

Tern = Val*

Sta = E n v x I n p x O u t x T e m Occ = N*

Pla = O c c x Q A u x i l i a r y f u n c t i o n s

apply [lope[[ 9 C - c > C

a p p l y [lope[[ = do (V apply llope], B apply llope]l)

V apply llope]] 9 Sta - c > T ("verify") V apply llope]l = 2 ( e n v , inp, out, t e m ) . # t e r n > > 2 B apply llope]] 9 Sta - c > Sta (" b o d y " ) B apply llope]] = 2 ( e n v , inp, out, t e r n ) .

( e n v , inp, out, @) llope]] (tern,L2, t e m + l ) ) w assign llide]], c o n t e n t llide]], p u s h llbas]], read, write

defined similarly Auxiliary f u n c t i o n s u s e d in the c o n d i t i o n a l

V c o n d 9 Sta - c > T (" verify")

V c o n d = 2 ( e n v , inp, out, t e m ) - ~ t e m > > 1 --* tem~ 1 E T, false

S c o n d 9 Sta - c > T ("select")

S c o n d = 2 ( e n v , inp, out, t e r n ) . ( t e m l 1) I T

B c o n d 9 Sta - c > Sta (" b o d y " ) B c o n d = 2 ( e n v , inp, out, t e r n ) . ( e n v , inp, out, tern ? 1)

values e n v i r o n m e n t s i n p u t s o u t p u t s

t e m p o r a r y result stacks states

occurrences places

(5)

A D e n o t a t i o n a l F r a m e w o r k for D a t a F l o w A n a l y s i s

T a b l e 3. I n t e r p r e t a t i o n sto

269

D o m a i n s

S = S t a s t a t e s

A = O u t + {" e r r o r " } a n s w e r s C = S - c > A c o n t i n u a t i o n s C o n s t a n t

fin ~ C fin = ). ( e n v , i n p , o u t , t e m ) - o u t in A A u x i l i a r y f u n c t i o n s

c o n d c C • C - c > C

c o n d (cl, c2) = 2 s t a . F c o n d (sta) ~ IS c o n d ( s t a ) ~ c l , c2] (B c o n d sta) , " e r r o r " in A

a t t a c h ~ P l a - c > C - c > C a t t a c h (pla) c = c

d o ~ ( S t a - c > T ) • (Sta - c > S t a ) - c > C - c > C d o ( V g , B g ) c = 2 s t a . V g ( s t a ) ~ c ( B g sta), " e r r o r " i n A

(For the present ignore domains Occ and Pla and function attach.) The definition of apply (and push) makes use of the undefined semantic function (9 for operators (and ~ for basic values). Domain Sta is the domain of states. The connection between identifiers and their values is established without using locations but this is not important for the development to go through. The component Tern is used to hold temporary results arising during evaluation of expressions. This is done by the ordinary method of evaluating expressions on a stack. As an example consider apply[ope]](c)(env, inp, out, tem). If two temporary results are on top of tern they are replaced by their result and the new state is supplied to c. If tern is not of the expected form an error occurs. The reader acquainted with store semantics [10] should find it straight-forward to supply the omitted definitions.

Table 3 is labelled "interpretation sto" because it is essentially by supplying replacements for Table 3 that the remaining semantics (col, sts and ind) are defined. To indicate which semantics is meant, a suffix may be used, e.g. cg sto.

So Cg sto[[cmd~ (fin-sto) (2 ide 9 "nil" in Val, inp, ( ) , ( ) ) specifies the effect of executing program cmd with input inp. For Tables 1, 2 and 3 it is presumably obvious that the functionalities shown are correct. For the remaining semantics the proofs of correctness of functionalities are straight-forward and therefore omitted.

Collecting Semantics

For data flow analysis purposes an important concept is that of associating information with a program point. This concept is not expressed in the store semantics and it is therefore convenient to add it to sto, yielding a collecting semantics (col).

(6)

A program point can be specified by a tuple (occ, q ) e P l a . All the tuples to be considered will be maximal (with respect to the partial ordering). Occurrences (like occ) are used to label nodes of the parse tree. The root is labelled ( ) and the i'th son of a node labelled occ is labelled occ w (i). The quotation q is useful for specifying whether the program point is to the left or to the right of the node:

o c c

t' t

Fig. 2.

In the usual view of parse-trees the nodes are not labelled by occurrences.

To be able to let the semantic equations associate information with program points (represented by tuples like (occ, q)) it is necessary to supply the appropriate occurrence as an additional argument to the semantic functions ~ and ~. In this way occurrences are used much like the positions of [6]. Furthermore the semantic equations must be augmented with functions associating information with the program points. The function attach (occ, q) is used for associating information with the program point specified by (occ, q). The result of performing these changes is sketched in Table 4. The proofs of Theorems 4 and 5 benefit from the chosen placement of attach. Because of attach in Table 3 the store semantics can be defined using Table 4 instead of Table 1.

In the collecting semantics (Tables 4, 2 and 5) the information to be associated with a program point is the set of states the program can be in whenever control reaches that point. Domain A - col = P l a - c >~(Sta) is used for that and attach -col(pla) is defined accordingly. This choice of A is not the only possibility, e.g. A =(Pla x Sta)* of [5] can be used instead. While this may give more information it is not needed for a large class of data flow analyses (including those usually handled by means of abstract interpretation). It will later be discussed why the continuations ( C - col) are not continuous.

Intuitively sto and col are closely related, because essentially only domain A is different. A connection is formally expressed by:

Theorem 1. For any staeSta, c m d e C m d and o u t 6 O u t :

<g sto ~cmd]] ( ) fin-sto sta = out in A

3 sta'eSta: sta'J,4 = o u t A cg CO1 [[cmd]] ( ) fin-col sta ( ( ) , " R " ) = {sta'}

(7)

A Denotational F r a m e w o r k for D a t a Flow Analysis

Table 4. Modified semantic functions

271

c g e C m d - c > O c c - c > C - c > C

cr Rif exp then c m d l else c m d 2 ti]l occ c = attach (occ, " L " ) {

~ [[exp]l occw ( 1 ) {

cond(~g I[cmdl] occ w ( 2 ) {attach (occ, " R " ) {c}}

, cg [[cmd2]l occ w ( 3 ) {attach (occ, " R " ) {c}})}}

~while exp do c m d od] ^{occ c =}

attach (occ, " L " ) { FIX(2c' 9 d" I[exp]] occw ( 1 ) {

cond (~ I]-cmd]l occ w ( 2 ) {c'}

, attach (occ, " R " ) {c})})}

the remaining clauses are changed similarly to the one for expl o p e exp2 g ~ E x p - c > O c c - c > C - c > C

8 nexpx ope exp2]] occ c = attach (occ, " L " ) { 8 [[expl] occ w ( 1 ) { 8 I[exp2]l occ w ( 3 ) { apply [[ ope]l {

attach (occ, " R " ) {c}}}}}

the remaining clauses are changed similarly to the one for exp~ ope exp2

Table 5. Interpretation col D o m a i n s

S = Sta

A = P l a - c > ~ ( S t a ) C = S - t > A C o n s t a n t

f i n e C f i n = Z Auxiliary functions

c o n d e C x C - c > C

cond (cl, c2) = 2 s t a . V cond (sta) -~ IS cond (sta) -~ cl, c2] (B cond sta), 3_

a t t a c h e P l a - c > C - c > C

attach (pla) c = 2 s t a . c (sta) ~ 3_ [{sta}/pla]

do e(Sta - c > T) x (Sta - c > S t a ) - t > C - c > C do(Vg, Bg) c = 2 s t a . Vg(sta)-~c(Bg sta), l

Proof.

T h e p r o o f is by structural i n d u c t i o n [14] m a k i n g use o f the predicate P - C s C - s t o x C - c o l - t > {true, false} defined by P - C ( c - s t o , c - c o l ) = V s t a e S t a : V o u t e O u t :

[c - sto (sta) = o u t in A r

sta' e Sta: sta' $4 = out ^ c - col(sta) ( ( ( ) , " R " ) ) = {sta'}].

(8)

By structural induction on cmd (and likewise exp) it is shown that P - C ( c - sto, c - c o l ) / x occ #:( )

=> P - C(Cs sto [[cmd]l occ c - sto, cg col I[cmd]] occ c - c o l ) . Similarly, by case analysis of cmd it can be shown that

P - C(Cg sto [[cmd]] ( ) fin - s t o , cg col I[cmd]] ( ) f i n - c o l ) using that

P - C (attach - sto ( ( ( ) , "R ")) {fin - sto}, attach - c o l ( ( ( ) , " R " ) ) {fin - col}).

The interesting case of the p r o o f is that of the while loop. Abbreviate a g [cl] = 2 c 2 . 8 [[exp]] occw ( 1 ) {cond(Cg [[cmd]] occw ( 2 ) {c2}, cl) } and assume P - C ( c 1 - s t o , c a - c o l ) . It is to be shown that

P - C ( F I X ( g - s t o [ c 1 - sto]), FIX (g - col [cl - col])).

It is easy to establish Y n > 0 : P - C ( ( g - s t o [c 1 -sto])"_t_, ( g - c o l [c 1 - c o l ] ) " / ) but this does not immediately yield the result. The result follows from

V sta 3 no V n > n o : (g - s t o [ c I - sto])" • sta = (IA {(g - sto [-C 1 - - sto])" • I n > 0}) sta and a similar equation for col. To establish the above equation it suffices to show (dropping suffix sto)

V sta: [(g [cl])" • sta # • ~ r e 2 : (g [cl])" c2 sta = (g [cl])" l sta].

This is because c 2 = g [cl] l implies that (g [cl])" c 2 is (g [cl]) "+ 1 L. P r o o f is by induction in n and the case n = 0 is trivial so consider the induction step. It is straight-forward to see that (g [ c l ] ) " + l ( c z ) ( s t a ) independently of c2 is either •

" e r r o r " in A, cl(sta' ) or (g [ct])" (c2)(sta") where sta' and sta" are independent o f c z . In the first 3 cases the equation is immediate and in the latter case it follows from the induction hypothesis for n. []

E x a m p l e . Consider analysing the program x.'= 1 with some input inp~Inp. Abbre- viate env = 2 ide. " n i l " inVal and a - col = ~ col[[x." = 1]] ( ) fin - col (env, inp, ( ) , ( ) ) . The result of analysing x.'= 1 is a - col, and

a - c o l = a t t a c h ( ( ) , " L " ) {

attach ( ( 1 ) , " L " ) {push [1]] {attach ( ( 1 ) , " R " ) { assign [I-x~ {attach ( ( ) , " R " ) {fin}}}}}} (env, inp, ( ) , ( ) ) This means that

a - col ( ( ) , " L " ) = a - col ( ( 1 ) , " L " ) = {(env, inp, ( ) , ( ) ) } a - c o l ( ( 1 ) , " R " ) = {(env, inp, ( ) , (1 inVal))}

a - c o l ( ( ) , " R " ) = {(env [1 inVal/x], inp, ( ) , ( ) ) } and a - col(pla) = 0 otherwise. []

For typographical reasons go, is written g [cl]

(9)

Static Semantics

T h e r e a r e t w o w a y s in w h i c h t h e c o l l e c t i n g s e m a n t i c s is n o t the d e s i r e d f o r m u l a t i o n o f d a t a flow analysis. O n e is t h a t t h e p r o g r a m is o n l y e x e c u t e d for o n e p a r t i c u l a r i n p u t r a t h e r t h a n a set of i n p u t s (e.g. Inp). T h i s is r e m e d i e d b y the s t a t i c s e m a n t i c s (sts) to b e c o n s i d e r e d below. A n o t h e r is t h a t sets o f states are c o n s i d e r e d r a t h e r t h a n a p p r o x i m a t e d e s c r i p t i o n s o f sets o f states. T h e l a t t e r will be r e m e d i e d b y the i n d u c e d semantics.

I n t h e s t a t i c s e m a n t i c s (Tables 4, 2 a n d 6) d o m a i n S a n d f u n c t i o n s d o a n d c o n d h a v e b e e n c h a n g e d . T h e i n t e n t i o n w i t h d o ( V g, Bg) (c) is to s u p p l y t o c a set o f t r a n s f o r m e d states. E x c l u d e d f r o m c o n s i d e r a t i o n a r e states t h a t w o u l d n o t be s u p p l i e d to the c o n t i n u a t i o n in t h e c o l l e c t i n g semantics. T h e c o n d i t i o n a l c o n d is m o d i f i e d so as to " t r a v e r s e " b o t h t h e t r u e a n d false b r a n c h w i t h a n a p p r o p r i a t e set of states. T h e p a r t i a l a n s w e r s f r o m the t w o b r a n c h e s a r e t h e n c o m b i n e d . T h e c o n n e c t i o n b e t w e e n col a n d sts is e x p r e s s e d b y t h e t h e o r e m below. A m o r e o r less s i m i l a r r e s u l t a p p e a r s in I-2] b u t t h e r e for a n o p e r a t i o n a l semantics.

T h e o r e m 2. For all c m d ~ C m d and s~r

cg sts [[cmd]l ( ) f i n - s t s s - - I I {~ col ~cmd]] ( ) f i n - c o l s t a l s t a ~ s } Proof. T h e p r o o f is b y s t r u c t u r a l i n d u c t i o n m a k i n g use o f t h e p r e d i c a t e P - C s C

- col x C - s t s - t > {true, false} d e f i n e d b y P - C ( c - col, c - sts) =VsEC~(Sta):

c - s t s ( s ) = U { c - c o l ( s t a ) l s t a s s } . By s t r u c t u r a l i n d u c t i o n o n c m d ( a n d likewise exp) it is s h o w n t h a t P - C ( c - c o l , c - s t s ) ~ P - C ( C g c o l l [ c m d ] l o c c c - c o l ,

sts I[cmd]] occ c - sts). T h e p r o o f o f the while c a s e m a k e s use o f the fact t h a t for a c o m p l e t e lattice I I {U {xi, j l i~I} I j e J } = LJ {El {x M I j ~ J } l i~I} w h i c h follows f r o m t h e fact [-14] t h a t [ _ J { l l { x i a I i e I } I j e J } = LJ(u { { x i a l i ~ I } [jeJ}).

A m o r e d e t a i l e d p r o o f o f this t h e o r e m , as well as T h e o r e m s 3 a n d 4 later, c a n be f o u n d in [11] in a slightly different n o t a t i o n . [ ]

Table 6. Interpretation sts Domains

S = ~(Sta) A = Pla - c > ~(Sta) C = S - a > A Constant

fineC fin=&

Auxiliary functions condeC• C - c > C

cond(q, c2)=2s, c I {B cond(sta)[ V cond(sta) =true ^ S cond(sta) = true ^ staes}

u c 2 {B cond(sta) l V cond(sta) = true ^ S cond(sta) =false ^ staes}

attache P l a - c> C - c > C attach(pla) c = ~ s . c(s)u/[s/pla]

do e(Sta - c > T) x (Sta- c > Sta) - t > C - c > C do (Vg, B g) c = ~. s. c {B g (sta) I Vg (sta) = true ^ sta e s}

(10)

It appears to be m a n d a t o r y to work from a store semantics in order for this theorem to hold. (In an approach based on a standard semantics presumably only _~ holds [11].) The techniques of [10] can be used to transform a standard semantics into a store semantics.

The Method of Abstract Interpretation

Data flow analysis is almost always expressed in terms of approximate descriptions of sets (of states or values) rather than the sets themselves. It is necessary to work with approximate information if data flow analysis is to be carried out automati- cally, because otherwise the data flow information might not be computable. The approximate data flow information must be "safe", i.e. describe a set that is not smaller than the precise set.

Example (preparing for constant propagation)

Define the complete lattice Val = {valeVallval is maximal} ~ An element of Val is to describe a set of values, i.e. an element of ~(Val). The method of abstract interpretation ([3, 2]) makes use of a concretization function 7 r E V a l - m > ~ ( V a l ) to express the subset of Val that some element of Val describes. A natural choice is yv(_l_)=r y v ( T ) = Val and 7v(val)= {val} otherwise.

The abstraction function e v e ~ ( V a l ) - m > V a l can be used to approximate sets of values. It is natural to define ev (v) = [-] {val I ~'v (val) ~ v } because then ev (v) is the " b e s t " approximation of the set v of values. As an example ev({true})

= t r u e and ev({true, false})=-r. []

It is useful when the concretization function (y) and the abstraction function (e) are related as described by one of the two concepts defined below.

Definition. (~, 7) is a pair of semi-adjoined functions between partially ordered sets L and M iff

(i) e e L - m > M and y e M - m > L, (ii) 7 o ~ 2 l . I.

Furthermore, (~, y) is a pair of adjoined functions [3] between L and M iff in addition to (i) and (ii) also

(iii) ~ o 7 ~ - 2 m . m . [ ]

The pair (~v, Yv) of the example is a pair of adjoined functions between complete lattices ~(Val) and Val. Condition (ii) expresses the intention that a set of values is approximated by a not smaller set of values so that one can only make

"errors on the conservative side" [1]. Condition (i) relates the partial orderings so that obtaining more information in ~(Val) = L corresponds to obtaining more information in Val = M and vice versa. Condition (iii) is usually satisfied but it is not needed for the substance of this development. An analysis of the concepts semi-adjoined and adjoined is given in [11]. Whenever y ~ M - m > L satisfies VM' _ M : y(RM') = rq {y(m)lmEM'} there is a unique o ~ L - m > M so that ( ~ , y ) is a pair of adjoined functions. If (~, y) is a pair of adjoined functions (between complete lattices L and M) then ~ = 2 I. [-I {m 17 (m)-7 l}, c~ is additive and y satisfies the condition displayed above [3].

(11)

T h e r e is a n i m p o r t a n t difference b e t w e e n V a l o n t h e o n e h a n d a n d V a l a n d N ( V a l ) o n t h e other. T h e p a r t i a l o r d e r o f V a l m e a n s " l e s s defined t h a n " (in the sense o f Scott), so a n a t u r a l c o n d i t i o n to i m p o s e is t h a t t h e d o m a i n s m u s t b e cpo's.

T h e p a r t i a l o r d e r s o f V a l a n d ~ ( V a l ) m e a n s o m e t h i n g like " l o g i c a l l y i m p l i e s " , e.g. val 1 --- val2 m e a n s t h a t if a set o f v a l u e s is a p p r o x i m a t e l y d e s c r i b e d b y v a l l t h e n val 2 is also a safe d e s c r i p t i o n o f the set. T h e c o n d i t i o n t h a t will be i m p o s e d is t h a t the d o m a i n s a r e c o m p l e t e lattices. W h i l e this c a n be w e a k e n e d it is i m p o r t a n t t h a t a g r e a t e s t e l e m e n t is present. T h i s is c o n t r a r y to V a l w h e r e a g r e a t e s t e l e m e n t w o u l d be artificial. I t is a l s o this difference in p a r t i a l o r d e r s t h a t a c c o u n t s for w h y the c o n t i n u a t i o n s of col a r e n o t c o n t i n u o u s .

Induced Semantics

T h e s t a t i c s e m a n t i c s is n o t the d e s i r e d f o r m u l a t i o n o f d a t a flow a n a l y s i s b e c a u s e it d o e s n o t w o r k w i t h a p p r o x i m a t e d e s c r i p t i o n elements. L e t S t a b e a c o m p l e t e l a t t i c e d e e m e d to be m o r e a p p r o x i m a t e t h a n r A l s o let t h e r e b e a c o n c r e t i - z a t i o n f u n c t i o n 7 ~ S t a - m > ~ ( S t a ) a n d a n a b s t r a c t i o n f u n c t i o n a E ~ ( S t a ) - m > S t a such t h a t ( a , ? ) is a p a i r o f s e m i - a d j o i n e d functions. T h e i n d u c e d s e m a n t i c s (ind ( ~ , ? ) ) is o b t a i n e d b y m o d i f y i n g the s t a t i c s e m a n t i c s in e s s e n t i a l l y t w o ways.

O n e is to use S t a i n s t e a d o f ~ ( S t a ) . T h e o t h e r is to redefine the a u x i l i a r y f u n c t i o n s so t h a t their effect u p o n s o m e sta is o b t a i n e d b y first a p p l y i n g t h e a n a l o g o u s m a p p i n g o f t h e s t a t i c s e m a n t i c s to 7(sta) a n d t h e n a p p l y i n g a to the result.

Table 7. Interpretation ind (~, 7) Domains

S = Sta (~, ? ) is a pair of semi-adjoined functions A = Pla - c > S between complete lattices ~(Sta) and Sta C = S - m > A

Constant

fin~C fin=•

Auxiliary functions cond~Cx C - c > C

cond (c 1, c2) = Z s. cl (~ {B cond (sta) I V cond (sta) = true A S cond (sta) = true ^ staE?(s)})

,t c2 (~ {B cond (sta) I V cond (sta) = true A S cond (sta) = false A staE7(s)})

attach~ P l a - c > C - c > C attach(pla) c = ~t s. c (s) u / [s/pla]

do e (Sta - c > T) • (Sta - c > Sta) - t > C - c > C

do(V g, Bg) c =2 s. c(cr {Bg(sta)l V g(sta)=true ^ sta~7(s)})

T a b l e s 4, 2 a n d 7 c o n t a i n t h e details. T h a t t h e i n d u c e d s e m a n t i c s is " s a f e " follows f r o m :

T h e o r e m 3. For all c m d 6 C m d and s t a ~ S t a "

sts [[cmd]] ( ) fin - sts (7 (sta))___ ? o (cg i n d Ircmd]l ( ) fin - i n d sta)

(12)

whenever (a, Y) is a pair of semi-adjoined functions (between complete lattices

~(Sta) and Sta) and ind is ind (a, y).

Proof. The p r o o f is by structural induction making use of the predicate P - C~ C - s t s x C - i n d - t > {true, false} defined by P - C ( c - s t s , c - i n d ) = [ c - s t s o y ~ 2 s t a . 7o(c-ind(sta))]. By structural induction on cmd (and likewise exp) it is shown that P - C (c - sts, c - ind) ~ P - C (cg sts [[cmd] occ c - sts, cg ind [[cmd]

occ c - ind). The p r o o f makes use of 2 f . y o f being monotone, that 7 o a___ 2 states 9 states and that continuations of sts are monotone. []

It is also possible to specify data flow analysis by means of approximate semantics other than ind (a, 7). Then an analogue of Theorem 3 can be used to relate such semantics [11].

Example (constant propagation). One way to specify the data flow analysis "constant p r o p a g a t i o n " is by the induced semantics ind (as, Ys). Domain Sta is ( I d e - c > Val) x Va]| where Va] | is ~ 1 " augmented with a greatest element (T).

The concretization function ~s s Sta - m > N(Sta) is given by:

7s (env, tern) = {(env, inp, out, tern) e Stal

V ide e Ide: env l[ide]] e 7v (env [ide])/x (i) [ t e m = Y v Item r {_L, T } ^ # tern = # tem/x (ii) V j ~ { 1 .. . . . ~ tern} : tern Sj s 7v (tern Sj)]] } (iii) F o r a state (env, inp, out, tern) to be one of those described by (env, tern) the environment env must be one of those described by env (condition (i)) and the temporary result stack tern must be one of those described by tem (conditions (ii) and (iii)). Conditions (i) and (iii) are reasonably straightforward. Condition (ii) is somewhat more "technical": The greatest element of V ~ | describes any stack in Val*, the least element of V ~ | describes no stacks in Val* and ( ~ 1 .... , val,) eVa] | describes some stacks of n elements. The abstraction function ase~(Sta)

- m > Sta is defined by as(States)= [-] {sta [ 7s(Sta ) ~ states}. Then (as, Ys) is a pair

of adjoined functions between complete lattices ~(Sta) and Sta.

In data flow analysis it is usually assumed that the description lattices are of finite height (to ensure computability of the data flow information). This is the case when Ide is finite. F o r an example constant propagation analysis consider the program x." = 1. Abbreviate env = 2 ide. " n i l " inVal and a - i n d =Cgind [Ix: = 1~

( ) f i n - i n d (env, ( ) ) . The result of analysing x: = 1 then is a - i n d , and a - i n d ( ( ) , " L " ) = a - i n d ( ( 1 ) , " L " ) = (env, ( ) )

a - ind ( ( 1 ) , " R " ) = (env, (1 i n V a l ) ) a - i n d ( ( ) , " R " ) = ( e n v [1 inVal/x], ( ) )

and a - i n d ( p l a ) = _ L otherwise. Note that information is obtained about the evaluation of expressions (here 1) without the need for converting the program to a sequence of one-operator assignments. This is because attach has been placed in the semantic clauses for expressions. []

In summary, the development performed has succeeded in formally validating the data flow information (specified by the induced semantics) with respect to

(13)

A Denotational Framework for Data Flow Analysis 277 the collecting semantics (Theorems 2 and 3). This is essentially similar to what is done in [5], although the present development is more systematic: a data flow analysis (ind(~, 7)) is obtained merely by specifying an abstraction function (e) and a concretization function (7), such that the pair ((~, 7)) of functions is a pair of semi-adjoined functions. However, the semantics of a program is really given by the store semantics. None of the theorems (including Theorem 1) can be used to validate data flow information with respect to the store semantics.

(A similar defect holds for [5].) It seems impossible to do so directly because two programs may have the same denotation in the store semantics and yet different denotations in the induced semantics. An indirect way of relating the collecting semantics (and by Theorems 2 and 3 also the induced semantics) to the store semantics is considered in [12] and [11] where the collecting semantics is used to validate program transformations.

3. The MOP and MFP Solutions

In this section the data flow information specified in section 2 is related to the traditionally considered M O P and M F P solutions. " I t appears generally true that for data flow analysis problem, we search for the [MOP] solution" [7], and Theorem 4 will show that U ind n-cmd~ ( ( ) ) ( f i n - i n d ) ( s ) yields the MOP solution. By using a direct style formulation instead of the continuation style formulation it is possible to obtain the M F P solution (Theorem 5).

To compare the traditional (operational) approach to data flow analysis with that of Sect. 2 it is necessary to super impose a flow chart view upon programs.

The flowcharts to be considered will have arcs to correspond to "places". The flowchart constructed from program cmd has unique entry arc ( ( ) , " E " ) , unique exit arc ( ( ) , " R " ) and is represented by a set of tuples of the form (pla 1, pla z, tf). Such a tuple is intended to express that when " g o i n g " from pla 1 to pla2 the data flow information is changed as specified by t f e T F = S - m > S . Thus tf is a transfer function [1] associated with the basic block (node) that has pla 1 leading in and pla 2 leading out. The basic blocks are smaller than is usual (in fact many are "empty") because attach functions have been placed as they have; while this can be remedied, the placement used makes the proofs of Theorems 4 and 5 less involved and simplifies the presentation below.

To construct a flowchart from a program the semantic functions FU and F ~ are used (Tables 8, 2, 7 and 9). As an example consider the flowchart in Fig. 3 (ignoring the dotted rectangle). It has arcs labelled pla o . . . pla4 and basic blocks with transfer functions fro, ..., tf3 associated and is represented by f c = {(pla i, plai+l, tfi)10<i<3}. This representation is obtained from the program x : = l using the function FU. Since FUI[x: = 1]] ( ) is

F attach (pla 1) * FN n- 1 ] ( 1 ) * F do ( V assign I[x]], B assign [[xl]) * F attach (pla4) and F S [ 1 ] l ( 1 ) is

F attach(pla2) 9 F do(Vpush[[1]], B push[Jill) 9 F attach(pla3) the result of FU ind [Ix: = 1]] ( ) (plao, fro) is ((pla4, tf4), fc).

(14)

Table 8. Semantic functions specifying the flowchart F c g ~ C m d - t > O c c - t > F G

Fog [ i f exp then cmd 1 else cmd z fi]] o c t = F attach (occ, " L " ) *

F g [[exp] occ w ( 1 ) *

F cond(F~C[cmda]] occ w ( 2 ) * F attach ( o c t , " R ' ) , Fog [[cmd2]] occw ( 3 ) * F attach (occ, " R ' ) ) Fcg [while exp do cmd od]] o c c =

F attach (occ, " L ' ) *

[ F o v iexp~ occ w ( 1 ) * F cond(FCg [cmdll occw ( 2 ) , F attach (occ, " R ' ) ) F fix ( ( o c t w ( 2 ) , " R ' ) , ( o c t w ( 1 ) , " L ' ) , ,t s. s ) ]

the remaining clauses are similar to that of exp~ ope exp2 F g ~ E x p - t > O c c - t > FG

F g Iiexp 1 ope exp2]] occ = F attach (occ, " L " ) * F d' gexp 1]] occ w ( 1 ) * f 8 I]'exp2]] occ w ( 3 ) *

F do (V apply fiope]l, B apply l o p e ] ) * F attach (occ, " R ' )

the remaining clauses are similar to that of expt ope exp2

Table 9. Additions to interpretation ind (for F :4 and F ~) Domains

T F = S - m > S

FC = ~'(Pla x Pla • TF) PI = Pla x TF F G = P I - t > P I x FC Combinator

transfer functions

(representation of) flowcharts partial information

flowchart generators

fgl *fg2 = 2 pi. (fg2(fgx(pi)~l),H, fg2 (fg,(pi)$1)+2ufgl(pi)J.2) Auxiliary functions

F c o n d ~ F G • F G - t > F G

F cond(fgl, f g / ) = 2 (pla, t f ) . (fg2 (pla, tffotf)~ 1

, fgl (pla, tftotf)+2~fgz (pla, tffotf)+2) tft = 2 s. ~ {B cond (sta) [ V cond (sta) = true ^ S cond (sta) = true ^ sta~ y (s)}

tff = 2 s. c~ {B cond (sta) I V cond (sta) = true A S cond (sta) = false ^ sta ~ 7 (s)}

F attache Pla - t > F G

F attach(pla) = 2 (pla', t f ) . ( ( p l a , 2 s . s), {(pla', pla, tf)}) F d o ~ ( S t a - c > T) • ( S t a - c > S t a ) - t > F G

F do(V g, Bg) = 2 (pla, t f ) - ( ( p l a , 2 s. ~ {Bg(sta)[ V g(sta) = true ^ sta Ey (tf(s))}) , r

F f i x s F G x (Pla x Pla x T F ) - t > F G (infixed)

fg F fix (pla~, pla2, tf) = 2 pi- (fg(pi)~ 1, fg(pi)$2 ~ {(pla 1, pla2, tf)})

H e r e t h e ( ) is t h e o c c u r r e n c e t h a t is n e e d e d a s u s u a l . T h e ( p l a o , f r o ) is n e e d e d b e c a u s e t h e r e p r e s e n t a t i o n o f t h e b a s i c b l o c k t o w h i c h t f o is a s s o c i a t e d c a n o n l y b e c o n s t r u c t e d o n c e p l a l is k n o w n . T h e r e f o r e F o v is d e f i n e d s o a s t o t a k e ( p l a o , f r o )

(15)

J, plao

plat

pla2

plas

pla~

- A

pla0 = ( ( ) , " E " ) plal = ( ( ) , " L " ) pla: = ( ( 1 ) , " L " ) pla 3 = ( ( 1 ) , " R " ) pla4 = ( ( ) , " R " ) tfo = tft = tf4 = 2 s . s

tf 2 =As. ct {B push [1] stal V p u s h [ l ] sta = t r u e ^ staeT(S)}

tfa = ~ s. ct {B assign ~x] sta{V assign I x ] sta = true a s t a 6y (s)}

F i g . 3.

as a parameter and construct the relevant tuple ((pla o, plal, fro)), although it is only the part of the flowchart enclosed in the dotted rectangle that intuitively

"corresponds" to x." = 1. In order for sequencing (by means of *) to work it is necessary to let FU produce not only a flowchart but also a tuple like the (pla4,tf4) above; indeed (plao, fro) could in principle have been produced by some FU [[...~ if x: = 1 had occurred in some context. One way to read (plao, fro) and similarly (pla4, tf4), is that since arc plao was traversed the (as yet unrecorded) transfer function tf 0 has been encountered.

In general the flowchart associated with program cmd is specified by F U i n d [ [ c m d ~ ( )(plao,tfo)~2. As a further explanation of FU consider the clause for while exp do cmd od that is illustrated in Fig. 4.

... i

/ -

[

I

^PIQo

tf~ I . . .

l plo~

Xs.s }

exp

[ pta6

cmd

t f t

pLa~ I

Fig. 4

pla 1 = ( o c c , " L " ) pla 2 = (occ w ( 1 ) , " L " ) pla 3 = ( o c c w ( 1 ) , " R " ) pla4 = (occ w ( 2 ) , " L " ) pla 5 = (occ w ( 2 ) , " R " ) pla 6 = (oec, " R " ) tff, tft as in Table 9

(16)

The dotted rectangles correspond to syntactic subphrases. The flowcharts for exp and cmd have not been shown in detail. The entire flowchart

FOg indl[while exp do cmd od]] occ<plao, tfo>+2 is {<plao, plal, fro)}

F ~ indl[exp] occw ( p l a l , 2s. s>$2 u {(pla3, pla6, tff>} u

F ~ ind [[cmd]] occw (pla3, tft>+2 {<plas, p l a 2 , 2 s , s>}

It is the tuple (pla 5, pla2,2s, s> that accounts for the "iterative nature" of the while construct.

Some properties of F ~ and F g are mentioned below. They are needed in the proofs of Theorems 4 and 5 and may help in giving a better understanding of FOg and F & The proofs of these properties are omitted since they are straight- forward (e.g. by structural induction). The formulation uses the phrase "pla is a descendant of occ", which means that pla+l =occw occ' for some maximal occ', and that pla$2~ {"L", "R"}. Consider FOg ind[cmd]] occ<pla o, fro> = ( p i , fc> or similarly F g ind [[exp~ occ <plao, tf o > = (pi, fc>. Let (pla, pla', tf) be an arbitrary element offc, and assume (as will always be the case) that occ and pla o are maximal and that pla o is not a descendant of occ. Then,

9 pi = ( ( o c c , "R">, 2 s . s>

9 pla' is maximal and is a descendant of occ

9 pla is maximal and is either pla o or a descendant of occ, but it is not ( o c c , " R " > . Intuitively this means that the flowchart fc is immediately left after traversing <occ, "R">.

9 <plao, (occ, "L">, fro> is the only tuple offc with pla o as the first component.

This means that the first arc of fc to be traversed is ( o c c , " L " > and also that fc\{<plao, <occ, "L">, fro)} is independent of (plao, fro> (i.e. independent of " h o w the flowchart is entered").

The M O P Solution

For a flowchart fc with entry arc pla and description element s holding there let MOPS (fc, pla, s)e P l a - t > S denote the M O P ("meet over all paths") solution.

It is defined by MOPS (fc, pla, s) = 2pla'. II {tf,(... (tfl(s)))[<pla o, pla 1, tfl), ...,

<pla,_ 1, pla,, tf,) Efc A pla o = pla A pla, = pla'}. This formulation essentially is that of [7]. One difference is that 1-7] uses [7 instead of Ll, i.e. uses the dual lattice, but this is not crucial [13]. Another is that MOPS (fc, pla, s) (pla) need not be s but is likely to be • The similarity between the approach of Sect. 2 and the traditional approach is expressed by:

Theorem 4. For all c m d e C m d , s ~ S and plaePla:

c~ indl[cmd]] ( ) f i n - i n d s p l a =

MOPS (FOg indl[cmd~ (< >)(<<< ), " E " ) , ~.s. s))+2, << ), " E " ) , s)(pla)

(17)

A Denotational Framework for Data Flow Analysis 281 where ind = ind (c~, 7) for ( ~, 7) a pair of semi-adjoined functions between complete lattices r and S.

Proof See Appendix 1. []

Note that both sides of the equality sign yield _1_ when pla is not maximal or not a descendant of occ.

The MFP Solution

Denote by M F P S (fc, pla, s)e P l a - t > S the M F P ("maximal fixed point") solution for flowchart fc with entry arc pla and description element s holding there.

It is defined by MFPS(fc, pla, s)=LFP(step(fc, pla, s)) where step(fc, pla, s)~

( P l a - t > S ) - m > ( P l a - t > S ) is the " d a t a flow equations", i.e. specifies the effect of advancing data flow information along one basic block. It is defined by step(fc, pla, s)= 2 a . 2 pla'. II {tf(a [-s/pla] pla")l(pla", pla', tf)~fc}. Here pla, pla' and pla" are going to be maximal so that a[s/pla] pla is s and a[s/pla] pla" is a(pla") otherwise. Again, this formulation is essentially the one given in [-7]

(where greatest fixed points and [-] are used instead of least fixed points and II).

To obtain a denotational semantics specifying the M F P solution, new semantic functions DOg ind and Dg ind are defined (Tables 10, 2, 7, and 11). They are in

"direct style" and the functionality for DCgind~cmd]] (occ) is S - m > S x A.

From description element seS is obtained the tuple (s', a ) where the component a specifies partial data flow information for cmd. When occ is maximal it is easy to show that s ' = a ( o c c , " R " ) (see Appendix 2, Lemma 3).

In the clause for Dog ind[[if exp then cmd 1 else cmd 2 fi]] use is made of the function Dcond. The approximate description element supplied to "the rest of the program" is s t u s 2 , where s t is produced along one branch and s 2 along the other. This differs from cg ind where "the rest of the program" is analysed with s a and s 2 separately and only the resulting data flow information is combined.

Note in the clause for Dog ind [[while exp do cmd od~ that 2s'. Dg(s'~s) rather than e.g. 2s'- Dg(s') is used. Consider some iteration of the while construct where s holds at pla 2 (see Fig. 4). Then some s' will be computed to hold at pla 5. It is then s'us (not s') that holds at pla 2 and must be used in the next iteration if the M F P solution is to be obtained.

The connection between the M F P solution and Dog ind is given by:

Theorem 5. For all c m d e C m d , seS and plaePla:

DC~ ind[[cmd]] ( ) s~2 p l a =

MFPS(FC~ ind[cmd]](( ) ) ( ( ( ( ) , " E ' ) , 2s.s))$2, ( ( ) , " E " ) , s) pla where ind=ind(ct, 7) for (ct, 7) a pair of adjoined functions between complete lattices ~(Sta) and S, where S is of finite height.

Proof When (e, Y) is as above then e is additive and hence continuous. Also 7 is continuous when S is of finite height. The only properties of (e, 7) to be used in the proof is that (e, 7) is a pair of semi-adjoined functions between complete

(18)

Table 10. Semantic functions in direct style D c g e C m d - c > O c c - c > S - m > S x A

DOg[if exp then cmd 1 else cmd 2 fi] o c c = D attach (occ, " L " ) *

O ~f[exp]] occ w ( 1 ) *

D cond(D cr ~[cmdl ] occ w ( 2 ) * D attach (occ, "R " ) , O cr i[cmd2 ~ occ w ( 3 ) * D attach (occ, "R ")) D cr [while exp do cmd od]l occ =

D attach (occ, " L " ) *

FIX (2 D g. 2 s. [D ~ I[exp]] occw ( 1 ) 9

D cond(D ~ [cmd]l occ w ( 2 ) * (2s'. D g ( s ' u s ) ) , D attach (occ, "R "))] s)

the remaining clauses are similar to that of expl ope exp2 D ~ E x p - c > O c c - c > S - m > S x A

Dgflexpl ope exp2]l o c c = D attach (occ, " L " ) *

Dr occw ( 1 ) *

D 8 I[exp2]] occ w ( 3 ) *

D do(V apply I[ope]l, B apply lope]l) 9 D attach (occ, "R " )

the remaining clauses are similar to that of expi ope exp2

Table I 1. Additions to interpretation ind (for D cr and D ~) Auxiliary functions

D c o n d ~ ( S - m > S • A ) • S • A ) - c > ( S - m > S • A) D cond(dgl, dg2) = 2 s. dg I (tft(s))udg2 (tff(s))

fit, tff as in Table 9

D attach ~ P l a - c > (S - rn > S x A) D attach (pla) = 2 s. (s, Z [ s / p l a ] )

D do e(Sta - c > T) x (Sta - c > Sta) - t > (S - rn > S x A) D do (V g, B g) = 2 s. (ct {Bg(sta) I V g(sta) = true ^ staey(s)}, • C o m b i n a t o r , as in Table 9.

lattices ~(Sta) and S such that a and y are continuous. For the proof proper see Appendix 2. []

Both sides of the equality are _L when pla is not maximal. In the literature it is usually assumed that S is of finite height because then the M F P solution is computable (provided the transfer functions are). This holds even if S is infinite (as in the constant propagation analysis of Sect. 2). In contrast, the M O P solution need not be computable even when S is of finite height [8]. It follows from a theorem of J.B. Kam I-7] that the constant propagation analysis is an example of this.

That ~r ind [cmd]] ( ( ) ) ( f i n - i n d ) ( s ) _ _ D ~ ind I[cmd]] (())(s)~2 follows from Theorems 4 and 5 when (e, 7) satisfies the conditions of(the p r o o f o 0 Theorem 5.

This is because MOPS(fc, pla, s)__MFPS(fc, pla, s) (see e.g. [3]). Since sts is a

(19)

A Denotational F r a m e w o r k for D a t a Flow Analysis 283

special case of ind (with S = ~(Sta) and 0t = 7 = Z states, states) it is meaningful to consider D~f sts. The transfer functions of sts are additive and it therefore follows from [3] that cg sts [[cmd]] ( ( ) ) ( f i n - sts) (s) = Dcg sts [[cmd~ ( ( ) ) (s) ~ 2.

4. Conclusion

It has been shown how data flow analysis can be specified in a denotational approach, by systematically transforming a store semantics to an "induced semantics" parameterized by a pair of semi-adjoined functions. It is claimed that the approach is no less systematic than existing operational methods.

Semantic characterizations are with respect to the collecting semantics. To give a "semantic characterization" of the collecting semantics in terms of a store semantics it seems to be necessary to consider program transformations (as is done in [12, 11]).

The data flow analyses considered in this paper could be called "history- insensitive". In [11] a similar development yields a semantic characterization of "available expressions" (a "history-sensitive" analysis). Unfortunately the machinery required to establish the semantic characterization is somewhat complex. More research is needed to handle "live variables" (a "future-sensitive"

analysis) and languages with arbitrary jumps and procedures.

The relationship between continuation-style and M O P and between direct- style and M F P shows that for data flow analysis purposes continuation-style is inherently more accurate than direct style. However, it should be kept in mind that the M O P solution can always be specified as the M F P solution to a different data flow analysis problem (whose M F P solution need not be computable) [3].

Acknowledgement. I should like to t h a n k Neil Jones for his continuing interest in and helpful c o m m e n t s u p o n this work, and Patrick Cousot, G o r d o n Plotkin and H a n n e Riis for helpful comments.

Appendix 1 : Proof of Theorem 4

In this appendix suffix ind is omitted. For the proof the function Close e F C • Pla x ~ ( S ) - t > P l a - t

> ~ ( S ) is useful. It is defined by Close(fc, pla', S') (pla)=

{tf,(...(tfl(s)))lseS' ^ (plao, plal, t f t ) . . . (plan- 1, plan, t f , ) e fc A pla o = p l a ' A plan = pla}

T h e n MOPS(re, plao, so) ='1 pla. I II-Close(fc, plao, {So} ) (pla)] = I Io Close(fc, plao, {So} ).

The proof of the theorem is by a structural induction showing P - C m d ( c r n d ) and P - E x p ( e x p ) . Predicate P - C m d e C m d - t > {true, false} is defined by P - C m d (cmd) = V c e C: V S' _~ S:

pla' m a x i m a l and occ maximal and pla' not a descendant of occ ==~

II {~ I[cmdl] (occ) (c) (s) lseS'} = II {c(s) lsecl (occ, " R " ) } ~ (ll o cl) where cl = C l o s e ( F f f [[cmd]] occ (pla', ,t s . s)$2, pla', S').

Predicate P - Exp is defined similarly. Clearly the theorem follows from P - C m d (cmd) because fin = Z.

Since this proof is long, only the most difficult case will be considered. It a m o u n t s to showing P - C m d (while exp do crnd od) a s s u m i n g P - C m d ( c m d ) and P - E x p ( e x p ) . So let occ be maximal

(20)

and abbreviate plao, ..., pla 6 and fit, tff as in Fig. 4. Further abbreviate g [c] = 2 c'- gl {cond(gz c', g3 c)}

where g l = S n - e x p ] o c c w and gz=(g[fcmdll occw ( 2 ) and g3=attach(pla6). Similarly f g = f g l

* F cond (fg2, fg3) where fgl = Fd~ [[exp]l occ w ( 1 ) and fgz = F ~ [[cmd]l occ w ( 2 ) and fg3 = Fattach (pla6).

Lemma 1. For arbitrary c, c' ~ C and S' ~ S :

I_1 {g [c] c' sis ES'} = II {e(s)[s~cl(pla6) } t~ II {c'(s)[s~cl(plas) }

~(Uocl) where cl =Close(fg ( p l a l , 2 s. s),L2, pla 1, S').

Proof. Abbreviate c12 IS"] = Close(fg2 (pla 3 , 2 s . s)~,2, pl%, S').

Then by P - C m d ( c m d ) , [..l{g2(c')(s)ls~S"} = U {c'(s)[sEcl2 [ S ' ] plas}u(l_Jocl 2 [S"]). Abbreviate cl 3 IS"] = Close(fg 3 (pla3, 2 s. s)~2, pla3, S") so that kl {ga (c) (s)lseS"} = II {c(s)ls~cl a IS"] pla6}

t~(IAo cl 3 [S']). Then

IA {cond(ga c', g3 c) slseS"}

= IA{c'(s)lsecl 2 [ { t f t ( s ) l s e S " } ] p l a s } , , ( I A o c l 2 [ { t f t ( s ) l s e S " } ] ) , , II {c(s) lsecl 3 [{tff(s) l seS"}] pla6} u ( l l o cl 3 [{tff(s) l seS"}])

= IA {c'(s)l s~(cl~ [S"] ~, cl; [-S'-I) plas} u IA {c(s) lse(cr2 [S"] ~,cl; IS"]) pla6} u (U o(cll [s"] ucl; [s"]))

where cl~ IS"] = Close (fgz (pla3, fit)J,2, plaa, S")= cl z [{tft(s) ls eS'}]

and cl; IS"] =Close(fg3 (pla~, tff)~,2, pla3, S " ) = c l 3 [{tff(s)lseS'}].

The last step is by the definition of Close and the properties of FOg mentioned in Sect. 3.

It will now be shown that cl; IS"] ~ cl; IS"] = cl' IS"] where cl' IS"] = Close(Fcond (fg2, fg3) ( p l a 3, 2 s . s),~2, plaa, S")=Close(fg z ( p l a 3, tft),[2 ~ fg 3 (pla3, tff)J,2, pla3, S"). Inequality r- is immediate since Close is monotone in its first argument. To establish the converse inequality consider any sequence ( p l a ; , pla't, tf'l) ... (pla;_ 1, pla'., tf,) that is in fg2 (pla3, tft)J,2ufg3 (pla3, tff)12 and has p l a ; = p l a 3 . Such a sequence is said to be " m e n t i o n e d " in cl' IS"] (pla',). It is not difficult to see (using properties of F ~ ) that the sequence is entirely either in fgz (pla3, tft)$2 (hence " m e n t i o n e d "

in cl~ [S"](pla~,)) or in fg3(pla3, tff)J,2 (hence " m e n t i o n e d " in c l ; [ S " ] (pla'~)). Thus c l ' [ S ' ] ( p l a ; ) cll IS"] (pla;) ~ cl; IS"] (pla;).

Next, abbreviate cl~ [S'] = Close(fg I ( p l a l , 2 s. s)J,2, pla~, S'), Then P - E x p ( e x p ) asserts (with S"

above being cll [S'] (pla3)) that kl {g [c] c'(s)lseS'}

= I I { c ' ( s ) l s e ( C l l [ S ' ] u c l ' [ c l l [ S ' ] p l a 3 ] ) p l a s } , , If {c(s) lse(cll [S'] u c l ' [cll [S'] pla3]) pla6} u (I /o(cl~ [S'] t~cl' [cl~ [S'] (pla3)]))

where properties of F g and Close assure cl~ [S'] (plas)= r = cl~ [S'] (pla6).

Finally, it must be shown that ellIS'], ~cl'[cl~ IS'] pla3] = c l [ S ' ] . Consider the inequality ~ . Again the key to the proof is to consider a sequence (pla;, pla~, t q ) .... ,(pla'._~, pla;, ft.) that is

" m e n t i o n e d " in cl [S'] (pla',) and has plab =plaa. If pla; is a descendant of occw ( 1 ) the properties of F d o and F ~f assert that all of pla[ ... pla', are, so the entire sequence is " m e n t i o n e d " in cl~ IS'] (pla',).

Otherwise there is some m < n so pla[ .... , pla~ are all the descendants ofocc w ( 1 ) (again by properties of F g and FOg). Then (pla~, plal, tf~ ) ... (pla;._ ~, pla~,, tf,.) is " m e n t i o n e d " in cl~ [S'] (pla'). Also pla" =pla3 (because (pla~,, pla~,+ ~, tf~, + ~) is in fg2 (pla3, tft),~2 ~ fg3 (pla3, tff)~,2) so exp is left through pla~ and not entered again. Also (plato, p l a ' . ~, tf',,+ ~) ... (pla'._ x, pla',, tf,) is " m e n t i o n e d "

in cl' [cl 1 IS'] (pla3)] (pla'.). F r o m this cla [S'] (pla',)wcl' [cl~ [S'] (pla3) ] (pla'.)_cl [S'] (pla',), The converse inequality is similar. []

In the next lemma the iterative nature of while is dealt with. The proof makes use of L u k ~ F C x Pla x ~(S) x Integer x FC - t > P l a - t > ~ ( S ) that is like Close but constrains the n u m b e r of times some part of the flowchart is traversed. It is defined by Luk(fc t, pla', S', k, fc~)(pla)=

{tf'. (... (tf'~ (s))) [ s~S' A (pla~, pla'~, tf~) ...

(pla'._ l, pla'., tf.)efc~ ~ fc 2 A pla~) = pla'/x pla'. = pla /x [{il (pla'~_ ~, pla'~, tf~)6 fc2}[ =k}.