• Ingen resultater fundet

Static Analysis Techniques

4.3 Flow Logic

Control Flow Analysisproblems emerge when one tries to compute how focus of control moves through a program. The initial conceptualisation of static Control Flow Analysis dates back to work on interprocedural analysis and obtained momentum with Shivers’ work on functional languages [Shi88], and was further refined by Jagannathan [JW95]. In this context, and based on the view that, in terms of flow, data and control are two sides of the same thing [NNH99],Flow Logic was pioneered in the late 1990’s, by Nielson and Nielson [NN98, NN97, NN02] as a unifying specification oriented approach to constraint based static analysis.

Basic Concepts. The Flow Logic framework makes a clear distinction be-tween the specification of an analysis and the computation of corresponding analysis results. This approach allows the designer to focus on the specification of analyses without making compromises dictated by implementation consid-erations. The implementation phase is also simplified and improved, as the implementor is always free to choose the best available tool — no particular

tool or formalism is prescribed by the framework.

Analysis Domain. As for Monotone Frameworks, a Flow Logic specification is based on a suitable universe of discourse, L. Here it is customary to follow the approach of Monotone Frameworks and demand that L is a complete lattice.

However, the domain of a Flow Logic specification should be designed such that elements correspond toglobal Control Flow Analysis information of the program of interest. This is in contrast to the property space of a monotone framework, which is designed such that the elements correspond to the Data Flow Analysis information of individual program points (or states if you will) of the program of interest,

Acceptability Judgement. Fundamentally, a Flow Logic specification is concerned with the relationship between programsP∈langand static analysis estimates A ∈L. This connection is captured by an acceptability judgement

A |=P

intended to hold precisely when Aconstitutes an acceptable analysis estimate forP.

The judgement is defined by clauses; typically there is one clause for each syn-tactic constructφoflangand they take the form

A |=φ(· · ·Pi· · ·) iff (some formulaϕwithA |=P for various sub-programsP)

whereϕis usually a formula in a suitable fragment of First Order Logic,fol. Flow Logic. These ingredients formalise a given Control Flow Analysis problem as a Flow Logic. Obviously the involved judgement relation “|=” has function-ality

|=: (L×lang)→ {true,false}

Generally, as we do not demand the specification to besyntax directed, i.e., for each of the defining clauses that eachPoccurring inϕis one of thePioccurring in φ(· · ·Pi· · ·), we have to define the meaning of “|=” by co-induction rather than induction.

Thus, when formally assigning meaning to a Flow Logic specification, we do so by regarding it as defining a functional

Q: ((L×lang)→ {true,false})→((L×lang)→ {true,false}) thegreatest fixed point of which we take as the formal meaning.

However, when the specificationis syntax directed the least and greatest fixed points coincide, and hence “|=” is adequately defined by ordinary induction.

In the terminology of Flow Logic, and by analogy to Structural Operational Semantics [Plo04], we call such specificationscompositional. Specifications that are not syntax directed we callabstract.

Desirable Properties. For a setup like this to qualify as a Flow Logic it must posses a number of desirable properties that we shall elaborate in the following:

Well-definedness. The analysis must be well-defined, i.e., for every combination ofP ∈langandA ∈L, the acceptability of Aas an analysis estimate for P is unambiguously defined. This amounts to showing that “|=”, with functionality as outlined above, constitutes a total function. In the case of a compositional specification this is immediate to show by structural induction inP. In the case of an abstract specification, however, the judgement is well-defined only if Q constitutes a monotone functional over the complete lattice

((L×lang)→ {true,false},⊑) where the ordering⊑is given by:

Q1⊑Q2 iff∀(A, P) : (Q1(A, P) = true)⇒(Q2(A, P) = true)

When this is the case the existence of a suitable greatest fixed point is guaranteed by Tarski’s fixed point theorem.

Semantic Correctness. Intuitively, an analysis issemantically correct if analysis estimates that are acceptable for a process P contain information about every possible evolution of P that is allowed by the semantics. As Flow Logic is a semantics based approach to static analysis this notion of correctness is usually captured by asubject reductionresult:

ifA |=P andP −→QthenA |=Q

which expresses that the acceptability of analysis estimates is preserved by the reaction relation. That is, ifAis an acceptable estimate forP, and the seman-tics allowsP to evolve intoQ, then Ais also acceptable forQ. This relies on an auxiliary result for structural congruence:

ifP≡QthenA |=P iffA |=Q

asserting that the acceptability of analysis estimates is invariant under the struc-tural congruence.

Remark 4.22 Due to the directedness of ⇛the latter becomes ifA |=P and P⇛QthenA |=Q

in our setup.

Clearly, full semantic correctness, sometimes calledsemantic soundness follows by the transitive closure of the subject reduction result, i.e.,

ifA |=P andP −→QthenA |=Q

Occasionally, it is convenient to use a collecting semantics and formalise the relationship,R⊆L×Trace, between analysis estimates and traces in order to formulate correctness:

ifA |=P,ARtr, andhP, tri −→hQ, tri, thenA |=QandARtr

Moore Family Property. Finally, it is desirable for every program,P, to actually have an acceptable analysis estimate and, indeed, a unique least such. This is the case if:

{A | A |=P} constitutes a Moore Family for allP

Recall that, trivially, a Moore family is never empty as it always contains a greatest element d

∅ = ⊤L, which is the trivial worst (i.e., least informative) acceptable analysis estimate. In contrast it is also guaranteed to have a least element A = d

{A | A |= P}, which is the least admissible result under the ordering⊑L ofLand, hence, thebest (most informative) acceptable estimate.

Implementation. The implementation of a Flow Logic specification is en-abled by a simple change of viewpoint: Intuitively, the acceptability judgement

A |=P iff ϕ

associates each program,P, with a formula,ϕ, such that an analysis estimate, A, is acceptable forP if, and only if,Aconstitutes a model ofϕ. Thus, as long as we are able to compute the appropriate ϕ the remaining task of finding a suitable model, A, can be left to an auxiliary logical solver.

In the presence of an abstract Flow Logic specification this idea clearly breaks down because ϕ is not necessarily finite for every given P. In the case of a compositional Flow Logic specification, however, the finiteness ofϕfollows di-rectly from the finiteness of the syntactic representation ofP and the inductive nature of the specification. If, furthermore, the specification is well-defined then it defines a total function from programsP ∈langto formulaeϕ∈fol. And when this is the case we shall refer to the Flow Logic specification of some global Control Flow Analysis problem, A, as a binary predicate

A=FL(lang,fol).

The ALFP Logic. Clearly, the theoretical as well as practical properties of an

term ::= c|x|f(term1, . . . ,termk)

pre ::= R(term1, . . . ,termk)| ¬R(term1, . . . ,termk)

| term1=term2|term16=term2

| pre1∧pre2|pre1∨pre2| ∀x:pre| ∃x:pre clause ::= R(x1, . . . , xk)|1|clause1∧clause2

| pre=⇒clause| ∀x:clause

Table 4.2: Syntax of the Alternation-free Least Fixed Point Logic.

analysis implementation depend on the formulae,ϕ, derived from programs,P, as well as the solver used for the computation of models.

In order to ensure nice properties, the Flow Logic specifications presented in this dissertation are expressed using an extension of Horn clauses, known as Alternation-free Least Fixed Point Logic (alfp) [NNS02]. This logic is presented in Table 4.2, where we write x for variables, c for constants, f for function symbols, R for predicates,term for terms,pre for preconditions, andclause for clauses.

The clauses are interpreted over a universeU of ground terms. The semantics is given in terms of satisfaction relations

(ρ, σ)|=preand (ρ, σ)|=clause

that are defined in the standard way. Thus,ρis an interpretation of predicate symbols, σ is an interpretation of terms, and the clause, e.g., R(x1, . . . , xk is interpreted according to the following rule:

(ρ, σ)|=R(x1, . . . , xk) iff (σx1, . . . , σx2)∈ρR

The logic has the nice feature that clauses with no free variables satisfy the model intersection property, i.e., given an interpretationσ0of the free variables in a clause cl, the set of interpretations {ρ|(ρ, σ0)|=cl} constitutes a Moore family [NNS02].

Note that in the presence of negation in preconditions the solvability of clauses is subject to a notion of stratification, which is not relevant, however, in the context of this dissertation.

As a matter of convenience we shall often use other mathematical notation to write certainalfpconstructs:

INPUT: a Flow LogicFL(lang,alfp) and a programP.

OUTPUT: analfpformulaϕsuch thatA |=ϕ⇔ A |=P. METHOD: Setϕ:=A |=P

whileϕcontainsA |=P

and there is a ruleαiffβinFL and a substitutionθ such thatθα=A |=P

do replaceA |=Pwithθβinϕ.

Table 4.3: Constraint generation algorithm for Flow Logics.

• The predicates of alfp represent relations. For a such a relation R we shall often write (x1, . . . , xk)∈Rto denote thealfpatomR(x1, . . . , xk).

• When testing for intersection we often writeR(x1, . . . , xi)∩R(y1, . . . , yi)6=

∅to denote∃zj, . . . , zk :R(x1, . . . , xi, zj, . . . , zk)∧R(y1, . . . , yi, zj, . . . , zk).

• When expressing subset relationships we writeR(x1, . . . , xi)⊆R(y1, . . . , yi) to denote∀zj, . . . , zk:R(x1, . . . , xi, zj, . . . , zk)⇒R(y1, . . . , yi, zj, . . . , zk).

Clause Generation. Algorithmically we capture the required change of view-point in a clause generator that works largely as the chaotically iterative al-gorithm outlined in Table 4.3. This procedure takes as input a Flow Logic FL(lang,alfp) and a programP. And, from this, it produces analfpclause, the models of which are exactly the acceptable analysis estimates.

The best (i.e., least under the ordering ofL) acceptable analysis estimateAfor Pis the minimal model satisfying the generated clause. The model intersection property ofalfpguarantees that this can be unambiguously computed using a fixed point engine like those ofThe Succinct Solver Suite [NNS+04].

Remark 4.23 Generally analyses implemented using the succinct solver ex-hibit an asymptotic worst-case complexity that is of the order ofO(nk+1), where nis the size of the term universe, which is fixed for any given process, andkis the maximal nesting depth of quantifiers in the generated clause.

4.4 Concluding Remarks

In this chapter we have introduced two common approaches to Static Program Analysis. These approaches originate from the analytic needs of two very dif-ferent programming paradigms and thus differ in some respects.

Classically, Monotone Frameworks constitute a generalised approach to the Data Flow Analysis problems of imperative languages – a setting where programs are normally considered as flow-graphs. However, we have deliberately presented the approach in a setting where programs are considered as labelled transition systems. This sets the scene for Chapter 8, where we shall use the approach to approximate the temporal structure, i.e., the underlying transition systems defined by the semantics, of BioAmbients models.

Similarly, Flow Logic constitutes a generalised approach to the Control Flow Analysis problems of functional languages – a setting where continuations are data and the flow of control is predominantly decided at run-time, Here it has traditionally been used to approximate the control flow structures of programs in the context of dynamic dispatch. In Chapters 6 and 7 we shall similarly use Flow Logic to approximate the spatial structure, i.e., the set of ambient induced nesting hierarchies, that can arise dynamically from a BioAmbients model.

Throughout the technical developments we shall consistently take the approach that is inherent in semantics based program analysis: In order to study proper-ties of a programP we analyse the static snapshot given by the direct syntac-tical representation ofP. However, we carefully define the analyses such that the properties are “⊒-preserved” under heating and reaction and use this to es-tablish the semantic soundness. In this manner we ensure that the information learnt from the study ofP inP

L˜

−→Qis also valid forQ.

Analysing for Structural