Analysis of Control-Flow Constructs

To finish up the analysis that determines the sign of variables, we
must define how the analysis processes commands. For each command c, I
want to execute c in some abstract store before c, and determine an
abstract store after the command. At each point during the analysis of
c, I will keep track of the current abstraction at that point, but I
will have no information about the concrete store (i.e., the concrete
integer values of variables). I express the analysis for a command c
using an abstract denotation C'[[c]]:

    C'[[c]] : AbsStore -> AbsStore

Note that C'[[c]] is a total map, in contrast to the concrete
semantics C[[c]], which is a partial map. (why?)

The analysis of skip and assignments is straightforward. For
assignments, we invoke the abstract interpretation A' for expressions
to determine the sign of the expression being assigned:

   C'[[skip]] S = S
   C'[[x := a]] S = S[x -> A'[[a]]S]

For sequences:

   C'[[c1;c2]] S = C[[c2]] (C[[c1]] S)

In other words, C'[[c1;c2]] = C[[c2]] o C[[c1]], since we now deal
with total maps.

More interesting is the analysis of an if command. If we can determine
which branch is being taken given the current sign information, then
we only need to execute the appropriate branch. However, if we cannot
precisely determine the truth value of the condition, then we must
conservatively analyze each branch in turn and then combine the
results:

   C'[[if b then c1 else c2 ]] S =

          C'[[c1]] S   if B'[[b]]      S = true
          C'[[c2]] S   if B'[[b]]      S = false
          merge(C'[[c1]] S, C[[c2]] S) if B'[[b]] S = anybool

Note that the last case is the one that we usually expect in real
programs -- the case when we cannot statically tell which branch is
going to be executed (usually because the test depends on some input
to the program). The first two cases show that one branch is always
taken; then, we can optimize the other branch away, and replace the if
command with branch being taken.

The merge operation takes two abstract stores and tries to build a new
store that conservatively approximates both given stores. For each
variable, it combines the two abstract signs for that variable in the
given stores:

   merge : (AbsStore x AbsStore) -> AbsStore

   merge(S1, S2) = { (x, merge_var(S1(x), S2(x)) | x in Var}

   merge_var(X, Y) =  X      if X = Y
                      anyint if X != Y

   where X, Y in AbsInt

Note that merge_var yields the most precise sign of a variable given
its signs on the true and false branch of the if. If a variable is
positive on both branches, merge_var yields pos; but if a variable has
unknown sign on one branch, or if it has different signs on the two
branches, merge_var yields anyint. 

With respect to the lattice structure of our abstraction, merge_var(X,
Y) yields the lowest possible element in the lattice diagram that
approximates (is higher up than) X and Y; such an operator is known as
the _least upper bound_ operator (abbreviated l.u.b.). Hence,
merge_var is essentially the l.u.b. operator.


Example programs

Let consider a few examples. Let's try the following sequence of
assignments:

  x := 4; y := -3; z := x + y

If I start with a concrete store where all variables are initialized
to zero s = {x=0, y=0, z=0}, then the concrete execution of c yields:

 C[[c]] s = {x=4, y=-3; z=1}

Nor if I start with abstract store S = {x=zero, y=zero, z=zero} and I
analyze the program using my sign abstraction, I get:

 C'[[c]] s = {x=pos, y=neg; z=anyint}

Note that the analysis result C'[[c]] s is faithful to the concrete
result C[[c]] s: the sign of each variable correctly describe the
actual value of that variable. However, the analysis may yield results
that, despite being correct, are not very accurate -- in this case for
z. This is the price to pay for restricting ourselves to the abstract
domain during the analysis.


Consider now a program with an if statement:

  x := 4; y := 3; if (x > y) then x := x + y else y := y - x

If I start with the same concrete store s, the program will take the
true branch and yield:

  C[[c]] s = {x=1, y=7}

Starting with an abstract store S = {x=zero, y=zero}, I get the
following. The first two assignments will yield a store S' = {x=pos,
y=pos}. Based on these signs, the analysis cannot determine whether
the test x > y succeeds. Therefore, it analyzes each branch. On the
true branch, the analysis can determine that x + y is positive and
maintain a positive sign for x:

  C'[[x := x + y]] S' = {x=pos, y=pos}


On the false branch, however, the analysis cannot determine the sign
of y - x since both x and y are positive, but their magnitudes are not
known. Therefore, the analysis sets an unknown sign for y:

  C'[[y := y - x]] S' = {x=pos, y=anyint}
 

Finally, the analysis combines the results from the two branches to
get the following final result:

  C'[[c]] S = {x=pos, y=anyint}

So the analysis could successfully determine that x is positive at the
end of the program, regardless of the branch being taken; but it could
not precisely determine the sign of y. As in the previous example, the
analysis result correctly models the signs of the values computed in
the actual execution of the program.


Analysis of while loops

Now consider I want to analyze a loop construct while b do
c. Essentially, I want to find an invariant about the variable
signs. Hence the loop invariant is an abstract store S_I. The analysis
result will then be this invariant:
  
  C'[[while b do c]] = S_I

Let's first write down what it means that some abstract store is
invariant. If I look for an invariant abstract store S_I, then I want
S_I to be maintained after executing each loop iteration in the
abstract domain. That is, if I start with S_I and I analyze the loop
body c, then I should get some abstraction that is at least as precise
as S_I:

   C'[[c]]S_I lessthan S_I

(here I extended the ordering relation "lessthan" to abstract
stores. I say that store S1 is less than S2, written S1 lessthan S2,
if S1(x) is less than S2(x) for each variable x. This is known as the
pointwise ordering for functions).

There is one more thing that I should ensure. If S is the abstract
store before the while loop, then S must be at least as precise as S_I
(to make sure that S_I actually holds before the first iteration,
similar to the weakening rule that we use in Hoare-style proofs):

  S lessthan S_I

I can then write these two conditions together in one single equation:

  merge(S, C'[[c]]S_I) lessthan S_I

If I look more carefully, I see that S_I = {(x, anyint) | x in Var}
satisfies this relation. It is an invariant saying that I don't know
the sign of any variable. This is an invariant, but a useless one. It
turns out that there are multiple possible solutions to the above
equation; we are looking for the most precise solution S_I, the least
solution with respect to the ordering in our abstract domain.

It turns out that there is a simple algorithm that computes the least
(most precise) solution of the above equation. The algorithm uses
successive approximations to derive S_I and works as follows:

  while (true) do 
     S' = merge(S, C'[[c]] S)
     if (S' = S) then return S'
                 else S = S'

Now I can show you that this algorithm computes a solution of our
recursive equation. That is because of two facts:

 - the initial value of S is less than all subsequent values of S and
 S' (intuitively, the merge operation "accumulates" new information,
 but keeps the original information); in particular the initial S is
 always less than S';

 - when the loop finishes, S = S', so S' = merge(S',
   C'[[c]]S'). Hence, C'[[c]]S' is less than S'.

So S' is greater than both S' and C'[[c]] S', hence it satisfies the
desired equation.