Analysis of Control-Flow Constructs To finish up the analysis that determines the sign of variables, we must define how the analysis processes commands. For each command c, I want to execute c in some abstract store before c, and determine an abstract store after the command. At each point during the analysis of c, I will keep track of the current abstraction at that point, but I will have no information about the concrete store (i.e., the concrete integer values of variables). I express the analysis for a command c using an abstract denotation C'[[c]]: C'[[c]] : AbsStore -> AbsStore Note that C'[[c]] is a total map, in contrast to the concrete semantics C[[c]], which is a partial map. (why?) The analysis of skip and assignments is straightforward. For assignments, we invoke the abstract interpretation A' for expressions to determine the sign of the expression being assigned: C'[[skip]] S = S C'[[x := a]] S = S[x -> A'[[a]]S] For sequences: C'[[c1;c2]] S = C[[c2]] (C[[c1]] S) In other words, C'[[c1;c2]] = C[[c2]] o C[[c1]], since we now deal with total maps. More interesting is the analysis of an if command. If we can determine which branch is being taken given the current sign information, then we only need to execute the appropriate branch. However, if we cannot precisely determine the truth value of the condition, then we must conservatively analyze each branch in turn and then combine the results: C'[[if b then c1 else c2 ]] S = C'[[c1]] S if B'[[b]] S = true C'[[c2]] S if B'[[b]] S = false merge(C'[[c1]] S, C[[c2]] S) if B'[[b]] S = anybool Note that the last case is the one that we usually expect in real programs -- the case when we cannot statically tell which branch is going to be executed (usually because the test depends on some input to the program). The first two cases show that one branch is always taken; then, we can optimize the other branch away, and replace the if command with branch being taken. The merge operation takes two abstract stores and tries to build a new store that conservatively approximates both given stores. For each variable, it combines the two abstract signs for that variable in the given stores: merge : (AbsStore x AbsStore) -> AbsStore merge(S1, S2) = { (x, merge_var(S1(x), S2(x)) | x in Var} merge_var(X, Y) = X if X = Y anyint if X != Y where X, Y in AbsInt Note that merge_var yields the most precise sign of a variable given its signs on the true and false branch of the if. If a variable is positive on both branches, merge_var yields pos; but if a variable has unknown sign on one branch, or if it has different signs on the two branches, merge_var yields anyint. With respect to the lattice structure of our abstraction, merge_var(X, Y) yields the lowest possible element in the lattice diagram that approximates (is higher up than) X and Y; such an operator is known as the _least upper bound_ operator (abbreviated l.u.b.). Hence, merge_var is essentially the l.u.b. operator. Example programs Let consider a few examples. Let's try the following sequence of assignments: x := 4; y := -3; z := x + y If I start with a concrete store where all variables are initialized to zero s = {x=0, y=0, z=0}, then the concrete execution of c yields: C[[c]] s = {x=4, y=-3; z=1} Nor if I start with abstract store S = {x=zero, y=zero, z=zero} and I analyze the program using my sign abstraction, I get: C'[[c]] s = {x=pos, y=neg; z=anyint} Note that the analysis result C'[[c]] s is faithful to the concrete result C[[c]] s: the sign of each variable correctly describe the actual value of that variable. However, the analysis may yield results that, despite being correct, are not very accurate -- in this case for z. This is the price to pay for restricting ourselves to the abstract domain during the analysis. Consider now a program with an if statement: x := 4; y := 3; if (x > y) then x := x + y else y := y - x If I start with the same concrete store s, the program will take the true branch and yield: C[[c]] s = {x=1, y=7} Starting with an abstract store S = {x=zero, y=zero}, I get the following. The first two assignments will yield a store S' = {x=pos, y=pos}. Based on these signs, the analysis cannot determine whether the test x > y succeeds. Therefore, it analyzes each branch. On the true branch, the analysis can determine that x + y is positive and maintain a positive sign for x: C'[[x := x + y]] S' = {x=pos, y=pos} On the false branch, however, the analysis cannot determine the sign of y - x since both x and y are positive, but their magnitudes are not known. Therefore, the analysis sets an unknown sign for y: C'[[y := y - x]] S' = {x=pos, y=anyint} Finally, the analysis combines the results from the two branches to get the following final result: C'[[c]] S = {x=pos, y=anyint} So the analysis could successfully determine that x is positive at the end of the program, regardless of the branch being taken; but it could not precisely determine the sign of y. As in the previous example, the analysis result correctly models the signs of the values computed in the actual execution of the program. Analysis of while loops Now consider I want to analyze a loop construct while b do c. Essentially, I want to find an invariant about the variable signs. Hence the loop invariant is an abstract store S_I. The analysis result will then be this invariant: C'[[while b do c]] = S_I Let's first write down what it means that some abstract store is invariant. If I look for an invariant abstract store S_I, then I want S_I to be maintained after executing each loop iteration in the abstract domain. That is, if I start with S_I and I analyze the loop body c, then I should get some abstraction that is at least as precise as S_I: C'[[c]]S_I lessthan S_I (here I extended the ordering relation "lessthan" to abstract stores. I say that store S1 is less than S2, written S1 lessthan S2, if S1(x) is less than S2(x) for each variable x. This is known as the pointwise ordering for functions). There is one more thing that I should ensure. If S is the abstract store before the while loop, then S must be at least as precise as S_I (to make sure that S_I actually holds before the first iteration, similar to the weakening rule that we use in Hoare-style proofs): S lessthan S_I I can then write these two conditions together in one single equation: merge(S, C'[[c]]S_I) lessthan S_I If I look more carefully, I see that S_I = {(x, anyint) | x in Var} satisfies this relation. It is an invariant saying that I don't know the sign of any variable. This is an invariant, but a useless one. It turns out that there are multiple possible solutions to the above equation; we are looking for the most precise solution S_I, the least solution with respect to the ordering in our abstract domain. It turns out that there is a simple algorithm that computes the least (most precise) solution of the above equation. The algorithm uses successive approximations to derive S_I and works as follows: while (true) do S' = merge(S, C'[[c]] S) if (S' = S) then return S' else S = S' Now I can show you that this algorithm computes a solution of our recursive equation. That is because of two facts: - the initial value of S is less than all subsequent values of S and S' (intuitively, the merge operation "accumulates" new information, but keeps the original information); in particular the initial S is always less than S'; - when the loop finishes, S = S', so S' = merge(S', C'[[c]]S'). Hence, C'[[c]]S' is less than S'. So S' is greater than both S' and C'[[c]] S', hence it satisfies the desired equation.