Control flow


The control in the execution of program refers to the point that the
computation has reached at a certain moment. Control flow refers to
the order of execution points, i.e., what is the program supposed
to execute next.

All of the programming constructs that we've studies so far have
_structured control flow_. That is, for each construct, the control
flows in from the previous program construct, flows locally according
to some particular pattern, and then flows out to the next program
construct. The control never flows in or out from other points in the
program. Such control behavior is also referred to as _local control
flow_.

In imperative languages, several constructs make control flow be
explicit. Typically, the control constructs in such languages are:
sequences of commands, loops (while, do-until, for), and conditional
statements (if's and switches). Take a command "if (b) then c1 else
c2". The control comes from the previous command in the program; then
the program evaluates b; based on the value of b, the control flows to
the true or false branch; and after the execution of that branch, the
control flows out to the command that follows after the if.

In functional languages, control is represented less explicitly. For
instance, the control in a function application "e1 e2" is as follows:
first evaluate function e1; then pass the control to the evaluation of
e2 (assuming call-by-value semantics); further evaluate the body of e1
with the value of e2; and finally pass the control and the resulting
value to the point that uses this value.

In contrast, a number of other constructs have _unstructured_ or
_non-local_ control-flow. These include goto, break, continue,
exceptions, and others. The semantics of these constructs is
significantly different than the semantics of structured constructs
because we need to explicitly describe the control in the program.


Exceptions

Exceptions represent a language mechanism that makes it easier for the
program to deal with unexpected, exceptional situations. These are
typically error cases -- either standard execution errors such as
division by zero or null pointer dereferences; or application-specific
errors, for instance syntax errors in a compiler.

The exception mechanism uses two primitives:

 - a construct that installs an exception handler;

 - a construct that issues an exceptions. 

Whenever the program executes the construct that issues the exception,
the control is transferred to the last handler that has been installed
and can handle that exception. We say that the exception is being
_caught_ by the exception handler. In general, the control point where
an exception is issued and the control point where it is handled may
not necessarily belong to the same function. Furthermore, the
particular handler for an exception is determined dynamically, during
the run of the program.

Exceptions and handlers can be identified by names, e.g. "issue
exception X" or "handle exception X". The handler that catches an
exception must match the exception name. Furthermore, program values
can be passed along with the exception to the exception handler.


ML provides three constructs for exceptions. First, an exception
declaration, which specifies the exception name. Optionally, one can
declare the type of the value that will be passed along with the
exception. For instance:

   exception X of int;

declares an exception X that will pass an integer value when an
exception will be issued.

Second, a construct "handle" defines a new exception handler:

   e1 handle X(n) => e2

This will evaluate e1. If an exception X occurs during the evaluation
of e1, it will be handled by evaluating handler e2. In that case, the
result of this expression is the result of e2.

Third, a construct "raise" will issue an exception:

  raise X(m)


For instance, expression:

   ((1 + (raise X(2)) + 3) handle X(n) => (n + 1)) + 4 

evaluates to 7.


In Java, exceptions are simply objects. The fields of these objects
represent the data being passed with the exception. For instance:

  class X extends Exception { 
    int n;
    X(int n) { this.n = n; }
  }

Exception handlers are defined with a "try-catch" construct, and
issued with "throw":

  try { s1 } catch (X obj) { s2 }

  throw new X(m)
 

Exceptions and dynamic scoping

Exceptions have dynamic scoping behavior. That is, the exception
handler that catches an exception is the most recent handler that the
dynamic execution of the program has installed, not the handler that
lexically encloses the point where the exception has been thrown.  For
instance, the following code fragment:

let fun f x = if (x = 0) then raise X(10) 
              else x 
in
   f(0) handle X(n) => n + 1
end handle X(n) => n + 2

yields value 11. The handler that lexically encloses the raise command
is "handle X(n) => n + 2". The handler that dynamically encloses the
execution of raise is "handle X(n) => n + 1". The latter is being
used.

Conceptually, think that command "e1 handle X(n) => e2" introduces a
scope for the exception handler "X(n) => e2". Before executing e1, the
program adds this handler to the current scope (i.e. activation
record); after that, it removes it (note: this is not necessarily the
way exceptions are implemented, but it is a simple way to think about
them). Then, when an exception is being raised during the execution of
e1, it follows the control links (not the access links) and search for
the first occurrence of the handler X, popping activation records off
the stack along the way. In other words, handling an exception
requires the program to "walk up the stack". If no exception is found
when walking up the stack, the program yields an uncaught exception
and the program terminates.


Source-to-source translation for exceptions

One way to describe the behavior of exceptions is by translating
programs with exceptions into programs without exceptions. In general,
take two languages A and B. If the semantics for A are defined, then I
can define the semantics of B by showing how I can reduce programs in
B to programs in A. Hence, the semantics of B are given by the
translation from B to A, and the semantics of A. Essentially, each
construct in language B is "syntactic sugar" for some small program
fragment in A. The translation from B to A is usually referred to as
"de-sugaring".

Consider that language B is some little functional language, extended
with ML-style exceptions. For simplicity, I will ignore types:

e ::= n | x | \x . e | e1 e2 | e1 + e2
    | e1 handle X => e2 | raise X

X in Exceptions

I will now translate this language into a language A with options
(sums) and case constructs:

e' ::= n | x | \x . e' | e1' e2' | e1' + e2'
     | inl(e) | inr(e) | case e1 of inl(x) => e2 | inr(y) => e3
     | X | if (X = Y) e2 e3

Intuitively, after the translation, each value in language A is either
a standard value (number, abstraction), or it is an exceptional
value. Inr expressions represent standard values; and inl expressions
represent exceptional values. The values that language A manipulates
roughly correspond to an ML datatype of the form:

  datatype valueA = inr of value | inl of exception

The translation function is T[[e]] = e', showing that expression e in
language B is de-sugared to expression e' in language A. Function T is
recursively defined as follows:

T[[n]] = inr(n)
T[[x]] = inr(x)

T[[\x.e]] = \x . T[[e]]

T[[e1 e2]] = case T[[e1]] of 
                  inl(X) => inl(X)
                | inr(f) => case T[[e2]] of
                            inl(Y) => inl(Y)
                          | inr(v) => inr(f v) 

T[[e1 + e2]] = case T[[e1]] of 
                    inl(X) => inl(X)
                  | inr(v1) => case T[[e2]] of
                               inl(Y) => inl(Y)
                             | inr(v2) => inr(v1 + v2) 

In other words, the translated program checks each operation if an
exception has been raised by its operands. If so, it aborts the
computation, and passes the exception as the result. The translation
for raise and handle  is as follows:

T[[raise X]] = inr(X)

T[[e1 handle X => e2]] = 
      case T[[e1]] of 
           inl(Y) => if (Y = X) T[[e2]] inl(Y)
         | inr(v) => inr(v)


The above provides a simple way of implementing exceptions as a
source-to source translation. You can use a similar translation to add
exceptions to any language: just tag each value with one bit
indicating whether it is an exception or not; and then test the
arguments of each operation to determine if an exception has been
raised during their evaluation.

In fact, this can be implemented more efficiently: you don't need to
check each operation, but just function calls. The reason is that, if
the handler lies in the current function, the compiler can statically
look up the enclosing lexical scopes, identify the address of the
handler code, and issue a jump to that code. If the exception is not
caught in the current function, the compiler translates the raise into
a return statement that returns an exceptional value.

Each function call is wrapped into a test to determine if the call
yielded an exceptional value. If so, and if an exception lexically
encloses the function call in the caller, then a jump is generated.
Otherwise, the caller also returns with an exceptional value. Wrapping
up function calls with tests makes the program automatically walk up
the stack.

Note that with this kind of implementation, exception handlers need
not be added to the environment (in the activation records). The
compiler models them as known code addresses within their enclosing
functions.

The main concern in the implementation of exceptions is efficiency.
You don't want your program to slow down when the program encounters
no exceptions (but it is okay to be inefficient when exceptions are
being handled). We must answer the following:

 - Are the expressions "e1 handle X => e2" slowing down the program if
   no exceptions occur in e1? The answer is no, because we don't
   actually add the handlers to the environment, as mentioned
   above. Expression "e1 handle ..." is as fast as "e1" in the absence
   of exceptions.

 - Are any other constructs being slowed down? The answer here is yes,
   function calls are slowed down even if no exceptions occur, because
   of the tests that wrap each call. This may significantly hurt
   performance, and is regarded as a big minus for this approach.

To summarize, the source-to-source translation of exceptions is a
simple and portable way of implementing such constructs;
unfortunately, it is inefficient because it leads to run-time overhead
on function calls.