Control flow The control in the execution of program refers to the point that the computation has reached at a certain moment. Control flow refers to the order of execution points, i.e., what is the program supposed to execute next. All of the programming constructs that we've studies so far have _structured control flow_. That is, for each construct, the control flows in from the previous program construct, flows locally according to some particular pattern, and then flows out to the next program construct. The control never flows in or out from other points in the program. Such control behavior is also referred to as _local control flow_. In imperative languages, several constructs make control flow be explicit. Typically, the control constructs in such languages are: sequences of commands, loops (while, do-until, for), and conditional statements (if's and switches). Take a command "if (b) then c1 else c2". The control comes from the previous command in the program; then the program evaluates b; based on the value of b, the control flows to the true or false branch; and after the execution of that branch, the control flows out to the command that follows after the if. In functional languages, control is represented less explicitly. For instance, the control in a function application "e1 e2" is as follows: first evaluate function e1; then pass the control to the evaluation of e2 (assuming call-by-value semantics); further evaluate the body of e1 with the value of e2; and finally pass the control and the resulting value to the point that uses this value. In contrast, a number of other constructs have _unstructured_ or _non-local_ control-flow. These include goto, break, continue, exceptions, and others. The semantics of these constructs is significantly different than the semantics of structured constructs because we need to explicitly describe the control in the program. Exceptions Exceptions represent a language mechanism that makes it easier for the program to deal with unexpected, exceptional situations. These are typically error cases -- either standard execution errors such as division by zero or null pointer dereferences; or application-specific errors, for instance syntax errors in a compiler. The exception mechanism uses two primitives: - a construct that installs an exception handler; - a construct that issues an exceptions. Whenever the program executes the construct that issues the exception, the control is transferred to the last handler that has been installed and can handle that exception. We say that the exception is being _caught_ by the exception handler. In general, the control point where an exception is issued and the control point where it is handled may not necessarily belong to the same function. Furthermore, the particular handler for an exception is determined dynamically, during the run of the program. Exceptions and handlers can be identified by names, e.g. "issue exception X" or "handle exception X". The handler that catches an exception must match the exception name. Furthermore, program values can be passed along with the exception to the exception handler. ML provides three constructs for exceptions. First, an exception declaration, which specifies the exception name. Optionally, one can declare the type of the value that will be passed along with the exception. For instance: exception X of int; declares an exception X that will pass an integer value when an exception will be issued. Second, a construct "handle" defines a new exception handler: e1 handle X(n) => e2 This will evaluate e1. If an exception X occurs during the evaluation of e1, it will be handled by evaluating handler e2. In that case, the result of this expression is the result of e2. Third, a construct "raise" will issue an exception: raise X(m) For instance, expression: ((1 + (raise X(2)) + 3) handle X(n) => (n + 1)) + 4 evaluates to 7. In Java, exceptions are simply objects. The fields of these objects represent the data being passed with the exception. For instance: class X extends Exception { int n; X(int n) { this.n = n; } } Exception handlers are defined with a "try-catch" construct, and issued with "throw": try { s1 } catch (X obj) { s2 } throw new X(m) Exceptions and dynamic scoping Exceptions have dynamic scoping behavior. That is, the exception handler that catches an exception is the most recent handler that the dynamic execution of the program has installed, not the handler that lexically encloses the point where the exception has been thrown. For instance, the following code fragment: let fun f x = if (x = 0) then raise X(10) else x in f(0) handle X(n) => n + 1 end handle X(n) => n + 2 yields value 11. The handler that lexically encloses the raise command is "handle X(n) => n + 2". The handler that dynamically encloses the execution of raise is "handle X(n) => n + 1". The latter is being used. Conceptually, think that command "e1 handle X(n) => e2" introduces a scope for the exception handler "X(n) => e2". Before executing e1, the program adds this handler to the current scope (i.e. activation record); after that, it removes it (note: this is not necessarily the way exceptions are implemented, but it is a simple way to think about them). Then, when an exception is being raised during the execution of e1, it follows the control links (not the access links) and search for the first occurrence of the handler X, popping activation records off the stack along the way. In other words, handling an exception requires the program to "walk up the stack". If no exception is found when walking up the stack, the program yields an uncaught exception and the program terminates. Source-to-source translation for exceptions One way to describe the behavior of exceptions is by translating programs with exceptions into programs without exceptions. In general, take two languages A and B. If the semantics for A are defined, then I can define the semantics of B by showing how I can reduce programs in B to programs in A. Hence, the semantics of B are given by the translation from B to A, and the semantics of A. Essentially, each construct in language B is "syntactic sugar" for some small program fragment in A. The translation from B to A is usually referred to as "de-sugaring". Consider that language B is some little functional language, extended with ML-style exceptions. For simplicity, I will ignore types: e ::= n | x | \x . e | e1 e2 | e1 + e2 | e1 handle X => e2 | raise X X in Exceptions I will now translate this language into a language A with options (sums) and case constructs: e' ::= n | x | \x . e' | e1' e2' | e1' + e2' | inl(e) | inr(e) | case e1 of inl(x) => e2 | inr(y) => e3 | X | if (X = Y) e2 e3 Intuitively, after the translation, each value in language A is either a standard value (number, abstraction), or it is an exceptional value. Inr expressions represent standard values; and inl expressions represent exceptional values. The values that language A manipulates roughly correspond to an ML datatype of the form: datatype valueA = inr of value | inl of exception The translation function is T[[e]] = e', showing that expression e in language B is de-sugared to expression e' in language A. Function T is recursively defined as follows: T[[n]] = inr(n) T[[x]] = inr(x) T[[\x.e]] = \x . T[[e]] T[[e1 e2]] = case T[[e1]] of inl(X) => inl(X) | inr(f) => case T[[e2]] of inl(Y) => inl(Y) | inr(v) => inr(f v) T[[e1 + e2]] = case T[[e1]] of inl(X) => inl(X) | inr(v1) => case T[[e2]] of inl(Y) => inl(Y) | inr(v2) => inr(v1 + v2) In other words, the translated program checks each operation if an exception has been raised by its operands. If so, it aborts the computation, and passes the exception as the result. The translation for raise and handle is as follows: T[[raise X]] = inr(X) T[[e1 handle X => e2]] = case T[[e1]] of inl(Y) => if (Y = X) T[[e2]] inl(Y) | inr(v) => inr(v) The above provides a simple way of implementing exceptions as a source-to source translation. You can use a similar translation to add exceptions to any language: just tag each value with one bit indicating whether it is an exception or not; and then test the arguments of each operation to determine if an exception has been raised during their evaluation. In fact, this can be implemented more efficiently: you don't need to check each operation, but just function calls. The reason is that, if the handler lies in the current function, the compiler can statically look up the enclosing lexical scopes, identify the address of the handler code, and issue a jump to that code. If the exception is not caught in the current function, the compiler translates the raise into a return statement that returns an exceptional value. Each function call is wrapped into a test to determine if the call yielded an exceptional value. If so, and if an exception lexically encloses the function call in the caller, then a jump is generated. Otherwise, the caller also returns with an exceptional value. Wrapping up function calls with tests makes the program automatically walk up the stack. Note that with this kind of implementation, exception handlers need not be added to the environment (in the activation records). The compiler models them as known code addresses within their enclosing functions. The main concern in the implementation of exceptions is efficiency. You don't want your program to slow down when the program encounters no exceptions (but it is okay to be inefficient when exceptions are being handled). We must answer the following: - Are the expressions "e1 handle X => e2" slowing down the program if no exceptions occur in e1? The answer is no, because we don't actually add the handlers to the environment, as mentioned above. Expression "e1 handle ..." is as fast as "e1" in the absence of exceptions. - Are any other constructs being slowed down? The answer here is yes, function calls are slowed down even if no exceptions occur, because of the tests that wrap each call. This may significantly hurt performance, and is regarded as a big minus for this approach. To summarize, the source-to-source translation of exceptions is a simple and portable way of implementing such constructs; unfortunately, it is inefficient because it leads to run-time overhead on function calls.