Introduction to Semantics The goal of semantics is to formalize the meaning of programming languages. Without semantics, programs are just pieces of syntax, that is, sequences of characters. What we need is to define what the program text means. One way to find out about the meaning of constructs in a language is to read the language specification manual -- that usually gives an informal description that explains the meaning of those constructs. The other alternative is to give a formal, mathematical definition of the language semantics. Advantages of doing so include the fact that formal descriptions are less ambiguous, more concise, and, more important, they allow us to write mathematical proofs for program properties that we're interested in. The drawback of deriving formal semantics is that they can lead to fairly complex mathematical models, especially if one attempts to describe all details in a full-featured modern language. There are three main approaches to specify the semantics of programming languages: - operational semantics: describes how a program would execute on an abstract machine; - denotational semantics: models programs as mathematical functions; - axiomatic semantics: describes program behavior using preconditions and postconditions; There are different drawbacks between the three approaches, in terms of how much math they involve, how easy is it to use them in proofs, or to use them for an implementation. A Simple Language: Expressions To make it easier to understand semantics, we will start with a minimal programming language, that of arithmetic expressions. A program in this language is an expression. Executing a program means evaluating the expression to a number. To describe the structure of this language we will use the following domains: x, y, z in Var n, m in Int e in Expr where Var is the set of program variables (i.e. strings of characters); Int is the set of constant integers; and Expr is the domain of expressions. The latter can be specified using a BNF (Backus Naur Form) grammar: e ::= x | n | e1 + e2 | e1 * e2 This grammar specifies the syntax for the language. However, an immediate problem here is that this grammar is ambiguous. Take expression 1 + 2 * 3. One can build two abstract syntax trees: + / \ 1 * / \ 2 3 * / \ + 3 / \ 1 2 There are several ways to deal with this problem. One is to rewrite the grammar for the same language to make it unambiguous. But that makes it more obscure. Another possibility is to extend the syntax with parentheses around expressions: e ::= x | n | (e1 + e2) | (e1 * e2) We can regard this grammar as the "concrete syntax" of the language, which specifies how to unambiguously parse a string into program phrases; as opposed to the original syntax, the "abstract syntax", which is ambiguous but describes the same language of expressions. In this course we will use the abstract syntax and assume that the abstract syntax tree is known (for details on parsing, grammars, and ambiguity elimination see the compiler course). As a parenthesis, note that the syntactic structure of expressions in this language can be compactly expressed in ML using datatypes: datatype exp = Var of string | Int of int | Add of exp * exp | Mul of exp * exp In a language like Java, expressing this structure would require a more complex declaration consisting of a class hierarchy: abstract class Expr { } class Var extends Expr { String name; .. } class Int extends Expr { int val; .. } class Add extends Expr { Expr left, right; .. } class Mul extends Expr { Expr left, right; .. } At this point we have defined the language syntax. We would like now to define the semantics, that is, the meaning of this syntax. In Operational Semantics we can do so by describing how a program would execute on an abstract machine. Such an execution would consists of successive reductions of an expression until we reach a number, which represents the result of the computation. The state of the abstract machine is usually referred to as a configuration, and for our language it must include two pieces of information: - a store (aka environment or state), which assigns integer values to variables, so that the execution of programs can look up these values. - the expression left to evaluate. Consider an example. Suppose we want to evaluate expression (x+2)*(y+1) in a store where x has value 4 and y has value 3. Then, the execution of the program can be seen as the following sequence of steps, where boxes represent configurations (i.e., states of the abstract machine): +-------------+ +-------------+ +---------+ | x=4,y=3 | | x=4,y=3 | | x=4,y=3 | +-------------+ --> +-------------+ --> +---------+ --> | (y+2)*(x+1) | | (4+2)*(x+1) | | 6*(x+1) | +-------------+ +-------------+ +---------+ +---------+ +---------+ +---------+ | x=4,y=3 | | x=4,y=3 | | x=4,y=3 | --> +---------+ --> +---------+ --> +---------+ | 6*(3+1) | | 6*4 | | 24 | +---------+ +---------+ +---------+ We want formalize this evaluation that consists of successive one-step reductions.