Lambda calculus is a notation for describing mathematical functions and programs. It is a mathematical system for studying the interaction of functional abstraction and functional application. It captures some of the essential, common features of a wide variety of programming languages. Because it directly supports abstraction, it is a more natural model of universal computation than a Turing machine is.
fn x => x*x
, in the
λ-calculus we write λx.x2.
In BNF notation,
e ::= x | λx.e | e0 e1We use the metavariable e to represent a λ-calculus term.
Parentheses are used just for grouping; they have no meaning on their own. Like other familiar binding constructs from mathematics (e.g., sums, integrals), lambda terms are greedy, extending as far to the right as they can. Therefore, the term λx. λy. y is the same as λx.(x (λy.y)), not (λx.x) (λy.y). As in SML, application terms are left-associative, so x y z is the same thing as (x y) z.
For simplicity, multiple variables may be placed after the lambda, and this is considered shorthand for having a lambda in front of each variable. For example, we write λxy.e as shorthand for λx.λy.e. This shorthand is an example of syntactic sugar. The process of removing this shorthand is currying, just as in SML, and adding it is called uncurrying.
We can apply a curried function like λx.λy.x one argument at a time. Applying it to one argument results in a function that takes in a value for x and returns a constant function, one that returns the value of x no matter what argument it is applied to. As this suggests, functions are just ordinary values, and can be the results of functions or passed as arguments to functions (even to themselves!). Thus, in the lambda calculus, functions are first-class values: lambda terms serve both as functions and data.
Just as in SML, occurrences of variables in a term can be bound or free. The term λx.e binds all the free occurrences of x in e. The scope of x is e. This is called lexical scoping, because the variable's scope is defined by the text of the program. It is “lexical” because it is possible to determine its scope before the program runs, just by inspecting the program text. A term is closed if all variables are bound. A term is open if it is not closed.
How do we run a λ-calculus program? The main computational rule is β-reduction, which we are already familiar with from SML. This rule applies whenever there is a subterm of the form (λx.e) e' representing the application of a function λx.e to an argument e'. To perform β-reduction, we substitute the argument e' for all free occurrences of the formal parameter x in the body e, obtaining e{e'/x}. This agrees with SML and corresponds to our intuition for what the function λx.e means.
We have to be a little careful; we cannot always substitute e' blindly for x in e, because bad things could happen which could alter the meaning of expressions in undesirable ways. We only want to replace the free occurrences of x within e, because any other occurrences are bound to a different binding; they are really different variables. There are some additional subtleties to substitution when e' is not closed, in particular the problem of variable capture.
The β reduction (λx.e) e' → e{e'/x} is the basic computational step of the λ-calculus. In the pure λ-calculus, we can start with a term and perform β-reductions on subterms in any order. However, for modeling programming languages, it is useful to restrict which β reductions are allowed and in what order they can be performed.
The lambda calculus is all about functions. It's easy to define higher-order functions that operate on other functions. For example, we can define a function COMPOSE that operates on two functions f and g to produce the composition of f and g, written f∘g. The definition is that (f∘g)(x) = f(g(x)), which we can then directly abstract:
COMPOSE = λfg. λx.f (g x)
We can use COMPOSE to define a higher-order function TWICE that operates on a function f to produce a new function that applies f twice:
TWICE = λf. COMPOSE f f
= λf. (λfg. λx.f (g x)) f f
= λf. λx.f (f x) (by β reduction inside the lambda abstraction)
If we want to define a function that applies another function four times, we can just apply TWICE to itself: FOURTIMES = TWICE TWICE.
In general there may be many possible β-reductions that can be performed on a given term. How do we choose which beta reductions to perform next? Does it matter?
A specification of which β-reduction to perform next is called a reduction strategy. Under a given reduction strategy, a value is a term on which no β-reductions are permitted. chosen reduction strategy. For example, λx.x would always be value, whereas ((λx.x) 1) would not be under usual reduction strategies.
Most real programming languages based on the λ-calculus, such as SML, use a reduction strategy known as call by value (CBV), in which functions may only be applied to (called on) values. Thus (λx.e) e' only β-reduces if e' is a value v. Here is an example of a CBV evaluation sequence, assuming 3, 4, and S (the successor function) are appropriately defined:
((λx.λy.y x) 3) S
→ (λy.y 3) S
→ S 3
→ 4
Another important strategy is call by name (CBN). We defer evaluation of arguments until as late as possible, applying reductions from left to right within the expression. In other words, we can pass an incomplete computation to a function as an argument. Terms are evaluated only once their value is really needed.
Let us define an expression we call Ω:
Ω = (λx.x x) (λx.x x)
What happens when we try to evaluate it?
Ω = (λx.x x) (λx.x x) → (x x){(λx.x x)/x} = Ω
We have just coded an infinite loop! When an expression e can go through infinite sequence of evaluation steps, we say e diverges. This corresponds to nontermination.
What happens if we try using Ω as a parameter? It depends on the reduction strategy. Consider this example:
e = (λx.(λy.y)) Ω
Using the CBV evaluation strategy, we must first reduce Ω. This puts the evaluator into an infinite loop, so e diverges. On the other hand, CBN reduces the term above to λy.y, the identify function. So CBN avoids divergence because it is lazy about evaluating arguments. CBN has an important property: CBN will not loop infinitely unless every other semantics would also loop infinitely, yet it agrees with CBV whenever CBV terminates successfully.
One feature we seem to be missing is the ability to declare local
variables. For example, in SML we introduce a new local
variable with the "let" expression:
let val x = e1 in e2 end
.
We expect this expression to evaluate e1 to some value v1 and then to substitute for occurrences of x, obtaining e2{v1/x}. But we know another term that evaluates to the same expression:
(λx.e2) e1 → ... →
(λx.e2) v1 →
e2{v1/x}
Therefore, we can view a let
expression as simply syntactic sugar
for an application of a lambda abstraction. We don't really need to have
let
in the language.
Perhaps the simplest interesting kind of value is a Boolean. We would like to define terms that act like the Boolean constants TRUE and FALSE and the Boolean operators IF, AND, OR, NOT, so that all these terms behave in the expected way, obeying the boolean abstraction. There are many reasonable encodings into lambda calculus. The standard approach is to define TRUE and FALSE functions that return the first and second of their two arguments, respectively:
TRUE = λx.λy.x
FALSE = λx.λy.y
It's important to realize that the tokens TRUE and FALSE are not part of the lambda calculus. We are just using them as abbreviations for the terms λx.λy.x and λx.λy.y.
Now, what about the conditional test IF? We would like IF to take three arguments b, t, f, where b is a Boolean value and t, f are arbitrary terms. The function should return t if b=TRUE and f if b=FALSE. Now the reason for defining TRUE and FALSE the way we did becomes clear. Since TRUE t f → t and FALSE t f → f, all IF has to do is apply its Boolean argument to the other two arguments:
IF = λb.λt.λf.b t fThe other Boolean operators can be defined using IF, TRUE, and FALSE:
AND = λbb'. IF b b' FALSE
OR = λbb'. IF b TRUE b'
NOT = λb.IF b FALSE TRUE
These operators work correctly when given Boolean values as we have defined them, but all bets are off if they are applied to any other term. There is no guarantee of any kind of reasonable behavior. Evaluation will proceed and reductions will be applied, but there is no way to usefully interpret the result. With the untyped λ-calculus, it is garbage in, garbage out.
We can encode natural numbers using Church numerals, introduced by Alonzo Church. The Church numeral for the number n is denoted n. It is the function that takes in a function f and produces a new function that is the n-fold composition of f with itself:
n = λf.fn
Of course, we can't write the exponential directly in lambda calculus. But we can build it up inductively:
0 = λf.λx.x 1 = λf.λx.f x 2 = λf.λx.f(f x) (= TWICE) 3 = λf.λx.f(f(f x)) ... n = λf.λx.f(f(...(f x))
The most basic operation on a number n is to find its successor S(n), which is n+1. We can define the successor function S as
S = λn.λf.λx.f((n f) x)
In words, S on input n returns a function that takes a function f as input, applies n to it to get the n-fold composition of f with itself, then composes that with one more f to get the (n+1)-fold composition of f with itself. Then,
S(n) = (λnfx.f (n f x)) n
→ λfx.f (n f x)
= n+1.
We can perform more interesting arithmetic with Church numerals. We might define addition as follows:
ADD = λmn.λfx.(m f) ((n f) x))
If applied to arguments m and n, ADD returns a number that if applied to f, returns the function that applies f the desired number of times: m+n. This works because we are composing fm with fn to get fm+n. Therefore, we could have defined ADD = λmn.λf. COMPOSE (m f) (n f).
As another alternative, recall that Church numerals act on a function to apply that function repeatedly, and addition is equivalent to repeated application of the successor function S, so we could define ADD as:
ADD = λmn.(m S) n
If we apply ADD in curried fashion to one argument n, we will get a function that adds n to whatever it is applied to.
Similarly, multiplication is just iterated addition, and exponentiation is iterated multiplication:
MUL = λmn.m (ADD n) 0
EXP = λmn.m (MUL n) 1
These last two functions can be defined even more compactly. COMPOSE will act like MUL on numbers. And EXP can be defined as:
EXP = λmn.n m
Defining subtraction, division, and other arithmetic operations is possible but more tricky. A good exercise is to figure out how to construct a predecessor function on the natural numbers.
Church numerals only encode natural numbers, but they can be used to build more complex kinds of numbers such as integers and floating point numbers.
Logic and arithmetic are good places to start, but we still have no useful data structures. For example, consider ordered pairs. We expect to have a pairing constructor PAIR with projection operations FIRST and SECOND that obey the following equations for any e1, e2, p:
FIRST (PAIR e1 e2) = e1
SECOND (PAIR e1 e2) = e2
PAIR (FIRST p) (SECOND p) = p
provided p is a pair. We can take a hint from IF. Recall that IF selects one of its two branch options depending on its Boolean argument. PAIR can do something similar, wrapping its two arguments for later extraction by some function f:
PAIR = λab. λf. f a bThus PAIR e1 e2 = λf. f e1 e2. To get e1 back out, we can just apply this to TRUE, which will pick out its first argument: (λf.f e1 e2) TRUE → TRUE e1 e2→ e1, and similarly, applying it to FALSE extracts e2. Thus we can define two projection functions FIRST and SECOND corresponding to SML
#1
and #2
:
FIRST = λp.p TRUE
SECOND = λp.p FALSE
Again, if p isn't a term constructed using PAIR, expect the unexpected.
With an encoding for IF, we have some control over the flow of a
program. We can also write simple for
loops using the Church
numerals n, by applying the Church numeral to the loop body
expressed as a function. However, we do not yet have the ability to
write an unbounded while
loop or a recursive function.
In ML, we can write the factorial function recursively as
fact(n) = if n <= 1 then 1 else n*fact(n-1)
But how can we write this in the lambda calculus, where all the functions are anonymous? We must somehow construct a term FACT that satisfies the following equation, assuming appropriate definitions for SUB and a less-than-or-equal comparison LEQ:
FACT = λn. IF (LEQ n 1) 1 (MUL n (FACT (SUB n 1)))
We can't just define FACT with this equation, because FACT appears on the right-hand side. Therefore it doesn't make sense as a shorthand for the term on the right. But it turns out we can define a term that acts in the way we want.
Suppose we break up the recursion into two steps. Since FACT can't refer to itself directly, we need some way to pass FACT into itself so it can be used for the recursion. We start by defining a function F that adds an extra argument f for passing in something similar to FACT:
F = λf. λn. IF (LEQ n 1) 1 (MUL n (f (SUB n 1)))
Now, if we just had FACT available, we could pass it to this function and get the FACT we want. Of course, that's begging the question. But we do have something rather similar to FACT: this function F itself. That won't quite work either, because this function expects to receive the extra argument f. So we just change the function definition so it calls f with an extra argument: f. Call this new function FACT':
FACT' = λf. λn. IF (LEQ n 1) 1 (MUL n ((f f)(SUB n 1)))
FACT' has the property that if a function just like it is passed to it, it will do what FACT should. Since (FACT' FACT') should behave as we want FACT to, we define FACT as exactly this self-application:
FACT = FACT' FACT'
If we try this definition of FACT, we can see the recursion working:
FACT(4) = (FACT' FACT') 4
= (λn.IF (LEQ n 1) 1 (MUL n ((FACT' FACT') (SUB n 1)))}) 4
= IF (LEQ 4 1) 1 (MUL 4 (FACT (SUB 4 1)))
= (MUL 4 (FACT (SUB 4 1)))
= (MUL 4 (FACT 3))
...
= (MUL 4 (MUL 3 (MUL 2 1))) = 24
This self-application trick can be used to implement all recursive functions. It even works in SML, though you have to use a recursive datatype to convince the type system that it is type-safe:
We've seen that many interesting features of SML can be encoded using just lambda calculus functions. The lambda calculus is in fact a universal model of computation that is just as powerful as other realizable models such as Turing machines (which we can easily prove by implementing a Turing machine using lambda calculus).