Interacting with OCaml
The toplevel is like a calculator or command-line interface to OCaml. It's similar to DrJava, if you used that in CS 2110, or to the interactive Python interpreter, if you used that in CS 1110. It's handy for trying out small pieces of code without going to the trouble of launching the OCaml compiler. But don't get too reliant on it, because creating, compiling, and testing large programs will require more powerful tools. Some other languages would call the toplevel a REPL, which stands for read-eval-print-loop: it reads programmer input, evaluates it, prints the result, and then repeats.
In a terminal window, type utop
to start the toplevel. Press
Control-D to exit the toplevel. You can also enter #quit;;
and press
return. Note that you must type the #
there: it is in addition to
the #
prompt you already see.
Types and values
You can enter expressions into the OCaml toplevel. End an expression with
a double semi-colon ;;
and press the return key. OCaml will then evaluate
the expression, tell you the resulting value, and the value's type. For example:
# 42;;
- : int = 42
Let's dissect that response from utop, reading right to left:
42
is the value.int
is the type of the value.- The value was not given a name, hence the symbol
-
.
You can bind values to names with a let
definition, as follows:
# let x = 42;;
val x : int = 42
Again, let's dissect that response, this time reading left to right:
- A value was bound to a name, hence the
val
keyword. x
is the name to which the value was bound.int
is the type of the value.42
is the value.
You can pronounce the entire output as "x
has type int
and equals 42
."
Functions
A function can be defined at the toplevel using syntax like this:
# let increment x = x+1;;
val increment : int -> int = <fun>
Let's dissect that response:
increment
is the identifier to which the value was bound.int -> int
is the type of the value. This is the type of functions that take anint
as input and produce anint
as output. Think of the arrow->
as a kind of visual metaphor for the transformation of one value into another value—which is what functions do.- The value is a function, which the toplevel chooses not to print (because
it has now been compiled and has a representation in memory that isn't
easily amenable to pretty printing). Instead, the toplevel prints
<fun>
, which is just a placeholder to indicate that there is some unprintable function value. Important note:<fun>
itself is not a value.
You can "call" functions with syntax like this:
# increment 0;;
- : int = 1
# increment(21);;
- : int = 22
# increment (increment 5);;
- : int = 7
But in OCaml the usual vocabulary is that we "apply" the function rather than "call" it.
Note how OCaml is flexible about whether you write the parentheses or not, and whether you write whitespace or not. One of the challenges of first learning OCaml can be figuring out when parentheses are actually required. So if you find yourself having problems with syntax errors, one strategy is to try adding some parentheses.
Storing code in files
Using OCaml as a kind of interactive calculator can be fun, but we won't get very far with writing large programs that way. We need to store code in files instead.
Open a terminal and use a text editor to create a file called
hello.ml
. Enter the following code into the file:
let _ = print_endline "Hello world!"
Important note: there is no double semicolon ;;
at the end of that line
of code. The double semicolon is strictly for interactive sessions in
the toplevel, so that the toplevel knows you are done entering a piece
of code. There's no reason to write it in a .ml file, and
we consider it mildly bad style to do so.
The let _ =
above means that we don't care to give a name (hence
the "blank" or underscore) to code on the right-hand side of the
=
.
Save the file and return to the command line. Compile the code:
$ ocamlc -o hello.byte hello.ml
The compiler is named ocamlc
. The -o hello.byte
option says to name the
output executable hello.byte
. The executable contains compiled OCaml
bytecode. In addition, two other files are produced, hello.cmi
and
hello.cmo
. We don't need to be concerned with those files for now.
Run the executable:
$ ./hello.byte
It should print Hello world!
and terminate.
Now change the string that is printed to something of your choice. Save the file, recompile, and rerun. Try making the code print multiple lines.
This edit-compile-run cycle between the editor and the command line is something that might feel unfamiliar if you're used to working inside IDEs like Eclipse. Don't worry; it will soon become second nature.
Running the compiler directly is good to know how to do, but in larger projects, we want to use the OCaml build system to automatically find and link in libraries. Let's try using it:
$ ocamlbuild hello.byte
You will get an error from that command. Don't worry; just keep reading this exercise.
The build system is named ocamlbuild
. The file we are asking it to
build is the compiled bytecode hello.byte
. The build system will
automatically figure out that hello.ml
is the source code for that
desired bytecode.
However, the build system likes to be in charge of the whole compilation process. When it sees leftover files generated by a direct call to the compiler, as we did in the previous exercise, it rightly gets nervous and refuses to proceed. If you look at the error message, it says that a script has been generated to clean up from the old compilation. Run that script, and also remove the compiled file:
$ _build/sanitize.sh
$ rm hello.byte
After that, try building again:
$ ocamlbuild hello.byte
That should now succeed. There will be a directory _build
that is
created; it contains all the compiled code. That's one benefit of the
build system over directly running the compiler: instead of polluting
your source directory with a bunch of generated files, they get cleanly created
in a separate directory. There's also a file hello.byte
that is created,
and it is actually just a link to "real" file of that name, which is in the
_build
directory.
Now run the executable:
$ ./hello.byte
You can now easily clean up all the compiled code:
$ ocamlbuild -clean
That removes the _build
directory and hello.byte
link, leaving just your source code.
What about Main?
Unlike C or Java, OCaml programs do not need to have a special function
named main
that is invoked to start the program. The usual idiom is
just to have the very last definition in a file serve as the main
function that kicks off whatever computation is to be done.
Loading code in the toplevel
In addition to allowing you to define functions, the toplevel will
also accept directives that are not OCaml code but rather tell the
toplevel itself to do something. All directives begin with the #
character. Perhaps the most common directive is #use
, which loads
all the code from a file into the toplevel, just as if you had typed
the code from that file into the toplevel.
For example, suppose you create a file named mycode.ml
.
In that file put the following code:
let inc x = x + 1
Start the toplevel. Try entering the following expression, and observe the error:
# inc 3;;
Error: Unbound value inc
Hint: Did you mean incr?
The error occurs because the toplevel does not yet know anything about
a function named inc
. Now issue the following directive to the toplevel:
# #use "mycode.ml";;
Note that the first #
character above indicates the toplevel prompt to you.
The second #
character is one that you type to tell the toplevel that you
are issuing a directive. Without that character, the toplevel would think
that you are trying to apply a function named use
.
Now try again:
# inc 3;;
- : int = 4
Workflow in the toplevel
The best workflow when using the toplevel with code stored in files is:
- Edit the code in the file.
- Load the code in the toplevel with
#use
. - Interactively test the code.
- Exit the toplevel. Warning: do not skip this step.
Suppose you wanted to fix a bug in your code: it's tempting to not exit
the toplevel, edit the file, and re-issue the #use
directive into the
same toplevel session. Resist that temptation. The "stale code" that
was loaded from an earlier #use
directive in the same session can
cause surprising things to happen—surprising when you're first
learning the language, anyway. So always exit the toplevel
before re-using a file.