Object layout and method dispatch

We have discussed how to implement type checking for object-oriented languages, by viewing objects as records containing data fields and methods. We now look at how objects are represented in memory, and how method calls are implemented while supporting inheritance and overriding (late binding).

Classes as modules

A class defines a namespace that maps names of fields and methods to corresponding information about them, such as their types and implementations. As discussed earlier, this means classes are a module mechanism, and when building a compiler for a language with modules, it makes sense for the compiler to build a symbol table for each class so that names can be resolved efficiently.

Fields and methods typically come with visibility annotations that describe which parts of the code are allowed to see each such component. Implementing visibility correctly can be challenging in languages with nested classes.

Memory layout for fields

Since objects are essentially souped-up versions of records (C: structs), it is useful to first consider how to implement records. In most languages, the order and size of the fields in a record is fixed and known, so that the record can be laid out sequentially in memory, and location of each field is at a fixed offset from the start of the record in memory.

An important consideration for field layout is that many processors have worse performance on misaligned accesses, or disallow them entirely. A misaligned access is one in which a read or write of \(n\) bytes occurs at an address that is not a multiple of \(n\). For example, most RISC processors make misaligned accesses illegal; x86 processors generally allow misaligned mov instructions (while sometimes imposing a performance penalty), but disallow misaligned push and pop instructions.

To ensure aligned accesses, extra space is often inserted between fields in record types. For example, consider the following C struct declaration on a typical 64-bit machine:

struct str {
    char c;           // 1 byte
    short s;          // 2 bytes
    int i;            // 4 bytes
    struct str *n;    // 8 bytes
}

The C standard specifies that each field is aligned on a byte address that is a multiple of its size, so the resulting structure leaves bytes unused in memory, shown shaded in gray below:

The fields can be accessed using memory operands indexed from the base address. If we are targeting x86-64 code and t is a temporary holding the base address, then field x can be accessed with the operand byte ptr [t], field s can be accessed as word ptr [t+2], and field i can be accessed as dword ptr [t+16]. The C language leaves it up to the programmer to order fields so that space is not wasted; other languages, like ML, statically reorder fields.

In languages that do not provide static information about the order of fields (e.g., JavaScript), or where permutation subtyping is supported, the compiler cannot know at what offset to find a field, and a more expensive mechanism must be used such as looking the field name up in a hash table.

Methods

Since methods are shared across all instances of a class, we would like to avoid taking up space in objects to store methods or pointers to methods. At the IR level, a method is essentially a function with one extra, implicit argument: the receiver object, usually named this or self in the method code.

To see how objects and methods are implemented, consider the following classic OO example:

class Point {
    double x,y;
    void setX(double x) { this.x = x; }
    void moveX(double dx) { setX(x + dx); }
}
class ColoredPoint {
    Color c;
    Color getCol() { return c; }
    @Override void setX(double x) {
        c = Color.redder(c);
        this.x = x;
    }
}

Now, suppose we execute the following lines of code:

Point p = new ColoredPoint();
p.setX(42);
p.moveX(1);

What happens? When we use the method setX directly, it should call the ColoredPoint version of setX, because the actual class of the receiver object is ColoredPoint. The point's color will change to be redder. That the static type of p is Point is immaterial; what matters for an ordinary method call is the actual run-time class of the object. Now consider the call p.moveX(1). Does this make the point even redder? It might seem that it does not, since moveX is defined in Point, but it gets its work done by calling setX. The actual class of the receiver is still ColoredPoint, so the same overriden method will be called: the point does become redder. This ability to inherit code from a superclass but customize its behavior is called late binding, and it is an essential feature of OO programming.

The behavior we want is that for ordinary method calls, the method definition that should be used is the one from the actual run-time class, or the nearest ancestor that defines that method. On the other hand, for methods that are declared static, final, or nonvirtual (depending on the language), the compiler can select the correct method call at compile time.

Method dispatch

Given a method call o.m(...), in which method m is invoked on receiver object o, how should it be implemented? The problem of picking the right method definition to run is called method dispatch. We have three main goals in mind for this meechanism:

Method calls should be fast, since they are the core operation in the language
The space overhead per object should be low. We are less concerned with space overhead on a per-class basis, since the number of objects is typically much larger than the number of classes.
We would like to share inherited code across classes, since for a large class hierarchy, we might otherwise end up with many copies of the essentially the same code, leading to the dreaded “code bloat”.

A straightforward approach that has been used in dynamically typed OO languages like Smalltalk is to simply search for the right method definition. Each object has a hidden field that points to its class object; a method call o.m() is implemented by walking up the class hierarchy, starting from o's class, and stopping once an implementation for method m is found. This implementation allows sharing inherited code, so it avoids code bloat, but it is much slower than a simple function call.

For faster method dispatch, we need to precompute a data structure—a dispatch table. The key idea is to map different method names onto small integer indices, and to use those indices to look up the appropriate method code pointer in an array of code pointers. These arrays of code pointers go by various names. In C++, this array is called a vtbl, pronounced like “v-table”, standing for “virtual table". In other literature it is known as a selector table or dispatch vector.

The figure below shows how dispatch vectors are constructed for the example classes Point and ColoredPoint. The orange ovals on the right side represent the actual machine code of the various method implementations. As shown, objects of class Point and ColoredPoint have a hidden field DV that points to the dispatch vectors of their respective classes (blue). These dispatch vectors are shared by all objects of the class.

Notice that the object layout for ColoredPoint is completely compatible with the layout for Point: the only differences between the layouts of the objects and their dispatch vectors is that the layouts are extended with new fields. Therefore, code expecting to receive a pointer to a Point will be perfectly happy to get a ColoredPoint instead. And this compatibility is important, since ColoredPoint is a subtype of Point: ColoredPoint ≤ Point.

The methods setX, moveX, and getCol are associated with indices into the dispatch vectors: 0, 1, and 2 respectively. The two dispatch vectors agree on the indices of the method, which is again crucial so that a ColoredPoint can be used where a Point is expected.

Notice that there is only one copy of the code for method Point.moveX, which is pointed to by both dispatch vectors. Thus, this object layout supports code-sharing inheritance as well as subtype polymorphism.

Dispatch code

With the object layout described above, we can generate efficient method dispatch code. For example, let us suppose we are targeting the x86-64 architecture and want to compile an expression like p.moveX(...) using our running example. An (abstract assembly) instruction sequence that calls the appropriate method code would be the following:

mov rdi, p
mov t_DV, [p]     // immutable memory location
call [t_DV + 8]   // note: moveX is at index 1

This is a reasonably fast instruction sequence, though not as fast as a direct call. Note that the second instruction accesses an immutable memory location, so common subexpression elimination can be used to avoid loading the dispatch vector address more than once for a given object.

Accesses to fields is even faster, since fields are at known offsets. For example, if we want to compile the expression p.y, it simply requires an indexed load from memory:

mov t, [p + 16]

Interfaces and abstract classes

Interfaces and abstract classes have an object layout, in some sense, since code needs to be able to handle arguments with these types. However, they do not have instances and so they do not need dispatch vectors. One exception is C++ abstract classes, which do need dispatch information, because C++ objects effectively change class as they are being constructed. The dispatch information in a C++ object is overwritten as its constructors are called.

Static and final methods, constructors and super calls

Static methods do not require any dispatching: the compiler can determine which code to call at compile time. Therefore, they do not need to be placed in the dispatch table. Similarly, final methods, constructors, and calls via super do not require method dispatch.

Inheritance vs. separate compilation

One wrinkle is that we want to be able to compile classes separately from each other. Unfortunately, we can't generate code for subclasses without knowing details of their superclasses' implementations that we would otherwise prefer to keep hidden. For example, suppose that cp is a variable of type ColoredPoint and we'd like to compile the expression cp.col. By looking at the above diagrams, we can see that the desired abstract assembly memory operand is [cp + 24]. But if the compiler can't see the code of Point, how can it know that 24 is the correct offset? It needs to know the size of Point; that is, it needs to know about possibly private fields that the implementation of ColoredPoint should not depend on. For this reason, C++ demands that programmers declare private fields, typically in .h files, so that subclass implementations can know how big their superclasses are. This requirement is a bit of an abstraction violation.

Java uses a different strategy for separate compilation that preserves abstraction better. The Java Virtual Machine computes field and method offsets as the program is executing, and caches them. The advantage of this lazy approach is that new fields and methods can be added to libraries without needing to recompile client code that inherits from library classes.

For example, a source-language call to ColoredPoint.getCol() would result in a JVM bytecode instruction of the form invokevirtual "ColoredPoint.getCol" (The actual string representing the method is uglier and includes an encoding of the parameter types.) When the JVM interpreter first encounters this bytecode instruction, it resolves the offset to the method getCol() and rewrites the instruction to a “quick” version: invokevirtual_quick 2.

If the code is run enough times by the interpreter, the JIT compiler is invoked to convert this bytecode instruction to the standard method dispatch code above:

mov t_DV, [rdi]
call [t_DV+16]

If the JIT compiler can determine that only one possible implementation of the method could be called at this point in the program, it can avoid method dispatch. It may insert a direct function call or even inline the code of the called method. This optimization is crucial for good performance in Java.