Modules and Data Abstraction

Key to building large software systems is the ability to organize the
program into _modules_, with clean and small interfaces between
then. A module can be used to identify an abstraction in the program.
For instance, a module can describe a logical component of the program
(e.g., a parser in a compiler), or it can describe an abstract data
type (e.g., a stack or queue).

Modularity has a range of advantages from a software engineering
perspective. These include:

1) Encapsulation, aggregation: it provides a higher-level organization
   of the program and makes it easier to keep track of what various
   parts of the program are supposed to do;

2) Independence: inter-module dependencies are minimized and reduced
   to interfaces.  This makes it possible to develop one part of the
   program without having another part (as long as the interface for
   the missing part is available); or to have multiple persons
   concurrently working on different parts of the program.

3) Information hiding: the implementation details are not provided in
   the interfaces, and not visible to other modules. This allows
   re-implementing a module without affecting the others.

4) Separate compilation: allows to compile a module separately. If one
   module is changed, only that module needs to be re-compiled.

5) Code reuse: modules can be reused in other programs, if they
   provide general-purpose functionality. Typical examples are library
   modules.

The key idea behind modules is the separation between _specification_
(provided by interfaces) and _implementation_ (provided by modules). A
specification describes what a piece of code is supposed to do; an
implementation shows how the program does it.

Typically, interfaces contain a collection of types, data
declarations, and procedure signatures. Modules provide
implementations for procedures, as well as additional data and
procedures, not present in the interface. These additional
declarations are _private_ and not available to other modules. A
module may also provide initialization code.

A number of languages have support for modules, although they use
different names for them. Some extensions of Pascal have "units", Ada
and Java have "packages", Modula-3 has "modules", and SML has
"structures". We'll use Modula-3 to explore the key concepts behind
modules.

Consider a simple example: a module "Util" that provide utility
functions for arrays of integers. The interface may look like this:

  INTERFACE Util;

  TYPE A = ARRAY OF INTEGER;

  PROCEDURE Sort(a : A);

  PROCEDURE Max(a : A) : INTEGER;
  END Util.
  
The interface defined a type A and provides the signatures of two
procedures, Sort and Max. The module then provides the implementation
for these procedures:

  MODULE Util EXPORTS Util;

  PROCEDURE Sort(a : A) = 
     BEGIN
        (* code for Sort *)
     END Sort;

  PROCEDURE Max(a : A) : INTEGER =
     VAR x = FIRST(INTEGER); 
     BEGIN
        (* code for Max *)
     END Max;
  END Util;

The clause "EXPORTS Util" indicates that this module implements the
interface Util. Now I want to use this module in a program. I can do
that by importing the interface, thus gaining access to all its
declarations. However, all accesses to declaration in an interface
needs to be done using qualified names of the form I.x, where I is
the interface and x is a declaration in I. For instance, a client of
the above Util module may look like this:

  MODULE Main;
  IMPORT Util;

  VAR arr = ARRAY [1..10] of INTEGER;
  
  BEGIN
     (* initialize arr *)
     Util.Sort(arr);
  END Main.

The "IMPORT Util" declaration allows this module to refer to the
declarations in Util, e.g. Util.A or Util.Sort or Util.Max. Modula-3
also has another import construct. If you write "FROM Util IMPORT
Sort", then only Sort is imported, and you can use it directly,
without a qualified name. Importing only a few components of an
interface helps avoiding conflicts between names in different
interfaces.

A module may also have initialization code. This code is specified at
the end of the module, between BEGIN .. END ModuleName. This piece of
code may refer to components of other modules. For instance:

  INTERFACE A;
  VAR x : INTEGER;
  END A.


  MODULE A EXPORTS A;
  IMPORT B;

  BEGIN          \
     x := B.y;   | initialization code.
  END A.         /

Clearly, a problem arises in the case of cyclic dependences between
modules. The Modula-3 manual gives only partial guarantees about the
order of execution of initialization code:

"If module M depends on module N and N does not depend on M, then N's
body will be executed before M's body, where:

    * A module M depends on a module N if M uses an interface that N
      exports or if M depends on a module that depends on N.

    * A module M uses an interface I if M imports or exports I or if M
      uses an interface that (directly or indirectly) imports I.

Except for this constraint, the order of execution is
implementation-dependent."

To rule out the non-deterministic behavior expressed in the last
sentence, one can either take a more restrictive approach to rule out
cyclic module dependences via imports/exports; or else use a more
complex analysis of the initialization code to rule out cyclic
assignments of data between different modules.


One other feature of modules in Modula-3 is the ability to hide the
structure of certain types in the interface. Such types are referred
to as "opaque types". The actual type is described only in the module
implementation part, using a "revelation" construct. Here is an
example with a module that implements a abstraction for rational
numbers:

  INTERFACE Rational;
  TYPE T <: REFANY;  
  EXCEPTION Err;
  PROCEDURE Create(n, d: INTEGER) : T RAISES {Err};
  PROCEDURE Add(r1, r2: T) : T;
  ...
  END Rational.

The declaration "TYPE T <: REFANY" says that T is some (unknown)
subtype of REFANY. In other words, T is some reference type. The
actual type that T references is only provided in the module, using
the "REVEAL" keyword:

  MODULE Rational;
  REVEAL T = REF RECORD
                    n, d : INTEGER
                 END;

  EXCEPTION Err;

  Procedure Create(num, denom: INTEGER) = 
     BEGIN
        IF denom = 0 THEN RAISE Err; END;
        RETURN NEW(T, n:=num, d:=denom);
     END Create;

  Procedure Add (r1, r2: T) : T = 
     ...

  END Rational.

Then, a client of Rational can refer to the type T without knowing its
structure! Here's an example:

  MODULE UseRational;
  IMPORT Rational;

  VAR r : Rational.T;

  BEGIN 
     r = Create(3,2); ... 
  END


In type theory, opaque types are referred to as _existential
types_. The idea is that there exists an actual type for Rational.T,
but that type is hidden to the clients of Rational. Existential types
express information hiding; in contrast, universal types express
parametric polymorphism.

In fact, parametric polymorphism is also useful in the context of
modules. Consider the module Util shown earlier. This code can only
deal with arrays of integers. How do I make it polymorphic, so that it
applies to any kind of array? Modula-3 provides a solution using
_generic_ modules and interfaces. A generic module/interface takes as
parameter one or more interfaces. For instance, a generic version of
Util may look like this:

  GENERIC INTERFACE GUtil(Elem);
  TYPE A = ARRAY OF Elem.T;
  PROCEDURE Sort(a : A);
  PROCEDURE Max(a : A) : Elem.T;
  END GUtil.


  GENERIC MODULE GUtil(Elem);
  PROCEDURE Sort(a : A) =
    BEGIN
       ... Elem.LessThan(a[i], a[j]) ...
    END Sort;

  PROCEDURE Max(a : A) =
    VAR x = Elem.Smallest();
    BEGIN 
       ... Elem.LessThan(a[i], a[j]) ...
    END Max;
  END GUtil.
 

Here Elem is an unknown interface that provides a type T and two
procedures, LessThan and Smallest. The following interface IntElem
provides this features for integers:

  INTERFACE IntElem;
  TYPE T = INTEGER;
  PROCEDURE LessThan(a, b : T) : BOOLEAN;
  PROCEDURE Smallest() : T;
  END IntElem.

Then I can instantiate the generic module and interface with the
concrete interface:

  INTERFACE IntUtil = GUtil(IntElem)
  END IntUtil;

  MODULE IntUtil = GUtil(IntElem)
  END IntUtil;

Of course, if I want to run the program, I also need to provide an
actual implementation of the IntElem interface.

The module system of SML is fairly similar to the one
above. Structures represent modules, signatures represent interfaces,
and functors represent generic modules (modules parameterized on
signatures). Access components of a module is done with qualified
names; and to access all of the components of a module S without
qualified names, one can use "open S;".