Modules and Data Abstraction Key to building large software systems is the ability to organize the program into _modules_, with clean and small interfaces between then. A module can be used to identify an abstraction in the program. For instance, a module can describe a logical component of the program (e.g., a parser in a compiler), or it can describe an abstract data type (e.g., a stack or queue). Modularity has a range of advantages from a software engineering perspective. These include: 1) Encapsulation, aggregation: it provides a higher-level organization of the program and makes it easier to keep track of what various parts of the program are supposed to do; 2) Independence: inter-module dependencies are minimized and reduced to interfaces. This makes it possible to develop one part of the program without having another part (as long as the interface for the missing part is available); or to have multiple persons concurrently working on different parts of the program. 3) Information hiding: the implementation details are not provided in the interfaces, and not visible to other modules. This allows re-implementing a module without affecting the others. 4) Separate compilation: allows to compile a module separately. If one module is changed, only that module needs to be re-compiled. 5) Code reuse: modules can be reused in other programs, if they provide general-purpose functionality. Typical examples are library modules. The key idea behind modules is the separation between _specification_ (provided by interfaces) and _implementation_ (provided by modules). A specification describes what a piece of code is supposed to do; an implementation shows how the program does it. Typically, interfaces contain a collection of types, data declarations, and procedure signatures. Modules provide implementations for procedures, as well as additional data and procedures, not present in the interface. These additional declarations are _private_ and not available to other modules. A module may also provide initialization code. A number of languages have support for modules, although they use different names for them. Some extensions of Pascal have "units", Ada and Java have "packages", Modula-3 has "modules", and SML has "structures". We'll use Modula-3 to explore the key concepts behind modules. Consider a simple example: a module "Util" that provide utility functions for arrays of integers. The interface may look like this: INTERFACE Util; TYPE A = ARRAY OF INTEGER; PROCEDURE Sort(a : A); PROCEDURE Max(a : A) : INTEGER; END Util. The interface defined a type A and provides the signatures of two procedures, Sort and Max. The module then provides the implementation for these procedures: MODULE Util EXPORTS Util; PROCEDURE Sort(a : A) = BEGIN (* code for Sort *) END Sort; PROCEDURE Max(a : A) : INTEGER = VAR x = FIRST(INTEGER); BEGIN (* code for Max *) END Max; END Util; The clause "EXPORTS Util" indicates that this module implements the interface Util. Now I want to use this module in a program. I can do that by importing the interface, thus gaining access to all its declarations. However, all accesses to declaration in an interface needs to be done using qualified names of the form I.x, where I is the interface and x is a declaration in I. For instance, a client of the above Util module may look like this: MODULE Main; IMPORT Util; VAR arr = ARRAY [1..10] of INTEGER; BEGIN (* initialize arr *) Util.Sort(arr); END Main. The "IMPORT Util" declaration allows this module to refer to the declarations in Util, e.g. Util.A or Util.Sort or Util.Max. Modula-3 also has another import construct. If you write "FROM Util IMPORT Sort", then only Sort is imported, and you can use it directly, without a qualified name. Importing only a few components of an interface helps avoiding conflicts between names in different interfaces. A module may also have initialization code. This code is specified at the end of the module, between BEGIN .. END ModuleName. This piece of code may refer to components of other modules. For instance: INTERFACE A; VAR x : INTEGER; END A. MODULE A EXPORTS A; IMPORT B; BEGIN \ x := B.y; | initialization code. END A. / Clearly, a problem arises in the case of cyclic dependences between modules. The Modula-3 manual gives only partial guarantees about the order of execution of initialization code: "If module M depends on module N and N does not depend on M, then N's body will be executed before M's body, where: * A module M depends on a module N if M uses an interface that N exports or if M depends on a module that depends on N. * A module M uses an interface I if M imports or exports I or if M uses an interface that (directly or indirectly) imports I. Except for this constraint, the order of execution is implementation-dependent." To rule out the non-deterministic behavior expressed in the last sentence, one can either take a more restrictive approach to rule out cyclic module dependences via imports/exports; or else use a more complex analysis of the initialization code to rule out cyclic assignments of data between different modules. One other feature of modules in Modula-3 is the ability to hide the structure of certain types in the interface. Such types are referred to as "opaque types". The actual type is described only in the module implementation part, using a "revelation" construct. Here is an example with a module that implements a abstraction for rational numbers: INTERFACE Rational; TYPE T <: REFANY; EXCEPTION Err; PROCEDURE Create(n, d: INTEGER) : T RAISES {Err}; PROCEDURE Add(r1, r2: T) : T; ... END Rational. The declaration "TYPE T <: REFANY" says that T is some (unknown) subtype of REFANY. In other words, T is some reference type. The actual type that T references is only provided in the module, using the "REVEAL" keyword: MODULE Rational; REVEAL T = REF RECORD n, d : INTEGER END; EXCEPTION Err; Procedure Create(num, denom: INTEGER) = BEGIN IF denom = 0 THEN RAISE Err; END; RETURN NEW(T, n:=num, d:=denom); END Create; Procedure Add (r1, r2: T) : T = ... END Rational. Then, a client of Rational can refer to the type T without knowing its structure! Here's an example: MODULE UseRational; IMPORT Rational; VAR r : Rational.T; BEGIN r = Create(3,2); ... END In type theory, opaque types are referred to as _existential types_. The idea is that there exists an actual type for Rational.T, but that type is hidden to the clients of Rational. Existential types express information hiding; in contrast, universal types express parametric polymorphism. In fact, parametric polymorphism is also useful in the context of modules. Consider the module Util shown earlier. This code can only deal with arrays of integers. How do I make it polymorphic, so that it applies to any kind of array? Modula-3 provides a solution using _generic_ modules and interfaces. A generic module/interface takes as parameter one or more interfaces. For instance, a generic version of Util may look like this: GENERIC INTERFACE GUtil(Elem); TYPE A = ARRAY OF Elem.T; PROCEDURE Sort(a : A); PROCEDURE Max(a : A) : Elem.T; END GUtil. GENERIC MODULE GUtil(Elem); PROCEDURE Sort(a : A) = BEGIN ... Elem.LessThan(a[i], a[j]) ... END Sort; PROCEDURE Max(a : A) = VAR x = Elem.Smallest(); BEGIN ... Elem.LessThan(a[i], a[j]) ... END Max; END GUtil. Here Elem is an unknown interface that provides a type T and two procedures, LessThan and Smallest. The following interface IntElem provides this features for integers: INTERFACE IntElem; TYPE T = INTEGER; PROCEDURE LessThan(a, b : T) : BOOLEAN; PROCEDURE Smallest() : T; END IntElem. Then I can instantiate the generic module and interface with the concrete interface: INTERFACE IntUtil = GUtil(IntElem) END IntUtil; MODULE IntUtil = GUtil(IntElem) END IntUtil; Of course, if I want to run the program, I also need to provide an actual implementation of the IntElem interface. The module system of SML is fairly similar to the one above. Structures represent modules, signatures represent interfaces, and functors represent generic modules (modules parameterized on signatures). Access components of a module is done with qualified names; and to access all of the components of a module S without qualified names, one can use "open S;".