12.2. Quick Tour

12.2.1. Expressions

Yes, we will start with our favorite message again by entering "Hello World" and finishing the expression with a semicolon.

- "Hello World";
> val it = "Hello World" : string

This time, we get a much more sophisticated answer. The line tells us that the symbol it has been bound to the value "Hello World", and that this value is of type string. The symbol it always represents the result of the last expression evaluated in the interactive shell. As usual, let's try simple arithmetic next.

- 4 + 5*6;
> val it = 34 : int
- 5.0 / 2.5 + 3.0;
> val it = 5.0 : real

At least we have found a functional language which can handle arithmetic expressions with infix operators including the right preferences. The next example shows a less convenient aspect of ML's type system.

- 2.5 + 1;
! Toplevel input:
! 2.5 + 1;
!       ^
! Type clash: expression of type
!   int
! cannot have type
!   real

It looks like ML is very strict about types, since it does not even coerce an integer into a real number. Instead, we have to tell the explicitly what we want using one of the built-in conversion functions between integers and floating point numbers.

- real(1)
> val it = 1.0 : real
- floor(2.5)
> val it = 2 : int
- 2.5 + real(1);
> val it = 3.5 : real

Another unusual syntactical element is the tilde ~ as the unary minus operator instead of the simple minus.

- ~1.5 - 2.0;
> val it = ~3.5 : real

12.2.2. Functions

Since ML is all about functions, defining them is literally "fun".

- fun f(x) = 2*x;
> val f = fn : int -> int

The function is defined by writing down the mathematical definition after the fun keyword. As a result, ML tells us that we have defined a function taking a single integer argument and returning another integer. But how does ML determine that x has to be an integer without us declaring any type in the function definition? This is of the characteristic features of ML. It tries to deduct types from the context. Since the literal "2" is an integer and the multiplication takes two arguments of the same numeric type, the argument x has to be an integer and so does the result of the function. Note that the function is just another value, just of type fn : int -> int. We could have defined the function as well by assigning an anonymous function to the symbol f as follows:

- val f = fn(x) => 2*x;
> val f = fn : int -> int

The anonymous function starts with the keyword fn (which plays the role of lambda in other languages) followed by the parameter list, and arrow => and the expression returned by the function.

We expect a functional language to be able to deal with functions as easily as with any other value. As an example, it should be easy to define the composition of two functions.

- fun compose(f, g) = fn x => f(g(x));
> val ('a, 'b, 'c) compose = fn : ('a -> 'b) * ('c -> 'a) -> 'c -> 'b

This looks tricky. First, how did we define the function compose? The result of applying compose to the functions f and g is an anonymous function which maps an argument x to f(g(x)) as required. The more difficult part is understanding ML's response. The first thing ML can infer is that f and g have to be functions. Otherwise, g could not be applied to x, and g could not be applied to g(x). Moreover, the function f must be applicable to the return type of g. But that's all. We know neither the argument type of g nor the return type of f. How does ML display this information? A symbol preceded by a quote ' denotes a type variable. The answer to our definition tells us that compose is a function depending on three types 'a, 'b, and 'c. [1] The function can be applied to two function arguments with the constraint that the return type 'a of the second function is identical to the argument type of the first function. The multiplication of types on the left hand side can be interpreted as the mathematical cross product of the two sets of functions. Alternatively, we could have defined the compose function using the basic value syntax:

- val compose = fn (f, g) => fn x => f(g(x));
> val ('a, 'b, 'c) compose = fn : ('a -> 'b) * ('c -> 'a) -> 'c -> 'b

And finally, ML also supports the more readible derived form:

- fun compose(f, g) x = f(g(x));
> val ('a, 'b, 'c) compose = fn : ('a -> 'b) * ('c -> 'a) -> 'c -> 'b

Before moving on, we should test if the compose function does what it is supposed to do.

- fun g(x) = x + 5;
> val g = fn : int -> int
- fun f(x) = 2 * x;
> val f = fn : int -> int
- val h = compose(f, g);
> val h = fn : int -> int
- h(3);
> val it = 16 : int

And, by the way, we could have saved all this work, since the composition is already predefined as the compose operator "o":

- (f o g)(3);
> val it = 16 : int

As another typical feature for a functional language, let's try recursion. This is also a good opportunity to introduce another ML specialty: pattern matching. ML functions can be defined in a piecemeal fashion with the alternatives separated using a vertical bar "|".

- fun fac(0) = 1
    | fac(n) = n * fac(n-1);
> val fac = fn : int -> int
- fac(5);
> val it = 120 : int

ML will try to match a given argument with the pattern on the left hand side of each definition in the order in which they appear and evaluate the right hand side of the matching rule. Using the basic syntax, we have to tell the ML system explicitly with the keyword "rec" that we are about to define a recursive function.

- val rec fac = fn 0 => 1
                 | n => n * fac(n-1);
> val fac = fn : int -> int

12.2.3. Collections

For now we have seen that it is easy to define complex functions in ML and that ML has a strong type system which gives us compile time type checking and probably also good performance. In fact, the ML dialect Ocaml which we will tackle in the next chapter is fighting with plain old C for the performance crown in the great computer language shootout. What we have not seen yet are more complex data types. Because of the math look and feel, we expect at least built-in tuples, and indeed here they are.

- val t = (1, "bla");
> val t = (1, "bla") : int * string
- #1(t);
> val it = 1 : int
- #2(t);
> val it = "bla" : string

Knowing Python, we possibly would have preferred a simple index notation to access the element of a tuple (starting with index zero), but ML instead defines the special functions #n which fetch the n'th element of the tuple. This will become more natural as we see that tuple are just a convenient shortcut notation for a record.

- val joe = {name="joe", age=33};
> val joe = {age = 33, name = "joe"} : {age : int, name : string}
- #name(joe);
> val it = "joe" : string
- #age(joe);
> val it = 33 : int
- val t = {1=1, 2="bla"};
> val t = (1, "bla") : int * string

Besides tuples and records, we also expect sequences to be part of the standard types in a functional language. ML supports two kinds of built-in sequence types: lists are optimized for sequential access (similar to Lisp's lists or Java's LinkedList), and Vectors allow for fast random access. We will first treat lists, and then have a quick look at vectors. List literals look like Python lists.

- val l = [1, 2, 3];
> val l = [1, 2, 3] : int list
- val l = ["Joe", "John", "Mary"];
> val l = ["Joe", "John", "Mary"] : string list
- val l = [(1, "Joe"), (2, "John")];
> val l = [(1, "Joe"), (2, "John")] : (int * string) list

As you can tell from the interpreters type notifications, ML's strong typing requires the elements in a list to be of the same type.

- val l = [1, "Joe"];
! Toplevel input:
! val l = [1, "Joe"];
!             ^^^^^
! Type clash: expression of type
!   string
! cannot have type
!   int

Since ML's lists are implemented as linked structures, they share many of Lisp's list functions, although with a more expressive syntax. You get the head element and the tail of the list using the functions "hd" and "tl" (corresponding to car and cdr in Lisp).

- val l = [1, 2, 3];
> val l = [1, 2, 3] : int list
- hd(l);
> val it = 1 : int
- tl(l);
> val it = [2, 3] : int list

You can also concatenate lists using the "@" operator or prepend an element in front of a list using a double colon (corresponding to Lisp's cons function) which complements the head and tail functions.

- [1, 2, 3] @ [4, 5, 6];
> val it = [1, 2, 3, 4, 5, 6] : int list
- 1 :: [2, 3, 4];
> val it = [1, 2, 3, 4] : int list

The head and tail functions are hardly needed in ML, since we can exploit ML's pattern matching and match a list with the pattern "x::xs", that is, it head and tail. The variable names in the pattern "x::xs" are a typical naming convention and should be read "ex" and "exes". To demonstrate that these apparently simple operations turn out to be powerful when combined with recursion, we first define a function which reverses a list.

- fun reverse([]) = []
    | reverse(x::xs) = reverse(xs) @ [x];
> val 'a reverse = fn : 'a list -> 'a list
- reverse l;
> val it = [3, 2, 1] : int list

The first match takes care of the empty list. A non-empty list we can always split into head and tail, and reversing is equivalent to appending the head at the reversed tail. Inserting an element into a sorted list is not much more difficult than this:

- fun insert(x, [])   = [x]
    | insert(x, h::t) = if x <= h
                        then x::h::t
                        else h::insert(x, t);
> val insert = fn : int * int list -> int list

Here, the non-trivial second case compares the element to be inserted with the head of the list. If the new element is smaller or equal, we can put it in front of the list. Otherwise, we need to insert the element into the tail. Once we can insert, we can sort a list (although not very efficiently).

- fun sort []    = []
    | sort(h::t) = insert(h, sort t);
> val sort = fn : int list -> int list
- sort([2, 5, 1, 3, 7, 2]);
> val it = [1, 2, 2, 3, 5, 7] : int list

ML also has the typical higher order list processiong functions map and reduce.

- map (fn x => 2*x) [1, 2, 3];
> val it = [2, 4, 6] : int list

Note that in contrast to Python, map takes only one argument (the function applied to all the elements in the list) and returns the function which modifies any list. You can see this more clearly, when checking the types of the partial expressions.

- map;
> val ('a, 'b) it = fn : ('a -> 'b) -> 'a list -> 'b list
- map (fn x => 2*x);
> val it = fn : int list -> int list

In functional jargon this is called a curried function (after Haskell B. Curry who also gave the purely functional language Haskell its name). When defining a function, we can eiter pass the arguments as a tuple (uncurried form) or pass just the first argument and return a function taking the second argument and so forth (curried form).

ML's equivalent of Python's reduce function is called foldr (for "fold right").

- foldr (fn (x, y) => x + y) 5 [1, 2, 3];
> val it = 11 : int

The first argument is the two argument function used to combine the values in the list, the second (curried) argument the initial value, and the last argument the list to reduce. Formally, foldr's type is:

- foldr;
> val ('a, 'b) it = fn : ('a * 'b -> 'b) -> 'b -> 'a list -> 'b

Let's turn to ML's random access sequence, the vector. Vectors literals are similar to lists, but start with a hash sign "#".

- val v = #[1, 2, 3];
> val v = #[1, 2, 3] : int vector

Again, we see that the collection is strongly typed. Access to vector elements or parts of vectors is index oriented.

- open Vector;
- sub(v, 0);
> val it = 1 : int
- extract(v, 1, SOME 2);
> val it = #[2, 3] : int vector

The function sub retrieves a single element, the function extract a slice of the vector. Indexes start at zero and the optional upper bound of the slice is inclusive.

The concat function concatenates a list of vectors into one big vector and the map and reduce (foldr) functions work like their list counterparts.

- concat([#[1, 2], #[3, 4]]);
> val it = #[1, 2, 3, 4] : int vector
- map (fn x => 2*x) #[1, 2, 3];
> val it = #[2, 4, 6] : int vector
- foldr (fn (x, y) => x+y) 5 #[1, 2, 3];
> val it = 11 : int
- app (fn x => print("x=" ^ Int.toString(x) ^ "\n")) #[1, 2, 3];
x=1
x=2
x=3
> val it = () : unit

You can also apply these functions to slices of a vector. In this case, the function has access to the iteration index as well.

- fun printElem(i, x) = (
    print(Int.toString(i)); print(": ");
    print(Int.toString(x)); print("\n"));
> val printElem = fn : int * int -> unit
- appi printElem (#[1, 2, 3, 4, 5], 1, SOME 3);
1: 2
2: 3
3: 4
> val it = () : unit

12.2.4. Data Types

Up to now we have defined a lot of values including functions. Even the records were defined directly as values. The type of these value was always inferred by the language without our declaring it. But how can we define types ourselves? The simplest way is to define a shortcut for an existing type using the "type" command. Here is the definition of the type associated with the simple record structure introduced above.

- type person = {name: string, age: int};
> type person = {age : int, name : string}

The type definition just binds a symbol to a type expression. We have seen the type expressions all along in the responses of the ML system. How can we use this type? Sometimes, we have to help ML to determine the type of a function. We do this by appending a type declaration (Pascal style) to a function argument.

- fun name(p : person) = #name(p);
> val name = fn : {age : int, name : string} -> string
- name(joe);
> val it = "joe" : string

In this example, ML is not able to derive the type of the function "name". We have to give ML a hint using an explicit type declarations. The definition is equivalent to using the explicit type expression; it only saves us some typing when using the type more than once.

- fun name(p: {name: string, age: int}) = #name(p);
> val name = fn : {age : int, name : string} -> string

Combining built-in types in type expressions is useful, but it does not allow us to create new types. As an example, think about an enumeration type representing colors or the type of a tree structure. These types require alternatives. A color is either red or green or blue (or some other color from a finite set of colors we want to deal with). A tree is either empty or a node whose branches are again tree structures. We can't express those types by combining existing types in type expressions. That's where ML's so-called data types come in. They generalize the concepts of enumerations and variants (like in Pascal or C's unions). An data type is defined as a sequence of alternatives, each alternative being some type. A value belonging to the new data type belongs to one of the sub types. To tell the alternatives apart, one has to put a so-called constructor in front of the value of the sub type. Let's look at a few examples. The first one defines a new data type called "color" consisting of three alternatives.

- datatype color = Red | Green | Blue ;
> New type names: =color
  datatype color =
  (color,{con Blue : color, con Green : color, con Red : color})
  con Blue = Blue : color
  con Green = Green : color
  con Red = Red : color
- val c = Red;
> val c = Red : color

This is the one extreme of a data type corresponding to an enumeration. The sub types are empty or, better, they only contain the single value "nothing". Still, there are three different kinds of nothing (red, green, blue) and we can define color values by using the associated three constructors Red, Green, and Blue. The constructors can also be used in patterns:

- val colorName = fn
      Red   => "red"
    | Green => "green"
    | Blue  => "blue" ;
> val colorName = fn : color -> string
- colorName c ;
> val it = "red" : string

The next example demonstrates another extreme: a data type with just a single sub type (which we can't call alternative anymore).

- datatype person = Person of string * int ;
> New type names: =person
  datatype person = (person,{con Person : string * int -> person})
  con Person = fn : string * int -> person
- val joe = Person("Joe", 25);
> val joe = Person("Joe", 25) : person
- fun name(Person(name, age)) = name;
> val name = fn : person -> string
- name(joe);
> val it = "Joe" : string

The new data type "person" has a single constructor "Person" taking two arguments, a string and an integer (representing the person's name and age, but this is not clear from the definition of the data type). The sub type (here the tuples consisting of a string and an integer) follows the constructor and the keyword "of". Defining new values of type person now really looks like a constructor call. The name function uses pattern matching again, albeit in a simple form since we have to alternative constructors.

The next example demonstrates a data type used in the sense of a variant. Suppose we would like to store integers and strings in a list. We can't do this directly like in the dynamically typed languages. But we can define a data type which contains strings and integers as sub types.

- datatype stringOrInt = I of int | S of string;
> New type names: =stringOrInt
  datatype stringOrInt =
  (stringOrInt, { con I : int -> stringOrInt,
                  con S : string -> stringOrInt})
  con I = fn : int -> stringOrInt
  con S = fn : string -> stringOrInt
- val l = [I(1), S("blah"), I(55)];
> val l = [I 1, S "blah", I 55] : stringOrInt list
- map (fn I(i) => Int.toString(i) | S(s) => s) l;
> val it = ["1", "blah", "55"] : string list

Data types can also be parametrized using type variables. We can generalize the previous example to construct lists containing one of two possible types.

- datatype ('a, 'b) alt = A of 'a | B of 'b;
> New type names: =alt
  datatype ('a, 'b) alt =
  (('a, 'b) alt,
   {con ('a, 'b) A : 'a -> ('a, 'b) alt,
    con ('a, 'b) B : 'b -> ('a, 'b) alt})
  con ('a, 'b) A = fn : 'a -> ('a, 'b) alt
  con ('a, 'b) B = fn : 'b -> ('a, 'b) alt
- [A(1.2), B(4), A(2.34), B(5)];
> val it = [A 1.2, B 4, A 2.34, B 5] : (real, int) alt list

If there is one element of ML which convinced me immediately, it is the handling of optional results of functions combined with pattern matching. Think about a lookup function which either returns the value you look for or indicates in some way that it could not find any value. How can we implement this result in the mainstream languages? Either we return a null pointer (or reference) in case the value was not found, or we throw an exception. Both solutions have their pros and cons. Null pointers are too often not checked leading to nasty crashes (or not much better runtime exceptions). Exceptions should be "exceptional" and cause clumsy code when used for normal program logic. ML has a built-in data type called "option" which is a perfect fit for a return value of a lookup function. We define it (but we don't have to since it is a standard data type) as

- datatype 'a option = NONE | SOME of 'a;
> New type names: =option
  datatype 'a option =
  ('a option,{con 'a NONE : 'a option,
                con 'a SOME : 'a -> 'a option})
  con 'a NONE = NONE : 'a option
  con 'a SOME = fn : 'a -> 'a option

A value of type "int option", for example, is either the value NONE or the value SOME x, where x is some integer. The List.find function return values of type "'a option" where 'a is the type of the list elements.

- List.find;
> val 'a it = fn : ('a -> bool) -> 'a list -> 'a option
- List.find (fn x => x > 10) [1, 3, 15, 3, 17];
> val it = SOME 15 : int option
- List.find (fn x => x > 10) [1, 3];
> val it = NONE : int option

To handle this type, we have to use pattern matching. There is no way to silently ignore a null pointer or exception.

- fun g(NONE)   = print("nothing found")
    | g(SOME x) = print("found " ^ Int.toString(x));
> val g = fn : int option -> unit
- g(SOME 5);
found 5
> val it = () : unit
- g(NONE);
nothing found
> val it = () : unit
- g(List.find (fn x => x > 10) [1, 3, 15, 3, 17]);
found 15
> val it = () : unit

Defining this kind of function is as easy as handling the option return value. The pattern matching in the calling function correponds to an if expression in the called function.

- fun f(x) = if x<0 then NONE else SOME x;
> val f = fn : int -> int option
- g(f(10));
found 10
> val it = () : unit
- g(f(0));
found 0
> val it = () : unit

The last example of a data type has to occur in any description of ML: a binary tree fined as a recursive, parametrized data type. Sounds complicated? But it fits in one line (for clarity we use three).

- datatype 'a btree =
      Empty
    | Node of 'a * 'a btree * 'a btree;
> New type names: =btree
  datatype 'a btree =
  ('a btree,
   {con 'a Empty : 'a btree,
    con 'a Node : 'a * 'a btree * 'a btree -> 'a btree})
  con 'a Empty = Empty : 'a btree
  con 'a Node = fn : 'a * 'a btree * 'a btree -> 'a btree

A binary tree is either empty, or it is a node consisting of a value of type 'a, a left branch, and a right branch. Both branches are binary trees of type 'a again. Here are some examples:

- val p = Node(100, Empty, Empty);
> val p = Node(100, Empty, Empty) : int btree
- val q = Node(200, Empty, Empty);
> val q = Node(200, Empty, Empty) : int btree
- val r = Node(150, p, q);
> val r = Node(150, Node(100, Empty, Empty), Node(200, Empty, Empty)) :
  int btree

To make our binary tree more useful, let us define lookup and insertion (the slightly more complicated deletion you find in ML book such as [ULLMAN98]>).

- val rec lookup = fn
      (x, Empty) => false
    | (x, Node(y, left, right)) =>
         if      x < y then lookup(x, left)
         else if y < x then lookup(x, right)
         else true;
> val lookup = fn : int * int btree -> bool
- lookup(200, r);
> val it = true : bool
- lookup(5, r);
> val it = false : bool

Note that the algorithm works only for sorted trees, and in fact only for integer trees, since we use the "less than" directly (we'll fix this later when talking about structures).

- val rec insert = fn
      (Empty, x) => Node(x, Empty, Empty)
    | (T as Node(y, left, right), x) =>
         if      x < y then Node(y, insert(left, x), right)
         else if y < x then Node(y, left, insert(right, x))
         else T;
> val insert = fn : int btree * int -> int btree
- insert(r, 400);
> val it =
    Node(150, Node(100, Empty, Empty),
         Node(200, Empty, Node(400, Empty, Empty))) : int btree
- insert(r, 120);
> val it =
    Node(150, Node(100, Empty, Node(120, Empty, Empty)),
         Node(200, Empty, Empty)) : int btree

12.2.5. Module System

When discussing Python, we complained about the lack of well-defined interfaces caused by the dynamic typing. With ML, we have a very strong type system. Where are the interfaces? ML includes a so-called module system consisting of structures, signatures, and functors. A structure is a combination of types and functions, a signature is a "type" of a structure, and functors are functions mapping structures to structures. In other words, we go from the ordinary level of values, types, and function to the meta level of structures, signatures, and functors.

Since the examples become a little bit longer in this section, you are better off entering them in a file which you can then load with the use command as if you had typed all the code on the command line. If the file containing the code is c:\sml\sample.sml, type:

- use "c:\sml\sample.sml";
[opening file "c:\sml\sample.sml"]
...
[closing file "c:\sml\sample.sml"]

At first sight, structures provide a namespace for types and values (including functions). We have already used some standard structures such as Int and List. As an example of our own, let us wrap the color type together with the colorName function into a structure.

- structure Color = struct
    datatype color = Red | Green | Blue;
    val name = fn
        Red   => "red"
      | Green => "green"
      | Blue  => "blue";
  end;
> ...
- Color.name(Color.Red);
> val it = "red" : string

Since all identifiers defined in the structure have to be qualified with the name of the structure, we can keep the names short and crisp. One of the most popular examples is a stack.

structure Stack = struct
  exception Empty;
  type 'a stack = 'a list;
  val create = [];
  fun push(s, x) = x::s;
  fun pop([])   = raise Empty
    | pop(x::s) = (s, x);
end;

We implement the stack using a list where push is equivalent the basic list construction operator, and pop takes the head of the list (unless it is empty). Since the implementation is functional, the pop function returns the popped element as well as the new stack (the tail of the origianal one).

- val s = Stack.create;
> val 'a s = [] : 'a list
- val s = Stack.push(s, 55);
> val s = [55] : int list
- val s = Stack.push(s, 66);
> val s = [66, 55] : int list
- Stack.pop(s);
> val it = ([55], 66) : int list * int

When using the stack, we can see the internal data structure. We can even directly pass a list to the structure's function although the list was created by the other functions of the structure.

- Stack.pop(["blah", "blub"]);
> val it = (["blub"], "blah") : string list * string

Besides the namespace, a structure does not offer us any benefit such as encapsulation or information hiding. How shall we define an interface for the structure which hides the internals of the implementation? All a client needs to use the structure in a strongly typed environment such as ML is the type information of the components in our structure. In ML, this is the provided by a signature.

signature STACK = sig
  exception Empty;
  type 'a stack;
  val create : 'a stack;
  val push : 'a stack * 'a -> 'a stack;
  val pop : 'a stack -> 'a stack * 'a;
end;

A signature is a structure lifted to the type level. The two keywords signature and sig correspond to the ones used to define a structure, structure and struct. The shorter name is used to define the object, the longer one to bind a symbol to this object.

We can tell the compiler that a structure implements a given signature by adding a so-called signature constraint to the structure. To this end, we place the signature behind the structure's name using a colon (like a superclass in C++).

structure Stack : STACK = struct
  exception Empty;
  type 'a stack = 'a list;
  val create = [];
  fun push(s, x) = x::s;
  fun pop([])   = raise Empty
    | pop(x::s) = (s, x);
end;

The relationship between structures and signatures is comparable to the relationship between classes and interface in objected-oriented languages. The signature describes the interface of all the structures which implement the signature. Now the compiler makes sure that the structure is compliant with the signature. Trying to compile the following definition of our stack causes a signature mismatch

structure Stack : STACK = struct
  exception Empty;
  type 'a stack = 'a list;
  val create = [];
  fun push(s, x) = x::s;
end;
...
! Signature mismatch: the module does not match the signature ...
! Missing declaration: value pop
! is specified in the signature as 
!   val 'a pop : 'a list -> 'a list * 'a
! but not declared in the module

Because this kind of signature constraint does not hide the implementation of the stack, it is called a transparent signature constraint.

> val it = () : unit
- val s = Stack.create;
> val 'a s = [] : 'a list
- Stack.pop([1, 2, 3]);
> val it = ([2, 3], 1) : int list * int

Fortunately, ML97 introduces an opaque signature constraint which hides the data types of the structure. To use this kind of constraint we only have to replace the colon by the symbol :>.

structure Stack :> STACK = struct
  exception Empty;
  type 'a stack = 'a list;
  val create = [];
  fun push(s, x) = x::s;
  fun pop([])   = raise Empty
    | pop(x::s) = (s, x);
end;

- val s = Stack.create;
> val 'a s = <stack> : 'a stack/2
- val s = Stack.push(Stack.push(s, 55), 66);
> val s = <stack> : int stack/2
- val (s, x) = Stack.pop(s);
> val s = <stack> : int stack/2
  val x = 66 : int
- Stack.pop([1, 2, 3]);
! Toplevel input:
! Stack.pop([1, 2, 3]);
!            ^^^^^^^^
! Type clash: expression of type
!   'a list
! cannot have type
!   'b stack/2

ML's first means of encapsulation is the concept of an abstract data type. It combines a datatype with functions while hiding the actual definition of the datatype itself.

abstype 'a stack = S of 'a list with
  exception Empty;
  val create = S [];
  fun push(S s, x) = S (x::s);
  fun pop(S [])   = raise Empty
    | pop(S(x::s)) = (S s, x);
end;

The first part of the definition until the with keyword is a datatype definition with the datatype keyword replaced by abstype. The essential declaration follows between the with and end keywords. From the outside, the type can only be used through the components defined in this section. Note that we had to use a datatype S of 'a list although 'a list is already a well defined type.

- val s = create;
> val 'a s = <stack> : 'a stack
- val s = push(s, 55);
> val s = <stack> : int stack
- pop(s);
> val it = (<stack>, 55) : int stack * int
- val s = push(s, 66);
> val s = <stack> : int stack
- pop(s);
> val it = (<stack>, 66) : int stack * int
- val (s, x) = pop(s);
> val s = <stack> : int stack
  val x = 66 : int
- val (s, x) = pop(s);
> val s = <stack> : int stack
  val x = 55 : int
- val (s, x) = pop(s);
! Uncaught exception: 
! Empty

The datatype constructor S can only be used within the declaration part of the abstract data type. It is not visible from the outside.

- val s = S [];
! Toplevel input:
! val s = S [];
!         ^
! Unbound value identifier: S

Now we have approached encapsulation and data hiding from two angles, structures and abstract data types, the former providing namespaces and the latter hiding a datatype.

To hide the internals of an implementation, we have to start with a signature. The relationship between structures and signatures is comparable to the relationship between classes and interface in objected-oriented languages. The signature describes the interface of all the structures which implement the signature. The interface contains all the information a client needs to use the structure and hides the rest. In a strongly typed language such as ML, a client needs to know the names and types of all the values defined in the structure (values again including functions). It also may need to know the names of types and exceptions required for the values. As an example, consider a functional interface (i.e., a signature) of a stack. To keep it simple, we start with a stack of integers.

signature IntStack = sig
  exception Empty;
  type stack;
  val create : stack;
  val push : stack * int -> stack;
  val pop : stack -> stack * int;
end;

A signature starts with the keyword sig and ends with the keyword end. It can be bound to a symbol using the keyword signature. Exceptions and types are just declared with their names. For values we also give their types in the notation we know from ML's answers. Let's turn to the implementation of this interface, a structure providing all the types and values declared in the signature.

structure ListIntStack : IntStack = struct
  exception Empty;
  type stack = int list;
  val create = [];
  fun push(s, x) = x::s;
  fun pop([])   = raise Empty
    | pop(x::s) = (s, x);
end;

The structure ListIntStack implements the IntStack with a simple list. The compiler makes sure that the structure really implements all the required types and values. If not, we get a signature mismatch error. We can use the stack operations with any structure implementing the signature.

- val s = ListIntStack.create;
> val s = [] : int list
- val s = ListIntStack.push(s, 55);
> val s = [55] : int list
- val s = ListIntStack.push(s, 66);
> val s = [66, 55] : int list
- val (s, x) = ListIntStack.pop(s);
> val s = [55] : int list
  val x = 66 : int
- val (s, x) = ListIntStack.pop([2, 3]);
> val s = [3] : int list
  val x = 2 : int

However, this is not exactly what we wanted. We can still see the internal structure of the stack, and, even worse, we can apply the functions to values defined outside like in the last pop call.

With ML97, opaque signatures were introduced which hide all the internals. A structure implementing a signature in this manner uses the operator :> instead of the simple colon.

structure IntStack :> IntStack = struct
  exception Empty;
  type stack = int list;
  val create = [];
  fun push(s, x) = x::s;
  fun pop([])   = raise Empty
    | pop(x::s) = (s, x);
end;

We also change the name of the structure to the name of the signature (which is not required, but uncommon either). When now using the structure, we don't see anymore that it is implemented with a list.

- val s = IntStack.create;
> val s = <stack> : stack/3
- val s = IntStack.push(s, 55);
> val s = <stack> : stack/3
- val (s, x) = IntStack.pop(s);
> val s = <stack> : stack/3
  val x = 55 : int
- val (s, x) = IntStack.pop([2, 3]);
! Toplevel input:
! val (s, x) = IntStack.pop([2, 3]);
!                            ^^^^^
! Type clash: expression of type
!   'a list
! cannot have type
!   stack/3

And consequently, we can not apply the IntStack functions to integer lists directly.

Next we would like to overcome the restriction of our stack to integers. As a first step, we generalize the signature by introducing a type for the elements contained in the stack.

signature Stack = sig
  exception Empty;
  type elem;
  type stack;
  val create : stack;
  val push : stack * elem -> stack;
  val pop : stack -> stack * elem;
end;

Now we need some kind of parametrization similar to C++ templates. ML's solution is the third element of the module system: functors. As mentioned at the beginning of this section, a functor maps a structure to another structure. In other words, is allows us to parametrize a structure with another one.

signature Type = sig type elem end;

functor MakeStack(T: Type) : Stack =
  struct
    exception Empty;
    type elem = T.elem;
    type stack = T.elem list;
    val create = [];
    fun push(s : stack, x) = x::s;
    fun pop([])   = raise Empty
      | pop(x::s) = (s, x);
  end;

Now we can apply the functor to a structure just like we would apply a function to its arguments. The result is a new structure, in our case a stack for elements of a given type.

- structure RealStack = MakeStack(struct type elem = real end);
> ...
- val s = RealStack.create;
> val s = [] : real list
- val s = RealStack.push(s, 1.5);
> val s = [1.5] : real list

You may wonder why we have not used an opaque signature here to hide the list implementation. The problem is that attaching Stack as an opaque signature also hide the elem type with the result that we can't push any element on the stack. To get out of this dilemma, we need to declare the signature as part of the functor definition.

functor MakeStack(T: Type) :>
  sig
    exception Empty;
    type stack;
    val create : stack;
    val push : stack * T.elem -> stack;
    val pop : stack -> stack * T.elem;
  end
=
  struct
    exception Empty;
    type stack = T.elem list;
    val create = [];
    fun push(s : stack, x) = x::s;
    fun pop([])   = raise Empty
      | pop(x::s) = (s, x);
  end;

This way we an create stacks for arbitrary types with hidden implementation.

- structure RealStack = MakeStack(struct type elem = real end);
> ...
- val s = RealStack.create;
> val s = <stack> : stack/5
- val s = RealStack.push(s, 1.5);
> val s = <stack> : stack/5
- RealStack.pop(s);
> val it = (<stack>, 1.5) : stack/5 * real

12.2.6. Procedural Features

I hope you agree by now that ML is a powerful language for functions and expressions, but can we also do some "normal" processing such as file I/O? Yes, we can. Like Lisp, ML is not a pure functional language. There are objects with state, mutable arrays, and even an equivalent to conventional variables which can change their value.

Let's start with references. Up to now we have worked without variables. The var statements bind symbols to value, but the value can't be changed. Using the same symbol again only means that we create a new binding with a new scope. Introducing variables so late in the game clearly indicates that they are overrated, but there are situations where they make sense. In ML, variables are implemented by binding a symbol to a reference with some additional syntax to read and set the value of the reference.

- val i = ref 0;
> val i = ref 0 : int ref
- !i;
> val it = 0 : int
- i := 5;
> val it = () : unit
- !i;
> val it = 5 : int

We create a reference pointing to some initial value with the ref function. The exclamation mark gives us the current value of the reference, and the assignment operator := (also used by the Pascal family) allows us to change the value of the reference. Setting the value is a pure non-functional operation. It returns nothing, but does all its work as a side-effect.

While a reference contains just a single value, an array is a mutable version of a vector. Applications for this kind of data structure are obviously numerical computations on vectors and matrices, because they call for an efficient in-place implementation.

- open Array;
> ...
- val a = array(5, 0.0);
> val a = <array> : real array
- length(a);
> val it = 5 : int
- sub(a, 3);
> val it = 0.0 : real
- update(a, 0, 1.5);
> val it = () : unit
- sub(a, 0);
> val it = 1.5 : real

All array functions are contained in the Array module which we open first. We create an array with the array function supplying the array's length and an initial value. This value determines the type of the array. The functions sub and update fetch and set an element in the array, respectively.

There are multiple functions in the Array module which apply a function to all elements in an array. We can, for example, print the array using a combination of app and print.

- app (fn x => print (Real.toString(x) ^ " ")) a;
1.5 0.0 0.0 0.0 0.0 > val it = () : unit

We can also modify the array (or parts of it) with the modify function.

- modify (fn x => 2.0 * x) a;
> val it = () : unit
- app (fn x => print (Real.toString(x) ^ " ")) a;
3.0 0.0 0.0 0.0 0.0 > val it = () : unit

Again, the update and modify functions are completely non-functional and therefore should be handled with care.

Now that we have references at our disposal, other typical elements of procedural languages start making sense. The while loop executes an expression while a condition is satisfied. Such a loop make sense only if the result of the condition can be changed through some side-effect. Here is a classical procedural loop using a reference for the loop variable.

- val i = ref 0;
> val i = ref 0 : int ref
- while !i < 10 do (
    print(Int.toString(!i) ^ " ");
    i := !i + 1);
0 1 2 3 4 5 6 7 8 9 > val it = () : unit

This really looks like procedural code. While ML focuses on functional programming, it allows for procedural code as well in order to solve real-life problems.

The following example reads just like procedural code.

- load "TextIO";
> val it = () : unit
- val out = TextIO.openOut("test.txt");
> val out = <outstream> : outstream
- TextIO.output(out, "this is Joe\n");
> val it = () : unit
- TextIO.output(out, "this is John\n");
> val it = () : unit
- TextIO.closeOut(out);
> val it = () : unit
- val f = TextIO.openIn("test.txt");
> val f = <instream> : instream
- while not(TextIO.endOfStream(f)) do
    print(TextIO.inputLine(f));
this is Joe
this is John
> val it = () : unit

This is also the first time we use a so-called structure. Structures in ML correspond to modules in Python (or Modula). To use a structure we have to load it (unless we define it ourselves, but this is a different story covered later) and then precede the function calls with the module qualifier just like in Python. In the example, we use the standard structure TextIO which offers text-oriented input and output streams. These streams have state and can be used like streams in a procedural language. We also notice the procedural control statement, the while loop. Another way to implement the same iteration through the lines in a file uses a loop function.

- fun loop(hasNext, next, f) =
    while (hasNext(f)) do next(f);
> val ('a, 'b) loop = fn : ('a -> bool) * ('a -> 'b) * 'a -> unit
- val f = TextIO.openIn("test.txt");
> val f = <instream> : instream
- loop(not o TextIO.endOfStream, print o TextIO.inputLine, f);
this is Joe
this is John
> val it = () : unit

Notes

[1]

According to [GILMORE03]>, the prime is supposed to indicate a greek letter, that is, 'a stands for the greek alpha.