Archive for December 2014

## Lampsort

In support of his view of software methodology, Leslie Lamport likes to use the example of non-recursive Quicksort. Independently of the methodological arguments, his version of the algorithm should be better known. In fact, if I were teaching “data structures and algorithms” I would consider introducing it first.

As far as I know he has not written down his version in an article, but he has presented it in lectures; see [1]. His trick is to ask the audience to give a non-recursive version of Quicksort, and of course everyone starts trying to remove the recursion, for example by making the stack explicit or looking for invertible functions in calls. But his point is that recursion is not at all fundamental in Quicksort. The recursive version is a specific implementation of a more general idea.

Lamport’s version — let us call it Lampsort —is easy to express in Eiffel. We may assume the following context:

*a*: *ARRAY* [*G* -> *COMPARABLE*] — The array to be sorted.

*pivot*: *INTEGER* — Set by *partition*.

*picked*: *INTEGER_INTERVAL *— Used by the sorting algorithm, see below.

*partition* (*i*, *j*: *INTEGER*)

……..**require** — *i*..*j* is a sub-interval of the array’s legal indexes:

……..……*..**i* < *j*

……..……..*i* >= *a*.*lower*

……..…….*.**j* <= *a.upper*

……..**do**

……..……..… Usual implementation of *partition* …

……..**ensure** — The expected effect of *partition*:

……..……..*pivot* >=* i*

……..……..*pivot* <* j*

……..……..— *a* [*i*..*j*] has been reshuffled so that elements in *i*..*pivot* are less than

……..……..— or equal to those in *pivot*+1 .. *j*.

……..**end**

We do not write the implementation of *partition* since the point of the present discussion is the overall algorithm. In the usual understanding, that algorithm consists of doing nothing if the array has no more than one element, otherwise performing a *partition *and then recursively calling itself on the two resulting intervals. The implementation can take advantage of parallelism by forking the recursive calls out to different processors. That presentation, says Lamport, describes only a possible implementation. The true Quicksort is more general. The algorithm works on a set of integer intervals; in Eiffel:

*intervals*: *SET *[*INTEGER_INTERVAL*]

It initializes *intervals *to contain a single element, the entire interval; at each iteration, it removes an interval from the set, *partition*s it if that makes sense (i.e. the interval has more than one element), and inserts the resulting two intervals into the set. It ends when the set is empty. Here it is:

……..**from** — Initialize interval set to contain a single interval, the array’s entire index range:

……..…..**create** *intervals*.*make_one* (*a*.*lower* |..| *a*.*upper*)…. ..……..

……..**invariant**

……..…..— See below

……..**until**

……..…..*intervals*.*is_empty* — Stop when there are no more intervals in set

……..**loop**

……..…..*picked* := *intervals*.*item* — Pick an interval from (non-empty) interval set.

……..……**if** *picked*.*count* > 1 **then** — (The precondition of *partition* holds, see below.)

……..……..…..*partition* (*picked*.*lower*, *picked*.*upper*) — Split it, moving small items before and large ones after *pivot*.

……..……..…..*intervals*.*extend* (*picked*.*lower* |..| *pivot*) — Insert new intervals into interval set: first

……..……....…*intervals*.*extend* (*pivot* + 1 |..| *picked*.*upper*) — and second.

……..……**end**

……..…...*intervals*.*remove* (*picked*) — Remove interval that was just *partition*ed.

…….**end**

Eiffel note: the function yielding an integer interval is declared in the library class *INTEGER* using the operator |..| (rather than just ..).

The query *item *from *SET*, with the precondition** not*** is_empty*, returns an element of the set. It does not matter which element. In accordance with the Command-Query Separation principle, calling *item *does not modify the set; to remove the element you have to use the command *remove*. The command *extend* adds an element to the set.

The abstract idea behind Lampsort, explaining why it works at all, is the following loop invariant (see [2] for a more general discussion of how invariants provide the basis for understanding loop algorithms). We call “slice” of an array a non-empty contiguous sub-array; for adjacent slices we may talk of concatenation; also, for slices* s* and *t*, *s* <= *t* means that every element of *s* is less than or equal to every element of *t*. The invariant is:

*a* is the concatenation of the members of a set *slices* of disjoint slices, such that:

– The elements of *a* are a permutation of its original elements.

– The index range of any member of *slices* having more than one element is in *intervals*.

– For any adjacent slices *s* and *t* (with *s* before *t*), *s* <= *t*.

The first condition (conservation of the elements modulo permutation) is a property of *partition*, the only operation that can modify the array. The rest of the invariant is true after initialization (**from** clause) *with **slices* made of a single slice, the full array. The loop body maintains it since it either removes a one-element interval from *intervals* (*slices* loses the corresponding slice) or performs *partition* with the effect of partitioning one slice into two adjacent ones satisfying *s* <= *t*, whose intervals replace the original one in *intervals*. On exit, *intervals* is empty, so *slices* is a set of one-element slices, each less than or equal to the next, ensuring that the array is sorted.

The invariant also ensures that the call to *partition *satisfies that routine’s precondition.

The Lampsort algorithm is a simple loop; it does not use recursion, but relies on an interesting data structure, a set of intervals. It is not significantly longer or more difficult to understand than the traditional recursive version

*sort *(*i*, *j*: *INTEGER*)

……..**require**

……..……*..**i* <= *j*

……..……..*i* >= *a*.*lower*

……..…….*.**j* <= *a.upper*

……..**do**

……..……**if** *j* > *i* **then** — Note that precondition of *partition* holds.

……..……..…..*partition* (*i*,* j*) — Split into two slices *s* and* t* such that* s* <= *t*.

……..……..…..*sort *(*i, pivot*) — Recursively sort first slice.

……..……..…..*sort *(*pivot*+1,* j*) — Recursively sort second slice.

……..……**end**……..…..

……..**end**

Lampsort, in its author’s view, captures the true idea of Quicksort; the recursive version, and its parallelized variants, are only examples of possible implementations.

I wrote at the start that the focus of this article is Lampsort as an algorithm, not issues of methodology. Let me, however, give an idea of the underlying methodological debate. Lamport uses this example to emphasize the difference between algorithms and programs, and to criticize the undue attention being devoted to programming languages. He presents Lampsort in a notation which he considers to be at a higher level than programming languages, and it is for him an algorithm rather than a program. *Programs* will be specific implementations guided in particular by efficiency considerations. One can derive them from higher-level versions (algorithms) through refinement. A refinement process may in particular remove or restrict *non-determinism*, present in the above version of Lampsort through the query *item* (whose only official property is that it returns an element of the set).

The worldview underlying the Eiffel method is almost the reverse: treating the whole process of software development as a continuum; unifying the concepts behind activities such as requirements, specification, design, implementation, verification, maintenance and evolution; and working to resolve the remaining differences, rather than magnifying them. Anyone who has worked in both specification and programming knows how similar the issues are. Formal specification languages look remarkably like programming languages; to be usable for significant applications they must meet the same challenges: defining a coherent type system, supporting abstraction, providing good syntax (clear to human readers and parsable by tools), specifying the semantics, offering modular structures, allowing evolution while ensuring compatibility. The same kinds of ideas, such as an object-oriented structure, help on both sides. Eiffel as a language is the notation that attempts to support this seamless, continuous process, providing tools to express both abstract specifications and detailed implementations. One of the principal arguments for this approach is that it supports *change* and *reuse*. If everything could be fixed from the start, maybe it could be acceptable to switch notations between specification and implementation. But in practice specifications change and programs change, and a seamless process relying on a single notation makes it possible to go back and forth between levels of abstraction without having to perform repeated translations between levels. (This problem of change is, in my experience, the biggest obstacle to refinement-based approaches. I have never seen a convincing description of how one can accommodate specification changes in such a framework without repeating the whole process. Inheritance, by the way, addresses this matter much better.)

The example of Lampsort in Eiffel suggests that a good language, equipped with the right abstraction mechanisms, can be effective at describing not only final implementations but also abstract algorithms. It does not hurt, of course, that these abstract descriptions can also be executable, at the possible price of non-optimal performance. The transformation to an optimal version can happen entirely within the same method and language.

Quite apart from these discussions of software engineering methodology, Lamport’s elegant version of Quicksort deserves to be known widely.

#### References

[1] Lamport video here, segment starting at 0:32:34.

[2] Carlo Furia, Bertrand Meyer and Sergey Velder: *Loop invariants: Analysis, Classification and Examples*, in ACM *Computing Surveys*, September 2014, preliminary text here.