Archive for December 2014
Lampsort
In support of his view of software methodology, Leslie Lamport likes to use the example of non-recursive Quicksort. Independently of the methodological arguments, his version of the algorithm should be better known. In fact, if I were teaching “data structures and algorithms” I would consider introducing it first.
As far as I know he has not written down his version in an article, but he has presented it in lectures; see [1]. His trick is to ask the audience to give a non-recursive version of Quicksort, and of course everyone starts trying to remove the recursion, for example by making the stack explicit or looking for invertible functions in calls. But his point is that recursion is not at all fundamental in Quicksort. The recursive version is a specific implementation of a more general idea.
Lamport’s version — let us call it Lampsort —is easy to express in Eiffel. We may assume the following context:
a: ARRAY [G -> COMPARABLE] — The array to be sorted.
pivot: INTEGER — Set by partition.
picked: INTEGER_INTERVAL — Used by the sorting algorithm, see below.
partition (i, j: INTEGER)
……..require — i..j is a sub-interval of the array’s legal indexes:
……..……..i < j
……..……..i >= a.lower
……..……..j <= a.upper
……..do
……..……..… Usual implementation of partition …
……..ensure — The expected effect of partition:
……..……..pivot >= i
……..……..pivot < j
……..……..— a [i..j] has been reshuffled so that elements in i..pivot are less than
……..……..— or equal to those in pivot+1 .. j.
……..end
We do not write the implementation of partition since the point of the present discussion is the overall algorithm. In the usual understanding, that algorithm consists of doing nothing if the array has no more than one element, otherwise performing a partition and then recursively calling itself on the two resulting intervals. The implementation can take advantage of parallelism by forking the recursive calls out to different processors. That presentation, says Lamport, describes only a possible implementation. The true Quicksort is more general. The algorithm works on a set not_sorted of integer intervals i..j such that the corresponding array slices a [i..j] are the only ones possibly not sorted; the goal of the algorithm is to make not_sorted empty, since then we know the entire array is sorted. In Eiffel we declare this set as:
not_sorted: SET [INTEGER_INTERVAL]
The algorithm initializes not_sorted to contain a single element, the entire interval; at each iteration, it removes an interval from the set, partitions it if that makes sense (i.e. the interval has more than one element), and inserts the resulting two intervals into the set. It ends when not_sorted is empty. Here it is:
……..from — Initialize interval set to contain a single interval, the array’s entire index range:
……..…..create not_sorted.make_one (a.lower |..| a.upper)…. ..……..
……..invariant
……..…..— See below
……..until
……..…..not_sorted.is_empty — Stop when there are no more intervals in set
……..loop
……..…..picked := not_sorted.item — Pick an interval from (non-empty) interval set.
……..……if picked.count > 1 then — (The precondition of partition holds, see below.)
……..……..…..partition (picked.lower, picked.upper) — Split, moving small items before & large ones after pivot.
……..……..…..not_sorted.extend (picked.lower |..| pivot) — Insert new intervals into the set of intervals: first
……..……....…not_sorted.extend (pivot + 1 |..| picked.upper) — and second.
……..……end
……..…...not_sorted.remove (picked) — Remove interval that was just partitioned.
…….end
Eiffel note: the function yielding an integer interval is declared in the library class INTEGER using the operator |..| (rather than just ..).
The query item from SET, with the precondition not is_empty, returns an element of the set. It does not matter which element. In accordance with the Command-Query Separation principle, calling item does not modify the set; to remove the element you have to use the command remove. The command extend adds an element to the set.
The abstract idea behind Lampsort, explaining why it works at all, is the following loop invariant (see [2] for a more general discussion of how invariants provide the basis for understanding loop algorithms). We call “slice” of an array a non-empty contiguous sub-array; for adjacent slices we may talk of concatenation; also, for slices s and t, s <= t means that every element of s is less than or equal to every element of t. The invariant is:
a is the concatenation of the members of a set slices of disjoint slices, such that:
– The elements of a are a permutation of its original elements.
– The index range of any member of slices having more than one element is in not_sorted.
– For any adjacent slices s and t (with s before t), s <= t.
The first condition (conservation of the elements modulo permutation) is a property of partition, the only operation that can modify the array. The rest of the invariant is true after initialization (from clause) with slices made of a single slice, the full array. The loop body maintains it since it either removes a one-element interval from not_sorted (slices loses the corresponding slice) or performs partition with the effect of partitioning one slice into two adjacent ones satisfying s <= t, whose intervals replace the original one in not_sorted. On exit, not_sorted is empty, so slices is a set of one-element slices, each less than or equal to the next, ensuring that the array is sorted.
The invariant also ensures that the call to partition satisfies that routine’s precondition.
The Lampsort algorithm is a simple loop; it does not use recursion, but relies on an interesting data structure, a set of intervals. It is not significantly longer or more difficult to understand than the traditional recursive version
sort (i, j: INTEGER)
……..require
……..……..i <= j
……..……..i >= a.lower
……..……..j <= a.upper
……..do
……..……if j > i then — Note that precondition of partition holds.
……..……..…..partition (i, j) — Split into two slices s and t such that s <= t.
……..……..…..sort (i, pivot) — Recursively sort first slice.
……..……..…..sort (pivot+1, j) — Recursively sort second slice.
……..……end……..…..
……..end
Lampsort, in its author’s view, captures the true idea of Quicksort; the recursive version, and its parallelized variants, are only examples of possible implementations.
I wrote at the start that the focus of this article is Lampsort as an algorithm, not issues of methodology. Let me, however, give an idea of the underlying methodological debate. Lamport uses this example to emphasize the difference between algorithms and programs, and to criticize the undue attention being devoted to programming languages. He presents Lampsort in a notation which he considers to be at a higher level than programming languages, and it is for him an algorithm rather than a program. Programs will be specific implementations guided in particular by efficiency considerations. One can derive them from higher-level versions (algorithms) through refinement. A refinement process may in particular remove or restrict non-determinism, present in the above version of Lampsort through the query item (whose only official property is that it returns an element of the set).
The worldview underlying the Eiffel method is almost the reverse: treating the whole process of software development as a continuum; unifying the concepts behind activities such as requirements, specification, design, implementation, verification, maintenance and evolution; and working to resolve the remaining differences, rather than magnifying them. Anyone who has worked in both specification and programming knows how similar the issues are. Formal specification languages look remarkably like programming languages; to be usable for significant applications they must meet the same challenges: defining a coherent type system, supporting abstraction, providing good syntax (clear to human readers and parsable by tools), specifying the semantics, offering modular structures, allowing evolution while ensuring compatibility. The same kinds of ideas, such as an object-oriented structure, help on both sides. Eiffel as a language is the notation that attempts to support this seamless, continuous process, providing tools to express both abstract specifications and detailed implementations. One of the principal arguments for this approach is that it supports change and reuse. If everything could be fixed from the start, maybe it could be acceptable to switch notations between specification and implementation. But in practice specifications change and programs change, and a seamless process relying on a single notation makes it possible to go back and forth between levels of abstraction without having to perform repeated translations between levels. (This problem of change is, in my experience, the biggest obstacle to refinement-based approaches. I have never seen a convincing description of how one can accommodate specification changes in such a framework without repeating the whole process. Inheritance, by the way, addresses this matter much better.)
The example of Lampsort in Eiffel suggests that a good language, equipped with the right abstraction mechanisms, can be effective at describing not only final implementations but also abstract algorithms. It does not hurt, of course, that these abstract descriptions can also be executable, at the possible price of non-optimal performance. The transformation to an optimal version can happen entirely within the same method and language.
Quite apart from these discussions of software engineering methodology, Lamport’s elegant version of Quicksort deserves to be known widely.
References
[1] Lamport video here, segment starting at 0:32:34.
[2] Carlo Furia, Bertrand Meyer and Sergey Velder: Loop invariants: Analysis, Classification and Examples, in ACM Computing Surveys, September 2014, preliminary text here.