Archive for the ‘Software engineering’ Category.

Domain Theory: the forgotten step in program verification

 

Program verification is making considerable progress but is hampered by a lack of abstraction in specifications. A crucial step is, almost always, absent from the process; this omission is the principal obstacle to making verification a standard component of everyday software development.

1. Steps in software verification

In the first few minutes of any introduction to program verification, you will be told that the task requires two artifacts: a program, and a specification. The program describes what executions will do; the specification, what they are supposed to do. To verify software is to ascertain that the program matches the specification: that it does is what it should.

The consequence usually drawn is that verification consists of three steps: write a specification, write a program, prove that the program satisfies the specification. The practical process is of course messier, if only because the first two steps may occur in the reverse order and, more generally, all three steps are often intertwined: the specification and the program influence each other, in particular through the introduction of “verification conditions” into the program; and initial proof attempts will often lead to changes in both the specification and the program. But by and large these are the three accepted steps.

Such a description misses a fourth step, a prerequisite to specification that is essential to a scalable verification process: Domain Theory. Any program addresses a specific domain of discourse, be it the domain of network access and communication for a mobile phone system, the domain of air travel for a flight control system, of companies and shares for a stock exchange system and so on. Even simple programs with a limited scope, such as the computation of the maximum of an array, use a specific domain beyond elementary mathematics. In this example, it is the domain of arrays, with their specific properties: an array has a range, a minimum and maximum indexes in that range, an associated sequence of values; we may define a slice a [i..j], ask for the value associated with a given index, replace an element at a given index and so on. The Domain Theory provides a formal model for any such domain, with the appropriate mathematical operations and their properties. In the example the operations are the ones just mentioned, and the properties will include the axiom that if we replace an element at a certain index i with a value v then access the element at an index j, the value we get is v if i = j, and otherwise the earlier value at j.

2. The role of a Domain Theory

The task of devising a Domain Theory is to describe such a domain of reference, in the spirit of abstract data types: by listing the applicable operations and their properties. If we do not treat this task as a separate step, we end up with the kind of specification that works for toy examples but quickly becomes unmanageable for real-life applications. Most of the verification literature, unfortunately, relies on such specifications. They lack abstraction since they keep using the lowest-level mathematical objects and constructs, such as numbers and quantified expressions. They are to specification what assembly language is to modern programming.

Dines Bjørner has for a long time advocated a closely related idea, domain engineering; see for example his book in progress [1]. Unfortunately, he does not take advantage of modularization through abstract data types; the book is an example of always-back-to-the-basics specification, resorting time and again to fully explicit specifications based on a small number of mathematical primitives, and as a consequence making formal specification look difficult.

3. Maximum computed from both ends

As a simple example of modeling through an abstract theory consider an algorithm for computing the maximum of an array. We could use the standard technique that goes through the array one-way, but for variety let us take the algorithm that works from both ends, moving two integer cursors towards each other until they meet.  (This example was used in a verification competition at a recent conference, I forgot which one.) The code looks like this:

Two-way maximum

The specification, expressed by the postcondition (ensure) should state that Result is the maximum of the array; the loop invariant will be closely related to it. How do we express these properties? The obvious way is not the right way. It states the postcondition as something like

k: Z | (ka.lowerka.upper) ⇒ a [k] ≤ Result

k: Z | ka.lowerka.upper a [k] = Result

In words, Result is at least as large as every element of the array, and is equal to at least one of the elements of the array. The invariant can also be expressed in this style (try it).

The preceding specification expresses the desired property, but it is of an outrageously lower level than called for. The notion of maximum is a general one for arrays over an ordered type. It can be computed through many different algorithms in addition to the one shown above, and exists independently of these algorithms. The detailed, assembly-language-like definition of its properties should not have to be repeated in every case. It should be part of the Domain Theory for the underlying notion, arrays.

4. A specification at the right level of abstraction

In a Domain Theory for arrays of elements from an ordered set, one of the principal operations is maximum, satisfying the above properties. The definition of maximum through these properties belongs at the Domain Theory level. The Domain Theory should include that definition, independent of any particular computational technique such as two_way_max. Then the routine’s postcondition, relying on this notion from the Domain Theory, becomes simply

Result = a.maximum

The application of this approach to the loop invariant is particularly interesting. If you tried to write it at the lowest level, as suggested above, you should have produced something like this:

a.lowerija.upper

k: Z | kikj ∧ (∀ l: Z | l a.lowerl a.upper a [l] ≤ a [k])

The first clause is appropriate but the rest is horrible! With its nested quantified expressions it gives an impression of great complexity for a property that is in fact straightforward, simple enough in fact to be explained to a 10-year-old: the maximum of the entire array can be found between indexes i and j. In other words, it is also the maximum of the array slice going from i to j. The Domain Theory will define the notion of slice and enable us to express the invariant as just

a.lowerij a.upper — This bounding clause remains

a.maximum = (a [i..j ]).maximum

(where we will write the slice a [i..j ] as a.slice (i, j ) if we do not have mechanisms for defining special syntax). To verify the routine becomes trivial: on loop exit the invariant still holds and i = j, so the maximum of the entire array is given by the maximum of the single-element slice a [i..i ], which is the value of its single element a [i ]. This last property — the maximum of a single-element array is its single value — is independent of the verification of any particular program and should be proved as a little theorem of the Domain Theory for arrays.

The comparison between the two versions is striking: without Domain Theory, we are back to the most tedious mathematical manipulations again and again; simple, clear properties look complicated and obscure. This just for a small example on basic data structures; now think what it will be for a complex application domain. Without a first step of formal modeling to develop a Domain Theory, no realistic specification and verification process is realistic.

Although the idea is illustrated here through examples of individual routines, the construction of a Domain Theory should usually occur, in an object-oriented development process, at the level of a class: the embodiment of an abstract data type, which is at the appropriate level of granularity. The theory applies to objects of a given type, and hence will be used for the verification of all operations of that type. This observation justifies the effort of devising a Domain Theory, since it will benefit a whole set of software elements.

5. Components of a Domain Theory

The Domain Theory should include the three ingredients illustrated in the example:

  • Operations, modeled as mathematical functions (no side effects of course, we are in the world of specification).
  • Axioms characterizing the defining properties of these operations.
  • Theorems, characterizing other important properties.

This approach is of course nothing else than abstract data types (the same thing, although few people realize it, as object-oriented analysis). Even though ADTs are a widely popularized notion, supported for example by tools such as CafeOBJ [2] and Maude [3], it is generally not taken to its full conclusions; in particular there is too often a tendency to define every new ADT from scratch, rather than building up libraries of reusable high-level mathematical components in the O-O spirit of reuse.

6. Results, not just definitions

In devising a Domain Theory with the three kinds of ingredient listed above, we should not forget the last one, the theorems! The most depressing characteristic of much of the work on formal specification is that it is long on definitions and short on results, while good mathematics is supposed to be the reverse. I think people who have seriously looked at formal methods and do not adopt them are turned off not so much by the need to use mathematics but by the impression they get little value for it.

That is why Eiffel contracts do get adopted: even if it’s just for testing and debugging, people see immediate returns. It suffices for a programmer to have caught one bug as the violation of a simple postcondition to be convinced for life and lose any initial math-phobia.

7. Quantifiers are evil

As we go beyond simple contract properties — this argument must be positive, this reference will not be void — the math needs to be at the same level of abstraction to which, as modern programmers, we are accustomed. For example, one should always be wary of program specifications relying directly on quantified expressions, as in the low-level variants of the postcondition and loop invariant of the two_way_max routine.

This is not just a matter of taste, as in the choice in logic [4] between lambda expressions (more low-level but also more immediately understandable) and combinators (more abstract but, for many, more abstruse). We are talking here about the fundamental software engineering problem of scalability; more generally, of the understandability, extendibility and reusability of programs, and the same criteria for their specification and verification. Quantifiers are of course needed to express fundamental properties of a structure but in general should not directly appear in program assertions: as the example illustrated, their level of abstraction is lower than the level of discourse of a modern object-oriented program. If the rule — Quantifiers Considered Harmful — is not absolute, it must be pretty close.

Quantified expressions, “All elements of this structure possess this property” and “Some element of this structure possesses this property” — belong in the description of the structure and not in the program. They should appear in the Domain Theory, not in the verification. If you want to express that a hash table search found an element of key K, you should not write

(Result = Void ∧ (∀ i: Z | i a.loweri a.upper a.item (i).key ≠ K))

(ResultVoid ∧ (∀ i: Z | i a.loweri a.upper a.item (i).key = K ∧ Result = a.item (i))

but

Result /= Void     (Result a.elements_of_key (K))

The quantified expressions will appear in the Domain Theory for the corresponding structure, in the definition of such domain properties as elements_of_key. Then the program’s specification — the contracts to be verified — can rely on concepts that make sense to the programmer; the verification will take advantage of theorems that have been proved independently since they belong to the Domain Theory and do not depend on individual programs.

8. Even the simplest examples…

Practical software verification requires Domain Theory even in the simplest cases, including those often used as purely academic examples. Perhaps the most common (and convenient) way to explain the notion of loop invariant is Euclid’s algorithm to compute the greatest common divisor (gcd) of two numbers (with a structure remarkably similar to that of two_way_max):
Euclid

I have expressed the postcondition using a concept from an assumed Domain Theory for the underlying problem: gcd, the mathematical function that yields the greatest common divisor of two integers. Many specifications I have seen go back to the basics, with something like this (using \\ for integer remainder):

a \\ Result = 0 b \\ Result = 0   ∀ i: N | (a \\ i = 0) ∧ (b \\ i = 0)  i Result

This is indeed the definition of what it means for Result to be the gcd of a and b (it divides a, it divides b, and is greater than any other integer that also has these two properties). But it makes no sense to include such a detailed mathematical property in the specification of a program element. It belongs in the domain theory, where it will serve as the definition of a function gcd, which we can then use directly in the specification of the program.

Note how the invariant makes the necessity of the Domain Theory approach even more clear: try to express it in the basic mathematical form, not using the function gcd, It can be done, but the result is typical of the high complexity to usefulness ratio of traditional formal specifications mentioned above. Instead, the invariant that I have included in the program text above says exactly what there is to say, clearly and concisely: at each iteration, the gcd of our two temporary values, i and j, is the result that we are seeking, the gcd of the original values a and b. On exit from the loop, when i and j are equal, their common value is that result.

It is also thanks to the Domain Theory modeling that the verification of the program — consisting of proving that the stated property is indeed invariant — will be so simple: as part of the theory, we should have the two little theorems

i > j > 0 gcd (i, j) = gcd (ij, j)
gcd
(i, i) = i

which immediately show the implementation to be correct.

Inside of any big, fat, messy, quantifier-ridden specification there is a simple, elegant and clear Domain-Theory-based specification desperately trying to get out. Find it and use it.

9. From Domain Theory to domain library

One of the reasons most people working on program verification have not used the division into levels of discourse described here, with a clear role for developing a Domain Theory, is that they lack the appropriate notational support. Mathematical notation is of course available, but we are talking about programs a general verification framework cannot resort to a new special notation for every new application domain.

This is one of the places where Eiffel provides a consistent solution, through its seamless approach to integrating programs and specifications in a single notation. Thanks to mechanisms such as deferred classes (classes that describe concepts through detailed specifications without committing to an implementation), Eiffel is as much for specification as for design and implementation; a Domain Theory can be expressed though a set of deferred Eiffel classes, which we may call a domain library. The classes in a domain library should not just be deferred, meaning devoid of implementation; they should in addition describe stateless operations only — queries, not commands — since they are modeling purely mathematical concepts.

An earlier article in this blog [5] outlined the context of our verification work: the EVE project (Eiffel Verification Environment), a practical approach to integrating software verification in the day-to-day practice of modern software development, with the slogan ““Verification As a Matter Of Course”. In this project we have applied the idea of Domain Theory by building a domain library covering fundamental concepts of set theory, including functions and relations. This is the Mathematical Model Library (MML) [6, 7], which we use to verify the new data structure library EiffelBase 2 using specifications at the appropriate level of abstraction.

MML is in fact useful for the specification of a wide variety of programs, since almost every application area can benefit from the general concepts of set, subset, relation and such. But to cover a specific application domain, say flight traffic control, MML will generally not suffice; you will need to devise a Domain Theory that mathematically models the target domain, and may express it in the form of a domain library written in the same general spirit as MML: all deferred, stateless, focused on high-level abstractions.

It is one of the attractions of Eiffel that you can express such a theory and library in the same notation as the programs that will use it — more precisely in a subset of that notation, since the specification classes do not need the imperative constructs of the language such as instructions and attributes. Then both the development process and the verification use a seamlessly integrated set of notations and techniques, and all use the same tools from a modern IDE, in our case EiffelStudio, for browsing, editing, working with graphical repreentation, metrics etc.

10. DSL libraries for specifications

A mechanism to express Domain Theories is to a general specification mechanism essentially like a Domain Specific Language (DSL) is to a general programming language: a specialization for a particular domain. Domain libraries make the approach practical by:

  • Embedding the specification language in the programming language.
  • Fundamentally relying on reuse, in the best spirit of object technology.

This approach is in line with the one I presented for handling DSLs in an earlier article of this blog [8] (thanks, by the way, for the many comments received, some of them posted here and some on Facebook and LinkedIn where the post triggered long discussions). It is usually a bad idea to invent a new language for a new application domain. A better solution is to rely on libraries, by taking advantage of the power of object-oriented mechanisms to model (in domain libraries) and implement (for DSLs) the defining features of such a domain, and to make the result widely reusable. The resulting libraries are purely descriptive in the case of a domain library expressing a Domain Theory, and directly usable by programs in the case of a library embodying a DSL, but the goal is the same.

11. A sound and necessary engineering practice

Many ideas superficially look similar to Domain Theory: domain engineering as mentioned above, “domain analysis” as widely discussed in the requirements literature, model-driven development, abstract data type specification… They all start from some of the same observations, but  Domain Theory as described in this article is something different: a systematic approach to modeling an arbitrary application domain mathematically, which:

  • Describes the concepts through applicable operations, axioms and (most importantly) theorems.
  • Expresses these elements in an applicative (side-effect free, i.e. equivalent to pure mathematics) subset of the programming language, for direct embedding in program specifications.
  • Relies on the class mechanism to structure the results.
  • Collects the specifications into specification libraries and promotes the reuse of specifications in the same way we promote software reuse.
  • Uses the combination of these techniques to ensure that program specifications are at a high level of abstraction, compatible with the programmers’ view of their software.
  • Promotes a clear and effective verification process.

The core idea is in line with standard engineering practices in disciplines other than software: to build a bridge, a car or a chip you need first to develop a sound model of the future system and its environment, using any useful models developed previously rather than always going back to elementary textbook mathematics.

It seems in fact easier to justify doing Domain Analysis than to justify not doing it. The power of expression and abstraction of our programs has grown by leaps and bounds; it’s time for our specifications to catch up.

References

[1] Dines Bjørner: From Domains to Requirements —The Triptych Approach to Software Engineering, “to be submitted to Springer”, available here.

[2] Kokichi Futatsugi and others: CafeObj page, here.

[3] José Meseguer and others: Maude publication page, here.

[4] J. Roger Hindley, J. P. Seldin: Introduction to Combinators and l-calculus, Cambridge University Press, 1986.

[5] Verification As a Matter Of Course, earlier article on this blog (March 2010), available here.

[6] Bernd Schoeller, Tobias Widmer and Bertrand Meyer. Making specifications complete through models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, pages 48-70, available here.

[7] Nadia Polikarpova, Carlo A. Furia and Bertrand Meyer: Specifying Reusable Components, in VSTTE’10: Verified Software: Theories, Tools and Experiments, Edinburgh, August 2010, Lecture Notes in Computer Science, Springer-Verlag, available here.

[8] Never Design a Language, earlier article on this blog (January 2012), available here.

VN:F [1.9.10_1130]
Rating: 9.3/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Aliasing and framing: Saint Petersburg seminar next week

In  last Thursday’s session of the seminar, Kokichi Futatsugi’s talk took longer than planned (and it would have been a pity to stop him), so I postponed my own talk on Automatic inference of frame conditions through the alias calculus to next week (Thursday local date). As usual it will be broadcast live.

Seminar page: here, including the link to follow the webcast.

Time and date: 5 April 2012, 18 Saint Petersburg time; you can see the local time at your location here.

Abstract:

Frame specifications, the description of what does not change in a routine call, are one of the most annoying components of verification, in particular for object-oriented software. Ideally frame conditions should be inferred automatically. I will present how the alias calculus, described in recent papers, can address this need.

There may be a second talk, on hybrid systems, by Sergey Velder.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

A carefully designed Result

 

In the Eiffel user discussion group [1], Ian Joyner recently asked:

A lot of people are now using Result as a variable name for the return value in many languages. I believe this first came from Eiffel, but can’t find proof. Or was it adopted from an earlier language?

Proof I cannot offer, but certainly my recollection is that the mechanism was an original design and not based on any previous language. (Many of Eiffel’s mechanisms were inspired by other languages, which I have always acknowledged as precisely as I could, but this is not one of them. If there is any earlier language with this convention — in which case a reader will certainly tell me — I was and so far am not aware of it.)

The competing conventions are a return instruction, as in C and languages based on it (C++, Java, C#), and Fortran’s practice, also used in Pascal, of using the function name as a variable within the function body. Neither is satisfactory. The return instruction suffers from two deficiencies:

  • It is an extreme form of goto, jumping out of a function from anywhere in its control structure. The rest of the language sticks to one-entry, one-exit structures, as I think all languages should.
  • In most non-trivial cases the return value is not just a simple formula but has to be computed through some algorithm, requiring the declaration of a local variable just to denote that result. In every case the programmer must invent a name for that variable and, in a typed language, include a declaration. This is tedious and suggests that the language should take care of the declaration for the programmer.

The Fortran-Pascal convention does not combine well with recursion (which Fortran for a long time did not support). In the body of the function, an occurrence of the function’s name can denote the result, or it can denote a recursive call; conventions can be defined to remove the ambiguity, but they are messy, especially for a function without arguments: in function f, does the instruction

f := f + 1

add one to the value of the function’s result as computed so far, as it would if f were an ordinary variable, or to the result of calling f recursively?

Another problem with the Fortran-Pascal approach is that in the absence of a language-defined rule for variable initialization a function can return an undefined result, if some path has failed to initialize the corresponding variable.

The Eiffel design addresses these problems. It combines several ideas:

  • No nesting of routines. This condition is essential because without it the name Result would be ambiguous. In all Algol- and Pascal-like languages it was considered really cool to be able to declare routines within routines, without limitation on the depth of recursion. I realized that in an object-oriented language such a mechanism was useless and in fact harmful: a class should be a collection of features — services offered to the rest of the world — and it would be confusing to define features within features. Simula 67 offered such a facility; I wrote an analysis of inter-module relations in Simula, including inheritance and all the mechanisms retained from Algol such as nesting (I am trying to find that document, and if I do I will post it in this blog); my conclusion was the result was too complicated and that the main culprit was nesting. Requiring classes to be flat structures was, in my opinion, one of the most effective design decisions for Eiffel.
  • Language-defined initialization. Even a passing experience with C and C++ shows that uninitialized variables are one of the major sources of bugs. Eiffel introduced a systematic rule for all variables, including Result, and it is good to see that some subsequent languages such as Java have retained that convention. For a function result, it is common to ignore the default case, relying on the standard initialization, as in if “interesting case” then Result:= “interesting value” end without an else clause (I like this convention, but some people prefer to make all cases explicit).
  • One-entry, one-exit blocks; no goto in overt or covert form (break, continue etc.).
  • Design by Contract mechanisms: postconditions usually need to refer to the result computed by a function.

The convention is then simple: in any function, you can use a language-defined local variable Result for you, of the type that you declared for the function result; you can use it as a normal variable, and the result returned by any particular call will be the final value of the variable on exit from the function body.

The convention has been widely imitated, starting with Delphi and most recently in Microsoft’s “code contracts”, a kind of poor-man’s Design by Contract emulation, achieved through libraries; it requires a Result notation to denote the function result in a postcondition, although this notation is unrelated to the mechanisms in the target languages such as C#. As the example of Eiffel’s design illustrates, a programming language is a delicate construction where all elements should fit together; the Result convention relies on many other essential concepts of the language, and in turn makes them possible.

Reference

[1] Eiffel Software discussion group, here.

VN:F [1.9.10_1130]
Rating: 8.3/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 9 votes)

New LASER proceedings

Springer has just published in the tutorial sub-series of Lecture Notes in Computer Science a new proceedings volume for the LASER summer school [1]. The five chapters are notes from the 2008, 2009 and 2010 schools (a previous volume [2] covered earlier schools). The themes range over search-based software engineering (Mark Harman and colleagues), replication of software engineering experiments (Natalia Juristo and Omar Gómez), integration of testing and formal analysis (Mauro Pezzè and colleagues), and, in two papers by our ETH group, Is branch coverage a good measure of testing effectiveness (with Yi Wei and Manuel Oriol — answer: not really!) and a formal reference for SCOOP (with Benjamin Morandi and Sebastian Nanz).

The idea of these LASER tutorial books — which are now a tradition, with the volume from the 2011 school currently in preparation — is to collect material from the presentations at the summer school, prepared by the lecturers themselves, sometimes in collaboration with some of the participants. Reading them is not quite as fun as attending the school, but it gives an idea.

The 2012 school is in full preparation, on the theme of “Advanced Languages for Software Engineering” and with once again an exceptional roster of speakers, or should I say an exceptional roster of exceptional speakers: Guido van Rossum (Python), Ivar Jacobson (from UML to Semat), Simon Peyton-Jones (Haskell), Roberto Ierusalimschy (Lua), Martin Odersky (Scala), Andrei Alexandrescu (C++ and D),Erik Meijer (C# and LINQ), plus me on the design and evolution of Eiffel.

The preparation of LASER 2012 is under way, with registration now open [3]; the school will take place from Sept. 2 to Sept. 8 and, like its predecessors, in the wonderful setting on the island of Elba, off the coast of Tuscany, with a very dense technical program but time for enjoying the beach, the amenities of a 4-star hotel and the many treasures of the island. On the other hand not everyone likes Italy, the sun, the Mediterranean etc.; that’s fine too, you can wait for the 2013 proceedings.

References

[1] Bertrand Meyer and Martin Nordio (eds): Empirical Software Engineering and Verification, International Summer Schools LASER 2008-2010, Elba Island, Italy, Revised Tutorial Lectures, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[2] Peter Müller (ed.): Advanced Lectures on Software Engineering, LASER Summer School 2007-2008, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[3] LASER summer school information and registration form, http://se.ethz.ch/laser.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

ERC Advanced Investigator Grant: Concurrency Made Easy

In April we will be starting the  “Concurrency Made Easy” research project, the result of a just announced Advanced Investigator Grant from the European Research Council. Such ERC grants are awarded to a specific person, rather than a consortium of research organizations as in the usual EU funding scheme. The usual amount, which applies in my case, is 2.5 million euros (currently almost 3 .3 million dollars) over five years, on a specific theme. According to the ERC’s own description [1],

ERC Advanced Grants allow exceptional established research leaders of any nationality and any age to pursue ground-breaking, high-risk projects that open new directions in their respective research fields or other domains.

This is the most sought-after research funding instrument of the EU, with a success rate of about 12% [2], out of a group already preselected by the host institutions. What makes ERC Advanced Investigator Grants so coveted is the flexibility of the scheme (no constraints on the topic, light administrative baggage) and the trust that an award implies in a particular researcher and his ability to carry out advanced research.

The name of the CME project clearly signals its ambition: to turn concurrent programming into a normal, unheroic part of programming. Today adding concurrency to a program, usually in the form of multithreading, is very hard, complexity and risk of all kinds. Everyone is telling us that we must rethink programming, retrain programmers and revamp curricula to put the specific reasoning modes of concurrent programming at the center. I don’t think this can work; thinking concurrently is just too hard to become the default mode. Instead, we should adapt programming languages, theories and tools so that programmers can continue to apply the reasoning schemes that have proved so successful in classical programming, especially object-oriented programming with the benefit of Design by Contract.

The starting point is the SCOOP model, to which I started an introduction in an earlier article of this blog [3], with a sequel yet to come. SCOOP is a minimal extension to the O-O framework to support concurrency, yielding very simple (the S in the acronym) solutions to concurrent programming problems. As part of the CME project we plan to develop it in many different directions and establish a sound and effective formal basis.

I have put the project description — the scientific part of the actual proposal text accepted by the ERC — online [4].

In the next few weeks I will be publishing here specific announcements for the positions we are seeking to fill very quickly; they include postdocs, PhD students, and one research engineer. We are looking for candidates with excellent knowledge and practice of concurrency, Eiffel, formal techniques etc. The formal application procedure will be Web-based and is not in place yet but you can contact me if you fit the profile and are interested.

We can defeat the curse: concurrent programming (an obligatory condition of any path towards a successful future for information technology) does not have to be black magic. It can be made simple and efficient. Such is the challenge of the CME project.

References

[1] European Research Council: Advanced Grants, available here.

[2] European Research Council: Press release on 2011 Advanced Investigator Grants, 24 January 2012, available here.

[3] Concurrent Programming is Easy, article from this blog, available here.

[4] CME Advanced Investigator Grant project description, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

R&D positions in software engineering

In the past couple of weeks, three colleagues separately sent me announcements of PhD and postdoc positions, asking me to circulate them. This blog is as good a place as any other (as a matter of fact when I asked Oge de Moor if it was appropriate  he replied: “Readers of your blog are just the kind of applicants we look for…” — I hope you appreciate the compliment). So here they are:

  • Daniel Jackson, for work on Alloy, see here. (“Our plan now is to take Alloy in new directions, and dramatically improve the usability and scalability of the analyzer. We are looking for someone who is excited by these possibilities and will be deeply involved not only in design and implementation but also in strategic planning. There are also opportunities to co-advise students and participate in research proposals.”)
  • University College London (Anthony Finkelstein), a position as lecturer, similar to assistant professor), see here. The title is “Lecturer in software testing” (which, by the way, seems a bit restrictive, as I am more familiar with an academic culture in which the general academic discipline is set, say software engineering or possibly software verification, rather than a narrow specialty, but I hope the scope can be broadened).
  • Semmle, Oge de Moor’s company, see here.

These are all excellent environments, so if you are very, very good, please consider applying. If you are very, very, very good don’t, as I am soon going to post a whole set of announcements for PhD and postdoc positions at ETH in connection with a large new grant.

 

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

TOOLS 2012, “The Triumph of Objects”, Prague in May: Call for Workshops

Workshop proposals are invited for TOOLS 2012, The Triumph of Objectstools.ethz.ch, to be held in Prague May 28 to June 1. TOOLS is a federated set of conferences:

  • TOOLS EUROPE 2012: 50th International Conference on Objects, Models, Components, Patterns.
  • ICMT 2012: 5th International Conference on Model Transformation.
  • Software Composition 2012: 10th International Conference.
  • TAP 2012: 6th International Conference on Tests And Proofs.
  • MSEPT 2012: International Conference on Multicore Software Engineering, Performance, and Tools.

Workshops, which are normally one- or two-day long, provide organizers and participants with an opportunity to exchange opinions, advance ideas, and discuss preliminary results on current topics. The focus can be on in-depth research topics related to the themes of the TOOLS conferences, on best practices, on applications and industrial issues, or on some combination of these.

SUBMISSION GUIDELINES

Submission proposal implies the organizers’ commitment to organize and lead the workshop personally if it is accepted. The proposal should include:

  •  Workshop title.
  • Names and short bio of organizers .
  • Proposed duration.
  •  Summary of the topics, goals and contents (guideline: 500 words).
  •  Brief description of the audience and community to which the workshop is targeted.
  • Plans for publication if any.
  • Tentative Call for Papers.

Acceptance criteria are:

  • Organizers’ track record and ability to lead a successful workshop.
  •  Potential to advance the state of the art.
  • Relevance of topics and contents to the topics of the TOOLS federated conferences.
  •  Timeliness and interest to a sufficiently large community.

Please send the proposals to me (Bertrand.Meyer AT inf.ethz.ch), with a Subject header including the words “TOOLS WORKSHOP“. Feel free to contact me if you have any question.

DATES

  •  Workshop proposal submission deadline: 17 February 2012.
  • Notification of acceptance or rejection: as promptly as possible and no later than February 24.
  • Workshops: 28 May to 1 June 2012.

 

VN:F [1.9.10_1130]
Rating: 7.3/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Never design a language

It is a common occurrence in software development. Someone says: “We should design a language”. The usual context is that some part of the development requires a rich functionality set, and it appears appropriate to provide a flexible solution through a specialized language. As an example, in the development of an airline’s frequent flyer program on which I once worked the suggestion came to design a “Flyer Award Language” , with instructions appropriate for that application domain: record a trip, redeem an award, provide a statement of available miles and so on. A common term for such notations is DSL, for Domain-Specific Language.

Designing a language in such a context is almost always a bad idea (and I am not sure why I wrote “almost”). Languages are endless objects of discussion, usually on the least important aspects, which are also the most visible and those on which everyone has a strong opinion: concrete syntactic properties. People might pretend otherwise (“let’s not get bogged down on syntax, this is just one possible form”) but syntax is what the discussions will get bogged down to — keywords or symbols, this order or that order of operands, one instruction with several variants vs. several instructions… — at the expense of discussing the fundamental issues of functionality.

Worse yet, even if a language will be part of the solution it is usually just one facet to the solution. As was already explained in detail in [1], any useful functionality set will naturally be useful through several interfaces: a textual notation with concrete syntax may be one of them, but other possible ones include an API (Abstract Program Interface) for use from other software elements, a Graphical User Interface, a web user interface, yet another for web services (typically WSDL or some other XML or JSON format).

In such cases, starting with a concrete textual language is pretty silly, since it cannot yield the others directly (it would have to be parsed and further analyzed, which does not make sense). Of all the kinds of interface listed, the most fundamental one is the API: it describes the raw functionality, excluding any choice of syntax but including, thanks to contracts, elements of semantics. For example, a class AWARD in our frequent flyer application might include the feature


             redeem_for_upgrade (c: CUSTOMER; f : FLIGHT)
                                     — Upgrade c to next class of service on f.
                       require
                                    c /= holder
implies holder.allowed_substitute (c)
                                    f.permitted_for_upgrade
(Current)
                                    c.booked
( f )
                       
ensure
                                    c.class_of_service
( f ) =  old c.class_of_service ( f ) + 1

There is of course no implementation as this declaration only specifies an interface, but it says what needs to be said: to redeem the award for an upgrade, the intended customer must be either the holder of the award or an allowed substitute; the flight must be available for an upgrade with the current award (including the availability of enough miles); the intended customer must already be booked on the flight; and the upgrade will be for the next class of service.

These details are the kind of things that need to be discussed and agreed before the API is finalized. Then one can start discussing about a textual form (a DSL), a graphical interface, a web services interface. They all consist of relatively simple layers to be superimposed on a solidly defined and precisely specified basis. Once you have that basis, you can have all the fun you like arguing over everyone’s favorite forms of concrete syntax; it cannot hurt the project any more. Having these discussions early, at the expense of the more fundamental issues, is a great danger.

One of the key rules for successful software construction — as for many other ventures of course, especially in science and technology — is to distinguish the essential from the auxiliary, and consequently to devote proper attention to the essential issues while avoiding disputations of auxiliary issues. To define functionality, API is essential; language is auxiliary.

So when should you design a language? Never. Well, hardly ever.

Reference

[1] Bertrand Meyer: Introduction to the Theory of Programming Languages, Prentice Hall, 1990.

VN:F [1.9.10_1130]
Rating: 7.9/10 (18 votes cast)
VN:F [1.9.10_1130]
Rating: +8 (from 16 votes)

Concurrency seminars

Ever more people are realizing that concurrency is at the center of IT challenges for the next decades. Concurrent programming remains as hard as ever; we have put together a one-day seminar that helps understand the concepts and build successful concurrent applications. The sessions for the first few months of 2012 are:

  • Palo Alto (February 15)
  • Zurich, (March 2)
  • London (March 22)
  • Paris (May 10)
  • Stockholm (June 15)
  • Seattle (July 20)

and the seminar program is available here.

 

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Webinar today: the Varieties of Loop Invariants

I did not have time to complete my Monday post this week; it will be for next Monday (title: Never design a language). In the meantime, here is the announcement for today’s Saint Petersburg Software Engineering seminar , which can be followed live at http://sel.ifmo.ru/seminar/live (19:30 Saint Petersburg time, meaning 16:30 Zurich/Paris, 7:30 PDT on 12 January 2012), duration about one hour.

I will be talking today; the topic is “The varieties of loop invariants”, reporting on joint work with Carlo Furia and Sergey Velder. The abstract appears below.

A recording of previous talks, starting from those of last week, will soon be available on the seminar page.

 

Abstract

The key practical issue in verifying software is to come up with the right loop invariants. We are performing an extensive analysis of loop invariants in important algorithms across all major areas of computer science, and have developed a taxonomy. I will present some of the results of this ongoing work, performed with Sergey Velder (ITMO) and Carlo Furia (ETH).

VN:F [1.9.10_1130]
Rating: 9.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Webinars Dec. 29: (1) Model-based contracts (2) Assessing agile methods

The Saint Petersburg Software Engineering seminar (organized jointly by ITMO and SPbSPU universities) takes place every Thursday, normally 18-21. You can find the program at

 

http://sel.ifmo.ru/seminar/ .

Starting with the Dec. 29 seminar, the talks can now be attended remotely. You can follow them live (i.e. starting at 18:00 SP time, 15:00 Zurich/Paris, 9 AM PDT) at

http://sel.ifmo.ru/seminar/live

Warning: this is an experimental setup and it may not work perfectly the first time around.

The talks on Dec. 29 are the following (see the seminar page for the abstracts):

Nadia Polikarpova (ETH): API design with strong specifications 18-19
Bertrand Meyer: Agile Methods: The Good, The Hype and The Ugly 19-20

For the abstracts, see the seminar page referenced above.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Guest article: funding great research

In a blog article posted in its original version on this blog [1] and in a revised version on the Communications of the ACM blog [2], I emphasized the relevance of incremental research. Recently Mikkel Thorup sent me some interesting comments, which I am publishing here as the first Guest Column of this blog.

References

[1] Bertrand Meyer: One Cheer for Incremental Research, in the present blog, 10 August 2009, available here

[2] Bertrand Meyer: Long Live Incremental Research, in Communications of the ACM Blog, 13 June 2011, available here.

Guest article by Mikkel Thorup: Funding Great Research

Research foundations want great research projects. However, a while back Bertrand Meyer wrote an interesting blog post: Long Live Incremental Research [2]. With examples he showed that many of the greatest results of research could not possibly be the projected results of great sounding project descriptions. His conclusion is that we should drop the high-flying ambitions from project descriptions, and instead support more incremental research proposals, hoping that great stuff will happen on the way. Indeed incremental research is perfect for research projects with predictable deliverables. However, I suggest the opposite conclusion; namely that we for some of the funding drop the project description.

The basic idea is that foundations should encourage researchers to look for results far better than those that can reasonably be projected. In particular, researchers should be free to follow their inspiration when they see new exiting opportunities. This is not done by tying researchers to incremental projects. Instead we can sometimes switch to result based funding, that is, funding based on results already achieved (with emphasis on the more recent past). Such result based funding is more like rewards for great results, and it offers researchers the perfect incentive to do their very best so as to secure future funding.

Consider a researcher with a history of brilliant ideas taking research in surprising new directions. If we try casting this as a project, the referees will rightly complain: “It is not clear how the applicant will come up with a brilliant idea, nor is it clear what the surprise will be”. With such lack of focus and feasibility, a low project score is expected.  If the project description has a predefined weight of, say, 40%, then the overall score will be too low for funding, regardless of the researcher’s established track record of succeeding in unlikely situations.  However, research needs great new ideas. Therefore we need some result based funding so that we can support researchers with a proven talent for generating great new ideas even if we do not quite understand how it will happen.

The above problem is often very real in my field of theoretical computer science. Like in other fields, theoretical research is only interesting if it contains surprises (otherwise it is more like development). A project plan would make sense if the starting point was a surprising idea or approach that it would take years to develop, but in theory, the most exciting ideas are often strikingly simple. When first you have such an idea, you are typically close to done, ready to start writing a paper. Thus, if you have a great idea when you apply for a grant, you will typically be done long before you get the grant. The essence of the research is thus the unpredictable search for powerful ideas and insights. The most appropriate project description is therefore just a description of the importance of the area to be researched and the type of results aimed for. The track record shows which researchers have the talent to succeed.

Dropping the how-part of the project description will greatly increase methodological diversity, allowing researchers to use the strategy that has proved most suitable for their area and their own talent and skills.  As a simple example, Bertrand suggested funding incremental research, hoping that great surprising things would turn up on the way. My strategy is the opposite. I try to spend as much time as possible on overly ambitious targets. Most of the time I fail, but I rarely come home empty-handed, for by studying the unknown I nearly always discover something new, sometimes even more interesting than the original target. From the perspective of ambition, I see it as an advantage that I minimize time spend on easy targets, but foundations seem to prefer that you take a planned path with some guaranteed targets on the way. The point here is not to argue whether one strategy is superior to the other, but rather to embrace the diversity of strategies that may work depending on the area and the individual researcher.

Perhaps more seriously, if a target is hard to achieve, it may be because it requires a crazy approach that would not look reasonable to anyone else, but which may work for a researcher thanks to his special talents and intuition. Indeed I have often been positively surprised seeing how others succeeded using an approach I had myself dismissed.  As a project, such crazy approaches would fail on perceived feasibility, but the point in result based funding is that researchers are free to use whatever approach they find most efficient. Funding is given to those who prove successful. This gives the perfect incentive to do great work, securing future funding.

Result-based funding would also reduce resources needed to evaluate applications. It is very hard for a general panel to evaluate the methodology and success probability of a project.  Moreover, it requires an intimate knowledge of a field to evaluate how big a difference a result would make relative to what is already known. However, handling published results, we know what happened and we can rely on peer-review for the difference it made to the field. All the panel has to do is to evaluate how the successes meet with the objectives of the foundation.

Let us, as an example, take something like the ERC Advanced Investigator Grant which welcomes high risk high gain research. It would seem that aiming for surprising breakthroughs in an important area would fall well within this scope. Having researchers with proven skills explore the area and follow their inspiration may be the optimal strategy. Uncertainty about what they would find should not be worse than high risk. In fact, based on past performance, it may be safe to assume that they will discover something interesting if not ground-breaking. However, when projects are scored on focused feasibility, such projects will fail even if their expected return is very high. It has to be possible to get a high overall score for promising research even if standard project parameters like focus and feasibility would be counterproductive.  At the end of the day, what we want are results, not project descriptions, so what should determine the overall score is which proposal is expected to yield the greatest results.

Long live great research!

Mikkel Thorup

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

How to design software

 

 

I think I recently understood how software should be designed — or at least, since I have informally practiced the method for some time, how to explain it. Maybe not absolutely all types of software, but the most important kind: APIs (abstract program interfaces). The key task in software design is to define proper interfaces; if it is done right, everything else will fall into place, and if it is done wrong, there will be no end of problems everywhere else.

To put a Swiss theme to the description I may call this approach the Gotthard method. You are building a tunnel, starting from both sides at the same time, the northern, rainy, German-speaking part, and the southern, sunny, Italian-speaking part. (Actual languages and climates may vary.)

Tunnel through a mountain

For the method to work it is really important that the two crews should meet somewhere in the middle:

Gotthard crews meeting

In software design we are typically confronted with two views:

  • The client view: application software needs certain abstractions and functionalities that will make it easy to produce clear, simple, extendible, reusable client programs.
  • The supplier view: the necessary mechanisms are usually available in a raw form directly reflecting the underlying platform, a combination of hardware and software facilities.

Good design is a negotiation and iteration process that tries to reconcile the two views, working top-down from the client side and bottom-up from the supplier side, just as you would work when digging a tunnel between Unterwald and the Tessin.

As an example, consider a Web-oriented API. On the supplier side, we have a stateless protocol with essentially one mechanism: processing a request and sending a response. On the client side, we want to enable the building of applications, such as an e-commerce site, which need to pretend that they are working with stateful sessions, just as with a classical client-server GUI setup. The task of building software is to provide what the client application needs, in terms that make sense to the client and with all the abstractions that it needs — in our example, SESSION, STATE, USER and so on.

Since these higher-level abstractions are not directly provided by the supplier side, they need to be implemented or, to use a more appropriate term, faked. After all, everything in computer science is about faking: pretending that we have machines that we really don’t, simply by building them conceptually, in the form of APIs, in terms of machines that we have already built (bottom-up approach) or hope to build (top-down approach). “Building” a machine here means— except for the bottom-most machines, down at the level of the hardware, which very few programmers ever use directly anyway —faking them again, in terms of simpler ones. Fakes all the way down.

The process of software design then consists of developing intermediate levels of abstraction until we reach a compromise: a set of abstractions that satisfy the needs of application programmers and are efficiently implementable (or better yet, already implemented as part of this negotiation process) on the basis of what was available in the first place.

A poorly functioning software process will be more like yoyo design: trying something too abstract, then something too low-level and so on, converging too late if at all. Effective design is like boring a tunnel using modern engineering techniques, which rely on a clear understanding of where the crews start on both sides and make sure they end up meeting in the right place.

 

Photo reference: Herrenknecht AG, www.herrenknecht.com.

VN:F [1.9.10_1130]
Rating: 7.4/10 (10 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Ado About The Resource That Was (Not)

 

After a few weeks of use, Microsoft Outlook tends in my experience to go into a kind of thrashing mode where the user interface no longer quite functions as it should, although to the tool’s credit it does not lose information. Recently I have been getting pop-up warnings such as

 

A required resource was

 

A required resource was what? The message reminded me of an episode in a long-ago game of Scrabble, in which I proposed ADOABOUT as a word. “Ado about what? ”, the other players asked, and were not placated by my answer.

The message must have been trying to say  that a required resource was missing, or not found, but at the time of getting the final detail Outlook must have run out of UI resources and hence could not summon the needed text string. Not surprising, since running out of resources is precisely what caused the message to appear, in a valiant attempt to tell the user what is going on. (Valiant but not that useful: if you are not a programmer on the Outlook development team but just a customer trying to read email, it is not absolutely obvious how the message, even with the missing part, helps you.) The irony in the example is that the title bar suggests the problem arose in connection with trying to display the “Social Connector” area, a recent Outlook feature which I have never used. (Social connector? Wasn’t the deal about getting into computer science in the first place that for the rest of your life you’d be spared the nuisance of social connections? One can no longer trust anything nowadays.)

We can sympathize with whoever wrote the code. The Case Of The Resource That Was (Not) is an example of a general programming problem which we may call Space Between Your Back And Wall  or SBYBAW:  when you have your back against the wall, there is not much maneuvering space left.

A fairly difficult case of the SBYBAW problem arises in garbage collection, for example for object-oriented languages. A typical mark-and-sweep garbage collector must traverse the entire object structure to remove all the objects that have not been marked as reachable from the stack. The natural way to write a graph traversal algorithm is recursive: visit the roots; then recursively traverse their successors, flagging visited objects in some way to avoid cycling. Yes, but the implementation of a recursive routine relies on a stack of unpredictable size (the longest path length). If we got into  garbage collection, most likely it’s that we ran out of memory, precisely the kind of situation in which we cannot afford room for unpredictable stack growth.

In one of the early Eiffel garbage collectors, someone not aware of better techniques had actually written the traversal recursively; had the mistake not been caught early enough, it would no doubt have inflicted unbearable pain on humankind. Fortunately there is a solution: the Deutsch-Schorr-Waite algorithm [1], which avoids recursion on the program side by perverting the data structure to  replace some of the object links by recursion-control links; when the traversal’s execution proceeds along an edge, it reverses that edge to permit eventual return to the source. Strictly speaking, Deutsch-Schorr-Waite still requires a stack of booleans — to distinguish original edges from perverted ones — but we can avoid a separate stack (even just  a stack of booleans, which can be compactly represented in a few integers) by storing these booleans in the mark field of the objects themselves. The resulting traversal algorithm is a beauty — although it is fairly tricky, presents a challenge for verification tools, and raises new difficulties in a multi-threaded environment.

Deutsch-Schorr-Waite is a good example of “Small Memory Software” as studied in a useful book of the same title [2]. The need for Small Memory Software does not just arise for embedded programs running on small devices, but also in mainstream programming whenever we face the SBYBAW issue.

The SBYBAW lesson for the programmer is tough but simple. The resources we have at our disposal on a computing system may be huge, but they are always finite, and our programs’ appetite for resources will eventually exhaust them. At that stage, we have to deal with the SBYBAW rule, which sounds like a tautology but is an encouragement to look for clever algorithms:  techniques for freeing resources when no resources remain must not request new resources.

References

[1] Deutsch-Schorr-Waite is described in Knuth and also in [2]. Someone should start a Wikipedia entry.

[2] James Noble and Charles Weir: Small Memory Software: Patterns for Systems with Limited Memory, Addison-Wesley, 2001.

VN:F [1.9.10_1130]
Rating: 8.8/10 (15 votes cast)
VN:F [1.9.10_1130]
Rating: +8 (from 8 votes)

John McCarthy

John McCarthyJohn McCarthy, who died last week at the age of 84, was one of the true giants of computer science. Most remarkable about his contributions are their diversity, their depth, and how they span both theory and practice.

To talk about him it is necessary first to dispel an unjustly negative connotation. McCarthy was one of the founders of the discipline of artificial intelligence, its most forceful advocate and the inventor of its very name. In the “AI Winter” episode of the late 1970s and 1980s, that name suffered some disrepute as a result of a scathing report by James Lighthill blaming AI researchers for over-promising. In fact the promoters of AI may not have delivered exactly what they announced (who can accurately predict science?); but what they delivered is astounding. Many breakthroughs in computer science, both in theory (advances in lambda calculus and the theory of computation) and in the practice of programming (garbage collection, functional programming languages), can directly be traced to work in AI. Part of the problem is a phenomenon that I heard John McCarthy himself describe:  “As soon as it works, no one calls it AI any more.” Automatic garbage collection was once advanced artificial intelligence; now it is just an algorithm that makes sure your smartphone does not freeze up. In a different field, we have become used to computers routinely beating chess champions, a feat that critics of AI once deemed unthinkable.

The worst over-promises came not from researchers in the field such as McCarthy, who understood the difficulties, but from people like Herbert Simon, more of a philosopher, who in 1965 wrote that “machines will be capable, within twenty years, of doing any work a man can do.” McCarthy’s own best-known over-promise was to take up David Levy on his 1968 bet that no computer would be able to beat him within ten years. But McCarthy was only mistaken in under-estimating the time span: Deep Blue eventually proved him right.

The word that comes most naturally to mind when thinking about McCarthy is “brilliant.” He belonged to that category of scientists who produce the fundamental insights before anyone else, even if they do not always have the patience to finalize the details. The breathtaking paper that introduced Lisp [1] is labeled “Part 1”; there was never a “Part 2.” (Of course we have a celebrated example in computer science, this one from a famously meticulous author, of a seven-volume treaty which never materialized in full.) It was imprudent to announce a second part, but the first was enough to create a whole new school of programming. The Lisp 1.5 manual [2], published in 1962, was another masterpiece; as early as page 13 it introduces — an unbelievable feat, especially considering that the program takes hardly more than half a page — an interpreter for the language being defined, written in that very language! The more recent reader can only experience here the kind of visceral, poignant and inextinguishable jealously that overwhelms us the first time we realize that we will never be able to attend the première of Don Giovanni at the Estates Theater in Prague on 29 October, 1787 (exactly 224 years ago yesterday — did you remember to celebrate?). What may have been the reaction of someone in “Data Processing,” such as it was in 1962, suddenly coming across such a language manual?

These years, 1959-1963, will remain as McCarthy’s Anni Mirabiles. 1961 and 1962 saw the publication of two visionary papers [3, 4] which started the road to modern program verification (and where with the benefit of hindsight it seems that he came remarkably close to denotational semantics). In [4] he wrote

Instead of debugging a program, one should prove that it meets its specifications, and this proof should be checked by a computer program. For this to be possible, formal systems are required in which it is easy to write proofs. There is a good prospect of doing this, because we can require the computer to do much more work in checking each step than a human is willing to do. Therefore, the steps can be bigger than with present formal systems.

Words both precise and prophetic. The conclusion of [3] reads:

It is reasonable to hope that the relationship between computation and mathematical logic will be as fruitful in the next century as that between analysis and physics in the last. The development of this relationship demands a concern for both applications and for mathematical elegance.

“A concern for both applications and mathematical elegance” is an apt characterization of McCarthy’s own work. When he was not busy designing Lisp, inventing the notion of meta-circular interpreter and developing the mathematical basis of programming, he was building the Lisp garbage collector and proposing the concept of time-sharing. He also played, again in the same period, a significant role in another milestone development, Algol 60 — yet another sign of his intellectual openness and versatility, since Algol is (in spite of the presence of recursion, which McCarthy championed) an imperative language at the antipodes of Lisp.

McCarthy was in the 1960s and 70s the head of the Artificial Intelligence Laboratory at Stanford. For some reason the Stanford AI Lab has not become as legendary as Xerox PARC, but it was also the home to early versions of  revolutionary technologies that have now become commonplace. Email, which hardly anyone outside of the community had heard about, was already the normal way of communicating, whether with a coworker next door or with a researcher at MIT; the Internet was taken for granted; everyone was using graphical displays and full-screen user interfaces; outside, robots were playing volley-ball (not very successfully, it must be said); the vending machines took no coins, but you entered your login name and received a bill at the end of the month, a setup which never failed to astonish visitors; papers were printed with sophisticated fonts on a laser printer (I remember a whole group reading the successive pages of Marvin Minsky’s  frames paper [5] directly on the lab’s XGP, Xerox Graphics Printer, as  they were coming out, one by one, straight from MIT). Arthur Samuel was perfecting his checkers program. Those who were not programming in Lisp were hooked to SAIL, “Stanford Artificial Intelligence Language,” an amazing design which among other insights convinced me once and for all that one cannot seriously deal with data structures without the benefit of an automatic serialization mechanism. The building itself, improbably set up amid the pastures of the Santa Cruz foothills, was razed in the eighties and the lab moved to the main campus, but the spirit of these early years lives on.

McCarthy ran the laboratory in an open and almost debonair way; he was a legend and somewhat intimidating, but never arrogant and in fact remarkably approachable. I took the Lisp course from him; in my second or third week at Stanford, I raised my hand and with the unflappable assurance of the fully ignorant slowly asked a long question: “In all the recursive function definitions that you have shown so far, termination was obvious because there is some ‘n’ that decreases for every recursive call, and we treat the case ‘n = 0’ or ‘n = 1’ in a special, non-recursive way. But things won’t always be so simple. Is there some kind of grammatical criterion on Lisp programs that we could use to ascertain whether a recursive definition will always lead to a terminating computation?” There was a collective gasp from the older graduate students in the audience, amazed that a greenhorn would have the audacity to interrupt the course with such an incompetent query. But instead of dismissing me, McCarthy proceeded, with a smile, to explain the basics of undecidability. He had the same attitude in the many seminars that he taught, often on topics straddling computer science and philosophy, in a Socratic style where every opinion was welcome and no one was above criticism.

He also had a facetious side. At the end of a talk by McCarthy at SRI, Tony Hoare, who was visiting for a few days, asked a question; McCarthy immediately rejoined that he had expected that question, summoned to the stage a guitar-carrying researcher from the AI Lab, and proceeded with the answer in the form of a prepared song.

The progress of science and technology is a collective effort; it takes many people to turn new insights into everyday reality. The insights themselves come from a few individuals, a handful in every generation. McCarthy was one of these undisputed pioneers.

 

References

[1] John McCarthy: Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I, in Communications of the ACM, vol. 3, no. 4, 1960, pages 184-195.

[2] John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin, LISP 1.5 Programmer’s Manual, MIT, 1962. Available at Amazon  External Linkand also as a PDF External Link.

[3] John McCarthy: A Basis for a Mathematical Theory of Computation, first version in Proc. Western Joint Computer Conference, 1961, revised version in Computer Programming and Formal Systems, eds. P. Braffort and D. Hirschberg, North Holland, 1963. Available in various places on the Web, e.g. here External Link.

[4] John McCarthy: Towards a Mathematical Science of Computation, in IFIP Congress 1962, pages 21-28, available in various places on the Web, e.g. here External Link.

[5] Marvin Minsky:  A Framework for Representing Knowledge, MIT-AI Laboratory Memo 306, June 1974, available here External Link.

 

(This article was first published on my ACM blog.  I am resuming regular Monday publication.)

VN:F [1.9.10_1130]
Rating: 9.7/10 (41 votes cast)
VN:F [1.9.10_1130]
Rating: +32 (from 32 votes)

The story of our field, in a few short words

 

(With all dues to [1], but going up from four to five as it is good to be brief yet not curt.)

At the start there was Alan. He was the best of all: built the right math model (years ahead of the real thing in any shape, color or form); was able to prove that no one among us can know for sure if his or her loops — or their code as a whole — will ever stop; got to crack the Nazis’ codes; and in so doing kind of saved the world. Once the war was over he got to build his own CPUs, among the very first two or three of any sort. But after the Brits had used him, they hated him, let him down, broke him (for the sole crime that he was too gay for the time or at least for their taste), and soon he died.

There was Ed. Once upon a time he was Dutch, but one day he got on a plane and — voilà! — the next day he was a Texan. Yet he never got the twang. The first topic that had put him on  the map was the graph (how to find a path, as short as can be, from a start to a sink); he also wrote an Algol tool (the first I think to deal with all of Algol 60), and built an OS made of many a layer, which he named THE in honor of his alma mater [2]. He soon got known for his harsh views, spoke of the GOTO and its users in terms akin to libel, and wrote words, not at all kind, about BASIC and PL/I. All this he aired in the form of his famed “EWD”s, notes that he would xerox and send by post along the globe (there was no Web, no Net and no Email back then) to pals and foes alike. He could be kind, but often he stung. In work whose value will last more, he said that all we must care about is to prove our stuff right; or (to be more close to his own words) to build it so that it is sure to be right, and keep it so from start to end, the proof and the code going hand in hand. One of the keys, for him, was to use as a basis for ifs and loops the idea of a “guard”, which does imply that the very same code can in one case print a value A and in some other case print a value B, under the watch of an angel or a demon; but he said this does not have to be a cause for worry.

At about that time there was Wirth, whom some call Nick, and Hoare, whom all call Tony. (“Tony” is short for a list of no less than three long first names, which makes for a good quiz at a party of nerds — can you cite them all from rote?) Nick had a nice coda to Algol, which he named “W”; what came after Algol W was also much noted, but the onset of Unix and hence of C cast some shade over its later life. Tony too did much to help the field grow. Early on, he had shown a good way to sort an array real quick. Later he wrote that for every type of unit there must be an axiom or a rule, which gives it an exact sense and lets you know for sure what will hold after every run of your code. His fame also comes from work (based in part on Ed’s idea of the guard, noted above) on the topic of more than one run at once, a field that is very hot today as the law of Moore nears its end and every maker of chips has moved to  a mode where each wafer holds more than one — and often many — cores.

Dave (from the US, but then at work under the clime of the North) must not be left out of this list. In a paper pair, both from the same year and both much cited ever since,  he told the world that what we say about a piece of code must only be a part, often a very small part, of what we could say if we cared about every trait and every quirk. In other words, we must draw a clear line: on one side, what the rest of the code must know of that one piece; on the other, what it may avoid to know of it, and even not care about. Dave also spent much time to argue that our specs must not rely so much on logic, and more on a form of table.  In a later paper, short and sweet, he told us that it may not be so bad that you do not apply full rigor when you chart your road to code, as long as you can “fake” such rigor (his own word) after the fact.

Of UML, MDA and other such TLAs, the less be said, the more happy we all fare.

A big step came from the cold: not just one Norse but two, Ole-J (Dahl) and Kris, came up with the idea of the class; not just that, but all that makes the basis of what today we call “O-O”. For a long time few would heed their view, but then came Alan (Kay), Adele and their gang at PARC, who tied it all to the mouse and icons and menus and all the other cool stuff that makes up a good GUI. It still took a while, and a lot of hit and miss, but in the end O-O came to rule the world.

As to the math basis, it came in part from MIT — think Barb and John — and the idea, known as the ADT (not all TLAs are bad!), that a data type must be known at a high level, not from the nuts and bolts.

There also is a guy with a long first name (he hates it when they call him Bert) but a short last name. I feel a great urge to tell you all that he did, all that he does and all that he will do, but much of it uses long words that would seem hard to fit here; and he is, in any case, far too shy.

It is not all about code and we must not fail to note Barry (Boehm), Watts, Vic and all those to whom we owe that the human side (dear to Tom and Tim) also came to light. Barry has a great model that lets you find out, while it is not yet too late, how much your tasks will cost; its name fails me right now, but I think it is all in upper case.  At some point the agile guys — Kent (Beck) and so on — came in and said we had got it all wrong: we must work in pairs, set our goals to no more than a week away, stand up for a while at the start of each day (a feat known by the cool name of Scrum), and dump specs in favor of tests. Some of this, to be fair, is very much like what comes out of the less noble part of the male of the cow; but in truth not all of it is bad, and we must not yield to the urge to throw away the baby along with the water of the bath.

I could go on (and on, and on); who knows, I might even come back at some point and add to this. On the other hand I take it that by now you got the idea, and even on this last day of the week I have other work to do, so ciao.

Notes

[1] Al’s Famed Model Of the World, In Words Of Four Signs Or Fewer (not quite the exact title, but very close): find it on line here.

[2] If not quite his alma mater in the exact sense of the term, at least the place where he had a post at the time. (If we can trust this entry, his true alma mater would have been Leyde, but he did not stay long.)

VN:F [1.9.10_1130]
Rating: 10.0/10 (14 votes cast)
VN:F [1.9.10_1130]
Rating: +11 (from 11 votes)

PhD position: concurrent programming (SCOOP) for robotics

The ETH Chair of Software Engineering has won a grant from the Hasler foundation, in a joint project with the Technical University of Lucerne and the Autonomous Systems Lab of ETH, to develop a robotics framework involving concurrent computation. The project, called Roboscoop,  will produce a demonstrator system: a “SmartWalker” robot — a robotic version of  “walkers” used by elderly people and others with reduced mobility. The research proposal is available [2].

One of the major goals of the Roboscoop framework is to provide robotics applications with the full possibilities of concurrent programming. Many robotics developments use no or little concurrency because of the tricky programming involved in using threads, and the difficulty of getting applications right. With SCOOP [1] programmers have a simple, high-level mechanism that removes the risk of data races and other plagues of concurrent programming.

We are looking for someone with a strong background in both software engineering and robotics. If you are interested, please see the position announcement [3].

References

[1] On SCOOP see here and here. See also a YouTube video of a small robot programmed with  SCOOP as part of an earlier student project (by Ganesh Ramanathan).

[2] Roboscoop research proposal, here.

[2] Position announcement: here

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

The charming naïveté of an IEEE standard

The IEEE Standard for Requirements Specifications [1], a short and readable text providing concrete and useful advice, is a valuable guide for anyone writing requirements. In our course projects we always require students to follow its recommended structure.

Re-reading it recently, I noticed the following extract  in the section that argues that a  requirements specification should be verifiable (sentence labels in brackets are my addition):

[A] Nonverifiable requirements include statements such as “works well,” “good human interface,” and “shall usually happen.” [B] These requirements cannot be verified because it is impossible to define the terms “good,” “well,” or “usually.”

[C] The statement that “the program shall never enter an infinite loop” is nonverifiable because the testing of this quality is theoretically impossible.

[D] An example of a verifiable statement is
      [E] “Output of the program shall be produced within 20 s of event 60% of the time; and shall be produced within 30 s of event 100% of the time.”
[F] This statement can be verified because it uses concrete terms and measurable quantities.

[A] and [B] are good advice, deserving to be repeated in every software engineering course and to anyone writing requirements. [C], however, is puzzling.

One might initially understand that the authors are telling us that it is impossible to devise a finite set of tests guaranteeing that a program terminates. But on closer examination this cannot be what they mean. Such a statement, although correct, would be uninteresting since it can be applied to any functional requirement: if I say “the program shall accept an integer as input and print out that same integer on the output”, I also cannot test that (trivial) requirement finitely since I would have to try all integers. The same observation applies to the example given in [D, E, F]: the property [D] they laud as an example of a  “verifiable” requirement is just as impossible to test exhaustively [2].

Since the literal interpretation of [C] is trivial and applies to essentially all possible requirements, whether bad or good in the authors’ eyes, they must mean something else when they cite loop termination as their example of a nonverifiable requirement. The word “theoretically” suggests what they have in mind: the undecidability results of computation theory, specifically the undecidability of the Halting Problem. It is well known that no general mechanism exists to determine whether an arbitrary program, or even just an arbitrary loop, will terminate. This must be what they are referring to.

Except, of course, that they are wrong. And a very good thing too that they are wrong, since “The program shall never enter an infinite loop” is a pretty reasonable requirement for any system [3].

If we were to accept [C], we would also accept that it is OK for any program to enter an infinite loop every once in a while, because the authors of its requirements were not permitted to specify otherwise! Fortunately for users of software systems, this particular sentence of the standard is balderdash.

What the halting property states, of course, is that no general mechanism exists that could examine an arbitrary program or loop and tell us whether it will always terminate. This result in no way excludes the possibility of verifying (although not through “testing”) that a particular program or loop will terminate. If the text of a program shows that it will print “Hello World” and do nothing else, we can safely determine that it will terminate. If a loop is of the form

from i := 1 until i > 10 loop
…..print (i)
…..i := i + 1
end

there is also no doubt about its termination. More complex examples require the techniques of modern program verification, such as exhibiting a loop variant in the sense of Hoare logic, but they can still be practically tractable.

Like many fundamental results of modern science (think of Heisenberg’s uncertainty principle), Turing’s demonstration of the undecidability of the Halting Problem is at the same time simple to state, striking, deep, and easy to misunderstand. It is touchingly refreshing to find such a misunderstanding in an IEEE standard.

Do not let it discourage you from applying the excellent advice of the rest of IEEE 830-1998, ; but when you write a program, do make sure — whether or not the requirements specify this property explicitly — that all its loops terminate.

Reference and notes

[1] IEEE Computer Society: IEEE Recommended Practice for Software Requirements SpeciÞcations, IEEE Standard 830-1998, revised 1998; available here (with subscription).

[2] The property [E] is actually more difficult to test, even non-exhaustively, than the authors acknowledge, if only because it is a probabilistic requirement, which can only be tested after one has defined appropriate probabilistic hypotheses.

[3] In requesting that all programs must terminate we must of course take note of the special case of systems that are non-terminating by design, such as most embedded systems. Such systems, however, are still made out of components representing individual steps that must terminate. The operating system on your smartphone may need to run forever (or until the next reboot), but the processing of an incoming text message is still, like a traditional program, required to terminate in finite time.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

All Bugs Great and Small

(Acknowledgment: this article came out of a discussion with Manuel Oriol, Carlo Furia and Yi Wei. The material is largely theirs but the opinions are mine.)

A paper on automatic testing, submitted some time ago, received the following referee comment:

The case study seems unrealistic and biased toward the proposed technique. 736 unique faults found in 92 classes means at least 8 unique faults per class at the same time. I have never seen in all my life a published library with so many faults …

This would be a good start for a discussion of what is wrong with refereeing in computer science today (on the negativism of our field see [1]); we have a referee who mistakes experience for expertise, prejudice for truth, and refuses to accept carefully documented evidence because “in all his life”, presumably a rich and rewarding life, he has never seen anything of the sort. That is not the focus of the present article, however; arrogant referees eventually retire and good papers eventually get published. The technical problems are what matters. The technical point here is about testing.

Specifically, what bugs are worth finding, and are high bug rates extraordinary?

The paper under review was a step in the work around the automatic testing tool AutoTest (see [2] for a slightly older overall description and [3] for the precise documentation). AutoTest applies a fully automatic strategy, exercising classes and their routines without the need to provide test cases or test oracles. What makes such automation possible is the combination of  random generation of tests and reliance on contracts to determine the success of tests.

For several years we have regularly subjected libraries, in particular the EiffelBase data structure library, to long AutoTest sessions, and we keep finding bugs (the better term is faults). The fault counts are significant; here they caught the referee’s eye. In fact we have had such comments before: I don’t believe your fault counts for production software; your software must be terrible!

Well, maybe.

My guess is that in fact EiffelBase has no more bugs, and possibly far fewer bugs, than other “production” code. The difference is that the  AutoTest framework performs far more exhaustive tests than usually practiced.

This is only a conjecture; unlike the referee I do not claim any special powers that make my guesses self-evident. Until we get test harnesses comparable to AutoTest for environments other than Eiffel and, just as importantly, libraries that are fully equipped with contracts, enabling the detection of bugs that otherwise might not come to light, we will not know whether the explanation is the badness of EiffelBase or the goodness of AutoTest.

What concrete, incontrovertible evidence demonstrates is that systematic random testing does find faults that human testers typically do not. In a 2008 paper [4] with Ilinca Ciupa, Manuel Oriol and Alexander Pretschner, we ran AutoTest on some classes and compared the results with those of human testers (as well as actual bug reports from the field, since this was released software). We found that the two categories are complementary: human testers find faults that are still beyond the reach of automated tools, but they typically never find certain faults that AutoTest, with its stubborn dedication to leaving no stone unturned, routinely uncovers. We keep getting surprised at bugs that AutoTest detects and which no one had sought to test before.

A typical set of cases that human programmers seldom test, but which frequently lead to uncovering bugs, involves boundary values. AutoTest, in its “random-plus” strategy, always exercises special values of every type, such as MAXINT, the maximum representable integer. Programmers don’t. They should — all testing textbooks tell them so — but they just don’t, and perhaps they can’t, as the task is often too tedious for a manual process. It is remarkable how many routines using integers go bezerk when you feed them MAXINT or its negative counterpart. Some of the fault counts that seem so outrageous to our referee directly come from trying such values.

Some would say the cases are so extreme as to be insignificant. Wrong. Many documented software failures and catastrophes are due to untested extreme values. Perhaps the saddest is the case of the Patriot anti-missile system, which at the beginning of the first Gulf war was failing to catch Scud missiles, resulting in one case in the killing of twenty-eight American soldiers in an army barrack. It was traced to a software error [5]. To predict the position of the incoming missile, the computation multiplied time by velocity. The time computation used multiples of the time unit, a tenth of a second, stored in a 24-bit register and hence approximated. After enough time, long enough to elapse on the battlefield, but longer than what the tests had exercised, the accumulated error became so large as to cause a significant — and in the event catastrophic — deviation. The unique poser of automatic testing is that unlike human testers it is not encumbered by a priori notions of a situation being extreme or unlikely. It tries all the possibilities it can.

The following example, less portentous in its consequences but just as instructive, is directly related to AutoTest. For his work on model-based contracts [6] performed as part of his PhD completed in 2008 at ETH, Bernd Schoeller developed classes representing the mathematical notion of set. There were two implementations; it turned out that one of them, say SET1, uses data structures that make the subset operation easy to program efficiently; in the corresponding class, the superset operation, ab, is then simply implemented as ba. In the other implementation, say SET2, it is the other way around: is directly implemented, and ab, is implemented as ba. This all uses a nice object-oriented structure, with a general class SET defining the abstract notion and the two implementations inheriting from it.

Now you may see (if you have developed a hunch for automated testing) where this is heading: AutoTest knows about polymorphism and dynamic binding, and tries all the type combinations that make sense. One of the generated test cases has two variables s1 and s2 of type SET, and tries out s2s1; in one of the combinations that AutoTest tries, s1 is dynamically and polymorphically of type SET1 and s2 of type SET2. The version of that it will use is from SET2, so it actually calls s1s2; but this tests the SET1 version of , which goes back to SET2. The process would go on forever, were it not for a timeout in AutoTest that uncovers the fault. Bernd Schoeller had tried AutoTest on these classes not in the particular expectation of finding bugs, but more as a favor to the then incipient development of AutoTest, to see how well the tool could handle model-based contracts. The uncovering of the fault, testament to the power of relentless, systematic automatic testing, surprised us all.

In this case no contract was violated; the problem was infinite recursion, due to a use of O-O techniques that for all its elegance had failed to notice a pitfall. In most cases, AutoTest finds the faults through violated postconditions or class invariants. This is one more reason to be cautious about sweeping generalizations of the kind “I do not believe these bug rates, no serious software that I have seen shows anything of the sort!”. Contracts express semantic properties of the software, which the designer takes care of stating explicitly. In run-of-the-mill code that does not benefit from such care, lots of things can go wrong but remain undetected during testing, only to cause havoc much later during some actual execution.

When you find such a fault, it is irrelevant that the case is extreme, or special, or rare, or trivial. When a failure happens it no longer matter that the fault was supposed to be rare; and you will only know how harmful it is when you deal with the consequences. Testing, single-mindedly  devoted to the uncovering of faults [7], knows no such distinction: it hunts all bugs large and small.

References

[1] The nastiness problem in computer science, article on the CACM blog, 22 August 2011, available here.

[2] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, also available here.

[3] Online AutoTest documentation, available here at docs.eiffel.com.

[4] Ilinca Ciupa, Bertrand Meyer, Manuel Oriol and Alexander Pretschner: Finding Faults: Manual Testing vs. Random+ Testing vs. User Reports, in ISSRE ’08, Proceedings of the 19th IEEE International Symposium on Software Reliability Engineering, Redmond, November 2008, available here.

[5] US General Accounting Office: GAO Report: Patriot Missile Defense– Software Problem Led to System Failure at Dhahran, Saudi Arabia, February 4, 1992, available here.

[6] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[7] Bertrand Meyer: Seven Principles of Software testing, in IEEE Computer, vol. 41, no. 10, pages 99-101, August 2008available here.

VN:F [1.9.10_1130]
Rating: 9.1/10 (9 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

A safe and stable solution

Reading about the latest hullabaloo around Android’s usage of Java, and more generally following the incessant flow of news about X suing Y in the software industry (with many combinations of X and Y) over Java and other object-oriented technologies, someone with an Eiffel perspective can only smile. Throughout its history, suggestions to use Eiffel have often been met initially — along with “Will Eiffel still be around next year?”, becoming truly riotous after 25 years — with objections of proprietariness, apparently because Eiffel initially came from a startup company. In contrast, many other approaches, from C++ to Smalltalk and Java, somehow managed to get favorable vibes from the media; the respective institutions, from AT&T to Xerox and Sun, must be disinterested benefactors of humanity.

Now many who believed this are experiencing a next-morning surprise, discovering under daylight that the person next to whom they wake up is covered with patents and lawsuits.

For their part, people who adopted Eiffel over the years and went on to develop project after project  do not have to stay awake worrying about legal issues and the effects of corporate takeovers; they can instead devote their time to building the best software possible with adequate methods, notations and tools.

This is a good time to recall the regulatory situation of Eiffel. First, the Eiffel Software implementation (EiffelStudio): the product can be used through either an open-source and a proprietary licenses. With both licenses the software is exactly the same; what differs is the status of the code users generate: with the open-source license, they are requested to make their own programs open-source; to keep their code proprietary, they need the commercial license. This is a fair and symmetric requirement. It is made even more attractive by the absence of any run-time fees or royalties of the kind typically charged by database vendors.

The open-source availability of the entire environment, over 2.5 millions line of (mostly Eiffel) code, has spurred the development of countless community contributions, with many more in progress.

Now for the general picture on the language, separate from any particular implementation. Java’s evolution has always been tightly controlled by Sun and now its successor Oracle. There may actually be technical arguments in favor of the designers retaining a strong say in the evolution of a language, but they no longer seem to apply any more now that most of the Java creators have left the company. Contrast this with Eiffel, which is entirely under the control of an international standards committee at ECMA International, the oldest and arguably the most prestigious international standards body for information technology. The standard is freely available online from the ECMA site [1]. It is also an ISO standard [2].

The standardization process is the usual ECMA setup, enabling any interested party to participate. This is not just a statement of principle but the reality, to which I can personally testify since, in spite of being the language’s original designer and author of the reference book, I lost countless battles in the discussions that led to the current standard and continue in preparation of the next version. While I was not always pleased on the moment, the committee’s collegial approach has led to a much more solid result than any single person could have achieved.

The work of ECMA TC49-TG4 (the Eiffel standard committee) has disproved the conventional view that committees can only design camels. In fact TC49-TG4 has constantly worked to keep the language simple and manageable, not hesitating to remove features deemed obsolete or problematic, while extending the range of the language and increasing the Eiffel programmer’s power of expression. As a result, Eiffel today is an immensely better language than when we started our work in 2002. Without a strong community-based process we would never, for example, have made Eiffel the first widespread language to guarantee void-safety (the compile-time removal of null-pointer-dereferencing errors), a breakthrough for software reliability.

Open, fair, free from lawsuits and commercial fights, supported by an enthusiastic community: for projects that need a modern quality-focused software framework, Eiffel is a safe and stable solution.

References

[1] ECMA International: Standard ECMA-367: Eiffel: Analysis, Design and Programming Language, 2nd edition (June 2006), available here (free download).

[2] International Organization for Standardization: ISO/IEC 25436:2006: Information technology — Eiffel: Analysis, Design and Programming Language, available here (for a fee; same text as [1], different formatting).

VN:F [1.9.10_1130]
Rating: 5.0/10 (33 votes cast)
VN:F [1.9.10_1130]
Rating: -2 (from 28 votes)

Specification explosion

To verify software, we must specify it; otherwise there is nothing to verify against. People often cite the burden of specification as the major obstacle toward making verification practical. At issue are not only the effort required to express the goals of software elements (their contracts) but also intermediate assertions, or “verification conditions”, including loop invariants, required by the machinery of verification.

At a Microsoft Software Verification summer school [1] in Moscow on July 18 — the reason why there was no article on this blog last week — Stefan Tobies, one of the lecturers, made the following observation about the specification effort needed to produce fully verified software. In his experience, he said, the ratio of specification lines to program lines is three to one.

Such a specification explosion, to coin a phrase, has to be addressed by any practical approach to verification. It would be interesting to get estimates from others with verification experience.

Reducing specification explosion  is crucial to the Eiffel effort to provide “Verification As a Matter Of Course” [2]. The following three techniques should go a long way:

  • Loop invariant inference. Programmers can be expected to write contracts expressing the purpose of routines (preconditions, postconditions) and classes (class invariants), but often balk at writing the intermediate assertions necessary to prove the correctness of loops. An earlier article [3] mentioned some ongoing work on this problem and I hope to come back to the topic.
  • Frame conventions. As another recent article has discussed [4], a simple language convention can dramatically reduce the number of assertions by making frame conditions explicit.
  • Model-based contracts. This technique calls for a separate article; the basic idea is to express the effect of operations through high-level mathematical models relying on a library that describe such mathematical abstractions as sets, relations, functions and graphs.

The risk of specification explosion is serious enough to merit a concerted defense.

 

References

[1] Summer School in Software Engineering and Verification, details here.

[2] Verification As a Matter Of Course, slides of a March 2010 talk, see an earlier article on this blog.

[3] Contracts written by people, contracts written by machines, an earlier article on this blog.

[4] If I’m not pure, at least my functions are, an earlier article on this blog.

VN:F [1.9.10_1130]
Rating: 9.3/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

Towards a Calculus of Object Programs

I posted here a draft of a new article, Towards a Calculus of Object Programs.

Here is the abstract:

Verifying properties of object-oriented software requires a method for handling references in a simple and intuitive way, closely related to how O-O programmers reason about their programs. The method presented here, a Calculus of Object Programs, combines four components: compositional logic, a framework for describing program semantics and proving program properties; negative variables to address the specifics of O-O programming, in particular qualified calls; the alias calculus, which determines whether reference expressions can ever have the same value; and the calculus of object structures, a specification technique for the structures that arise during the execution of an object-oriented program.
The article illustrates the Calculus by proving the standard algorithm for reversing a linked list.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Testing insights

Lionel Briand and his group at the Simula Research Laboratory in Oslo have helped raise the standard for empirical research in testing and other software engineering practices by criticizing work that in their opinion relies on wrong assumptions or insufficiently supported evidence. In one of their latest papers [1] they take aim at “Adaptive Random Testing” (ART); one of the papers they criticize is from our group at ETH, on the ARTOO extension [2] to this testing method. Let’s examine the criticism!

We need a bit of background on random testing, ART, and ARTOO:

  • Random testing tries inputs based on a random process rather than attempting a more sophisticated strategy; it was once derided as silly [3], but has emerged in recent years as a useful technique. Our AutoTest tool [4], now integrated in EiffelStudio, has shown it to be particularly effective when applied to code equipped with contracts, which provide built-in test oracles. As a result of this combination, testing can be truly automatic: the two most tedious tasks of traditional testing, test case preparation and test oracle definition, can be performed without human intervention.
  • ART, developed by Chen and others [5], makes random testing not entirely random by ensuring that the inputs are spread reasonably evenly in the input domain.
  • ARTOO, part of Ilinca Ciupa’s PhD thesis on testing defended in 2008,   generalized ART to object-oriented programs, by defining a notion of distance between objects; the ARTOO strategy  avoids choosing objects that are too close to each other. The distance formula, which you can find in[2], combines three elementary distances: between the types of the objects involved,  the values in their primitive fields (integers etc.), and, recursively, the objects to which they have references.

Arcuri and Briand dispute the effectiveness of ART and criticize arguments that various papers have used to show its effectiveness. About the ARTOO paper they write

The authors concluded that ART was better than random testing since it needed to sample less test cases before finding the first failure. However, ART was also reported as taking on average 1.6 times longer due to the distance calculations!

To someone not having read our paper the comment and the exclamation mark would seem to suggest that the paper somehow downplays this property of random testing, but in fact it stresses it repeatedly. The property appears for example in boldface as part of the caption to Table 2: In most cases ARTOO requires significantly less tests to find a fault, but entails a time overhead, and again in boldface in the caption to Table 3: The overhead that the distance calculations introduce in the testing process causes ARTOO to require on average 1.6 times more time than RAND to find the first fault.

There is no reason, then, to criticize the paper on this point. It reports the results clearly and fairly.

If we move the focus from the paper to the method, however, Arcuri and Briand have a point. As they correctly indicate, the number of tests to first fault is not a particularly useful criterion. In fact I argued against it in another paper on testing [6]

The number of tests is not that useful to managers, who need help deciding when to stop testing and ship, or to customers, who need an estimate of fault densities. More relevant is the testing time needed to uncover the faults. Otherwise we risk favoring strategies that uncover a failure quickly but only after a lengthy process of devising the test; what counts is total time. This is why, just as flies get out faster than bees, a seemingly dumb strategy such as random testing might be better overall.

(To understand the mention of flies and bees you need to read [6].) The same article states, as its final principle:

Principle 7: Assessment criteria A testing strategy’s most important property is the number of faults it uncovers as a function of time.

The ARTOO paper, which appeared early in our testing work, used “time to first failure” because it has long been a standard criterion in the testing literature, but it should have applied our own advice and focused on more important properties of testing strategies.

The “principles” paper [6] also warned against a risk awaiting anyone looking for new test strategies:

Testing research is vulnerable to a risky thought process: You hit upon an idea that seemingly promises improvements and follow your intuition. Testing is tricky; not all clever ideas prove helpful when submitted to objective evaluation.

The danger is that the clever ideas may result in so much strategy setup time that any benefit on the rest of the testing process is lost. This danger threatens testing researchers, including those who are aware of it.

The idea of ARTOO and object distance remains attractive, but more work is needed to make it an effective contributor to automated random testing and demonstrate that effectiveness. We can be grateful to Arcuri and Briand for their criticism, and I hope they continue to apply their iconoclastic zeal to empirical software engineering work, ours included.

I have objections of my own to their method. They write that “all the work in the literature is based either on simulations or case studies with unreasonably high failure rates”. This is incorrect for our work, which does not use simulations, relying instead on actual, delivered software, where AutoTest routinely finds faults in an automatic manner.

In contrast, however, Arcuri and Briand rely on fault seeding (also known as fault introduction or fault injection):

To obtain more information on how shapes appear in actual SUT, we carried out a large empirical analysis on 11 programs. For each program, a series of mutants were generated to introduce faults in these programs in a systematic way. Faults generated through mutation [allow] us to generate a large number of faults, in an unbiased and varied manner. We generated 3727 mutants and selected the 780 of them with lower detection probabilities to carry out our empirical analysis of faulty region shapes.

In the absence of objective evidence attesting to the realism of fault seeding, I do not believe any insights into testing obtained from such a methodology. In fact we adopted, from the start of our testing work, the principle that we would never rely on fault seeding. The problem with seeded faults is that there is no guarantee they reflect the true faults that programmers make, especially the significant ones. Techniques for fault seeding are understandably good at introducing typographical mistakes, such as a misspelling or the replacement of a “+” by a “-”; but these are not interesting kinds of fault, as they are easily caught by the compiler, by inspection, by low-tech static tools, or by simple tests. Interesting faults are those resulting from a logical error in the programmer’s mind, and in my experience (I do not know of good empirical studies on this topic) seeding techniques do not generate them.

For these reasons, all our testing research has worked on real software, and all the faults that AutoTest has found were real faults, resulting from a programmer’s mistake.

We can only apply this principle because we work with software equipped with contracts, where faults will be detected through the automatic oracle of a violated assertion clause. It is essential, however, to the credibility and practicality of any testing strategy; until I see evidence to the contrary, I will continue to disbelieve any testing insights resulting from studies based on artificial fault injection.

References

[1] Andrea Arcuri and Lionel Briand: Adaptive Random Testing: An Illusion of Effectiveness, in ISSTA 2011 (International Symposium on Software Testing and Analysis), available here.

[2] Ilinca Ciupa, Andreas Leitner, Manuel Oriol and Bertrand Meyer: ARTOO: Adaptive Random Testing for Object-Oriented Software, in ICSE 2008: Proceedings of 30th International Conference on Software Engineering, Leipzig, 10-18 May 2008, IEEE Computer Society Press, 2008, also available here.

[3] Glenford J. Myers. The Art of Software Testing. Wiley, New York, 1979. Citation:

Probably the poorest methodology of all is random-input testing: the process of testing a program by selecting, at random, some subset of all possible input values. In terms of the probability of detecting the most errors, a randomly selected collection of test cases has little chance of being an optimal, or close to optimal, subset. What we look for is a set of thought processes that allow one to select a set of test data more intelligently. Exhaustive black-box and white-box testing are, in general, impossible, but a reasonable testing strategy might use elements of both. One can develop a reasonably rigorous test by using certain black-box-oriented test-case-design methodologies and then supplementing these test cases by examining the logic of the program (i.e., using white-box methods).

[4] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, available here. For practical uses of AutoTest within EiffelStudio see here.

[5] T. Y. Chen, H Leung and I K Mak: Adaptive Random Testing, in  Advances in Computer Science, ASIAN 2004, Higher-Level Decision Making,  ed. M.J. Maher,  Lecture Notes in Computer Science 3321, Springer-Verlag, pages 320-329, 2004, available here.

[6] Bertrand Meyer: Seven Principles of Software testing, in IEEE Computer, vol. 41, no. 10, pages 99-101, August 2008, also available here.

VN:F [1.9.10_1130]
Rating: 8.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 7 votes)

If I’m not pure, at least my functions are

..

If I’m not pure, at least my jewels are [1].

..

We often need to be reassured that a routine, usually a function, is “pure”, meaning that it does not change the state of the computation. For example, a function used in a contract element (precondition, postcondition, class invariant, loop invariant) should be purely descriptive, since it is a specification element; evaluating it, typically for testing and debugging, should not create a change of behavior — a “Heisenberg effect” — in the very computation that it is intended to assess. Another application is in a concurrency context, particularly in SCOOP (see earlier posts and forthcoming ones): if one or more functions are pure, several of their executions can run  concurrently on the same object.

The notion of purity admits variants. The usual notion is what  [2] calls weak purity, which precludes changes to previously existing objects but allow creating new objects. In the EiffelBase library we also encounter routines that have another form of purity, which we may call “relative” purity: they can leave the same state on exit as they found on entry, but in-between they might change the state.  For the rest of this discussion we will rely on the standard notion of weak purity: no changes permitted on existing objects.

It is often suggested that the programming language should support specifying that a routine is pure; many people have indeed proposed the addition of a keyword such as pure to Eiffel. One of the reasons this is not — in my opinion — such a great idea is that purity is just a special case of the more general problem of framing: specifying and verifying what a routine does not change. If we can specify an arbitrary frame property, then we can, as a special case covered by the general mechanism, specify that a routine changes nothing.

To see why framing is so important, consider a class ACCOUNT with a routine deposit that has the postcondition

balance = old balance + sum………..— Where sum is the argument of deposit

Framing here means stating that nothing else than balance changes; for example the account’s owner and its number should remain the same. It is not practical to write all individual postcondition clauses such as

owner= old owner
number=
old number

and so on. But we do need to specify these properties and enforce them, if only to avoid that a descendant class (maybe MAFIA_ACCOUNT) distort the rules defined by the original.

One technique is to add a so-called “modifies clause”, introduced by verification tools such as ESC-Java [3] and JML [4]. Modifies clauses raise some theoretical issues; in particular, the list of modified expressions is often infinite, so we must restrict ourselves to an abstract-data-type view where we characterize a class by commands and queries and the modifies clause only involves queries of the class. Many people find this hard to accept at first, since anything that is not talked about can change, but it is the right approach. A modifies clause of sorts, included in the postcondition, appeared in an earlier iteration of the Eiffel specification, with the keyword only (which is indeed preferable to modifies, which in the Eiffel style favoring grammatically simple keywords would be modify, since what we want to express is not that the routine must change anything at all  but that it may only change certain specified results!). The convention worked well with inheritance, since it included the rule that a clause such as only balance, in class  ACCOUNT, prescribes that the routine may not, in its modifies version as well as in any version redefined in descendants, change any other query known at the level of ACCOUNT; but a descendant version may change, subject to its own ACCOUNT clauses, any new query introduced by a descendant.

To declare a routine as pure, it would suffice to use an empty only clause (not very elegant syntactically — “only” what? — but one can get used to it).

This construct has been discarded, as it places too heavy a burden on the programmer-specifier. Here the key observation resulted from a non-scientific but pretty extensive survey I made of all the JML code I could get my hands on. I found that every time a query appeared in a modifies clause it was also listed in the postcondition! On reflection, this seems reasonable: if you are serious about specification, as anyone bothering to write such a clause surely is, you will not just express that something changes and stop there; you will write something about how it may change. Not necessarily the exact result, as in

my_query = precise_final_value

but at least some property of that result, as in

some_property (my_query)

This observation has, however, an inescapable consequence for language design: modifies or only clauses should be inferred by the compiler from the postcondition, not imposed on the programmer as an extra burden. The convention, which we may call the Implicit Framing Rule, is simple:

A routine may change the value of a query only if the query is specified in the routine’s postcondition

(or, if you like double negation, “no routine may change the value of a query other than those specified in its postcondition”). Here we say that a feature is “specified” in a postcondition if it appears there outside of the scope of an old expression. (Clearly, an occurrence such as old balance does not imply that balance can be modified, hence this restriction to occurrences outside of an old.)

With this convention the only clause becomes implicit: it would simply list all the queries specified in the postcondition, so there is no need for the programmer to write it. For the rare case of wanting to specify that a query q may change, but not wanting to specify how, it is easy to provide a library function, say involved, that always return true and can be used in postconditions, as in involved (q).

The convention is of course not just a matter of programming methodology but, in an IDE supporting verification, such as the EVE “Verification As a Matter Of Course” environment which we are building for Eiffel [5], the compiler will enforce the definition above — no change permitted to anything not specified in the postcondition — and produce an error in case of a violation. This also means that we can easily specify that a routine is pure: it must simply not specify any query in its postcondition. It may still list it in an old clause, as happens often in practice, e.g.

Result = old balance – Minimum_balance………..— In the postcondition of a function available_funds

Note the need to use old here. Apart from this addition of old to some postconditions, a considerable advantage of the convention is that most existing code using pure functions will be suitable to the new purity enforcement without any need to provide new annotations.

I believe that this is the only sustainable convention. It does not, of course, solve the frame problem by itself (for attempts in this direction see [6, 7]), but it is a necessary condition for a solution that is simple, easily taught, fairly easily implemented, and effective. It goes well with model-based specifications [8, 9], which I believe are the technique of most promise for usable  specifications of object-oriented software. And it provides a straightforward, no-frills way to enforce purity where prescribed by the Command-Query Separation principle [10, 11]: if I’m not pure, at least my functions must be.

References

[1] From the lyrics of the aria Glitter and Be Gay in Leonard Bernstein’s Candide, text by Lillian Hellman and others. Youtube offers several performances, including  by Diana Damrau (here) and Natalie Dessay (here) . For the text see e.g. here.

[2] Adam Darvas and Peter Müller: Reasoning About Method Calls in Interface Specifications, in Journal of Object Technology, Volume 5, no. 5, jUNE 2006, pages 59-85, doi:10.5381/jot.2006.5.5.a3, available here.

[3] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe and R. Stata: Extended static checking for Java, in PLDI 2002 (Programming Language Design and Implementation), pages 234–245, 2002.

[4] Gary Leavens et al.: Java Modeling Language, see references here.

[5] Julian Tschannen, Carlo A. Furia, Martin Nordio, and Bertrand Meyer: Verifying Eiffel Programs with Boogie, to appear in Boogie 2011, First International Workshop on Intermediate Verification Languages, Wroclaw, August 2011. See documentation about the EVE project on the project page.

[6] Ioannis Kassios: Dynamic Frames: Support for Framing, Dependencies and Sharing Without Restrictions, in Formal Methods 2006, eds. J. Misra, T. Nipkow and E. Sekerinski, Lecture Notes in Computer Science 4085, Springer Verlag, 2006, pages 268-283.

[7] Bernd Schoeller: Making Classes Provable through Contracts, Models and Frames, PhD thesis, ETH Zurich, 2007, available here

[8] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[9] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ’10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[10] Bertrand Meyer: Object-Oriented Software Construction, first (1988) and second (1997) editions, Prentice Hall.

[11] Bertrand Meyer: Touch of Class: An Introduction to Programming Well, Using Objects and Contracts, Springer Verlag, 2009.

VN:F [1.9.10_1130]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Agile methods: the good, the bad and the ugly

It was a bit imprudent last Monday to announce the continuation of the SCOOP discussion for this week; with the TOOLS conference happening now, with many satellite events such as the Eiffel Design Feast of the past week-end and today’s “New Eiffel Technology Community” workshop, there is not enough time for a full article. Next week might also be problematic. The SCOOP series will resume, but in the meantime I will report on other matters.

As something that can be conveniently typed in while sitting in the back of the TOOLS room during fascinating presentations, here is a bit of publicity for the next round of one-day seminars for industry — “Compact Course” is the official terminology — that I will be teaching at ETH in Zurich next November (one in October), some of them with colleagues. It’s the most extensive session that we have ever done; you can see the full programs and registration information here.

  • Software Engineering for Outsourced and Distributed Development, 27 October 2011
    Taught with Peter Kolb and Martin Nordio
  • Requirements Engineering, 17 November
  • Software Testing and Verification: state of the art, 18 November
    With Carlo Furia and Sebastian Nanz
  • Agile Methods: the Good, the Bad and the Ugly, 23 November
  • Concepts and Constructs of Concurrent Computation, 24 November
    With Sebastian Nanz
  • Design by Contract, 25 November

The agile methods course is new; its summary reads almost like a little blog article, so here it is.

Agile methods: the Good, the Bad and the Ugly

Agile methods are wonderful. They’ll give you software in no time at all, turn your customers and users into friends, catch bugs before they catch you, change the world, and boost your love life. Do you believe these claims (even excluding the last two)? It’s really difficult to form an informed opinion, since most of the presentations of eXtreme Programming and other agile practices are intended to promote them (and the consultants to whom they provide a living), not to deliver an objective assessment.

If you are looking for a guru-style initiation to the agile religion, this is not the course for you. What it does is to describe in detail the corpus of techniques covered by the “agile” umbrella (so that you can apply them effectively to your developments), and assess their contribution to software engineering. It is neither “for” nor “against” agile methods but fundamentally descriptive, pedagogical, objective and practical. The truth is that agile methods include some demonstrably good ideas along with some whose benefits are at best dubious. In addition (and this should not be a surprise) they cannot make the fundamental laws of software engineering go away.

Agile methods have now been around for more than a decade, during which many research teams, applying proven methods of experimental science, have performed credible empirical studies of how well the methods really work and how they compare to more traditional software engineering practices. This important body of research results, although not widely known, is critical to managers and developers in industry for deciding whether and how to use agile development. The course surveys these results, emphasizing the ones most directly relevant to practitioners.

A short discussion session will enable participants with experience in agile methods to share their results.

Taking this course will give you a strong understanding of agile development, and a clear view of when, where and how to apply them.

Schedule

Morning session: A presentation of agile methods

  • eXtreme Programming, pair programming, Scrum, Test-Driven Development, continuous integration, refactoring, stakeholder involvement, feature-driven development etc.
  • The agile lifecycle.
  • Variants: lean programming etc.

Afternoon session (I): Assessment of agile methods

  • The empirical software engineering literature: review of available studies. Assessment of their value. Principles of empirical software engineering.
  • Agile methods under the scrutiny of empirical research: what helps, what harms, and what has no effect? How do agile methods fare against traditional techniques?
  • Examples: pair programming versus code reviews; tests versus specifications; iterative development versus “Big Upfront Everything”.

Afternoon session (II): Discussion and conclusion

This final part of the course will present, after a discussion session involving participants with experience in agile methods, a summary of the contribution of agile methods to software engineering.

It will conclude with advice for organizations involved in software development and interested in applying agile methods in their own environment.

Target groups

CIOs; software project leaders; software developers; software testers and QA engineers.

VN:F [1.9.10_1130]
Rating: 8.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Concurrent programming is easy

EiffelStudio 6.8, released last month, contains the first official implementation of the SCOOP programming model for concurrent programming. This is an important milestone; let me try to explain why.

Concurrency challenging us

Concurrency is the principal stumbling block in the progress of programming. Do not take just my word for it:

  • Intel: “Multi-core processing is taking the industry on a fast-moving and exciting ride into profoundly new territory. The defining paradigm in computing performance has shifted inexorably from raw clock speed to parallel operations and energy efficiency” [1].
  • Rick Rashid (head of Microsoft Research):  “Multicore processors represent one of the largest technology transitions in the computing industry today, with deep implications for how we develop software.” [2].
  • Bill Gates: “Multicore: This is the one which will have the biggest impact on us. We have never had a problem to solve like this. A breakthrough is needed in how applications are done on multicore devices.” [3]
  • David Patterson: “Industry has basically thrown a Hail Mary. The whole industry is betting on parallel computing. They’ve thrown it, but the big problem is catching it.” [4]
  • Gordon Bell: “I’m skeptical until I see something that gives me some hope…  the machines are here and we haven’t got it right.” [4].

What has happened? Concurrency  used to be a highly specialized domain of interest to a small minority of programmers building operating systems and networking systems and database engines. Just about everyone else could live comfortably pretending that the world was sequential. And then suddenly we all need to be aware of concurrency. The principal reason is the end of Moore’s law as we know it [5].

The end of Moore's law as we know it

This chart show that we can no longer rely on the automatic and regular improvement to our programs’ performance, roughly by a factor of two every two years, thanks to faster chips. The free lunch is over; continued performance increases require taking advantage of concurrency, in particular through multithreading.

Performance is not the only reason for getting into concurrency. Another one is user convenience: ever since the first browser showed that one could write an email and load a Web page in the same window, users have been clamoring for multithreaded applications. Yet another source of concurrency requirements is the need to produce Internet and Web applications.

How do programmers write these applications? The almost universal answer relies on threading mechanisms, typically offered through some combination of language and library mechanisms: Java Threads, .NET threading, POSIX threads, EiffelThreads. The underlying techniques are semaphores and mutexes: nineteen-sixties vintage concepts, rife with risks of data races (access conflicts to a variable or resource, leading to crashes or incorrect computations) and deadlocks (where the system hangs). These risks are worse than the classical bugs of sequential programs because they are very difficult to detect through testing.

Ways to tame the beast

Because the need is so critical, the race is on — a “frantic” race in the words of a memorable New York Times article by John Markoff [4] — to devise a modern programming framework that will bring concurrent programming under control. SCOOP is a contender in this battle. In this post and the next I will try to explain why we think it is exactly what the world needs to tame concurrency.

The usual view, from which SCOOP departs, is that concurrent programming is intrinsically hard and requires a fundamental change in the way programmers think. Indeed some of the other approaches that have attracted attention imply radical departures from accepted programming paradigm:

  • Concurrency calculi such as CSP [6, 7], CCS [8] and the π-Calculus [9] define  high-level mathematical frameworks addressing concurrency, but they are very far from the practical concerns of programmers. An even more serious problem is that they focus on only some aspects of programming, but being concurrent is only one property of a program, among many others (needing a database, relying on graphical user interface, using certain data structures, perform certain computations…). We need mechanisms that integrate concurrency with all the other mechanisms that a program uses.
  • Functional programming languages have also offered interesting idioms for concurrency, taking advantage of the non-imperative nature of functional programming. Advocacy papers have argued for Haskell [10 and Erlang [11] in this role. But should the world renounce other advances of modern software engineering, in particular object-oriented programming, for the sake of these mechanisms? Few people are prepared to take that step, and (as I have discussed in a detailed article [12]) the advantages of functional programming are counter-balanced by the superiority of the object-oriented model in its support for the modular construction of realistic systems.

What if we did not have to throw away everything and relearn programming from the ground up for concurrency? What if we could retain the benefits of five decades of software progress, as crystallized in modern object-oriented programming? This is the conjecture behind SCOOP: that we can benefit from all the techniques we have learned to make our software reliable, extendible and reusable, and add concurrency to the picture in an incremental way.

From sequential to concurrent

A detailed presentation of SCOOP will be for next Monday, but let me give you a hint and I hope whet your appetite by describing how to move a typical example from sequential to concurrent. Here is a routine for transferring money between two accounts:

transfer (amount: INTEGER ; source, target: ACCOUNT)
               -- Transfer amount dollars from source to target.
        require
               enough: source·balance >= amount
        do
         source·withdraw (amount)
         target·deposit (amount)
        ensure
               removed: source·balance = old source·balance – amount
               added: target·balance = old target·balance + amount
        end

The caller must satisfy the precondition, requiring the source account to have enough money to withdraw the requested amount; the postcondition states that the source account will then be debited, and the target account credited, by that amount.

Now assume that we naïvely apply this routine in a concurrent context, with concurrent calls

        if acc1·balance >= 100 then transfer (acc1, acc2, 100) end

and

        if acc1·balance >= 100 then transfer (acc1, acc3, 100) end

If the original balance on acc1 is 100, it would be perfectly possible in the absence of a proper concurrency mechanism that both calls, as they reach the test acc1·balance >= 100, find the property to be true and proceed to do the transfer — but incorrectly since they cannot both happen without bringing the balance of acc1 below zero, a situation that the precondition of transfer and the tests were precisely designed to rule out. This is the classic data race. To avoid it in the traditional approaches, you need complicated and error-prone applications of semaphores or conditional critical regions (the latter with their “wait-and-signal” mechanism, just as clumsy and low-level as the operations on semaphores).

In SCOOP, such data races, and data races of any other kind, cannot occur. If the various objects involved are to run in separate threads of control, the declaration of the routine will be of the form

transfer (amount: INTEGER ; source, target: separate ACCOUNT)
               -- The rest of the routine exactly as before.

where separate is the only specific language keyword of SCOOP. This addition of the separate marker does the trick. will result in the following behavior:

  • Every call to transfer is guaranteed exclusive access to both separate arguments (the two accounts).
  • This simultaneous reservation of multiple objects (a particularly tricky task when programmers must take care of it through their own programs, as they must in traditional approaches) is automatically guaranteed by the SCOOP scheduler. The calls wait as needed.
  • As a consequence, the conditional instructions (if then) are no longer needed. Just call transfer and rely on SCOOP to do the synchronization and guarantee correctness.
  • As part of this correctness guarantee, the calls may have to wait until the preconditions hold, in other words until there is enough money on the account.

This is the desired behavior in the transition from sequential to concurrent. It is achieved here not by peppering the code with low-level concurrent operations, not by moving to a completely different programming scheme, but by simply declaring which objects are “separate” (potentially running elsewhere.

The idea of SCOOP is indeed that we reuse all that we have come to enjoy in modern object-oriented programming, and simply declare what needs to be parallel, expecting things to work (“principle of least surprise”).

This is not how most of the world sees concurrency. It’s supposed to be hard. Indeed it is; very hard, in fact. But the view of the people who built SCOOP is that as much of the difficulty should be for the implementers. Hence the title of this article: for programmers, concurrency should be easy. And we think SCOOP demonstrates that it can be.

SCOOP in practice

A few words of caution: we are not saying that SCOOP as provided in EiffelStudio 6.8 is the last word. (Otherwise it would be called 7.0.) In fact, precisely because implementation is very hard, a number of details are still not properly handled; for example, as discussed in recent exchanges on the EiffelStudio user group [13], just printing out the contents of a separate string is non-trivial. We are working to provide all the machinery that will make everything work well, the ambitious goals and the practical details. But the basics of the mechanism are there, with a solid implementation designed to scale properly for large applications and in distributed settings.

In next week’s article I will describe in a bit more detail what makes up the SCOOP mechanisms. To get a preview, you are welcome to look at the documentation [14, 15]; I hope it will convince you that despite what everyone else says concurrent programming can be easy.

References

[1] Official Intel statement, see e.g. here.

[2] Rich Rashid, Microsoft Faculty Summit, 2008.

[3] This statement was cited at the Microsoft Faculty Summit in 2008 and is part of the official transcript; hence it can be assumed to be authentic, although I do not know the original source.

[4] Patterson and Bell citations from John Markoff, Faster Chips Are Leaving Programmers in Their Dust, New York Times, 17 December 2007, available here.

[5] The chart is from the course material of Tryggve Fossum at the LASER summer school in 2008.

[6] C.A.R. Hoare: em>Communicating Sequential Processes, Prentice Hall, 1985, also available online.

[7] Bill Roscoe: The Theory and Practice of Concurrency, revised edition, Prentice Hall, 2005, also available online.

[8] Robin Milner: Communication and Concurrency, Prentice Hall, 1989.

[9] Robin Milner: Communicating and Mobile Systems: The π-calculus, Cambridge University Press, 1999.

[10] Simon Peyton-Jones: Beautiful Concurrency, in Beautiful Code, ed. Greg Wilson, O’Reilly, 2007, also available online.

[11] Joe Armstrong: Erlang, in Communications of the ACM, vol. 53, no. 9, September 2010, pages 68-75.

[12] Bertrand Meyer: Software Architecture: Functional vs. Object-Oriented Design, in Beautiful Architecture, eds. Diomidis Spinellis and Georgios Gousios, O’Reilly, 2009, pages 315-348, available online.

[13] EiffelStudio user group; see here for a link to current discussions and to join the group.

[14] SCOOP project documentation at ETH, available here.

VN:F [1.9.10_1130]
Rating: 8.9/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

Assessing concurrency models

By describing a  poorly conceived hypothetical experiment, last week’s article described the “Professor Smith syndrome” consisting of four risks that threaten the validity of empirical software engineering experiments relying on students in a course:

  • Professor Smith Risk 1: possible bias if the evaluator has a stake in the ideas or tools under assessment.
  • Professor Smith Risk 2: creating different levels of motivation in the different groups (Hawthorne effect).
  • Professor Smith Risk 3: extrapolating from students to professionals.
  • Professor Smith Risk 4: violation of educational ethics if the experiment may cause some students to learn better than others.

If you have developed a great new method or tool and would like to assess it, the best way to address Risk 1 is to find someone else to do the assessment. What if  this solution is not practical? Recently we wanted to get some empirical evidence on the merits of the SCOOP (Simple Concurrent Object-Oriented Programming) approach to concurrency [1, 2], on which I have worked for a long time and which is now part of EiffelStudio since the release of 6.8 a couple of weeks ago. We wanted to see if, despite the Professor Smith risks, we could do a credible study ourselves.

The ETH Software Architecture course[3], into which we introduced some introductory material on concurrency last year (as part of a general effort to push more concurrency into software courses at ETH), looked like a good place to try an evaluation; it is a second-year course, where students, or so we thought, would have little prior experience in concurrent software design.

The study’s authors — Sebastian Nanz, Faraz Torshizi and Michela Pedroni — paid special attention to the methodological issues. To judge for yourself whether we addressed them properly, you can read the current version of our paper to be presented at ESEM 2011 [4]. Do note that it is a draft and that we will improve the paper for final publication.

Here is some of what we did. I will not address the Professor Smith Risk 3, the use of students, which (as Lionel Briand has pointed out in a comment on the previous article) published work has studied; in a later article I will give  references to some of that work. But we were determined to tackle the other risks explicitly, so as to obtain credible results.

The basic experiment was a session in which the students were exposed to two different design methods for concurrent software: multithreaded programming in Java, which I’ll call “Java Threads”, and SCOOP. We wanted to explore whether it is easier to program in SCOOP than in Java. This is too general a hypothesis, so it was refined into three concrete hypotheses: is it easier to understand a SCOOP program? Is it easier to find errors in SCOOP programs? Do programmers using SCOOP make fewer errors?

A first step towards reducing the effect — Professor Smith Risk 1 — of any emotional attachment of the experimenters  to one of the approaches, SCOOP in our case, was to generalize the study. Although what directly interested us was to compare SCOOP against Java Threads, we designed the study as a general scheme to compare concurrency approaches; SCOOP and Java Threads are just an illustration, but anyone else interested in assessing concurrency techniques — say Erlang versus C# concurrency — can apply the same methodology. This decision had two benefits: it freed the study from dependency on the particular techniques, hence, we hope, reducing bias; and as side attraction of the kind that is hard for researchers to resist, it increased the publishability of the results.

Circumstances unexpectedly afforded us another protection against any for-SCOOP bias: unbeknownst to us at the time of the study’s design, a first-year course had newly added (in 2009, whereas our study was performed in 2010) an introduction to concurrent programming — using Java Threads! While we had thought that concurrency in any form would be new to most students, in fact almost all of them had now seen Java Threads before. (The new material in the first-year course was taken by ETH students only, but many transfer students had also already had an exposure to Java Threads.) On the other hand, students had not had any prior introduction to SCOOP. So any advantage that one of the approaches may have had because of students’ prior experience would work against our hypotheses. This unexpected development would not help if the study’s results heavily favored Java Threads, but if they favored SCOOP it would reinforce their credibility.

A particular pedagogical decision was made regarding the teaching of our concurrency material: it started with a self-study rather than a traditional lecture. One of the reasons for this decision was purely pedagogical: we felt (and the course evaluations confirmed) that at that stage of the semester the students would enjoy a break in the rhythm of the course. But another reason was to avoid any bias that might have arisen from any difference in the lecturers’ levels of enthusiasm and effectiveness in teaching the two approaches. In the first course session devoted to concurrency, students were handed study materials presenting Java Threads and SCOOP and containing a test to be taken; the study’s results are entirely based on their answers to these tests. The second session was a traditional lecture presenting both approaches again and comparing them. The purpose of this lecture was to make sure the students got the full picture with the benefit of a teacher’s verbal explanations.

The study material was written carefully and with a tone as descriptive and neutral as possible. To make comparisons meaningful, it does not follow a structure specific to Java Threads or  SCOOP  (as we might have used had we taught only one of these approaches); instead it relies in both cases on the same overall plan  (figure 2 of the paper), based on a neutral analysis of concurrency concepts and issues: threads, mutual exclusion, deadlock etc. Each section then presents, for one such general concurrency question, the solution proposed by Java Threads or SCOOP.

This self-study material, as well as everything else about the study, is freely available on the Web; see the paper for the links.

In the self-study, all students studied both the Java Threads and SCOOP materials. They were randomly assigned to two groups, for which the only difference was the order of studying the approaches. We feel that this decision addresses the ethical issue (Professor Smith Risk 4): any pedagogical effect of reading about A before B rather than the reverse, in the course of a few hours, has to be minimal if you end up reading about the two of them, and on the next day follow a lecture that also covers both.

Having all students study both approaches — a crossover approach in the terminology of [5] — should also address the Hawthorne effect (Professor Smith Risk 2): students have no particular incentive to feel that one of the approaches is more hip than the other. While they are not told that SCOOP is partly the work of the instructors, some of them may know or guess this information; the consequences, positive or negative, are limited, since they are asked in both cases to do as well as they can in answering the assessment questions.

The design of that evaluation is another crucial element in trying to avoid bias. We tried, to the extent possible, to base the assessment on objective criteria. For the first hypothesis (program understanding) the technique was to ask the students to predict the output of some simple concurrent programs. To address the risk of a binary correct/incorrect assessment, and get a more fine-grained view, we devised the programs so that they would produce output strings and measured the Levenshtein (edit) distance to the correct result. For the second hypothesis (ease of program debugging), we gave students programs exhibiting typical errors in both approaches and asked them to tell us both the line number of any error they found and an explanation. Assessing the explanation required human analysis; the idea of also assigning partial credit for pointing out a line number without providing a good explanation is that it may be meaningful that a student found that something is amiss even without being quite able to define what it is. The procedure for the third hypothesis (producing programs with fewer errors) was more complex and required two passes over the result; it requires some human analysis, as you will see in the article, but we hope that the two-pass process removes any bias.

This description of the study is only partial and you should read the article [4] for the full details of the procedure.

So what did we find in the end? Does SCOOP really makes concurrency easier to learn, concurrent programs easier to debug, and concurrent programmers less error-prone? Here too  I will refer you to the article. Let me simply mention that the results held some surprises.

In obtaining these results we tried very hard to address the Professor Smith syndrome and its four risks. Since all of our materials, procedures and data are publicly accessible, described in some detail in the paper, you can determine for yourself how well we met this objective, and whether it is possible to perform credible assessments even of one’s own work.

References

Further reading: for general guidelines on how to conduct empirical research see [5]; for ethical guidelines, applied to psychological research but generalizable, see [6].

[1] SCOOP Eiffel documentation, available here.

[2] SCOOP project documentation at ETH, available here.

[3] Software Architecture course at ETH, course page (2011).

[4] Sebastian Nanz, Faraz Torshizi, Michela Pedroni and Bertrand Meyer: Design of an Empirical Study for Comparing the Usability of Concurrent Programming Languages, to appear in ESEM 2011 (ACM/IEEE International Symposium on Empirical Software Engineering and Measurement), 22-23 September 2011. Draft available here.

[5] Barbara A. Kitchenham, Shari L. Pfleeger, Lesley M. Pickard, Peter W. Jones, David C. Hoaglin, Khaled El-Emam and Jarrett Rosenberg: Preliminary Guidelines for Empirical Research in Software Engineering, national Research Council Canada (NRC-CNRC), Report ERB-1082, 2001, available here.

[6] Robert Rosenthal: Science and ethics in conducting, analyzing, and reporting psychological research, in  Psychological Science, 5, 1994, p127-134. I found a copy cached by a search engine here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

The Professor Smith syndrome: Part 2

As stated in the Quiz of a few days ago (“Part 1 ”), we consider the following hypothetical report in experimental software engineering ([1], [2]):

Professor Smith has developed a new programming technique, “Suspect-Oriented Programming” (SOP). To evaluate SOP, he directs half of the students in his “Software Methodology” class to do the project using traditional techniques, and the others to use SOP.

He finds that projects by the students using SOP have, on the average, 15% fewer bugs than the others, and reports that SOP increases software reliability.

What’s wrong with this story?

Professor Smith’s attempt at empirical software engineering is problematic for at least four reasons. Others could arise, but we do not need to consider them if Professor Smith has applied the expected precautions: the number of students should be large enough (standard statistical theory will tell us how much to trust the result for various sample sizes); the students should be assigned to one of the two groups on a truly random basis; the problem should be amenable to both SOP and non-SOP techniques; and the assessment of the number of bugs should in the results should be based on fair and if possible automated evaluation. Some respondents to the quiz cited these problems, but they would apply to any empirical study and we can assume they are being taken care of.

The first problem to consider is that the evaluator and the author of the concept under evaluation are the same person. This is an approach fraught with danger. We have no reason to doubt Professor Smith’s integrity, but he is human. Deep down, he wants SOP to be better than the alternative. That is bound to affect the study. It would be much more credible if someone else, with no personal stake in SOP, had performed it.

The second problem mirrors the first on the students’ side. The students from group 1 were told that they used Professor Smith’s great idea, those from group 2 that they had to use old, conventional, boring stuff. Did both groups apply the same zeal to their work? After all, the students know that Professor Smith created SOP, and maybe he is an convincing advocate, so group 1 students will (consciously or not) do their best; those from group 2 have less incentive to go the extra mile. What we may have at play here is a phenomenon known as the Hawthorne effect [3]: if you know you are being tested for a new technique, you typically work harder and better — and may produce better results even if the technique is worthless! Experiments dedicated to studying this effect show that even  a group that is in reality using the same technique as another does better, at least at the beginning, if it is told that it is using a new, sexy technique.

The first and second problems arise in all empirical studies, software-related or not. They are the reason why medical experiments use placebos and double-blind techniques (where neither the subjects nor the experimenters themselves know who is using which variant). These techniques often do not directly transpose to software experiments, but we should all the same be careful about empirical studies of assessments of one’s own work and about possible Hawthorne effects.

The third problem, less critical, is the validity of a study relying on students. To what extent can we extrapolate from the results to a situation in industry? Software engineering students are on their way to becoming software professionals, but they are not professionals yet. This is a difficult issue because universities, rather than industry, are usually and understandably the place where experiments take place, an sometimes there is no other choice than using students. But then one can question the validity of the results. It depends on the nature of the questions being asked: if the question under study is whether a certain idea is easy to learn, using students is reasonable. But if it is, for example, whether a technique produces less buggy programs, the results can depend significantly on the subjects’ experience, which is different for students and professionals.

The last problem does not by itself affect the validity of the results, but it is a show-stopper nonetheless: Professor Smith’s experiment is unethical! If is is indeed true that SOP is better than the alternative, he is harming students from group 2; in the reverse case, he is harming students from group 1. Only in the case of the null hypothesis (using SOP makes no statistically significant difference) is the experiment ethical, but then it is also profoundly uninteresting. The rule in course-related experiments is a variant of the Hippocratic oath: before all, do not harm. The first purpose of a course is to enrich the students’ knowledge and skills; secondary aims, such as helping the professor’s research, are definitely acceptable, but must never impede the first. The setup described above is all the less acceptable that the project results presumably count towards the course grade, so the students who were forced to use the less good technique, if there demonstrably was one, have grounds to complain.

Note that Professor Smith could partially address this fairness problem by letting students choose their group, instead of assigning them randomly to group 1 or group 2 (based for example on the first letter of their names). But then the results would lose credibility, because this technique introduces self-selection and hence bias: the students who choose SOP may be the more intellectually curious students, and hence possibly the ones who do better anyway.

If Professor Smith cannot ensure fairness, he can still use students for his experiment, but has to run it outside of a course, for example by paying students, or running the experiment as a competition with some prizes for those who produce the programs with fewest bugs. This technique can work, although it introduces further dangers of self-selection. As part of a course, however, you just cannot assign students, on your own authority, to different techniques that might have an different effect on the core goal of the course: the learning experience.

So Professor Smith has a long way to go before he can run experiments that will convey a significant argument in favor of SOP.

Over the years I have seen, as a reader and sometimes as a referee, many Professor Smith papers: “empirical” evaluation of a technique by its own authors, using questionable techniques and not applying the necessary methodological precautions.

A first step is, whenever possible, to use experimenters who are from a completely different group from the developers of the ideas, as in two studies [4] [5] about the effectiveness of pair programming.

And yet! Sometimes no one else is available, and you do want to obtain objective empirical evidence about the merits of your own ideas. You are aware of the risk, and ready to face the cold reality, including if the results are unfavorable. Can you do it?

A recent attempt of ours seems to suggest that this is possible if you exert great care. It will presented in a paper at the next ESEM (Empirical Software Engineering and Measurement) and even though it discusses assessing some aspects of our own designs, using students, as part of the course project which counts for grading, and separating them into groups, we feel it was fair and ethical, and </modesty_filter_on>an ESEM referee wrote: “this is one of the best designed, conducted, and presented empirical studies I have read about recently”<modesty_filter_on>.

How did we proceed? How would you have proceeded? Think about it; feel free to express your ideas as comments to this post. In the next installment of this blog (The Professor Smith Syndrome: Part 3), I will describe our work, and let you be the judge.

References

[1] Bertrand Meyer: The rise of empirical software engineering (I): the good news, this blog, 30 July 2010, available here.
[2] Bertrand Meyer: The rise of empirical software engineering (II): what we are still missing, this blog, 31 July 2010, available here.

[3] On the Hawthorne effect, there is a good Wikipedia entry. Acknowledgment: I first heard about the Hawthorne effect from Barry Boehm.

[4] Jerzy R. Nawrocki, Michal Jasinski, Lukasz Olek and Barbara Lange: Pair Programming vs. Side-by-Side Programming, in EuroSPI 2005, pages 28-38. I do not have a URL for this article.

[5] Matthias Müller: Two controlled Experiments concerning the Comparison of Pair Programming to Peer Review, in  Journal of Systems and Software, vol. 78, no. 2, pages 166-179, November 2005; and Are Reviews an Alternative to Pair Programming ?, in  Journal of Empirical Software Engineering, vol. 9, no. 4, December 2004. I don’t have a URL for either version. I am grateful to Walter Tichy for directing me to this excellent article.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

The Professor Smith syndrome: Part 1 – a quiz

[As a reminder, this blog is now on a regular schedule, appearing every Monday. Sometimes in mid-week there will be a lighter piece or, as here, a preparation for the following Monday’s entry.]

Consider the following hypothetical report in experimental software engineering (see earlier posts: [1], [2]):

Professor Smith has developed a new programming technique, “Suspect-Oriented Programming” (SOP). To evaluate SOP, he directs half of the students in his “Software Methodology” class to do the project using traditional techniques, and the others to use SOP.

He finds that projects by the students using SOP have, on the average, 15% fewer bugs than the others, and reports that SOP increases software reliability.

Quiz, in advance of next Monday’s post: what’s wrong with this story?

References

[1] Bertrand Meyer: The rise of empirical software engineering (I): the good news, this blog, 30 July 2010, available here.
[2] Bertrand Meyer: The rise of empirical software engineering (II): what we are still missing, this blog, 31 July 2010, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Stendhal on abstraction

This week we step away from our usual sources of quotations — the Hoares and Dijkstras and Knuths — in favor an author who might seem like an unlikely inspiration for a technology blog: Stendhal. A scientist may like anyone else be fascinated by Balzac, Flaubert, Tolstoy or Dostoevsky, but they live in an entirely different realm; Stendhal is the mathematician’s novelist. Not particularly through the themes of his works (as could be the case with  Borges or Eco), but because of their clear structure and elegant style,  impeccable in its conciseness and razor-like in its precision. Undoubtedly his writing was shaped by his initial education; he prepared for the entrance exam of the then very young École Polytechnique, although at the last moment he yielded instead to the call of the clarion.

The scientific way of thinking was not just an influence on his writing; he understood the principles of scientific reasoning and knew how to explain them. Witness the following text, which explains just about as well as anything I know the importance of abstraction. In software engineering (see for example [1]), abstraction is the key talent, a talent of a paradoxical nature: the basic ideas take a few minutes to explain, and a lifetime to master. In this effort, going back to the childhood memories of Henri Beyle (Stendhal’s real name) is not a bad start.

Stendhal’s Life of Henri Brulard is an autobiography, with only the thinnest of disguises into a novel (compare the hero’s name with the author’s). In telling the story of his morose childhood in Grenoble, the narrator grumbles about the incompetence of his first mathematics teacher, a Mr. Dupuy, who taught mathematics “as a set of recipes to make vinegar” (comme une suite de recettes pour faire du vinaigre) and tells how his father found a slightly better one, Mr. Chabert. Here is the rest of the story, already cited in [2]. The translation is mine; you can read the original below, as well as a German version. Instead of stacks and circles  — or a university’s commencement day, see last week’s posting — the examples invoke eggs and cheese, but wouldn’t you agree that this paragraph is as good a definition of abstraction, directly applicable to software abstractions, and specifically to abstract data types and object abstractions (yes, it does discuss “objects”!), as any other?

So I went to see Mr. Chabert. Mr. Chabert was indeed less ignorant than Mr. Dupuy. Through him I discovered Euler and his problems on the number of eggs that a peasant woman brings to the market where a scoundrel steals a fifth of them, then she leaves behind the entire half of the remainder and so forth. This opened my mind, I glimpsed what it means to use the tool called algebra. I’ll be damned if anyone had ever explained it to me; endlessly Mr. Dupuy spun pompous sentences on the topic, but never did he say this one simple thing: it is a division of labor, and like every division of labor it creates wonders by allowing the mind to concentrate all its forces on just one side of objects, on just one of their qualities. What difference it would have made if Mr. Dupuy had told us: This cheese is soft or is it hard; it is white, it is blue; it is old, it is young; it is mine, it is yours; it is light or it is heavy. Of so many qualities, let us only consider the weight. Whatever that weight is, let us call it A. And now, no longer thinking of cheese, let us apply to A everything we know about quantities. Such a simple thing; and yet no one was explaining it to us in that far-away province [3]. Since that time, however, the influence of the École Polytechnique and Lagrange’s ideas may have trickled down to the provinces.

References

[1] Jeff Kramer: Is abstraction the key to computing?, in Communications of The ACM, vol. 50, 2007, pages 36-42.
[2] Bertrand Meyer and Claude Baudoin: Méthodes de Programmation, Eyrolles, 1978, third edition, 1982.
[3] No doubt readers from Grenoble, site of great universities and specifically one of the shrines of French computer science, will appreciate how Stendhal calls it  “that backwater” (cette province reculée).

Original French text

J’allai donc chez M. Chabert. M. Chabert était dans le fait moins ignare que M. Dupuy. Je trouvai chez lui Euler et ses problèmes sur le nombre d’œufs qu’une paysanne apportait au marché lorsqu’un méchant lui en vole un cinquième, puis elle laisse toute la moitié du reste, etc., etc. Cela m’ouvrit l’esprit, j’entrevis ce que c’était que se servir de l’instrument nommé algèbre. Du diable si personne me l’avait jamais dit ; sans cesse M. Dupuy faisait des phrases emphatiques sur ce sujet, mais jamais ce mot simple : c’est une division du travail qui produit des prodiges comme toutes les divisions du travail et permet à l’esprit de réunir toutes ses forces sur un seul côté des objets, sur une seule de leurs qualités. Quelle différence pour nous si M. Dupuy nous eût dit : Ce fromage est mou ou il est dur ; il est blanc, il est bleu ; il est vieux, il est jeune ; il est à moi, il est à toi ; il est léger ou il est lourd. De tant de qualités ne considérons absolument que le poids. Quel que soit ce poids, appelons-le A. Maintenant, sans plus penser absolument au fromage, appliquons à A tout ce que nous savons des quantités. Cette chose si simple, personne ne nous la disait dans cette province reculée ; depuis cette époque, l’École polytechnique et les idées de Lagrange auront reflué vers la province.

German translation (by Benjamin Morandi)

Deshalb ging ich zu Herrn Chabert. In der Tat war Herr Chabert weniger ignorant als Herr Dupuy. Bei ihm fand ich Euler und seine Probleme über die Zahl von Eiern, die eine Bäuerin zum Markt brachte, als ein Schurke ihr ein Fünftel stahl, sie dann die Hälfte des Restes hinterliest u.s.w. Es hat mir die Augen geöffnet. Ich sah was es bedeutet, das Algebra genannte Werkzeug zu benutzen. Unaufhörlich machte Herr Dupuy emphatische Sätze über dieses Thema, aber niemals dieses einfache Wort: Es ist eine Arbeitsteilung, die wie alle Arbeitsteilungen Wunder herstellt und dem Geist ermöglicht seine Kraft ganz auf eine einzige Seite von Objekten zu konzentrieren, auf eine Einzige ihrer Qualitäten. Welch Unterschied für uns, wenn uns Herr Dupuy gesagt hätte: Dieser Käse ist weich oder er ist hart; er ist weiss, er ist blau; er ist alt, er ist jung; er gehört dir, er gehört mir; er ist leicht oder er ist schwer. Bei so vielen Qualitäten betrachten wir unbedingt nur das Gewicht. Was dieses Gewicht auch sei, nennen wir es A. Jetzt, ohne unbedingt weiterhin an Käse denken zu wollen, wenden wir auf A alles an, was wir über Mengen wissen. Diese einfach Sache sagte uns niemand in dieser zurückgezogenen Provinz; von dieser Epoche an werden die École Polytechnique und die Ideen von Lagrange in die Provinz zurückgeflossen sein.

VN:F [1.9.10_1130]
Rating: 9.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)