Archive for the ‘Eiffel’ Category.

Detecting deadlock automatically? (New paper)

To verify sequential programs, we have to prove that they do the right thing, but also that they do it within our lifetime — that they terminate. The termination problem is considerably harder with concurrent programs, since they add a new form of non-termination: deadlock. A set of concurrent processes or threads will deadlock if they end up each holding a resource that another wants and wanting a resource that another holds.

There is no general solution to the deadlock problem, even a good enough general solution. (“Good enough” is the best we can hope for, since like many important problems deadlock is undecidable.) It is already hard enough to provide run-time deadlock detection, to be able at least to cancel execution when deadlock happens. The research reported in this new paper [1] pursues the harder goal of static detection. It applies to an object-oriented context (specifically the SCOOP model of concurrent OO computation) and relies fundamentally on the alias calculus, a static alias analysis technique developed in previous publications.

The approach is at its inception and considerable work remains to be done. Still, the example handled by the paper is encouraging: analyzing two versions of the dining philosophers problem and proving — manually — that one can deadlock and the other cannot.

References

[1] Bertrand Meyer: An automatic technique for static deadlock prevention, in PSI 2014 (Ershov Informatics Conference), eds. Irina Virbitskaite and Andrei Voronkov, Lecture Notes in Computer Science, Springer, 2015, to appear.; draft available here.

VN:F [1.9.10_1130]
Rating: 6.0/10 (19 votes cast)
VN:F [1.9.10_1130]
Rating: -2 (from 12 votes)

Lampsort

 

In support of his view of software methodology, Leslie Lamport likes to use the example of non-recursive Quicksort. Independently of the methodological arguments, his version of the algorithm should be better known. In fact, if I were teaching “data structures and algorithms” I would consider introducing it first.

As far as I know he has not written down his version in an article, but he has presented it in lectures; see [1]. His trick is to ask the audience to give a non-recursive version of Quicksort, and of course everyone starts trying to remove the recursion, for example by making the stack explicit or looking for invertible functions in calls. But his point is that recursion is not at all fundamental in Quicksort. The recursive version is a specific implementation of a more general idea.

Lamport’s version — let us call it Lampsort —is easy to express in Eiffel. We may assume the following context:

a: ARRAY [G -> COMPARABLE]        — The array to be sorted.
pivot: INTEGER                                      —  Set by partition.
picked: INTEGER_INTERVAL            — Used by the sorting algorithm, see below.
partition (i, j: INTEGER)
……..require      — i..j is a sub-interval of the array’s legal indexes:
……..……..i < j
……..……..i >= a.lower
……..……..j <= a.upper
……..do
……..……..… Usual implementation of partition
……..ensure     — The expected effect of partition:
……..……..pivot >= i
……..……..pivot < j
……..……..a [i..j] has been reshuffled so that elements in i..pivot are less than
……..……..or equal to those in pivot+1 .. j.
……..end

We do not write the implementation of partition since the point of the present discussion is the overall algorithm. In the usual understanding, that algorithm consists of doing nothing if the array has no more than one element, otherwise performing a partition and then recursively calling itself on the two resulting intervals. The implementation can take advantage of parallelism by forking the recursive calls out to different processors. That presentation, says Lamport, describes only a possible implementation. The true Quicksort is more general. The algorithm works on a set not_sorted of integer intervals i..j such that the corresponding array slices a [i..j] are the only ones possibly not sorted; the goal of the algorithm is to make not_sorted empty, since then we know the entire array is sorted. In Eiffel we declare this set as:

not_sorted: SET [INTEGER_INTERVAL]

The algorithm initializes not_sorted to contain a single element, the entire interval; at each iteration, it removes an interval from the set, partitions it if that makes sense (i.e. the interval has more than one element), and inserts the resulting two intervals into the set. It ends when not_sorted is empty. Here it is:

……..from                                 — Initialize interval set to contain a single interval, the array’s entire index range:
……..…..create not_sorted.make_one (a.lower |..| a.upper)….         ..……..
……..invariant
……..…..— See below
……..until
……..…..not_sorted.is_empty                                                            — Stop when there are no more intervals in set
……..loop
……..…..picked := not_sorted.item                                                     — Pick an interval from (non-empty) interval set.
……..……if picked.count > 1 then                                                      — (The precondition of partition holds, see below.)
……..……..…..partition (picked.lower, picked.upper)                 — Split, moving small items before & large ones after pivot.
……..……..…..not_sorted.extend (picked.lower |..| pivot)            — Insert new intervals into the set of intervals: first
……..……....not_sorted.extend (pivot + 1 |..| picked.upper)     — and second.
……..……end
……..…...not_sorted.remove (picked)                                               — Remove interval that was just partitioned.
…….end

Eiffel note: the function yielding an integer interval is declared in the library class INTEGER using the operator |..| (rather than just  ..).

The query item from SET, with the precondition not is_empty,  returns an element of the set. It does not matter which element. In accordance with the Command-Query Separation principle, calling item does not modify the set; to remove the element you have to use the command remove. The command extend adds an element to the set.

The abstract idea behind Lampsort, explaining why it works at all, is the following loop invariant (see [2] for a more general discussion of how invariants provide the basis for understanding loop algorithms). We call “slice” of an array a non-empty contiguous sub-array; for adjacent slices we may talk of concatenation; also, for slices s and t s <= t means that every element of s is less than or equal to every element of t. The invariant is:

a is the concatenation of the members of a set slices of disjoint slices, such that:
– The elements of a are a permutation of its original elements.
– The index range of any member  of slices having more than one element is in not_sorted.
– For any adjacent slices s and t (with s before t), s <= t.

The first condition (conservation of the elements modulo permutation) is a property of partition, the only operation that can modify the array. The rest of the invariant is true after initialization (from clause) with slices made of a single slice, the full array. The loop body maintains it since it either removes a one-element interval from not_sorted (slices loses the corresponding slice) or performs partition with the effect of partitioning one slice into two adjacent ones satisfying s <= t, whose intervals replace the original one in not_sorted. On exit, not_sorted is empty, so slices is a set of one-element slices, each less than or equal to the next, ensuring that the array is sorted.

The invariant also ensures that the call to partition satisfies that routine’s precondition.

The Lampsort algorithm is a simple loop; it does not use recursion, but relies on an interesting data structure, a set of intervals. It is not significantly longer or more difficult to understand than the traditional recursive version

sort (i, j: INTEGER)
……..require
……..……..i <= j
……..……..i >= a.lower
……..……..j <= a.upper
……..do
……..……if j > i then                    — Note that precondition of partition holds.
……..……..…..partition (i, j)         — Split into two slices s and t such that s <= t.
……..……..…..sort (i, pivot)          — Recursively sort first slice.
……..……..…..sort (pivot+1, j)      — Recursively sort second slice.
……..……end……..…..
……..end

Lampsort, in its author’s view, captures the true idea of Quicksort; the recursive version, and its parallelized variants, are only examples of possible implementations.

I wrote at the start that the focus of this article is Lampsort as an algorithm, not issues of methodology. Let me, however, give an idea of the underlying methodological debate. Lamport uses this example to emphasize the difference between algorithms and programs, and to criticize the undue attention being devoted to programming languages. He presents Lampsort in a notation which he considers to be at a higher level than programming languages, and it is for him an algorithm rather than a program. Programs will be specific implementations guided in particular by efficiency considerations. One can derive them from higher-level versions (algorithms) through refinement. A refinement process may in particular remove or restrict non-determinism, present in the above version of Lampsort through the query item (whose only official property is that it returns an element of the set).

The worldview underlying the Eiffel method is almost the reverse: treating the whole process of software development as a continuum; unifying the concepts behind activities such as requirements, specification, design, implementation, verification, maintenance and evolution; and working to resolve the remaining differences, rather than magnifying them. Anyone who has worked in both specification and programming knows how similar the issues are. Formal specification languages look remarkably like programming languages; to be usable for significant applications they must meet the same challenges: defining a coherent type system, supporting abstraction, providing good syntax (clear to human readers and parsable by tools), specifying the semantics, offering modular structures, allowing evolution while ensuring compatibility. The same kinds of ideas, such as an object-oriented structure, help on both sides. Eiffel as a language is the notation that attempts to support this seamless, continuous process, providing tools to express both abstract specifications and detailed implementations. One of the principal arguments for this approach is that it supports change and reuse. If everything could be fixed from the start, maybe it could be acceptable to switch notations between specification and implementation. But in practice specifications change and programs change, and a seamless process relying on a single notation makes it possible to go back and forth between levels of abstraction without having to perform repeated translations between levels. (This problem of change is, in my experience, the biggest obstacle to refinement-based approaches. I have never seen a convincing description of how one can accommodate specification changes in such a framework without repeating the whole process. Inheritance, by the way, addresses this matter much better.)

The example of Lampsort in Eiffel suggests that a good language, equipped with the right abstraction mechanisms, can be effective at describing not only final implementations but also abstract algorithms. It does not hurt, of course, that these abstract descriptions can also be executable, at the possible price of non-optimal performance. The transformation to an optimal version can happen entirely within the same method and language.

Quite apart from these discussions of software engineering methodology, Lamport’s elegant version of Quicksort deserves to be known widely.

References

[1] Lamport video here, segment starting at 0:32:34.
[2] Carlo Furia, Bertrand Meyer and Sergey Velder: Loop invariants: Analysis, Classification and Examples, in ACM Computing Surveys, September 2014, preliminary text here.

VN:F [1.9.10_1130]
Rating: 7.0/10 (27 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 11 votes)

New MOOC opens Tuesday

Our online course Computing: Art, Magic, Science, available from EdX, opens this Tuesday (tomorrow, 30 September) at 9 AM Zurich time (and at this time in your area).

An earlier article on this blog described the course, which integrates ten years of experience teaching introductory programming at ETH, and takes advantage of remote-compilation and remote-execution technology from our distributed development research.

You can find the course here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

The Eiffel Documentation Drive

EiffelStudio releases are semi-annual, end of May and end of November. Release 14-05 just came out. The next release (14-11) is entirely devoted to documentation. We are hoping for extensive community involvement in this first-time Eiffel Documentation Drive.

Many people regularly comment that there is not enough Eiffel and EiffelStudio documentation, and some of what exists is not good enough. We have decided to tackle the problem seriously, hence the dedication of an entire release cycle to documentation. The term is taken here in a broad sense: “documentation” means what is at http://docs.eiffel.com, but also everything else that can help understand Eiffel, for example updating Wikipedia entries on topics for which Eiffel has something to offer.

Anyone with an understanding of an Eiffel-related topic can help. We particularly need help from two (non-disjoint) categories of contributors

  • Those with a good understanding of one or more Eiffel-related topics.
  • Those with good writing skills.

The process will involve reviewing, so if you are an Eiffelist with moderate taste for writing, or a good writer with incomplete knowledge of Eiffel, we need your help anyway; someone else will compensate for the missing side. In particular, a common criticism is that some of the documentation was written by developers who do not have English as their mother tongue; if you can help improve it everyone will benefit. Of course if you are good at both technology and writing it’s even better.

We are mentioning English because it is the first target, but documentation in other languages, either original or a translation of existing English pages, is needed too.

Here is how the Eiffel Documentation Drive works:

  • Here you will find a form to report missing or unsatisfactory documentation. Please fill it on every applicable occasion.
  • The entries will be read by a member of the Eiffel Software team, who in applicable cases will add a row to the Eiffel Documentation Drive spreadsheet here. You can not only read that spreadsheet but also edit it yourself, so as to keep it as accurate and up-to-date as possible.
  • An email will be sent to the user list, with “Eiffel Documentation Drive” in the header (so that people not interested in the topic can filter them out), requesting help.
  • Those willing to help can enter their names in the corresponding row, indicating a planned date of completion.

Each row includes among its fields the following: topic, link to existing documentation, volunteer writer(s), planned completion, volunteer reviewer(s).

The full Eiffel Software team will participate – as noted above, improving the documentation is the strategic goal for the release – but we hope for considerable community participation. Please help make EiffelStudio documentation shine as much as the environment itself.

VN:F [1.9.10_1130]
Rating: 9.0/10 (9 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Programming language features

 

InfoWorld is currently publishing a series of programming language assessments:

  • 9 Things We Hate About Objective-C, 4 June.
  • 15 Things We Hate About Java, 6 March.
  • 10 Features Apple Stole for the Swift Programming Language, 9 June.

Notable in these articles is what they do not mention: Eiffel has most of what the author misses in Objective-C and Java; and most of what Swift “stole” it stole from Eiffel.

In this article let us concentrate on the nine Objective-C complaints, by Peter Wayner [1]; subsequent articles will examine the Java “hates” and the Swift “steals”.

Criticism 1: “It is a little too different

“Objective-C lovers tout that Objective-C is a strict superset of C: If you can do it in C, you should be able to do it in Objective-C. But it doesn’t go the other way, so you’re stuck wondering, “Should I use an Objective-C method description or a C one?” Achieving portability to C programs requires constant vigilance and forethought.”

This is what happens when you mix language paradigms. Eiffel has a close relationship with C, but the two sides are clearly separated. You can call C from Eiffel, and the other way around. You can declare an Eiffel routine as “external C” and even include the C code inline: in other words an Eiffel “method description” can have a C implementation. The structure is always object-oriented (no need to fear that a novice programmer will revert to a C style for the design) but for access to low-level system mechanisms and small functions that should be optimized to the byte and microsecond you use C directly, in its ideal role.

Criticism 2: “It’s still mostly just plain old C

“For all its object-oriented coolness, you don’t get much else from Objective-C. It’s more of a way to organize your code for large systems than a way to write better code. You’re still responsible for pointers. You’re still responsible for keeping track of memory.

Eiffel is object-oriented all the way. You are not “responsible for pointers“. References are tame: no pointer arithmetic. You are not “responsible for keeping track of memory“:  objects are garbage-collected

“The C programmers loved to call their software a ‘portable assembly code’, and the same is true for Objective-C … except it’s only portable from the Mac to the iPad.”

“Portable assembly code” is exactly what C provides, and hence an excellent target for an Eiffel compiler. As to Eiffel, it runs on all platforms, from Windows to Linux to Solaris to VMS to the Mac.

Criticism 3: Stuck in the 80’s

Criticism 3: “Stuck in the ’80s

“Parachute pants, big hair, ‘The Breakfast Club’ — and the NeXT machine: Objective-C is like a time machine in programming-language land.”

Eiffel has undergone constant evolution, innovating on all fronts of programming constructs and integrating the best of known techniques.

“The primitives aren’t first-class citizens. Garbage collection, that wonderful idea that sustained Lisp, was adopted by Java ages ago. Objective-C got it in 2006. The same goes for properties and closures.”

All this has been in Eiffel forever. Agents (closures) were introduced in 1999, long before Java, C# and other OO languages had anything of the sort. Eiffel’s assigner commands are vastly superior to properties (no need to write all these boring getter functions).

 Criticism 4: “Punctuation

“The cool modern kids writing Python, Ruby, and CoffeeScript can craft billion-dollar companies without using brackets, braces, and parentheses. You’ll be wearing out your punctuation keys writing Objective-C. Colons, at-signs, asterisks? Is there any character that the language doesn’t use?”

Come on. How can one be so misinformed? The semicolon has been optional in Eiffel for fifteen years. The high-priest style of C, Objective-C, Java, C# and so many others, with its piling up of strange symbols, is something that Eiffel users never had to suffer.

Criticism 5: “Modern syntax

Not modern syntax, that is:

“Objective-C”s syntax is like Coke: They tried to modernize it in the ’90s, but it never stuck.”

Eiffel’s syntax is clear and simple. Total beginners, including high-school students, pick it up just as easily and naturally as advanced programmers, and as application experts who want to concentrate on their problem, not on learning strange language conventions going back to the nineteen-sixties.

Criticism 6: “No namespaces

Here Eiffel does not provide what the journalist wants: it is “post-namespaces” (as in “postmodern”). The Eiffel community has decided that the complexity of namespaces was not worth the trouble (what happens when you move packages around?) and prefers simple mechanisms for resolving class name clashes.

Criticism 7: “It only runs in Apple’s corner of the universe

” Variety is the spice of life. It’s even more important in a world where not everything is an iPhone. If a Windows or Linux shop recruits you, you can forget all of those extra Objective-C extensions you learned because they’ll be of no use.”

Eiffel is not tied to any manufacturer, computer architecture or operating system. If a new processor comes out, or a user needs an exotic platform, a port can usually be produced in a matter of hours. The compiler and the entire environment to which it belongs, EiffelStudio, are written in Eiffel; the supporting runtime is in a highly portable form of C, which requires very little customization, if any, for a new platform. (Here “the compiler” means the Eiffel Software implementation, but other implementations also put a strong emphasis on portability.)

Criticism 8: “XCode is your only choice

“In the Objective-C world, you get really only one choice. Why do you need to be different, comrade?”

Besides EiffelStudio other compilers and tools are available for Eiffel.

Criticism 9: “Apple’s benevolent dictatorship

“Do you want to give out more than 100 copies of your iPhone app? Forget it. Do you want to “think different” with your UI? Please go back and read the user interface guidelines. You can’t do anything without Apple’s permission because Apple uses strong crypto to lock down everything — and fanatically tyrannical policies to lock down the rest.”

The Eiffel language definition is steered by a standards committee under Ecma (the organization behind many of the major standards in IT), which anyone can join. EiffelStudio itself is available in open source. The Eiffel world knows nothing like the close control Apple exerts over its product; it welcomes all contributors.

Maybe someone should talk to Mr. Wayner and help him broaden his scope of programming language knowledge.

References

[1] Peter Wayner, 9 Things We Hate About Objective-C, InfoWorld, 4 June 2014, available here.

VN:F [1.9.10_1130]
Rating: 6.7/10 (17 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 8 votes)

Attached by default?

 

Opinions requested! See at end.

A void call, during the execution of an object-oriented program, is a call of the standard OO form

x·some_routine (…)                                                /CALL/

where x, a reference, happens to be void (null) instead of denoting, as expected, an object. The operation is not possible; it leads to an exception and, usually, a crash of the program. Void calls are also called “null pointer dereferencing”.

One of the major advances in Eiffel over the past years has been the introduction of attached types, entirely removing the risk of void calls. The language mechanisms, extending the type system, make void-call avoidance a static property, part of type checking: just as the compiler will prevent you from assigning a boolean value to an integer variable, so will it flag your program if it sees a risk of void call. Put the other way around, if your program passes compilation, you have the guarantee that its executions will never produce a void call. Attached types thus remove one of the major headaches of programming, what Tony Hoare [1] called his “one-billion-dollar mistake”:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W) [2]. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty year

Thanks to attached types, Eiffel programmers can sleep at night: their programs will not encounter void calls.

To benefit from this advance, you must declare variables accordingly, as either attached (never void after initialization) or detachable (possibly void). You must also write the program properly:

  • If you declare x attached, you must ensure in the rest of the program that before its first use x will have been attached to an object, for example through a creation instruction create x.
  • If you declare x detachable, you must make sure that any call of the above form /CALL/ happens in a context where x is guaranteed to be non-void; for example, you could protect it by a test if x /= Void then or, better, an “object test”.

Code satisfying these properties is called void-safe.

Void safety is the way to go: who wants to worry about programs, even after they have been thoroughly tested and have seemingly worked for a while, crashing at unpredictable times? The absence of null-pointer-dereferencing can be a statically  enforced property, as the experience of Eiffel now demonstrates; and that what it should be. One day, children will think void-safely from the most tender age, and their great-grandparents will tell them, around the fireplace during long and scary winter nights, about the old days when not everyone was programming in Eiffel and even those who did were worried about the sudden null-pointer-derefencing syndrome. To get void safety through ordinary x: PERSON declarations, you had (children, hold your breath) to turn on a compiler option!

The transition to void safety was neither fast nor easy; in fact, it has taken almost ten years. Not everyone was convinced from the beginning, and we have had to improve and simplify the mechanism along the way to make void-safe programming practical. Compatibility has been a key issue throughout: older classes are generally not void-safe, but in a language that has been around for many years and has a large code base of operational software it is essential to ensure a smooth transition. Void safety has, from its introduction, been controlled by a compiler option:

  • With the option off, old code will compile as it used to do, but you do not get any guarantee of void safety. At execution time, a void call can still cause your program to go berserk.
  • With the option on, you get the guarantee: no void calls. To achieve this goal, you have to make sure the classes obey the void safety rules; if they do not, the compiler will reject them until you fix the problem.

In the effort to reconcile the compatibility imperative with the inexorable evolution to void safety, the key decisions have affected default values for compiler options and language conventions. Three separate decisions, in fact. Two of the defaults have already been switched; the question asked at the end of this article addresses the switching of the last remaining one.

The first default governed the void-safety compiler option. On its introduction, void-safety was off by default; the mechanism had to be turned on explicitly, part of the “experimental” option that most EiffelStudio releases offer for new, tentative mechanisms. That particular decision changed a year ago, with version 7.3 (May 2013): now void safety is the default. To include non-void-safe code you must mark  it explicitly.

The second default affects a language convention: the meaning of a standard declaration. A typical declaration, such as

x: PERSON                                                                                      /A/

says that at run time x denotes a reference which, if not void, will be attached to an object of type PERSON.  In pre-void-safety Eiffel, as in today’s other typed OO languages,  the reference could occasionally become void at run time; in other words, x was detachable. With the introduction of void safety, you could emphasize this property by specifying it explicitly:

x: detachable PERSON                                                             /B/

You could also specify that x would never be void by declaring it attached, asking the compiler to guarantee this property for you (through its application of the void-safety rules to all operations involving x). The explicit form in this case is

x: attached PERSON                                                               /C/

In practical programming, of course, you do not want to specify attached or detachable all the time: you want to use the simple form /A/ as often as possible. Originally, since we were starting from a non-void-safe language, compatibility required /A/ to mean /B/ by default. But it turns out that “attached” really is the dominant case: most references should remain attached at all times and Void values should be reserved for important but highly specialized cases such as terminating linked data structures. So the simple form should, in the final state of the language, mean /C/. That particular default was indeed switched early (version 7.0, November 2011) for people using the void-safety compiler option. As a result, the attached keyword is no longer necessary for declarations such as the above, although it remains available. Everything is attached by default; when you want a reference that could be void (and are prepared to bear the responsibility for convincing the compiler that it won’t when you actually use it in a call), you declare it as detachable; that keyword remains necessary.

There remains one last step in the march to all-aboard-for-void-safety: removing the “detachable by default” option, that is to say, the compiler option that will make /A/ mean /B/ (rather than /C/). It is only an option, and not the default; but still it remains available. Do we truly need it? The argument for removing it  is that it simplifies the specification (the fewer options the better) and encourages everyone, even more than before, to move to the new world. The argument against is to avoid disturbing existing projects, including their compiler control files (ECFs).

The question looms: when do we switch the defaults? Some of us think the time is now; specifically, the November release (14.11) [4].

Do you think the option should go? We would like your opinion. Please participate in the Eiffelroom poll [5].

 

References and note

[1] C.A.R. Hoare: Null References: The Billion Dollar Mistake , abstract of talk at QCon London, 9-12 March 2009, available here.

[2] (BM note) As a consolation, before Algol W, LISP already had NIL, which is the null pointer.

[3] Bertrand Meyer, Alexander Kogtenkov and Emmanuel Stapf: Avoid a Void: The Eradication of Null Dereferencing, in Reflections on the Work of C.A.R. Hoare, eds. C. B. Jones, A.W. Roscoe and K.R. Wood, Springer-Verlag, 2010, pages 189-211, available here.

[4] EiffelStudio version numbering changed in 2014: from a classic major_number.minor_number to a plain year.month, with two principal releases, 5 and 11 (May and November).

[5] Poll on switching the attachment defaults: at the bottom of the Eiffelroom page here (direct access here).

VN:F [1.9.10_1130]
Rating: 8.6/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

New article: contracts in practice

For almost anyone programming in Eiffel, contracts are just a standard part of daily life; Patrice Chalin’s pioneering study of a few years ago [1] confirmed this impression. A larger empirical study is now available to understand how developers actually use contracts when available. The study, to published at FM 2014 [2] covers 21 programs, not just in Eiffel but also in JML and in Code Contracts for C#, totaling 830,000 lines of code, and following the program’s revision history for a grand total of 260 million lines of code over 7700 revisions. It analyzes in detail whether programmers use contracts, how they use them (in particular, which kinds, among preconditions, postconditions and invariants), how contracts evolve over time, and how inheritance interacts with contracts.

The paper is easy to read so I will refer you to it for the detailed conclusions, but one thing is clear: anyone who thinks contracts are for special development or special developers is completely off-track. In an environment supporting contracts, especially as a native part of the language, programmers understand their benefits and apply them as a matter of course.

References

[1] Patrice Chalin: Are practitioners writing contracts?, in Fault-Tolerant System, eds. Butler, Jones, Romanovsky, Troubitsyna, Springer LNCS, vol. 4157, pp. 100–113, 2006.

[2] H.-Christian Estler, Carlo A. Furia, Martin Nordio, Marco Piccioni and Bertrand Meyer: Contracts in Practice, to appear in proceedings of 19th International Symposium on Formal Methods (FM 2014), Singapore, May 2014, draft available here.

VN:F [1.9.10_1130]
Rating: 8.4/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

Eiffel as an expression language

A functional-programming style, or more generally a style involving more expressions and fewer instructions, is possible in Eiffel. In particular, Eiffel’s agent mechanism embeds a full functional-programming mechanism in the object-oriented framework of the language.

To make the notations simpler, we are discussing and tentatively implementing a number of proposed extensions. They involve no fundamental new language mechanisms, but provide new, more concise notations for existing mechanisms. Examples are:

  • Conditional expressions.
  • Implicit tuple, a rule allowing the omission of brackets for an actual argument when it is a tuple and the last argument, e.g. f (x, y, z) as an abbreviation for f ([x, y, z]) (an example involving just one argument). Tuples already provided the equivalent of a variable-argument (“varargs”) facility, but it is made simpler to use with this convention.
  • Parenthesis alias, making it possible to write just f (x, y) when f is an agent (closure, lambda expression, delegate etc. in other terminologies), i.e. treating f as if it were a function; the notation is simply an abbreviation for f.item ([x, y]) (an example that also takes advantage of implicit tuples). It has many other applications since a “parenthesis alias” can be defined for a feature of any class.
  • Avoiding explicit assignments to Result.
  • Type inference (to avoid explicitly specifying the type when it can be deduced from the context). This is a facility for the programmer, useful in particular for local variables, but does not affect the type system: Eiffel remains strongly typed, it is just that you can be lazy about writing the type when there is no ambiguity.
  • In the same vein, omitting the entire list of generic parameters when it can be inferred.

The description of the mechanism (see the link in [1]) is in the form of a set of slides explaining the concepts and presenting example. This is a working document and feedback is welcome.

References

[1] Eiffel as an expression language, Eiffel Software working document, 2012-2014, see here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

Negative variables: new version

I have mentioned this paper before (see the earlier blog entry here) but it is now going to be published [1] and has been significantly revised, both to take referee comments into account and because we found better ways to present the concepts.

We have  endeavored to explain better than in the draft why the concept of negative variable is necessary and why the usual techniques for modeling object-oriented programs do not work properly for the fundamental OO operation, qualified call x.r (…). These techniques are based on substitution and are simply unable to express certain properties (let alone verify them). The affected properties are those involving properties of the calling context or the global project structure.

The basic idea (repeated in part from the earlier post) is as follows. In modeling OO programs, we have to take into account the unique “general relativity” property of OO programming: all the operations you write are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

We introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current.

Negative variable as back pointer

The paper explains why this concept is necessary, describes the associated formal rules, and presents applications.

Reference

[1] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, to appear in Specification, Algebra, and Software, eds. Shusaku Iida, Jose Meseguer and Kazuhiro Ogata, Springer Lecture Notes in Computer Science, 2014, to appear. See text here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

New paper: alias calculus and frame inference

For a while now I have  been engaged in  a core problem of software verification: the aliasing problem. As with many difficult problems in science, it is easy to state the basic question: can we determine automatically whether at a program point p the values of two reference expressions e and f can ever denote the same object?

Alias analysis lies at the core of many problems in software analysis and verification.

Earlier work [2] I introduced an “alias calculus”. The calculus is a set of rules, attached to the constructs of the programming language, to compute the “alias relation”: the set of possibly aliased expression pairs. A new paper [1] with Sergey Velder and Alexander Kogtenkov improves the model (correcting in particular an error in the axiom for assignment, whose new version has been proved sound using Coq) and applies it to the inference of frame properties. Here the abstract:

Alias analysis, which determines whether two expressions in a program may reference to the same object, has many potential applications in program construction and verification. We have developed a theory for alias analysis, the “alias calculus”, implemented its application to an object-oriented language, and integrated the result into a modern IDE. The calculus has a higher level of precision than many existing alias analysis techniques. One of the principal applications is to allow automatic change analysis, which leads to inferring “modifies clauses”, providing a significant advance towards addressing the Frame Problem. Experiments were able to infer the “modifies” clauses of an existing formally specied library. Other applications, in particular to concurrent programming, also appear possible. The article presents the calculus, the application to frame inference including experimental results, and other projected applications. The ongoing work includes building more efficient model capturing aliasing properties and soundness proof for its essential elements.

This is not the end of the work, as better models and implementations are needed, but an important step.

References

[1] Sergey Velder, Alexander Kogtenkovand Bertrand Meyer: Alias Calculus, Frame Calculus and Frame Inference, in Science of Computer Programming, to appear in 2014 (appeared online 26 November 2013); draft available here, published version here.
[2] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, Chinese Academy of Sciences, 2011, pages 77-116, available here.

 

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

The laws of branching (part 1)

 

The first law of branching is: don’t. There is no other law.

The only sane way to develop software in a group, whether collocated or distributed, is to have a single branch (“trunk”) to which everyone commits changes, with constant running of the regression test suite to make sure that any breaking change is detected and corrected right away.

To allow branching, that is to say the emergence of separate lines of development with the expectation that they will be merged back later on, is to guarantee disaster. It is easy to diverge, but hard to converge; not only hard, but unpredictable. It can take days, weeks or more to reconcile changes and resolve conflicts, when the reason for the changes is no longer fresh in the developers’ memories, and the developers themselves may even no longer be there. Conflicts should be detected right away, and corrected immediately.

The EiffelStudio development never uses branches. A related development, EVE (Eiffel Verification Environment), maintained at ETH, includes all research tools, and is reconciled every Friday with the EiffelStudio trunk, so it is never allowed to diverge into a separate branch. Most other successful teams I know apply the first law of branching too. Solve conflicts before they have had the time to become conflicts.

VN:F [1.9.10_1130]
Rating: 5.1/10 (28 votes cast)
VN:F [1.9.10_1130]
Rating: -10 (from 20 votes)

Concurrency video

Our Concurrency Made Easy project, the result of an ERC Advanced Investigator Grant, is trying to solve the problem of making concurrent programming simple, reliable and effective. It has spurred related efforts, in particular the Roboscoop project applying concurrency to robotics software.

Sebastian Nanz and other members of the CME project at ETH have just produced a video that describes the aims of the project and presents some of the current achievements. The video is available on the CME project page [1] (also directly on YouTube [2]).

References

[1] Concurrency Made Easy project, here.

[2] YouTube CME video, here.

VN:F [1.9.10_1130]
Rating: 9.5/10 (17 votes cast)
VN:F [1.9.10_1130]
Rating: +13 (from 15 votes)

Reading notes: strong specifications are well worth the effort

 

This report continues the series of ICSE 2013 article previews (see the posts of these last few days, other than the DOSE announcement), but is different from its predecessors since it talks about a paper from our group at ETH, so you should not expect any dangerously delusional,  disingenuously dubious or downright deceptive declaration or display of dispassionate, disinterested, disengaged describer’s detachment.

The paper [1] (mentioned on this blog some time ago) is entitled How good are software specifications? and will be presented on Wednesday by Nadia Polikarpova. The basic result: stronger specifications, which capture a more complete part of program functionality, cause only a modest increase in specification effort, but the benefits are huge; in particular, automatic testing finds twice as many faults (“bugs” as recently reviewed papers call them).

Strong specifications are specifications that go beyond simple contracts. A straightforward example is a specification of a push operation for stacks; in EiffelBase, the basic Eiffel data structure library, the contract’s postcondition will read

item =                                          /A/
count = old count + 1

where x is the element being pushed, item the top of the stack and count the number of elements. It is of course sound, since it states that the element just pushed is now the new top of the stack, and that there is one more element; but it is also  incomplete since it says nothing about the other elements remaining as they were; an implementation could satisfy the contract and mess up with these elements. Using “complete” or “strong” preconditions, we associate with the underlying domain a theory [2], or “model”, represented by a specification-only feature in the class, model, denoting a sequence of elements; then it suffices (with the convention that the top is the first element of the model sequence, and that “+” denotes concatenation of sequences) to use the postcondition

model = <x> + old model         /B/

which says all there is to say and implies the original postconditions /A/.

Clearly, the strong contracts, in the  /B/ style, are more expressive [3, 4], but they also require more specification effort. Are they worth the trouble?

The paper explores this question empirically, and the answer, at least according to the criteria used in the study, is yes.  The work takes advantage of AutoTest [5], an automatic testing framework which relies on the contracts already present in the software to serve as test oracles, and generates test cases automatically. AutoTest was applied to both to the classic EiffelBase, with classic partial contracts in the /A/ style, and to the more recent EiffelBase+ library, with strong contracts in the /B/ style. AutoTest is for Eiffel programs; to check for any language-specificity in the results the work also included testing a smaller set of classes from a C# library, DSA, for which a student developed a version (DSA+) equipped with strong model-based contracts. In that case the testing tool was Microsoft Research’s Pex [7]. The results are similar for both languages: citing from the paper, “the fault rates are comparable in the C# experiments, respectively 6 . 10-3 and 3 . 10-3 . The fault complexity is also qualitatively similar.

The verdict on the effect of strong specifications as captured by automated testing is clear: the same automatic testing tools applied to the versions with strong contracts yield twice as many real faults. The term “real fault” comes from excluding spurious cases, such as specification faults (wrong specification, right implementation), which are a phenomenon worth studying but should not count as a benefit of the strong specification approach. The paper contains a detailed analysis of the various kinds of faults and the corresponding empirically determined measures. This particular analysis is for the Eiffel code, since in the C#/Pex case “it was not possible to get an evaluation of the faults by the original developers“.

In our experience the strong specifications are not that much harder to write. The paper contains a precise measure: about five person-weeks to create EiffelBase+, yielding an “overall benefit/effort ratio of about four defects detected per person-day“. Such a benefit more than justifies the effort. More study of that effort is needed, however, because the “person” in the person-weeks was not just an ordinary programmer. True, Eiffel experience has shown that most programmers quickly get the notion of contract and start applying it; as the saying goes in the community, “if you can write an if-then-else, you can write a contract”. But we do not yet have significant evidence of whether that observation extends to model-based contracts.

Model-based contracts (I prefer to call them “theory-based” because “model” means so many other things, but I do not think I will win that particular battle) are, in my opinion, a required component of the march towards program verification. They are the right compromise between simple contracts, which have proved to be attractive to many practicing programmers but suffer from incompleteness, and full formal specification à la Z, which say everything but require too much machinery. They are not an all-or-nothing specification technique but a progressive one: programmers can start with simple contracts, then extend and refine them as desired to yield exactly the right amount of precision and completeness appropriate in any particular context. The article shows that the benefits are well worth the incremental effort.

According to the ICSE program the talk will be presented in the formal specification session, Wednesday, May 22, 13:30-15:30, Grand Ballroom C.

References

[1] Nadia Polikarpova, Carlo A. Furia, Yu Pei, Yi Wei and Bertrand Meyer: What Good Are Strong Specifications?, to appear in ICSE 2013 (Proceedings of 35th International Conference on Software Engineering), San Francisco, May 2013, draft available here.

[2] Bertrand Meyer: Domain Theory: the forgotten step in program verification, article on this blog, see here.

[3] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[4] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ‘ 10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[5] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, also available here.

[6] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, in IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, also available here.

[7] Nikolai Tillman and Peli de Halleux, Pex: White-Box Generation for .NET, in Tests And Proofs (TAP 2008), pp. 134-153.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Adult entertainment

 

I should occasionally present examples of the strange reasons people sometimes invoke for not using Eiffel. In an earlier article [1] I gave the basic idea common to all these reasons, but there are many variants, in the general style “I am responsible for IT policy and purchases for IBM, the US Department of Defense and Nikke, and was about to sign the PO for the triple site license when I noticed that an article about Eiffel was published in 1997. How dare you! I had a tooth removed that year and it hurt a lot. I would really have liked to use Eiffel but you just made it impossible“.

While going through old email I found one of these carefully motivated strategic policy decisions: a missing “L” in a class name. Below is, verbatim [2], a message posted on the EiffelStudio developers list in 2006, and my answer. Also provides an interesting glimpse of what supposedly grown-up people find it worthwhile to spend their days on.

Original message

From: es-devel-bounces@origo.ethz.ch [mailto:es-devel-bounces@origo.ethz.ch] On Behalf Of Peter Gummer
Sent: Tuesday, 29 August, 2006 14:01
To: es-devel@origo.ethz.ch
Subject: [Es-devel] Misspelling as a naming convention
From: es-devel-bounces@origo.ethz.ch [mailto:es-devel-bounces@origo.ethz.ch] On Behalf Of Peter Gummer

Today I submitted a problem report that one of the EiffelVision classes has misspelt “tabbable” as “tabable“. Manu replied that the EiffelVision naming convention is that class or feature names ending in “able” will not double the preceding consonant, regardless of whether this results in wrong spelling.

Looking at the latest Es-changes Digest email, I see various changes implementing this naming convention. For example, the comment for revision 63043 is, “Changed from controllable to controlable to meet naming convention‘.

This is lunacy! “Controlable” (implying the existence of some verb “to controle“) might look quite ok to French eyes, but it looks utterly unprofessional to me. It does have a sort of Chaucerian, Middle English, pre-Gutenberg charm I suppose. Is this part of a plot for a Seconde Invasion Normande of the Langue Anglaise?

We are about to embark on some GUI work. Although we are probably going to use .NET WinForms, EiffelVision was a possible choice. But bad spelling puts me in a bad mood. I’d be very reluctant to work with EiffelVision because of this ridiculous naming convention.

– Peter Gummer

Answer

From: Bertrand Meyer
Sent: Wednesday, 30 August, 2006 00:52
To: Peter Gummer
Cc: es-devel@origo.ethz.ch
Subject: Re: [Es-devel] Misspelling as a naming convention

This has nothing to do with French. If anything, French practices the doubling of consonants before a suffix more than English does; an example (extracted from reports of users’ attitudes towards EiffelVision) is English “passionate“, French “passionné“. For the record, there’s no particular French dominance in the Eiffel development team, either at Eiffel Software or elsewhere. The recent discussion on EiffelVision’s “-able” class names involved one native English speaker out of three people, invalidating at the 33% level Kristen Nygaard’s observation that the language of science is English as spoken by foreigners.

The problem in English is that the rules defining which consonants should be doubled before a suffix such as “able” are not obvious. See for example this page from the University of Ottawa:

Double the final consonant before a suffix beginning with a vowel if both of the following are true: the consonant ends a stressed syllable or a one-syllable word, and the consonant is preceded by a single vowel.

Now close your eyes and repeat this from memory. I am sure that won’t be hard because you knew the rule all along, but can we expect this from all programmers using EiffelVision?

Another Web page , from a school in Oxfordshire, England, says:

Rule: Double the last consonant when adding a vowel suffix to a single syllable word ending in one vowel and one consonant.

Note that this is not quite the same rule; it doesn’t cover multi-syllable words with the stress (tonic accent) on the last syllable; and it would suggest “GROUPPABLE” (“group” is a one-syllable word ending in one vowel and one consonant), whereas the first rule correctly prescribes “GROUPABLE“. But apparently this is what is being taught to Oxfordshire pupils, whom we should stand ready to welcome as Eiffel programmers in a few years.

Both rules yield “TRANSFERABLE” because the stress is on the first syllable of “transfer“. But various dictionaries we have consulted also list “TRANSFERRABLE” and “TRANSFERRIBLE“.

As another example consider “FORMATING“. Both rules suggest a single “t“. The Solaris spell checker indeed rejects the form with two “t“s and accepts the version with one; but — a case of Unix-Windows incompatibility that seems so far to have escaped the attention of textbook authors — Microsoft Word does the reverse! In fact in default mode if you type “FORMATING” it silently corrects it to “FORMATTING“. It’s interesting in this example to note a change of tonic accent between the original and derived words: “fórmat” (both noun and verb) but “formáting“. Maybe the Word convention follows the “Ottawa” rule but by considering the stress in the derivation rather than the root? There might be a master’s thesis topic in this somewhere.

Both rules imply “MIXXABLE” and “FIXXABLE“, but we haven’t found a dictionary that accepts either of these forms.

Such rules cannot cover all cases anyway (they are “UNFATHOMMABLE“) because “consonant” vs “vowel” is a lexical distinction that doesn’t reflect the subtleties of English pronunciation. For example either rule would lead to “DRAWWABLE” because the word “draw” ends with “w“, a letter that you’ll find everywhere characterized as a consonant. Lexically it is a consonant, but phonetically it is sometimes a consonant and sometimes not, in particular at the end of a word. In “Wow“, the first “w” is a consonant, but not the second one. A valid rule would need to take into account not only spelling but also pronunciation. This is probably the reason behind the examples involving words ending in “x“: phonetically “X” can be considered two consonants, “KS“. But then the rule becomes more tricky, forcing the inquirer, who is understandably getting quite “PERPLEXXED” at this stage, to combine lexical and phonetic reasoning in appropriate doses.

No wonder then a page from the Oxford Dictionaries site states:

One of the most common types of spelling error is a mistake over whether a word is spelled with a double or a single consonant.

and goes on to list many examples.

You can find a list of of words ending in “ablehere . Here are a few cases involving derivations from a word ending with “p“:

Single consonant
DEVELOPABLE
GRASPABLE
GROUPABLE
HELPABLE
KEEPABLE
REAPABLE
RECOUPABLE

Doubled consonant
DIPPABLE
DROPPABLE (but: DRAPABLE)
FLOPPABLE
MAPPABLE
RECAPPABLE (but: CAPABLE)
RIPPABLE (but: ROPABLE)
SHIPPABLE
SKIPPABLE
STOPPABLE
STRIPPABLE
TIPPABLE

There are also differences between British and American usage.

True, the “Ottawa” rule could be amended to take into account words ending in “w“, “x“, “h” and a few other letters, and come reasonably close to matching dictionary practice. But should programmers have to remember all this? Will they?

Since we are dealing in part with artificial words, there is also some doubt as to what constitutes a “misspelt” word as you call it (or a “misspelled” one as Eiffel conventions — based on American English — would have it). Applying the rule yields “MAPPABLE“, which is indeed found in dictionaries. But in the world of graphics we have the term “bitmap“, where the stress is on the first syllable. The rule then yields “BITMAPABLE“. That’s suspicious but “GOOGLABLE“; a search produces 31 “BITMAPPABLE” and two “BITMAPABLE“, one of which qualified by “(Is that a word?)”. So either EiffelVision uses something that looks inconsistent (“BITMAPABLE” vs “MAPPABLE“) but follows the rule; or we decide for consistency.

In our view this case can be generalized. The best convention is the one that doesn’t require programmers to remember delicate and sometimes fuzzy rules of English spelling, but standardizes on a naming convention that will be as easy to remember as possible. To achieve this goal the key is consistency. A simple rule for EiffelVision classes is:

  • For an “-able” name derived from a word ending with “e“, drop the “e“: REUSABLE. There seems to be no case of words ending with another vowel.
  • If the name is derived from a word ending with a consonant, just add “able“: CONTROLABLE, TOOLTIPABLE, GROUPABLE.

Some of these might look strange the first couple of times but from then on you will remember the convention.

While we are flattered that EiffelVision should be treated as literature, we must admit that there are better recommendations for beach reading, and that Eiffel is not English (or French).

The above rule is just a convention and someone may have a better suggestion.

With best regards,

— Bertrand Meyer, Ian King, Emmanuel Stapf

Reference and note

[1] Habit, happiness and programming languages, article in this blog, 22 October 2012, see here.

[2] I checked the URLs, found that two pages have disappeared since 2006, and replaced them with others having the same content. The formatting (fonts, some of the indentation) is added. Peter Gummer asked me to make sure that his name always appears with two “m“.

VN:F [1.9.10_1130]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Bringing C code to the modern world

The C2Eif translator developed by Marco Trudel takes C code and translates it into Eiffel; it produces not just a literal translation but a re-engineering version exhibiting object-oriented properties. Trudel defended his PhD thesis last Friday at ETH (the examiners were Hausi Muller from Victoria University, Manuel Oriol from ABB, Richard Paige from the University of York,  and me as the advisor). The thesis is not yet available online but earlier papers describing C2Eif are, all reachable from the project’s home page [1].

At issue is what we do with legacy code. “J’ai plus de souvenirs que si j’avais mille ans”, wrote Charles Baudelaire in Les Fleurs du Mal (“Spleen de Paris”). The software industry is not a thousand years old, but has accumulated even more “souvenirs” than

A heavy chest of drawers cluttered with balance-sheets,
Poems, love letters, lawsuits, romances
And heavy locks of hair wrapped in invoices
.

We are suffocating under layers of legacy code heaped up by previous generations of programmers using languages that no longer meet our scientific and engineering standards. We cannot get rid of this heritage; how do we bring it to the modern world? We need automatic tools to wrap it in contemporary code, or, better, translate it into contemporary code. The thesis and the system offer a way out through translation to a modern object-oriented language. It took courage to choose such a topic, since there have been many attempts in the past, leading to conventional wisdom consisting of two strongly established opinions:

  • Plain translation: it has been tried, and it works. Not interesting for a thesis.
  • Object-oriented reengineering: it has been tried, and it does not work. Not realistic for a thesis.

Both are wrong. For translation, many of the proposed solutions “almost work”: they are good enough to translate simple programs, or even some large programs but on the condition that the code avoids murky areas of C programming such as signals, exceptions (setjmp/longjmp) and library mechanisms. In practice, however, most useful C programs need these facilities, so any tool that ignores them is bound to be of conceptual value only. The basis for Trudel’s work has been to tackle C to OO translation “beyond the easy stuff” (as stated in the title of one of the published papers). This effort has been largely successful, as demonstrated by the translation of close to a million lines of actual C code, including some well-known and representative tools such as the Vim editor.

As to OO reengineering, C2Eif makes a serious effort to derive code that exhibits a true object-oriented design and hence resembles, in its structure at least, what a programmer in the target language might produce. The key is to identify the right data abstractions, yielding classes, and specialization properties, yielding inheritance. In this area too, many people have tried to come up with solutions, with little success. Trudel has had the good sense of avoiding grandiose goals and sticking to a number of heuristics that work, such as looking at the signatures of a set of functions to see if they all involve a common argument type. Clearly there is more to be done in this direction but the result is already significant.

Since Eiffel has a sophisticated C interface it is also possible to wrap existing code; some tools are available for that purpose, such as Andreas Leitner’s EWG (Eiffel Wrapper Generator). Wrapping and translating each have their advantages and limitations; wrapping may be more appropriate for C libraries that someone else is still actively updating  (so that you do not have to redo a translation with every new release), and translation for legacy code that you want to take over and bring up to par with the rest of your software. C2Eif is engineered to support both. More generally, this is a practitioner’s tool, devoting considerable attention to the many details that make all the difference between a nice idea and a tool that really works. The emphasis is on full automation, although more parametrization has been added in recent months.

C2Eif will make a big mark on the Eiffel developer community. Try it yourself — and don’t be shy about telling its author about the future directions in which you think the tool should evolve.

Reference

[1] C2Eif project page, here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (13 votes cast)
VN:F [1.9.10_1130]
Rating: +8 (from 8 votes)

How good are strong specifications? (New paper, ICSE 2013)

 

A core aspect of our verification work is the use of “strong” contracts, which express sophisticated specification properties without requiring a separate specification language: even for advanced properties, there is no need for a separate specification language, with special notations such as those of first-order logic; instead, one can continue to rely, in the tradition of Design by Contract, on the built-in notations of the programming language, Eiffel.

This is the idea of domain theory, as discussed in earlier posts on this blog, in particular [1]. An early description of the approach, part of Bernd Schoeller’s PhD thesis work, was [2]; the next step was [3], presented at VSTTE in 2010.

A new paper to be presented at ICSE in May [3], part of an effort led by Nadia Polikarpova for her own thesis in progress, shows new advances in using strong specifications, demonstrating their expressive power and submitting them to empirical evaluation. The results show in particular that strong specifications justify the extra effort; in particular they enable automatic tests to find significantly more bugs.

A byproduct of this work is to show again the complementarity between various forms of verification, including not only proofs but (particularly in the contribution of two of the co-authors, Yi Wei and Yu Pei, as well as Carlo Furia) tests.

References

[1] Bertrand Meyer: Domain Theory: the forgotten step in program verification, article on this blog, see here.

[2] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[3] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ‘ 10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[4] Nadia Polikarpova, Carlo A. Furia, Yu Pei, Yi Wei and Bertrand Meyer: What Good Are Strong Specifications?, to appear in ICSE 2013 (Proceedings of 35th International Conference on Software Engineering), San Francisco, May 2013, draft available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Multirequirements (new paper)

 

As part of a Festschrift volume for Martin Glinz of the university of Zurich I wrote a paper [1] describing a general approach to requirements that I have been practicing and developing for a while, and presented in a couple of talks. The basic idea is to rely on object-oriented techniques, including contracts for the semantics, and to weave several levels of discourse: natural-language, formal and graphical.

Reference

[1] Bertrand Meyer: Multirequirements, to appear in Martin Glinz Festschrift, eds. Anne Koziolek and Norbert Scheyff, 2013, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Negative variables and the essence of object-oriented programming (new paper)

In modeling object-oriented programs, for purposes of verification (proofs) or merely for a better understanding, we are faced with the unique “general relativity” property of OO programming: all the operations you write (excluding non-OO mechanisms such as static functions) are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

In earlier work [1, 2] initially motivated by the development of the Alias Calculus, I introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current. Alexander Kogtenkov has implemented these ideas and refined them.

Negative variable as back pointer

In a recent paper under submission [3], we review the concepts and applications of negative variables.

References

[1] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, 2011, available here.

[2] Bertrand Meyer: Towards a Calculus of Object Programs, in Patterns, Programming and Everything, Judith Bishop Festschrift, eds. Karin Breitman and Nigel Horspool, Springer-Verlag, 2012, pages 91-128, available here.

[3] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, submitted for publication, 2012. [Updated 13 January 2014: I have removed the link to the draft mentioned in this post since it is now superseded by the new version, soon to be published, and available here.]

VN:F [1.9.10_1130]
Rating: 9.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Loop invariants: the musical

 

Actually it is not a musical but an extensive survey. I have long been fascinated by the notion of loop invariant, which describes the essence of a loop. Considering a loop without its invariant is like conducting an orchestra without a score.

In this submitted survey paper written with Sergey Velder and Carlo Furia [1], we study loop invariants in depth and describe many algorithms from diverse areas of computer science through their invariants. For simplicity and clarity, the specification technique uses the Domain Theory technique described in an earlier article on this blog [2] (see also [3]). The invariants were verified mechanically using Boogie, a sign of how much more realistic verification technology has become in recent years.

The survey was a major effort (we worked on it for a year and a half); it is not perfect but we hope it will prove useful in the understanding, teaching and verification of important algorithms.

Here is the article’s abstract:

At the heart of every loop, and hence of all significant algorithms, lies a loop invariant: a property ensured by the initialization and maintained by every iteration so that, when combined with the exit condition, it yields the loop’s final effect. Identifying the invariant of every loop is not only a required step for software verification, but also a key requirement for understanding the loop and the program to which it belongs. The systematic study of loop invariants of important algorithms can, as a consequence, yield insights into the nature of software.

We performed this study over a wide range of fundamental algorithms from diverse areas of computer science. We analyze the patterns according to which invariants are derived from postconditions, propose a classification of invariants according to these patterns, and present its application to the algorithms reviewed. The discussion also shows the need for high-level specification and invariants based on “domain theory”. The included invariants and the corresponding algorithms have been mechanically verified using an automatic program prover. Along with the classification and applications, the conclusions include suggestions for automatic invariant inference and general techniques for model-based specification.

 

References

[1] Carlo Furia, Bertrand Meyer and Sergey Velder: Loop invariants: analysis, classification, and examples, submitted for publication, December 2012, draft available here.

[2] Domain Theory: the Forgotten Step in Program Verification, article from this blog, 11 April 2012, available here.

[3] Domain Theory: Precedents, article from this blog, 11 April 2012, available here

VN:F [1.9.10_1130]
Rating: 9.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Habit, happiness, and programming languages

One of the occupational hazards of spreading the word about Eiffel is the frequent answer “yes, it’s much better than the language I use now, I would like to switch, but…“, followed by some sheepish excuse.

Last night I went to see Eugene Onegin once more (I still hope some day to land the part of Monsieur Triquet). Towards the beginning of the first act [1], Tatiana’s mother (Larina), reflecting in a melancholic tone on the vicissitudes of her (long ago) arranged marriage (and (amazingly) anticipating the very fate (as sketched in the last act) of her own daughter (talking about (amazing) anticipation, is there any other similarly hair-raising case of an author (here of the text behind the libretto) so presciently staging the (exact (down to the very last details)) story of his own future tragic death) but enough digressions (sorry (this is supposed (after all) to be (although it is not the first time (and probably not the last either) it strays from the script) a technology blog))), sings

From above, we were given habit:
It is a substitute for happiness

Is this not exactly the excuse?

Reference

[1] Libretto of Onegin, in English here, in the original there.

 

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

EIS: Putting into Practice the Single Model Principle

Since release 6.2 (November 2008) EiffelStudio has included the EIS system, Eiffel Information System. It has been regularly revised, and significantly improved for the recent 7.1 release.

For us EIS is a key contribution with far-reaching software engineering implications, but many users seem unaware of it, perhaps because we have not been explicit enough about why we think it is important. We would love to have more people try it and give us their feedback. (Please make sure to use the 7.1 version.) Information on EIS can be found in the documentation [1] and also in a blog entry by Tao Feng [2].

EIS connects an Eiffel system with external documents in arbitrary formats; examples of formats currently supported are Microsoft Word and PDF, but you can easily add protocols. Such a connection links an element of the Eiffel text, such as a feature, with an element of the external document, such as a paragraph. Then clicking the Eiffel element in EiffelStudio will open the document at the corresponding place in the external tool (Word, Acrobat etc.); this is the EIS “outgoing” mechanism. Conversely the external element has a back link: clicking in the external tool will open EiffelStudio at the right place; this is the EIS “incoming” mechanism.

For the outgoing mechanism, the link will appear as part of a note clause (with attributes filled by default, you need only edit the URL and any option that you wish to change):

EIS incoming note

The fundamental idea behind EIS is to support the seamless form of software development promoted and permitted by Eiffel, where all phases of a project’s lifecycle are closely linked and the code provides the ultimate reference. Since other documents are often involved, in particular a requirements document (SRS, Software Requirements Specification), it is essential to record their precise associations with elements of the software text. For example a paragraph in the SRS could state that “Whenever the tank temperature reaches 50 degrees, the valve shall be closed”. In the software text, there will be some feature, for example monitor_temperature in the class TANK, reflecting this requirement. The two elements should be linked, in particular to ensure that dependencies appear clearly and that any change in either the requirements or the code triggers the corresponding update to the other side. This is what EIS provides.

We envision further tools to track dependencies and in particular to warn users if an element of a connection (e.g. requirement or code) changes, alerting them to the need to check the linked elements on the other side. One of the key goals here is traceability: effective project management, particular during the evolution of a system, requires that all dependencies between the project’s artifact are properly recorded so that it is possible to find out the consequences of any change, proposed or carried out.

The general approach reflects the essential nature of Eiffel development, with its Single Product Principle linking all elements of a software system and minimizing, rather than exaggerating, the inevitable differences of levels of abstraction between requirements, design, code, test plans, test logs, schedules and all the other products of a software project. The core problem of software engineering is change: if we use different tools and notations at each step, and keep the documents separate, we constantly run the risk of divergence between intent and reality. Eiffel by itself offers a good part of the solution by providing a single method (with all its principles, from Design by Contract to open-closed etc.), a single notation (the Eiffel language itself) and a single integrated set of tools (the EiffelStudio IDE) supporting the entire lifecycle; the language, in particular is meant for requirements and design as much as for implementation. The graphical forms (BON and UML, as produced by the Diagram Tool of EiffelStudio in a roundtrip style, i.e. changes to the diagram immediately generate code and changes to the code are reflected in the diagram) directly support these ideas. Of course documents in other formalisms, for example SRS, remain necessary for human consumption; but they should be closely linked to the core project asset, the Eiffel code; hence the need for EIS and its connection mechanisms.

This approach, as I have often noted when presenting it in public, is hard to convey to people steeped in the mindset of the past (UML as separate from code, model-driven development) which magnify the differences between software levels, hence introducing the risk of divergence and making change painful. The Eiffel approach is innovative enough to cause incomprehension or even rejection. (“What, you are not model-driven, but everyone says model-driven is good!” – well, models are bad if they are inaccurate. In the Eiffel approach the model and the program are the same thing, or more precisely the model is the abstract view of the program, obtained through abstraction mechanisms such as deferred classes with contracts and the “contract view” tool of EiffelStudio.)

To be effective, these ideas require proper tool support, for which EIS is a start. But we would like to know if we are on the right track and hence need feedback. We would be grateful if you could try out EIS and tell us what you think, both about the current state of the mechanism and its long-term prospects in the general framework of high-quality, sustainable software development.

References

[1] EIS documentation, here.

[2] Tao Feng, Start using Eiffel Information System, Eiffelroom blog entry of 17 April 2008, available here.

VN:F [1.9.10_1130]
Rating: 9.8/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Positions open at ETH in concurrency and verification

We have positions open at both the PhD and Postdoc levels at the Chair of Software Engineering at ETH Zurich, the Chair of Software Engineering. As noted in an earlier article, I recently
received an Advanced Investigator Grant from the European Research Council (5 years, 2.5 million euros) on the theme “Concurrency Made Easy”; see [1] for the project description. We are also recruiting for our effort of building an advanced verification environment around Eiffel (EVE, Eiffel Verification Environment), under the slogan “Verification As a Matter of Course”; see e.g. the slides at [2], although the details are no longer up to date.

Requirements:

  • Excellent background in verification.
  • For the concurrency project, excellent background in concurrency.
  • Good publications on relevant topics (particularly for the postdoc positions).
  • Excellent mastery of object-oriented programming, including Eiffel concepts & Design by Contract.
  • Mix of theoretical and practical (software development) talents.
  • Passion for research and determination to advance the state of software engineering.

Please send a CV to this address; also include a position statement describing (briefly, half a page to two pages) on what topics you would like to work and what you think you can contribute. Obviously you should take some time to become familiar with our work , starting from the research pages at [3].

References

[1] Concurrency Made Easy project description, here.

[2] Slides of a talk at SAC, here.

[3] Home page of ETH Chair of Software Engineering, here.

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

A carefully designed Result

 

In the Eiffel user discussion group [1], Ian Joyner recently asked:

A lot of people are now using Result as a variable name for the return value in many languages. I believe this first came from Eiffel, but can’t find proof. Or was it adopted from an earlier language?

Proof I cannot offer, but certainly my recollection is that the mechanism was an original design and not based on any previous language. (Many of Eiffel’s mechanisms were inspired by other languages, which I have always acknowledged as precisely as I could, but this is not one of them. If there is any earlier language with this convention — in which case a reader will certainly tell me — I was and so far am not aware of it.)

The competing conventions are a return instruction, as in C and languages based on it (C++, Java, C#), and Fortran’s practice, also used in Pascal, of using the function name as a variable within the function body. Neither is satisfactory. The return instruction suffers from two deficiencies:

  • It is an extreme form of goto, jumping out of a function from anywhere in its control structure. The rest of the language sticks to one-entry, one-exit structures, as I think all languages should.
  • In most non-trivial cases the return value is not just a simple formula but has to be computed through some algorithm, requiring the declaration of a local variable just to denote that result. In every case the programmer must invent a name for that variable and, in a typed language, include a declaration. This is tedious and suggests that the language should take care of the declaration for the programmer.

The Fortran-Pascal convention does not combine well with recursion (which Fortran for a long time did not support). In the body of the function, an occurrence of the function’s name can denote the result, or it can denote a recursive call; conventions can be defined to remove the ambiguity, but they are messy, especially for a function without arguments: in function f, does the instruction

f := f + 1

add one to the value of the function’s result as computed so far, as it would if f were an ordinary variable, or to the result of calling f recursively?

Another problem with the Fortran-Pascal approach is that in the absence of a language-defined rule for variable initialization a function can return an undefined result, if some path has failed to initialize the corresponding variable.

The Eiffel design addresses these problems. It combines several ideas:

  • No nesting of routines. This condition is essential because without it the name Result would be ambiguous. In all Algol- and Pascal-like languages it was considered really cool to be able to declare routines within routines, without limitation on the depth of recursion. I realized that in an object-oriented language such a mechanism was useless and in fact harmful: a class should be a collection of features — services offered to the rest of the world — and it would be confusing to define features within features. Simula 67 offered such a facility; I wrote an analysis of inter-module relations in Simula, including inheritance and all the mechanisms retained from Algol such as nesting (I am trying to find that document, and if I do I will post it in this blog); my conclusion was the result was too complicated and that the main culprit was nesting. Requiring classes to be flat structures was, in my opinion, one of the most effective design decisions for Eiffel.
  • Language-defined initialization. Even a passing experience with C and C++ shows that uninitialized variables are one of the major sources of bugs. Eiffel introduced a systematic rule for all variables, including Result, and it is good to see that some subsequent languages such as Java have retained that convention. For a function result, it is common to ignore the default case, relying on the standard initialization, as in if “interesting case” then Result:= “interesting value” end without an else clause (I like this convention, but some people prefer to make all cases explicit).
  • One-entry, one-exit blocks; no goto in overt or covert form (break, continue etc.).
  • Design by Contract mechanisms: postconditions usually need to refer to the result computed by a function.

The convention is then simple: in any function, you can use a language-defined local variable Result for you, of the type that you declared for the function result; you can use it as a normal variable, and the result returned by any particular call will be the final value of the variable on exit from the function body.

The convention has been widely imitated, starting with Delphi and most recently in Microsoft’s “code contracts”, a kind of poor-man’s Design by Contract emulation, achieved through libraries; it requires a Result notation to denote the function result in a postcondition, although this notation is unrelated to the mechanisms in the target languages such as C#. As the example of Eiffel’s design illustrates, a programming language is a delicate construction where all elements should fit together; the Result convention relies on many other essential concepts of the language, and in turn makes them possible.

Reference

[1] Eiffel Software discussion group, here.

VN:F [1.9.10_1130]
Rating: 8.3/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 9 votes)

New LASER proceedings

Springer has just published in the tutorial sub-series of Lecture Notes in Computer Science a new proceedings volume for the LASER summer school [1]. The five chapters are notes from the 2008, 2009 and 2010 schools (a previous volume [2] covered earlier schools). The themes range over search-based software engineering (Mark Harman and colleagues), replication of software engineering experiments (Natalia Juristo and Omar Gómez), integration of testing and formal analysis (Mauro Pezzè and colleagues), and, in two papers by our ETH group, Is branch coverage a good measure of testing effectiveness (with Yi Wei and Manuel Oriol — answer: not really!) and a formal reference for SCOOP (with Benjamin Morandi and Sebastian Nanz).

The idea of these LASER tutorial books — which are now a tradition, with the volume from the 2011 school currently in preparation — is to collect material from the presentations at the summer school, prepared by the lecturers themselves, sometimes in collaboration with some of the participants. Reading them is not quite as fun as attending the school, but it gives an idea.

The 2012 school is in full preparation, on the theme of “Advanced Languages for Software Engineering” and with once again an exceptional roster of speakers, or should I say an exceptional roster of exceptional speakers: Guido van Rossum (Python), Ivar Jacobson (from UML to Semat), Simon Peyton-Jones (Haskell), Roberto Ierusalimschy (Lua), Martin Odersky (Scala), Andrei Alexandrescu (C++ and D),Erik Meijer (C# and LINQ), plus me on the design and evolution of Eiffel.

The preparation of LASER 2012 is under way, with registration now open [3]; the school will take place from Sept. 2 to Sept. 8 and, like its predecessors, in the wonderful setting on the island of Elba, off the coast of Tuscany, with a very dense technical program but time for enjoying the beach, the amenities of a 4-star hotel and the many treasures of the island. On the other hand not everyone likes Italy, the sun, the Mediterranean etc.; that’s fine too, you can wait for the 2013 proceedings.

References

[1] Bertrand Meyer and Martin Nordio (eds): Empirical Software Engineering and Verification, International Summer Schools LASER 2008-2010, Elba Island, Italy, Revised Tutorial Lectures, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[2] Peter Müller (ed.): Advanced Lectures on Software Engineering, LASER Summer School 2007-2008, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[3] LASER summer school information and registration form, http://se.ethz.ch/laser.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Never design a language

It is a common occurrence in software development. Someone says: “We should design a language”. The usual context is that some part of the development requires a rich functionality set, and it appears appropriate to provide a flexible solution through a specialized language. As an example, in the development of an airline’s frequent flyer program on which I once worked the suggestion came to design a “Flyer Award Language” , with instructions appropriate for that application domain: record a trip, redeem an award, provide a statement of available miles and so on. A common term for such notations is DSL, for Domain-Specific Language.

Designing a language in such a context is almost always a bad idea (and I am not sure why I wrote “almost”). Languages are endless objects of discussion, usually on the least important aspects, which are also the most visible and those on which everyone has a strong opinion: concrete syntactic properties. People might pretend otherwise (“let’s not get bogged down on syntax, this is just one possible form”) but syntax is what the discussions will get bogged down to — keywords or symbols, this order or that order of operands, one instruction with several variants vs. several instructions… — at the expense of discussing the fundamental issues of functionality.

Worse yet, even if a language will be part of the solution it is usually just one facet to the solution. As was already explained in detail in [1], any useful functionality set will naturally be useful through several interfaces: a textual notation with concrete syntax may be one of them, but other possible ones include an API (Abstract Program Interface) for use from other software elements, a Graphical User Interface, a web user interface, yet another for web services (typically WSDL or some other XML or JSON format).

In such cases, starting with a concrete textual language is pretty silly, since it cannot yield the others directly (it would have to be parsed and further analyzed, which does not make sense). Of all the kinds of interface listed, the most fundamental one is the API: it describes the raw functionality, excluding any choice of syntax but including, thanks to contracts, elements of semantics. For example, a class AWARD in our frequent flyer application might include the feature


             redeem_for_upgrade (c: CUSTOMER; f : FLIGHT)
                                     — Upgrade c to next class of service on f.
                       require
                                    c /= holder
implies holder.allowed_substitute (c)
                                    f.permitted_for_upgrade
(Current)
                                    c.booked
( f )
                       
ensure
                                    c.class_of_service
( f ) =  old c.class_of_service ( f ) + 1

There is of course no implementation as this declaration only specifies an interface, but it says what needs to be said: to redeem the award for an upgrade, the intended customer must be either the holder of the award or an allowed substitute; the flight must be available for an upgrade with the current award (including the availability of enough miles); the intended customer must already be booked on the flight; and the upgrade will be for the next class of service.

These details are the kind of things that need to be discussed and agreed before the API is finalized. Then one can start discussing about a textual form (a DSL), a graphical interface, a web services interface. They all consist of relatively simple layers to be superimposed on a solidly defined and precisely specified basis. Once you have that basis, you can have all the fun you like arguing over everyone’s favorite forms of concrete syntax; it cannot hurt the project any more. Having these discussions early, at the expense of the more fundamental issues, is a great danger.

One of the key rules for successful software construction — as for many other ventures of course, especially in science and technology — is to distinguish the essential from the auxiliary, and consequently to devote proper attention to the essential issues while avoiding disputations of auxiliary issues. To define functionality, API is essential; language is auxiliary.

So when should you design a language? Never. Well, hardly ever.

Reference

[1] Bertrand Meyer: Introduction to the Theory of Programming Languages, Prentice Hall, 1990.

VN:F [1.9.10_1130]
Rating: 7.9/10 (18 votes cast)
VN:F [1.9.10_1130]
Rating: +8 (from 16 votes)

Ado About The Resource That Was (Not)

 

After a few weeks of use, Microsoft Outlook tends in my experience to go into a kind of thrashing mode where the user interface no longer quite functions as it should, although to the tool’s credit it does not lose information. Recently I have been getting pop-up warnings such as

 

A required resource was

 

A required resource was what? The message reminded me of an episode in a long-ago game of Scrabble, in which I proposed ADOABOUT as a word. “Ado about what? ”, the other players asked, and were not placated by my answer.

The message must have been trying to say  that a required resource was missing, or not found, but at the time of getting the final detail Outlook must have run out of UI resources and hence could not summon the needed text string. Not surprising, since running out of resources is precisely what caused the message to appear, in a valiant attempt to tell the user what is going on. (Valiant but not that useful: if you are not a programmer on the Outlook development team but just a customer trying to read email, it is not absolutely obvious how the message, even with the missing part, helps you.) The irony in the example is that the title bar suggests the problem arose in connection with trying to display the “Social Connector” area, a recent Outlook feature which I have never used. (Social connector? Wasn’t the deal about getting into computer science in the first place that for the rest of your life you’d be spared the nuisance of social connections? One can no longer trust anything nowadays.)

We can sympathize with whoever wrote the code. The Case Of The Resource That Was (Not) is an example of a general programming problem which we may call Space Between Your Back And Wall  or SBYBAW:  when you have your back against the wall, there is not much maneuvering space left.

A fairly difficult case of the SBYBAW problem arises in garbage collection, for example for object-oriented languages. A typical mark-and-sweep garbage collector must traverse the entire object structure to remove all the objects that have not been marked as reachable from the stack. The natural way to write a graph traversal algorithm is recursive: visit the roots; then recursively traverse their successors, flagging visited objects in some way to avoid cycling. Yes, but the implementation of a recursive routine relies on a stack of unpredictable size (the longest path length). If we got into  garbage collection, most likely it’s that we ran out of memory, precisely the kind of situation in which we cannot afford room for unpredictable stack growth.

In one of the early Eiffel garbage collectors, someone not aware of better techniques had actually written the traversal recursively; had the mistake not been caught early enough, it would no doubt have inflicted unbearable pain on humankind. Fortunately there is a solution: the Deutsch-Schorr-Waite algorithm [1], which avoids recursion on the program side by perverting the data structure to  replace some of the object links by recursion-control links; when the traversal’s execution proceeds along an edge, it reverses that edge to permit eventual return to the source. Strictly speaking, Deutsch-Schorr-Waite still requires a stack of booleans — to distinguish original edges from perverted ones — but we can avoid a separate stack (even just  a stack of booleans, which can be compactly represented in a few integers) by storing these booleans in the mark field of the objects themselves. The resulting traversal algorithm is a beauty — although it is fairly tricky, presents a challenge for verification tools, and raises new difficulties in a multi-threaded environment.

Deutsch-Schorr-Waite is a good example of “Small Memory Software” as studied in a useful book of the same title [2]. The need for Small Memory Software does not just arise for embedded programs running on small devices, but also in mainstream programming whenever we face the SBYBAW issue.

The SBYBAW lesson for the programmer is tough but simple. The resources we have at our disposal on a computing system may be huge, but they are always finite, and our programs’ appetite for resources will eventually exhaust them. At that stage, we have to deal with the SBYBAW rule, which sounds like a tautology but is an encouragement to look for clever algorithms:  techniques for freeing resources when no resources remain must not request new resources.

References

[1] Deutsch-Schorr-Waite is described in Knuth and also in [2]. Someone should start a Wikipedia entry.

[2] James Noble and Charles Weir: Small Memory Software: Patterns for Systems with Limited Memory, Addison-Wesley, 2001.

VN:F [1.9.10_1130]
Rating: 8.7/10 (13 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

A safe and stable solution

Reading about the latest hullabaloo around Android’s usage of Java, and more generally following the incessant flow of news about X suing Y in the software industry (with many combinations of X and Y) over Java and other object-oriented technologies, someone with an Eiffel perspective can only smile. Throughout its history, suggestions to use Eiffel have often been met initially — along with “Will Eiffel still be around next year?”, becoming truly riotous after 25 years — with objections of proprietariness, apparently because Eiffel initially came from a startup company. In contrast, many other approaches, from C++ to Smalltalk and Java, somehow managed to get favorable vibes from the media; the respective institutions, from AT&T to Xerox and Sun, must be disinterested benefactors of humanity.

Now many who believed this are experiencing a next-morning surprise, discovering under daylight that the person next to whom they wake up is covered with patents and lawsuits.

For their part, people who adopted Eiffel over the years and went on to develop project after project  do not have to stay awake worrying about legal issues and the effects of corporate takeovers; they can instead devote their time to building the best software possible with adequate methods, notations and tools.

This is a good time to recall the regulatory situation of Eiffel. First, the Eiffel Software implementation (EiffelStudio): the product can be used through either an open-source and a proprietary licenses. With both licenses the software is exactly the same; what differs is the status of the code users generate: with the open-source license, they are requested to make their own programs open-source; to keep their code proprietary, they need the commercial license. This is a fair and symmetric requirement. It is made even more attractive by the absence of any run-time fees or royalties of the kind typically charged by database vendors.

The open-source availability of the entire environment, over 2.5 millions line of (mostly Eiffel) code, has spurred the development of countless community contributions, with many more in progress.

Now for the general picture on the language, separate from any particular implementation. Java’s evolution has always been tightly controlled by Sun and now its successor Oracle. There may actually be technical arguments in favor of the designers retaining a strong say in the evolution of a language, but they no longer seem to apply any more now that most of the Java creators have left the company. Contrast this with Eiffel, which is entirely under the control of an international standards committee at ECMA International, the oldest and arguably the most prestigious international standards body for information technology. The standard is freely available online from the ECMA site [1]. It is also an ISO standard [2].

The standardization process is the usual ECMA setup, enabling any interested party to participate. This is not just a statement of principle but the reality, to which I can personally testify since, in spite of being the language’s original designer and author of the reference book, I lost countless battles in the discussions that led to the current standard and continue in preparation of the next version. While I was not always pleased on the moment, the committee’s collegial approach has led to a much more solid result than any single person could have achieved.

The work of ECMA TC49-TG4 (the Eiffel standard committee) has disproved the conventional view that committees can only design camels. In fact TC49-TG4 has constantly worked to keep the language simple and manageable, not hesitating to remove features deemed obsolete or problematic, while extending the range of the language and increasing the Eiffel programmer’s power of expression. As a result, Eiffel today is an immensely better language than when we started our work in 2002. Without a strong community-based process we would never, for example, have made Eiffel the first widespread language to guarantee void-safety (the compile-time removal of null-pointer-dereferencing errors), a breakthrough for software reliability.

Open, fair, free from lawsuits and commercial fights, supported by an enthusiastic community: for projects that need a modern quality-focused software framework, Eiffel is a safe and stable solution.

References

[1] ECMA International: Standard ECMA-367: Eiffel: Analysis, Design and Programming Language, 2nd edition (June 2006), available here (free download).

[2] International Organization for Standardization: ISO/IEC 25436:2006: Information technology — Eiffel: Analysis, Design and Programming Language, available here (for a fee; same text as [1], different formatting).

VN:F [1.9.10_1130]
Rating: 5.0/10 (33 votes cast)
VN:F [1.9.10_1130]
Rating: -2 (from 28 votes)

Specification explosion

To verify software, we must specify it; otherwise there is nothing to verify against. People often cite the burden of specification as the major obstacle toward making verification practical. At issue are not only the effort required to express the goals of software elements (their contracts) but also intermediate assertions, or “verification conditions”, including loop invariants, required by the machinery of verification.

At a Microsoft Software Verification summer school [1] in Moscow on July 18 — the reason why there was no article on this blog last week — Stefan Tobies, one of the lecturers, made the following observation about the specification effort needed to produce fully verified software. In his experience, he said, the ratio of specification lines to program lines is three to one.

Such a specification explosion, to coin a phrase, has to be addressed by any practical approach to verification. It would be interesting to get estimates from others with verification experience.

Reducing specification explosion  is crucial to the Eiffel effort to provide “Verification As a Matter Of Course” [2]. The following three techniques should go a long way:

  • Loop invariant inference. Programmers can be expected to write contracts expressing the purpose of routines (preconditions, postconditions) and classes (class invariants), but often balk at writing the intermediate assertions necessary to prove the correctness of loops. An earlier article [3] mentioned some ongoing work on this problem and I hope to come back to the topic.
  • Frame conventions. As another recent article has discussed [4], a simple language convention can dramatically reduce the number of assertions by making frame conditions explicit.
  • Model-based contracts. This technique calls for a separate article; the basic idea is to express the effect of operations through high-level mathematical models relying on a library that describe such mathematical abstractions as sets, relations, functions and graphs.

The risk of specification explosion is serious enough to merit a concerted defense.

 

References

[1] Summer School in Software Engineering and Verification, details here.

[2] Verification As a Matter Of Course, slides of a March 2010 talk, see an earlier article on this blog.

[3] Contracts written by people, contracts written by machines, an earlier article on this blog.

[4] If I’m not pure, at least my functions are, an earlier article on this blog.

VN:F [1.9.10_1130]
Rating: 9.3/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

Towards a Calculus of Object Programs

I posted here a draft of a new article, Towards a Calculus of Object Programs.

Here is the abstract:

Verifying properties of object-oriented software requires a method for handling references in a simple and intuitive way, closely related to how O-O programmers reason about their programs. The method presented here, a Calculus of Object Programs, combines four components: compositional logic, a framework for describing program semantics and proving program properties; negative variables to address the specifics of O-O programming, in particular qualified calls; the alias calculus, which determines whether reference expressions can ever have the same value; and the calculus of object structures, a specification technique for the structures that arise during the execution of an object-oriented program.
The article illustrates the Calculus by proving the standard algorithm for reversing a linked list.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

If I’m not pure, at least my functions are

..

If I’m not pure, at least my jewels are [1].

..

We often need to be reassured that a routine, usually a function, is “pure”, meaning that it does not change the state of the computation. For example, a function used in a contract element (precondition, postcondition, class invariant, loop invariant) should be purely descriptive, since it is a specification element; evaluating it, typically for testing and debugging, should not create a change of behavior — a “Heisenberg effect” — in the very computation that it is intended to assess. Another application is in a concurrency context, particularly in SCOOP (see earlier posts and forthcoming ones): if one or more functions are pure, several of their executions can run  concurrently on the same object.

The notion of purity admits variants. The usual notion is what  [2] calls weak purity, which precludes changes to previously existing objects but allow creating new objects. In the EiffelBase library we also encounter routines that have another form of purity, which we may call “relative” purity: they can leave the same state on exit as they found on entry, but in-between they might change the state.  For the rest of this discussion we will rely on the standard notion of weak purity: no changes permitted on existing objects.

It is often suggested that the programming language should support specifying that a routine is pure; many people have indeed proposed the addition of a keyword such as pure to Eiffel. One of the reasons this is not — in my opinion — such a great idea is that purity is just a special case of the more general problem of framing: specifying and verifying what a routine does not change. If we can specify an arbitrary frame property, then we can, as a special case covered by the general mechanism, specify that a routine changes nothing.

To see why framing is so important, consider a class ACCOUNT with a routine deposit that has the postcondition

balance = old balance + sum………..— Where sum is the argument of deposit

Framing here means stating that nothing else than balance changes; for example the account’s owner and its number should remain the same. It is not practical to write all individual postcondition clauses such as

owner= old owner
number=
old number

and so on. But we do need to specify these properties and enforce them, if only to avoid that a descendant class (maybe MAFIA_ACCOUNT) distort the rules defined by the original.

One technique is to add a so-called “modifies clause”, introduced by verification tools such as ESC-Java [3] and JML [4]. Modifies clauses raise some theoretical issues; in particular, the list of modified expressions is often infinite, so we must restrict ourselves to an abstract-data-type view where we characterize a class by commands and queries and the modifies clause only involves queries of the class. Many people find this hard to accept at first, since anything that is not talked about can change, but it is the right approach. A modifies clause of sorts, included in the postcondition, appeared in an earlier iteration of the Eiffel specification, with the keyword only (which is indeed preferable to modifies, which in the Eiffel style favoring grammatically simple keywords would be modify, since what we want to express is not that the routine must change anything at all  but that it may only change certain specified results!). The convention worked well with inheritance, since it included the rule that a clause such as only balance, in class  ACCOUNT, prescribes that the routine may not, in its modifies version as well as in any version redefined in descendants, change any other query known at the level of ACCOUNT; but a descendant version may change, subject to its own ACCOUNT clauses, any new query introduced by a descendant.

To declare a routine as pure, it would suffice to use an empty only clause (not very elegant syntactically — “only” what? — but one can get used to it).

This construct has been discarded, as it places too heavy a burden on the programmer-specifier. Here the key observation resulted from a non-scientific but pretty extensive survey I made of all the JML code I could get my hands on. I found that every time a query appeared in a modifies clause it was also listed in the postcondition! On reflection, this seems reasonable: if you are serious about specification, as anyone bothering to write such a clause surely is, you will not just express that something changes and stop there; you will write something about how it may change. Not necessarily the exact result, as in

my_query = precise_final_value

but at least some property of that result, as in

some_property (my_query)

This observation has, however, an inescapable consequence for language design: modifies or only clauses should be inferred by the compiler from the postcondition, not imposed on the programmer as an extra burden. The convention, which we may call the Implicit Framing Rule, is simple:

A routine may change the value of a query only if the query is specified in the routine’s postcondition

(or, if you like double negation, “no routine may change the value of a query other than those specified in its postcondition”). Here we say that a feature is “specified” in a postcondition if it appears there outside of the scope of an old expression. (Clearly, an occurrence such as old balance does not imply that balance can be modified, hence this restriction to occurrences outside of an old.)

With this convention the only clause becomes implicit: it would simply list all the queries specified in the postcondition, so there is no need for the programmer to write it. For the rare case of wanting to specify that a query q may change, but not wanting to specify how, it is easy to provide a library function, say involved, that always return true and can be used in postconditions, as in involved (q).

The convention is of course not just a matter of programming methodology but, in an IDE supporting verification, such as the EVE “Verification As a Matter Of Course” environment which we are building for Eiffel [5], the compiler will enforce the definition above — no change permitted to anything not specified in the postcondition — and produce an error in case of a violation. This also means that we can easily specify that a routine is pure: it must simply not specify any query in its postcondition. It may still list it in an old clause, as happens often in practice, e.g.

Result = old balance – Minimum_balance………..— In the postcondition of a function available_funds

Note the need to use old here. Apart from this addition of old to some postconditions, a considerable advantage of the convention is that most existing code using pure functions will be suitable to the new purity enforcement without any need to provide new annotations.

I believe that this is the only sustainable convention. It does not, of course, solve the frame problem by itself (for attempts in this direction see [6, 7]), but it is a necessary condition for a solution that is simple, easily taught, fairly easily implemented, and effective. It goes well with model-based specifications [8, 9], which I believe are the technique of most promise for usable  specifications of object-oriented software. And it provides a straightforward, no-frills way to enforce purity where prescribed by the Command-Query Separation principle [10, 11]: if I’m not pure, at least my functions must be.

References

[1] From the lyrics of the aria Glitter and Be Gay in Leonard Bernstein’s Candide, text by Lillian Hellman and others. Youtube offers several performances, including  by Diana Damrau (here) and Natalie Dessay (here) . For the text see e.g. here.

[2] Adam Darvas and Peter Müller: Reasoning About Method Calls in Interface Specifications, in Journal of Object Technology, Volume 5, no. 5, jUNE 2006, pages 59-85, doi:10.5381/jot.2006.5.5.a3, available here.

[3] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe and R. Stata: Extended static checking for Java, in PLDI 2002 (Programming Language Design and Implementation), pages 234–245, 2002.

[4] Gary Leavens et al.: Java Modeling Language, see references here.

[5] Julian Tschannen, Carlo A. Furia, Martin Nordio, and Bertrand Meyer: Verifying Eiffel Programs with Boogie, to appear in Boogie 2011, First International Workshop on Intermediate Verification Languages, Wroclaw, August 2011. See documentation about the EVE project on the project page.

[6] Ioannis Kassios: Dynamic Frames: Support for Framing, Dependencies and Sharing Without Restrictions, in Formal Methods 2006, eds. J. Misra, T. Nipkow and E. Sekerinski, Lecture Notes in Computer Science 4085, Springer Verlag, 2006, pages 268-283.

[7] Bernd Schoeller: Making Classes Provable through Contracts, Models and Frames, PhD thesis, ETH Zurich, 2007, available here

[8] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[9] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ’10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[10] Bertrand Meyer: Object-Oriented Software Construction, first (1988) and second (1997) editions, Prentice Hall.

[11] Bertrand Meyer: Touch of Class: An Introduction to Programming Well, Using Objects and Contracts, Springer Verlag, 2009.

VN:F [1.9.10_1130]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)