Archive for the ‘Language design’ Category.

On beauty and software (online talk on Wednesday, 17 CET / 11 EDT / 8 PDT)

This Wednesday (still “tomorrow” as I am writing this), 10 March 2021, I am giving a talk on “The Beauty of Software” on the occasion of the graduation ceremony of the first students of the Schaffhausen Institute of Technology. The event starts at 17 Schaffhausen/Zurich/Paris etc. time (11 AM New York, 8 AM San Francisco) and my own talk, starting half an hour later, will take about one hour.

The talk is (surprise!) given online. Registration is free but required: you can find the registration form on the announcement page here.

The abstract appears below. It is rather ambitious-sounding and I cannot promise the talk will live up to the promise, but I feel it necessary at least to attempt some initial steps towards a better understanding of beauty in software, which might help understand beauty in general.

The Beauty of Software

Software runs the world and delivers riches. Every passionate software engineer or computer scientist is also attuned to another of its features: the study and practice of software construction reveal gems of utter beauty.

While the concept of beauty is most naturally associated with art, scientists and engineers of all fields often invoke it. Beauty is a strong guiding principle in searching for solutions to scientific and technical problems, and arbitrating between rival candidate solutions. The reaction is often instinctive: “What an elegant theory!” “This technique is too ugly to be a viable approach”.

What do such appeals to beauty really mean? Do they pertain to the same concept of beauty as found in nature and art? Is beauty only “in the eye of the beholder”, is it conditioned by cultural prejudices, or does it submit to an objective definition?

In this talk, an initial step towards a more extensive study of what beauty means for software, I will present a few artifacts from software engineering and computer science which I find strikingly beautiful and – at the risk of breaking the charm – analyze what might make them so. This analysis will lead to a tentative definition of a notion both alluring and elusive: beauty.

VN:F [1.9.10_1130]
Rating: 3.6/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: -4 (from 4 votes)

Some contributions

Science progresses through people taking advantage of others’ insights and inventions. One of the conditions that makes the game possible is that you acknowledge what you take. For the originator, it is rewarding to see one’s ideas reused, but frustrating when that happens without acknowledgment, especially when you are yourself punctilious about citing your own sources of inspiration.

I have started to record some concepts that are widely known and applied today and which I believe I originated in whole or in part, whether or not their origin is cited by those who took them. The list below is not complete and I may update it in the future. It is not a list of ideas I contributed, only of those fulfilling two criteria:

  • Others have built upon them.  (If there is an idea that I think is great but no one paid attention to it, the list does not include it.)
  • They have gained wide visibility.

There is a narcissistic aspect to this exercise and if people want to dismiss it as just showing I am full of myself so be it. I am just a little tired of being given papers to referee that state that genericity was invented by Java, that no one ever thought of refactoring before agile methods, and so on. It is finally time to state some facts.

Facts indeed: I back every assertion by precise references. So if I am wrong — i.e. someone preceded me — the claims of precedence can be refuted; if so I will update or remove them. All articles by me cited in this note are available (as downloadable PDFs) on my publication page. (The page is up to date until 2018; I am in the process of adding newer publications.)

Post-publication note: I have started to receive some comments and added them in a Notes section at the end; references to those notes are in the format [A].

Final disclaimer (about the narcissistic aspect): the exercise of collecting such of that information was new for me, as I do not usually spend time reflecting on the past. I am much more interested in the future and definitely hope that my next contributions will eclipse any of the ones listed below.

Programming concepts: substitution principle

Far from me any wish to under-represent the seminal contributions of Barbara Liskov, particularly her invention of the concept of abstract data type on which so much relies. As far as I can tell, however, what has come to be known as the “Liskov Substitution Principle” is essentially contained in the discussion of polymorphism in section 10.1 of in the first edition (Prentice Hall, 1988) of my book Object-Oriented Software Construction (hereafter OOSC1); for example, “the type compatibility rule implies that the dynamic type is always a descendant of the static type” (10.1.7) and “if B inherits from A, the set of objects that can be associated at run time with an entity [generalization of variable] includes instances of B and its descendants”.

Perhaps most tellingly, a key aspect of the substitution principle, as listed for example in the Wikipedia entry, is the rule on assertions: in a proper descendant, keep the invariant, keep or weaken the precondition, keep or strengthen the postcondition. This rule was introduced in OOSC1, over several pages in section 11.1. There is also an extensive discussion in the article Eiffel: Applying the Principles of Object-Oriented Design published in the Journal of Systems and Software, May 1986.

The original 1988 Liskov article cited (for example) in the Wikipedia entry on the substitution principle says nothing about this and does not in fact include any of the terms “assertion”, “precondition”, “postcondition” or “invariant”. To me this absence means that the article misses a key property of substitution: that the abstract semantics remain the same. (Also cited is a 1994 Liskov article in TOPLAS, but that was many years after OOSC1 and other articles explaining substitution and the assertion rules.)

Liskov’s original paper states that “if for each object o1 of type S there is an object o2 of type T such that for all programs P defined in terms of T, the behavior of P is unchanged when o1 is substituted for oz, then S is a subtype of T.” As stated, this property is impossible to satisfy: if the behavior is identical, then the implementations are the same, and the two types are identical (or differ only by name). Of course the concrete behaviors are different: applying the operation rotate to two different figures o1 and o2, whose types are subtypes of FIGURE and in some cases of each other, will trigger different algorithms — different behaviors. Only with assertions (contracts) does the substitution idea make sense: the abstract behavior, as characterized by preconditions, postconditions and the class invariants, is the same (modulo respective weakening and strengthening to preserve the flexibility of the different version). Realizing this was a major step in understanding inheritance and typing.

I do not know of any earlier (or contemporary) exposition of this principle and it would be normal to get the appropriate recognition.

Software design: design patterns

Two of the important patterns in the “Gang of Four” Design Patterns book (GoF) by Gamma et al. (1995) are the Command Pattern and the Bridge Pattern. I introduced them (under different names) in the following publications:

  • The command pattern appears in OOSC1 under the name “Undo-Redo” in section 12.2. The solution is essentially the same as in GoF. I do not know of any earlier exposition of the technique. See also notes [B] and [C].
  • The bridge pattern appears under the name “handle technique” in my book Reusable Software: The Base Component Libraries (Prentice Hall, 1994). It had been described several years earlier in manuals for Eiffel libraries. I do not know of an earlier reference. (The second edition of Object-Oriented Software Construction — Prentice Hall, 1997, “OOSC2” –, which also describes it, states that a similar technique is described in an article by Josef Gil and Ricardo Szmit at the TOOLS USA conference in the summer of 1994, i.e. after the publication of Reusable Software.)

Note that it is pointless to claim precedence over GoF since that book explicitly states that it is collecting known “best practices”, not introducing new ones. The relevant questions are: who, pre-GoF, introduced each of these techniques first; and which publications does the GoF cites as “prior art”  for each pattern. In the cases at hand, Command and Bridge, it does not cite OOSC1.

To be concrete: unless someone can point to an earlier reference, then anytime anyone anywhere using an interactive system enters a few “CTRL-Z” to undo commands, possibly followed by some “CTRL-Y” to redo them (or uses other UI conventions to achieve these goals), the software most likely relying on a technique that I first described in the place mentioned above.

Software design: Open-Closed Principle

Another contribution of OOSC1 (1988), section 2.3, reinforced in OOSC2 (1997) is the Open-Closed principle, which explained one of the key aspects of inheritance: the ability to keep a module both closed (immediately usable as is) and open to extension (through inheritance, preserving the basic semantics. I am mentioning this idea only in passing since in this case my contribution is usually recognized, for example in the Wikipedia entry.

Software design: OO for reuse

Reusability: the Case for Object-Oriented Design (1987) is, I believe, the first publication that clearly explained why object-oriented concepts were (and still are today — in Grady Booch’s words, “there is no other game in town”) the best answer to realize the goal of software construction from software components. In particular, the article:

  • Explains the relationship between abstract data types and OO programming, showing the former as the theoretical basis for the latter. (The CLU language at MIT originated from Liskov’s pioneering work on abstract data types, but was not OO in the full sense of the term, missing in particular a concept of inheritance.)
  • Shows that reusability implies bottom-up development. (Top-down refinement was the mantra at the time, and promoting bottom-up was quite a shock for many people.)
  • Explains the role of inheritance for reuse, as a complement to Parnas’s interface-based modular construction with information hiding.

Software design: Design by Contract

The contribution of Design by Contract is one that is widely acknowledged so I don’t have any point to establish here — I will just recall the essentials. The notion of assertion goes back to the work of Floyd, Hoare and Dijkstra in the sixties and seventies, and correctness-by-construction to Dijktra, Gries and Wirth, but Design by Contract is a comprehensive framework providing:

  • The use of assertions in an object-oriented context. (The notion of class invariant was mentioned in a paper by Tony Hoare published back in 1972.)
  • The connection of inheritance with assertions (as sketched above). That part as far as I know was entirely new.
  • A design methodology for quality software: the core of DbC.
  • Language constructs carefully seamed into the fabric of the language. (There were precedents there, but in the form of research languages such as Alphard, a paper design only, not implemented, and Euclid.)
  • A documentation methodology.
  • Support for testing.
  • Support for a consistent theory of exception handling (see next).

Design by Contract is sometimes taken to mean simply the addition of a few assertions here and there. What the term actually denotes is a comprehensive methodology with all the above components, tightly integrated into the programming language. Note in particular that preconditions and postconditions are not sufficient; in an OO context class invariants are essential.

Software design: exceptions

Prior to the Design by Contract work, exceptions were defined very vaguely, as something special you do outside of “normal” cases, but without defining “normal”. Design by Contract brings a proper perspective by defining these concepts precisely. This was explained in a 1987 article, Disciplined Exceptions ([86] in the list), rejected by ECOOP but circulated as a technical report; they appear again in detail in OOSC1 (sections 7.10.3 to 7.10.5).

Other important foundational work on exceptions, to which I know no real precursor (as usual I would be happy to correct any omission), addressed what happens to the outcome of an exception in a concurrent or distributed context. This work was done at ETH, in particular in the PhD theses  of B. Morandi and A. Kolesnichenko, co-supervised with S. Nanz. See the co-authored papers [345] and [363].

On the verification aspect of exceptions, see below.

Software design: refactoring

I have never seen a discussion of refactoring that refers to the detailed discussion of generalization in both of the books Reusable Software (1994, chapter 3) and Object Success (Prentice Hall, 1995, from page 122 to the end of chapter 6). These discussions describe in detail how, once a program has been shown to work, it should be subject to a posteriori design improvements. It presents several of the refactoring techniques (as they were called when the idea gained traction several years later), such as moving common elements up in the class hierarchy, and adding an abstract class as parent to concrete classes ex post facto.

These ideas are an integral part of the design methodology presented in these books (and again in OOSC2 a few later). It is beyond me why people would present refactoring (or its history, as in the Wikipedia entry on the topic) without referring to these publications, which were widely circulated and are available for anyone to inspect.

Software design: built-in documentation and Single-Product principle

Another original contribution was the idea of including documentation in the code itself and relying on tools to extract the documentation-only information (leaving implementation elements aside). The idea, described in detail in OOSC1 in 1988 (sections 9.4 and 9.5) and already mentioned in the earlier Eiffel papers, is that code should be self-complete, containing elements of various levels of abstraction; some of them describe implementation, but the higher-level elements describe specification, and are distinguished syntactically in such a way that tools can extract them to produce documentation at any desired level of abstraction.

The ideas were later applied through such mechanisms as JavaDoc (with no credit as far as I know). They were present in Eiffel from the start and the underlying principles, in particular the “Single Product principle” (sometimes “Self-Documentation principle”, and also generalized by J. Ostroff and R. Paige as “Single-Model principle”). Eiffel is the best realization of these principles thanks to:

  • Contracts (as mentioned above): the “contract view” of a class (called “short form” in earlier descriptions) removes the implementations but shows the relevant preconditions, postconditions and class invariants, given a precise and abstract specification of the class.
  • Eiffel syntax has a special place for “header comments”, which describe high-level properties and remain in the contract view.
  • Eiffel library class documentation has always been based on specifications automatically extracted from the actual text of the classes, guaranteeing adequacy of the documentation. Several formats are supported (including, from 1995 on, HTML, so that documentation can be automatically deployed on the Web).
  • Starting with the EiffelCase tool in the early 90s, and today with the Diagram Tool of EiffelStudio, class structures (inheritance and client relationships) are displayed graphically, again in an automatically extracted form, using either the BON or UML conventions.

One of the core benefits of the Single-Product principle is to guard against what some of my publications called the “Dorian Gray” syndrome: divergence of an implementation from its description, a critical problem in software because of the ease of modifying stuff. Having the documentation as an integral part of the code helps ensure that when information at some level of abstraction (specification, design, implementation) changes, the other levels will be updated as well.

Crucial in the approach is the “roundtripping” requirement: specifiers or implementers can make changes in any of the views, and have them reflected automatically in the other views. For example, you can graphically draw an arrow between two bubbles representing classes B and A in the Diagram Tool, and the code of B will be updated with “inherit A”; or you can add this Inheritance clause textually in the code of class B, and the diagram will be automatically updated with an arrow.

It is important to note how contrarian and subversive these ideas were at the time of their introduction (and still to some extent today). The wisdom was that you do requirements then design then implementation, and that code is a lowly product entirely separate from specification and documentation. Model-Driven Development perpetuates this idea (you are not supposed to modify the code, and if you do there is generally no easy way to propagate the change to the model.) Rehabilitating the code (a precursor idea to agile methods, see below) was a complete change of perspective.

I am aware of no precedent for this Single Product approach. The closest earlier ideas I can think of are in Knuth’s introduction of Literate Programming in the early eighties (with a book in 1984). As in the Single-product approach, documentation is interspersed with code. But the literate programming approach is (as presented) top-down, with English-like explanations progressively being extended with implementation elements. The Single Product approach emphasizes the primacy of code and, in terms of the design process, is very much yoyo, alternating top-down (from the specification to the implementation) and bottom-up (from the implementation to the abstraction) steps. In addition, a large part of the documentation, and often the most important one, is not informal English but formal assertions. I knew about Literate Programming, of course, and learned from it, but Single-Product is something else.

Software design: from patterns to components

Karine Arnout’s thesis at ETH Zurich, resulting in two co-authored articles ([255] and [257], showed that contrary to conventional wisdom a good proportion of the classical design patterns, including some of the most sophisticated, can be transformed into reusable components (indeed part of an Eiffel library). The agent mechanism (see below) was instrumental in achieving that result.

Programming, design and specification concepts: abstract data types

Liskov’s and Zilles’s ground-breaking 1974 abstract data types paper presented the concepts without a mathematical specification, using programming language constructs instead. A 1976 paper (number [3] in my publication list, La Description des Structures de Données, i.e. the description of data structures) was as far as I know one of the first to present a mathematical formalism, as  used today in presentations of ADTs. John Guttag was taking a similar approach in his PhD thesis at about the same time, and went further in providing a sound mathematical foundation, introducing in particular (in a 1978 paper with Jim Horning) the notion of sufficient completeness, to which I devoted a full article in this blog  (Are My Requirements Complete?) about a year ago. My own article was published in a not very well known journal and in French, so I don’t think it had much direct influence. (My later books reused some of the material.)

The three-level description approach of that article (later presented in English for an ACM workshop in the US in 1981, Pingree Park, reference [28]) is not well known but still applicable, and would be useful to avoid frequent confusions between ADT specifications and more explicit descriptions.

When I wrote my 1976 paper, I was not aware of Guttag’s ongoing work (only of the Liskov and Zilles paper), so the use of a mathematical framework with functions and predicates on them was devised independently. (I remember being quite happy when I saw what the axioms should be for a queue.) Guttag and I both gave talks at a workshop organized by the French programming language interest group in 1977 and it was fun to see that our presentations were almost identical. I think my paper still reads well today (well, if you read French). Whether or not it exerted direct influence, I am proud that it independently introduced the modern way of thinking of abstract data types as characterized by mathematical functions and their formal (predicate calculus) properties.

Language mechanisms: genericity with inheritance

Every once in a while I get to referee a paper that starts “Generics, as introduced in Java…” Well, let’s get some perspective here. Eiffel from its introduction in 1985 combined genericity and inheritance. Initially, C++ users and designers claimed that genericity was not needed in an OO context and the language did not have it; then they introduced template. Initially, the designers of Java claimed (around 1995) that genericity was not needed, and the language did not have it; a few years later Java got generics. Initially, the designers of C# (around 1999) claimed that genericity was not needed, and the language did not have it; a few years later C# and .NET got generics.

Genericity existed before Eiffel of course; what was new was the combination with inheritance. I had been influenced by work on generic modules by a French researcher, Didier Bert, which I believe influenced the design of Ada as well; Ada was the language that brought genericity to a much broader audience than the somewhat confidential languages that had such a mechanism before. But Ada was not object-oriented (it only had modules, not classes). I was passionate about object-oriented programming (at a time when it was generally considered, by the few people who had heard of it as an esoteric, academic pursuit). I started — in the context of an advanced course I was teaching at UC Santa Barbara — an investigation of how the two mechanisms relate to each other. The results were a paper at the first OOPSLA in 1986, Genericity versus Inheritance, and the design of the Eiffel type system, with a class mechanism, inheritance (single and multiple), and genericity, carefully crafted to complement each other.

With the exception of a Trellis-Owl, a  design from Digital Equipment Corporation also presented at the same OOPSLA (which never gained significant usage), there were no other OO languages with both mechanisms for several years after the Genericity versus Inheritance paper and the implementation of genericity with inheritance in Eiffel available from 1986 on. Eiffel also introduced, as far as I know, the concept of constrained genericity, the second basic mechanism for combining genericity with inheritance, described in Eiffel: The Language (Prentice Hall, 1992, section 10.8) and discussed again in OOSC2 (section 16.4 and throughout). Similar mechanisms are present in many languages today.

It was not always so. I distinctly remember people bringing their friends to our booth at some conference in the early nineties, for the sole purpose of having a good laugh with them at our poster advertising genericity with inheritance. (“What is this thing they have and no one else does? Generi-sissy-tee? Hahaha.”). A few years later, proponents of Java were pontificating that no serious language needs generics.

It is undoubtedly part of of the cycle of invention (there is a Schopenhauer citation on this, actually the only thing from Schopenhauer’s philosophy that I ever understood [D]) that people at some point will laugh at you; if it did brighten their day, why would the inventor deny them one of the little pleasures of life? But in terms of who laughs last, along the way C++ got templates, Java got generics, C# finally did too, and nowadays all typed OO languages have something of the sort.

Language mechanisms: multiple inheritance

Some readers will probably have been told that multiple inheritance is a bad thing, and hence will not count it as a contribution, but if done properly it provides a major abstraction mechanism, useful in many circumstances. Eiffel showed how to do multiple inheritance right by clearly distinguishing between features (operations) and their names, defining a class as a finite mapping between names and features, and using renaming to resolve any name clashes.

Multiple inheritance was made possible by an implementation innovation: discovering a technique (widely imitated since, including in single-inheritance contexts) to implement dynamic binding in constant time. It was universally believed at the time that multiple inheritance had a strong impact on performance, because dynamic binding implied a run-time traversal of the class inheritance structure, already bad enough for single inheritance where the structure is a tree, but prohibitive with multiple inheritance for which it is a directed acyclic graph. From its very first implementation in 1986 Eiffel used what is today known as a virtual table technique which guarantees constant-time execution of routine (method) calls with dynamic binding.

Language mechanisms: safe GC through strong static typing

Simula 67 implementations did not have automatic garbage collection, and neither had implementations of C++. The official excuse in the C++ case was methodological: C programmers are used to exerting manual control of memory usage. But the real reason was a technical impossibility resulting from the design of the language: compatibility with C precludes the provision of a good GC.

More precisely, of a sound and complete GC. A GC is sound if it will only reclaim unreachable objects; it is complete if it will reclaim all unreachable objects. With a C-based language supporting casts (e.g. between integers and pointers) and pointer arithmetic, it is impossible to achieve soundness if we aim at a reasonable level of completeness: a pointer can masquerade as an integer, only to be cast back into a pointer later on, but in the meantime the garbage collector, not recognizing it as a pointer, may have wrongly reclaimed the corresponding object. Catastrophe.

It is only possible in such a language to have a conservative GC, meaning that it renounces completeness. A conservative GC will treat as a pointer any integer whose value could possibly be a pointer (because it lies between the bounds of the program’s data addresses in memory). Then, out of precaution, the GC will refrain from reclaiming the objects at these addresses even if they appear unreachable.

This approach makes the GC sound but it is only a heuristics, and it inevitably loses completeness: every once in a while it will fail to reclaim some dead (unreachable) objects around. The result is a program with memory leaks — usually unacceptable in practice, particularly for long-running or continuously running programs where the leaks inexorably accumulate until the program starts thrashing then runs out of memory.

Smalltalk, like Lisp, made garbage collection possible, but was not a typed language and missed on the performance benefits of treating simple values like integers as a non-OO language would. Although in this case I do not at the moment have a specific bibliographic reference, I believe that it is in the context of Eiffel that the close connection between strong static typing (avoiding mechanisms such as casts and pointer arithmetic) and the possibility of sound and complete garbage collection was first clearly explained. Explained in particular around 1990 in a meeting with some of the future designers of Java, which uses a similar approach, also taken over later on by C#.

By the way, no one will laugh at you today for considering garbage collection as a kind of basic human right for programmers, but for a long time the very idea was quite sulfurous, and advocating it subjected you to a lot of scorn. Here is an extract of the review I got when I submitted the first Eiffel paper to IEEE Transactions on Software Engineering:

Systems that do automatic garbage collection and prevent the designer from doing his own memory management are not good systems for industrial-strength software engineering.

Famous last words. Another gem from another reviewer of the same paper:

I think time will show that inheritance (section 1.5.3) is a terrible idea.

Wow! I wish the anonymous reviewers would tell us what they think today. Needless to say, the paper was summarily rejected. (It later appeared in the Journal of Systems and Software — as [82] in the publication list — thanks to the enlightened views of Robert Glass, the founding editor.)

Language mechanisms: void safety

Void safety is a property of a language design that guarantees the absence of the plague of null pointer dereferencing.

The original idea came (as far as I know) from work at Microsoft Research that led to the design of a research language called C-omega; the techniques were not transferred to a full-fledged programming language. Benefiting from the existence of this proof of concept, the Eiffel design was reworked to guarantee void safety, starting from my 2005 ECOOP keynote paper (Attached Types) and reaching full type safety a few years later. This property of the language was mechanically proved in a 2016 ETH thesis by A. Kogtenkov.

Today all significant Eiffel development produces void-safe code. As far as I know this was a first among production programming languages and Eiffel remains the only production language to provide a guarantee of full void-safety.

This mechanism, carefully crafted (hint: the difficult part is initialization), is among those of which I am proudest, because in the rest of the programming world null pointer dereferencing is a major plague, threatening at any moment to crash the execution of any program that uses pointers of references. For Eiffel users it is gone.

Language mechanisms: agents/delegates/lambdas

For a long time, OO programming languages did not have a mechanism for defining objects wrapping individual operations. Eiffel’s agent facility was the first such mechanism or among the very first together the roughly contemporaneous but initially much more limited delegates of C#. The 1999 paper From calls to agents (with P. Dubois, M. Howard, M. Schweitzer and E. Stapf, [196] in the list) was as far as I know the first description of such a construct in the scientific literature.

Language mechanisms: concurrency

The 1993 Communications of the ACM paper on Systematic Concurrent Object-Oriented Programming [136] was certainly not the first concurrency proposal for OO programming (there had been pioneering work reported in particular in the 1987 book edited by Tokoro and Yonezawa), but it innovated in offering a completely data-race-free model, still a rarity today (think for example of the multi-threading mechanisms of dominant OO languages).

SCOOP, as it came to be called, was implemented a few years later and is today a standard part of Eiffel.

Language mechanisms: selective exports

Information hiding, as introduced by Parnas in his two seminal 1972 articles, distinguishes between public and secret features of a module. The first OO programming language, Simula 67, had only these two possibilities for classes and so did Ada for modules.

In building libraries of reusable components I realized early on that we need a more fine-grained mechanism. For example if class LINKED_LIST uses an auxiliary class LINKABLE to represent individual cells of a linked list (each with a value field and a “right” field containing a reference to another LINKABLE), the features of LINKABLE (such as the operation to reattach the “right” field) should not be secret, since LINKED_LIST needs them; but they should also not be generally public, since we do not want arbitrary client objects to mess around with the internal structure of the list. They should be exported selectively to LINKED_LIST only. The Eiffel syntax is simple: declare these operations in a clause of the class labeled “feature {LINKED_LIST}”.

This mechanism, known as selective exports, was introduced around 1989 (it is specified in full in Eiffel: The Language, from 1992, but was in the Eiffel manuals earlier). I think it predated the C++ “friends” mechanism which serves a similar purpose (maybe someone with knowledge of the history of C++ has the exact date). Selective exports are more general than the friends facility and similar ones in other OO languages: specifying a class as a friend means it has access to all your internals. This solution is too coarse-grained. Eiffel’s selective exports make it possible to define the specific export rights of individual operations (including attributes/fields) individually.

Language mechanisms and implementation: serialization and schema evolution

I did not invent serialization. As a student at Stanford in 1974 I had the privilege, at the AI lab, of using SAIL (Stanford Artificial Intelligence Language). SAIL was not object-oriented but included many innovative ideas; it was far ahead of its time, especially in terms of the integration of the language with (what was not yet called) its IDE. One feature of SAIL with which one could fall in love at first sight was the possibility of selecting an object and having its full dependent data structure (the entire subgraph of the object graph reached by following references from the object, recursively) stored into a file, for retrieval at the next section. After that, I never wanted again to live without such a facility, but no other language and environment had it.

Serialization was almost the first thing we implemented for Eiffel: the ability to write object.store (file) to have the entire structure from object stored into file, and the corresponding retrieval operation. OOSC1 (section 15.5) presents these mechanisms. Simula and (I think) C++ did not have anything of the sort; I am not sure about Smalltalk. Later on, of course, serialization mechanisms became a frequent component of OO environments.

Eiffel remained innovative by tackling the difficult problems: what happens when you try to retrieve an object structure and some classes have changed? Only with a coherent theoretical framework as provided in Eiffel by Design by Contract can one devise a meaningful solution. The problem and our solutions are described in detail in OOSC2 (the whole of chapter 31, particularly the section entitled “Schema evolution”). Further advances were made by Marco Piccioni in his PhD thesis at ETH and published in joint papers with him and M. Oriol, particularly [352].

Language mechanisms and implementation: safe GC through strong static typing

Simula 67 (if I remember right) did not have automatic garbage collection, and neither had C++ implementations. The official justification in the case of C++ was methodological: C programmers are used to exerting manual control of memory usage. But the real obstacle was technical: compatibility with C makes it impossible to have a good GC. More precisely, to have a sound and complete GC. A GC is sound if it will only reclaim unreachable objects; it is complete if it will reclaim all unreachable objects. With a C-based language supporting casts (e.g. between integers and pointers) and pointer arithmetic, it is impossible to achieve soundness if we aim at a reasonable level of completeness: a pointer can masquerade as an integer, only to be cast back into a pointer later on, but in the meantime the garbage collector, not recognizing it as a pointer, may have wrongly reclaimed the corresponding object. Catastrophe. It is only possible in such a language to have a conservative GC, which will treat as a pointer any integer whose value could possibly be a pointer (because its value lies between the bounds of the program’s data addresses in memory). Then, out of precaution, it will not reclaim the objects at the corresponding address. This approach makes the GC sound but it is only a heuristics, and it may be over-conservative at times, wrongly leaving dead (i.e. unreachable) objects around. The result is, inevitably, a program with memory leaks — usually unacceptable in practice.

Smalltalk, like Lisp, made garbage collection possible, but was not a typed language and missed on the performance benefits of treating simple values like integers as a non-OO language would. Although in this case I do not at the moment have a specific bibliographic reference, I believe that it is in the context of Eiffel that the close connection between strong static typing (avoiding mechanisms such as casts and pointer arithmetic) and the possibility of sound and complete garbage collection was first clearly explained. Explained in particular to some of the future designers of Java, which uses a similar approach, also taken over later on by C#.

By the way, no one will laugh at you today for considering garbage collection as a kind of basic human right for programmers, but for a long time it was quite sulfurous. Here is an extract of the review I got when I submitted the first Eiffel paper to IEEE <em>Transactions on Software Engineering:

Software engineering: primacy of code

Agile methods are widely and properly lauded for emphasizing the central role of code, against designs and other non-executable artifacts. By reading the agile literature you might be forgiven for believing that no one brought up that point before.

Object Success (1995) makes the argument very clearly. For example, chapter 3, page 43:

Code is to our industry what bread is to a baker and books to a writer. But with the waterfall code only appears late in the process; for a manager this is an unacceptable risk factor. Anyone with practical experience in software development knows how many things can go wrong once you get down to code: a brilliant design idea whose implementation turns out to require tens of megabytes of space or minutes of response time; beautiful bubbles and arrows that cannot be implemented; an operating system update, crucial to the project which comes five weeks late; an obscure bug that takes ages to be fixed. Unless you start coding early in the process, you will not be able to control your project.

Such discourse was subversive at the time; the wisdom in software engineering was that you need to specify and design a system to death before you even start coding (otherwise you are just a messy “hacker” in the sense this word had at the time). No one else in respectable software engineering circles was, as far as I know, pushing for putting code at the center, the way the above extract does.

Several years later, agile authors started making similar arguments, but I don’t know why they never referenced this earlier exposition, which still today I find not too bad. (Maybe they decided it was more effective to have a foil, the scorned Waterfall, and to claim that everyone else before was downplaying the importance of code, but that was not in fact everyone.)

Just to be clear, Agile brought many important ideas that my publications did not anticipate; but this particular one I did.

Software engineering: the roles of managers

Extreme Programming and Scrum have brought new light on the role of managers in software development. Their contributions have been important and influential, but here too they were for a significant part prefigured by a long discussion, altogether two chapters, in Object Success (1995).

To realize this, it is enough to read the titles of some of the sections in those chapters, describing roles for managers (some universal, some for a technical manager): “risk manager”, “interface with the rest of the world” (very scrummy!), “protector of the team’s sanity”, “method enforcer” (think Scrum Master), “mentor and critic”. Again, as far as I know, these were original thoughts at the time; the software engineering literature for the most part did not talk about these issues.

Software engineering: outsourcing

As far as I know the 2006 paper Offshore Development: The Unspoken Revolution in Software Engineering was the first to draw attention, in the software engineering community, to the peculiar software engineering challenges of distributed and outsourced development.

Software engineering: automatic testing

The AutoTest project (with many publications, involving I. Ciupa, A. Leitner, Y. Wei, M. Oriol, Y. Pei, M. Nordio and others) was not the first to generate tests automatically by creating numerous instances of objects and calling applicable operations (it was preceded by Korat at MIT), but it was the first one to apply this concept with Design by Contract mechanisms (without which it is of little practical value, since one must still produce test oracles manually) and the first to be integrated in a production environment (EiffelStudio).

Software engineering: make-less system building

One of the very first decisions in the design of Eiffel was to get rid of Make files.

Feldman’s Make had of course been a great innovation. Before Make, programmers had to produce executable systems manually by executing sequences of commands to compile and link the various source components. Make enabled them to instead  to define dependencies between components in a declarative way, resulting in a partial order, and then performed a topological sort to produce the sequence of comments. But preparing the list of dependencies remains a tedious task, particularly error-prone for large systems.

I decided right away in the design of Eiffel that we would never force programmers to write such dependencies: they would be automatically extracted from the code, through an exhaustive analysis of the dependencies between modules. This idea was present from the very the first Eiffel report in 1985 (reference [55] in the publication list): Eiffel programmers never need to write a Make file or equivalent (other than for non-Eiffel code, e.g. C or C++, that they want to integrate); they just click a Compile button and the compiler figures out the steps.

Behind this approach was a detailed theoretical analysis of possible relations between modules in software development (in many programming languages), published as the “Software Knowledge Base” at ICSE in 1985. That analysis was also quite instructive and I would like to return to this work and expand it.

Educational techniques: objects first

Towards an Object-Oriented Curriculum ( TOOLS conference, August 1993, see also the shorter JOOP paper in May of the same year) makes a carefully argued case for what was later called the Objects First approach to teaching programming. I would be interested to know if there are earlier publications advocating starting programming education with an OO language.

The article also advocated for the “inverted curriculum”, a term borrowed from work by Bernie Cohen about teaching electrical engineering. It was the first transposition of this concept to software education. In the article’s approach, students are given program components to use, then little by little discover how they are made. This technique met with some skepticism and resistance since the standard approach was to start from the very basics (write trivial programs), then move up. Today, of course, many introductory programming courses similarly provide students from day one with a full-fledged set of components enabling them to produce significant programs.

More recent articles on similar topics, taking advantage of actual teaching experience, are The Outside-In Method of Teaching Programming (2003) and The Inverted Curriculum in Practice (at ICSE 2006, with Michela Pedroni). The culmination of that experience is the textbook Touch of Class from 2009.

Educational techniques: Distributed Software Projects

I believe our team at ETH Zurich (including among others M. Nordio, J. Tschannen, P. Kolb and C. Estler and in collaboration with C. Ghezzi, E. Di Nitto and G. Tamburrelli at Politecnico di Milano, N. Aguirre at Rio Cuarto and many others in various universities) was the first to devise,  practice and document on a large scale (see publications and other details here) the idea of an educational software project conducted in common by student groups from different universities. It yielded a wealth of information on distributed software development and educational issues.

Educational techniques: Web-based programming exercises

There are today a number of cloud-based environments supporting the teaching of programming by enabling students to compile and test their programs on the Web, benefiting from a prepared environment (so that they don’t have to download any tools or prepare control files) and providing feedback. One of the first — I am not sure about absolute precedence — and still a leading one, used by many universities and applicable to many programming languages, is Codeboard.

The main developer, in my chair at ETH Zurich, was Christian Estler, supported in particular by M. Nordio and M. Piccioni, so I am only claiming a supporting role here.

Educational techniques: key CS/SE concepts

The 2001 paper Software Engineering in the Academy did a good job, I think, of defining the essential concepts to teach in a proper curriculum (part of what Jeannette Wing’s 2006 paper called Computational Thinking).

Program verification: agents (delegates etc.)

Reasoning about Function Objects (ICSE 2010, with M. Nordio, P. Müller and J. Tschannen) introduced verification techniques for objects representing functions (such as agents, delegates etc., see above) in an OO language. Not sure whether there were any such techniques before.

Specification languages: Z

The Z specification language has been widely used for formal development, particularly in the UK. It is the design of J-R Abrial. I may point out that I was a coauthor of the first publication on Z in English (1980),  describing a version that preceded the adaptation to a more graphical-style notation done later at Oxford. The first ever published description of Z, pertaining to an even earlier version, was in French, in my book Méthodes de Programmation (with C. Baudoin), Eyrolles, 1978, running over 15 pages (526-541), with the precise description of a refinement process.

Program verification: exceptions

Largely coming out of the PhD thesis of Martin Nordio, A Sound and Complete Program Logic for Eiffel (TOOLS 2009) introduces rules for dealing with exceptions in a Hoare-style verification framework.

Program verification: full library, and AutoProof

Nadia Polikarpova’s thesis at ETH, aided by the work of Carlo Furia and Julian Tschannen (they were the major contributors and my participation was less important), was as far as I know the first to produce a full functional verification of an actual production-quality reusable library. The library is EiffelBase 2, covering fundamental data structures.

AutoProof — available today, as a still experimental tool, through its Web interface, see here — relied on the AutoProof prover, built by the same team, and itself based on Microsoft Research’s Boogie and Z3 engines.

More

There are more concepts worthy of being included here, but for today I will stop here.

Notes

[A] One point of divergence between usual presentations of the substitution principle and the view in OOSC and my other publications is the covariance versus contravariance of routine argument types. It reflects a difference of views as to what the proper policy (both mathematically sound and practically usable) should be.

[B]  The GoF book does not cite OOSC for the command or bridge patterns. For the command pattern it cites (thanks to Adam Kosmaczewski for digging up the GoF text!) a 1985 SIGGRAPH paper by Henry Lieberman (There’s More to Menu Systems than Meets the Screen). Lieberman’s paper describes the notion of command object and mentions undoing in passing, but does not include the key elements of the command pattern (as explained in full in OOSC1), i.e. an abstract (deferred) command class with deferred procedures called (say) do_it and undo_it, then specific classes for each kind of command, each providing a specific implementation of those procedures, then a history list of commands supporting multiple-level undo and redo as explained in OOSC1. (Reading Lieberman’s paper with a 2021 perspective shows that it came tantalizingly close to the command pattern, but doesn’t get to it. The paper does talk about inheritance between command classes, but only to “define new commands as extensions to old commands”, not in the sense of a general template that can be implemented in many specific ways. And it does mention a list of objects kept around to enable recovery from accidental deletions, and states that the application can control its length, as is the case with a history list; but the objects in the list are not command objects, they are graphical and other objects that have been deleted.)

[C] Additional note on the command pattern: I vaguely remember seeing something similar to the OOSC1 technique in an article from a supplementary volume of the OOPSLA proceedings in the late eighties or early nineties, i.e. at the same time or slightly later, possibly from authors from Xerox PARC, but I have lost the reference.

[D] Correction: I just checked the source and learned that the actual Schopenhauer quote (as opposed to the one that is usually quoted) is different; it does not include the part about laughing. So much for my attempts at understanding philosophy.

 

VN:F [1.9.10_1130]
Rating: 8.8/10 (15 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 4 votes)

Between you and me

I have been conducting interesting conversations with a two-something child who has not quite mastered the speaker-dependent [1] personal pronouns. He says things like “Where is your mom?”, when he actually means to ask about his mother, not mine; from hearing people tell him things like “your mom is coming”, he is clearly taking “your mom” as some kind of paraphrase for “mom”. He approaches people and says “I want to help me”, obviously from having heard, from others, “would you come and help me?”.

I am not concerned about young Tim (let us call him that), who sooner or later will get the point. But let us assume that someone, call him Tom, is determined to explain to Tim the concepts of “you” and “me”. Not so easy! In fact, come to think of it, fairly tricky, treacherous even. We all learned the concept at some point; I wish I remembered how it happened for me [2].

The difficulty is that someone, whether Tom or a third party, has to provide the explanation, a circumstance that messes up the explanation itself since it contains speaker-dependent terms. (Did I hear the word “Heisenberg”? If so I will ignore it.) Tom might try something like the following explanation — let us call it /A/  — but it will not work:

If Tom says me it means Tom, but if Tim says me it means Tim.

If Tim says you it means Tom, but if Tom says you it means Tim.

The reason it does not work is lack of quoting, but before I get to this let me mention that a mathematical model is not hard to come by. From a two-element set Names = {Tim, Tom} to itself, there are four total functions. Of these, two are constant functions. The other two are non-constant; one, me, is the identity function on Names, and you is the other. They are the functions me = {<Tim, Tim>, <Tom, Tom>} and you = {<Tim, Tom>, <Tom, Tim>}. I is just a synonym for me (the accusative in the latter being one of the few remnants of declension in English.)

Let us call this explanation /B/.  It can be illustrated as follows:

 

subjectivity

We can also write these functions (using for conciseness [c: A \ d: B … \ Z] for the conditional expression if c then A elseif d then Belse Z end) as the following version /C/:

me = λ n | [n = Tim: Tim \ Tom]

you = λ n | [n = Tim: Tom \ Tim]

In practice, however, this modeling of the concepts “you” and “me” as functions in Names → Names is not sufficient to define them properly, for example to turn the informal explanation /A/ into one that makes sense. The problem is that we are dealing with a larger set of words than just Names: the set   Words = {Tim, Tom, Me, You} with four elements. The meaning of a sentence that may include any of them is not absolute, but depends on who is uttering the sentence. When I say “me” and you say “me” the meaning is different, although when I say “you” and you say “me” the meaning is the same.

To interpret a sentence, then, we need a second-level function: a function meaning not in Words → Words but in Names → Words → Names, defined as follows:

meaning = λ n  |  λ w  |     [w ∈ Names: w    \    w = Me: me (n)    \    you (n)]

Let us call this definition /D/. Examples of its application include:

  • meaning (Tim) (Tom) = Tom. (First case: no speaker-dependency).
  • meaning (Tim) (Me) = Tim. (Second case, expresses the part of /A/ that said “if Tim says me it means Tim”.)
  • meaning (Tim) (You) = Tom. (Third case, expresses the part of /A/ that said “if Tim says you it means Tom”.)

/D/ is the most accurate definition of speaker-dependent pronouns and I may try it with the real Tim at the next opportunity (as soon as I have taught him lambda calculus).

/D/ also helps us understand what does not quite work in /A/, the first and perhaps most intuitive attempt. With explanations such as  “if Tim says you it means Tom” we are actually trying to express the need for making meanings speaker-dependent, that is to say, explain that the evaluation of a sentence must take into account who utters it. But this attempt fails because we need a general rule which will apply to all sentences, and the explanation itself is a sentence. To evaluate the sentence “if Tim says you it means Tom” we need to evaluate “you” which figures in it; that evaluation would yield either Tim or Tom depending on who is speaking. But because the sentence is itself part a definition of “you”, we do not want in this case to evaluate “you”.

We could write you in straight quotes as in programming languages:

if Tim says “you” it means Tom

but that is not the form of quoting we need. In a programming language, “you” is a character string, made of the characters ‘y‘, ‘o‘ and ‘u‘ in that order. Here, the makeup of this string in terms of its characters is completely irrelevant. The form of quoting that we need must achieve something else: preventing, in the evaluation of a formula, the evaluation of one of its parts.

John McCarthy’s Lisp language, coming out as early as 1959, showed how to handle this issue correctly. (This contribution is one of many from Lisp, including garbage collection and the notion of functional programming — not bad for a single language. See my 2011 obituary of John McCarthy on this blog [3].) One of the reasons Lisp needed a quoting mechanism is that it was designed to be self-describing and specifically self-evaluating, a spectacularly new concept at the time. McCarthy’s original book on Lisp (LISP 1.5 Programmer’s Manual, MIT Press, 1962) contains a dazzling interpreter of Lisp, written in Lisp and taking up no more than half a page. The interpreter is based on eval, a function (in the sense of a Lisp function, defined by a Lisp expression) which evaluates any Lisp program. You cannot define eval unless you have some way to prevent evaluation of part of an expression.

The convention is simple. For any expression x we may define another expression x — which we can read aloud as “QUOTED x”. Then:

  • Evaluating x gives the value of x.
  • Evaluating x yields x.

The original Lisp notation was (QUOTE x), which clearly conveys the idea: QUOTE is a function — in Lisp, function application f (x) is written (f x) — whose value, when applied to an argument, is that argument. (Not the value of the argument: the argument itself.)

With such a quote notation, explanations in the style of /A/ can be made meaningful; for example (version /E/):

If Tom says ‘me it means Tom, but if Tim says ‘me it means Tim.

This should be said with the quote spoken out loud: QUOTED me”.

We’ll see if Tim sees what /E/ means. At least, you see what you mean. I mean, I see what I  mean. Sorry, I meant that I see what you mean. Oh well. We see what we mean.

 

Notes and references

[1] I really want to call them “subjective personal pronouns”, in the ordinary sense of “subjective” opposed to “objective” as in “a subjective assessment”, but in grammatical terminology “subjective pronoun” has a well-accepted but entirely different meaning, that of a pronoun used as the subject of a sentence, like “she” rather than “her”. Hence “speaker-dependent pronoun”. Linguists also use the term “indexical pronouns”, as in the reference cited in note [2].

[2] Linguists investigating language learning have studied the matter. A recent article (2015) is 2-Year-Olds’ Comprehension of Personal Pronouns by M. Moyer, K. Harrigan, V. Hacquard and J. Lidz (in Supplement to the proceedings of the 39th Boston University Conference on Language Development, eds E. Grillo, K. Jepson and M. LaMendola), available here. The article includes a literature review. It acknowledges the you-and-me problem (first- and second-person pronouns) although it suggests that third-person pronouns (“he”, “she”) may be even more of a challenge to infants. Two quotes (with bibliographic references removed):

  • The evidence from previous studies examining two year olds’ competence is mixed, with some studies finding children succeeding with pronouns as early as 22-24 months, others finding success much later, and some with children succeeding at a range of ages.
  • Most [studies] suggest a production/comprehension asymmetry… Production studies suggest that children begin producing pronouns around 15-18 months, starting with first person, then second person, and finally third person pronouns. These early productions typically include some errors where the child reverses first and second person, but on the whole consistent errors are few . On the other hand, comprehension data focusing on speech addressed to the child suggest that children first understand second person pronouns, then first person, and finally third person.

[3] My article on John McCarthy can be found here.

VN:F [1.9.10_1130]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

The right forms of expression

If you want to know whether your_string has at least one upper-case character, you will write this in Eiffel:

if  ∃ c: your_string ¦ c.is_upper then

Such predicate-calculus boolean expressions, using a quantifier (“for all”) or (“there exists”) are becoming common in Eiffel code. They are particularly useful in Design by Contract assertions, making it possible to characterize deep semantic properties of the code and its data structures. For example a class invariant clause in a class I wrote recently states

from_lists_exist: ∀ tf: triples_from ¦ tf Void                        — [1]

meaning that all the elements, if any, of the list triples_from  are non-void (non-null). The notation is the exact one from mathematics. (Mathematical notation sometimes uses a dot in place of the bar, but the bar is clearer, particularly in an OO context where the dot has another use.)

Programming languages should support time-honored notations from mathematics. Reaching this goal has been a driving force in the evolution of Eiffel, but not as a concession to “featurism” (the gratuitous piling up of language feature upon feature). The language must remain simple and consistent; any new feature must find its logical place in the overall edifice.

The design of programming languages is a constant search for the right balance between rigor, simplicity, consistency, formal understanding, preservation of existing code, innovation and expressiveness. The design of Eiffel has understood the last of these criteria as implying support for established notations from mathematics, not through feature accumulation but by re-interpreting these notations in terms of the language’s fundamental concepts. A typical example is the re-interpretation of the standard mathematical notation a + b as as simply an operator-based form for the object-oriented call a.plus (b), obtained by declaring “+” as an operator alias for the function plus in the relevant classes. There are many more such cases in today’s Eiffel. Quantifier expressions using and  are the latest example.

 They are not a one-of-a-kind trick but just as a different syntax form for loops. Expressed in a more verbose form, the only one previously available, [1] would be:

across triples_from is tf all tf /= Void end                         — [2]

It is interesting to walk back the history further. [2] is itself a simplification of

across triples_from as tf all tf.item /= Void end               — [3]

where the “.item” has a good reason for being there, but that reason is irrelevant to a beginner. The earlier use of as in [3] is also the reason for the seemingly bizarre use of is in [2], which is only explainable by the backward compatibility criterion (code exists that uses as , which has a slightly different semantics from is), and will go away. But a few years ago the across loop variant did not exist and you would have had to write the above boolean expressions as

all_non_void (triples_from)

after defining a function

all_non_void (l: LIST [T]): BOOLEAN                                    — [4]
                         — Are all the elements of `l’, if any, non-void?
          local
pos: INTEGER
do
from
pos := l.index
l.start
Result := True
until not Result or l.after loop
l.forth
end
go_ith (pos)
end

The road traveled from [4] to [1] is staggering. As we introduced new notations in the history of Eiffel the reaction of the user community has sometimes been between cautious and negative. With the exception of a couple of quickly discarded ideas (such as the infamous and short-lived “!!” for creation), they were generally adopted widely because they simplify people’s life without adding undue complexity to the language. The key has been to avoid featurism and choose instead to provide two kinds of innovation:

  • Major conceptual additions, which elevate the level of abstraction of the language. A typical introduction was the introduction of agents, which provide the full power of functional programming in an object-oriented context; another was the SCOOP concurrency mechanism. There have been only a few such extensions, all essential.
  • Syntactical variants for existing concepts, allowing more concise forms obtained from traditional mathematical notation. The use of quantifier expressions as in [1] is the latest example.

Complaints of featurism still occasionally happen when people first encounter the new facilities, but they fade away quickly as people start using them. After writing a few expressions such as [1], no one wants to go back to any of the other forms.

These quantifier expressions using and , as well as the “” not-equal sign for what used to be (and still commonly is) written “/=”, rely on Unicode. Eiffel started out when ASCII was the law of the land. (Or 8-bit extended ASCII, which does not help much since the extensions are rendered differently in different locales, i.e. the same 8-bit character code may mean something different on French and Swedish texts.) In recent years, Eiffel has made a quiet transition to full Unicode support. (Such support extends to manifest strings and operators, not to identifiers. The decision, which could be revisited, has been to keep the ASCII-only  policy for identifiers to favor compatible use by programmers regardless of their mother tongues.) The use of Unicode considerably extends the expressive power of the language, in particular for scientific software which can — thanks to Eiffel’s mechanism for defining free operators — rely on advanced mathematical notations.

Unicode is great, but I hear the question: how in the world can we enter the corresponding symbols, since our keyboards are still ASCII plus some extensions?

It would be tedious to have to select from a list of special symbols (as you do when inserting a mathematical symbol in Microsoft Word or, for that matter, as I did when inserting the phrase “ and ” in the preceding paragraph using WordPress).

The answer lies in the interplay between the language and the development environment. EiffelStudio, like other modern IDEs, includes an automatic completion mechanism which lets you enter the beginning of a construct and will take care of filling in the rest. Already useful for complex structures (if you type “if” the tools will create the entire “if then else end” conditional structure for you to fill in), automatic completion will take care of inserting the appropriate Unicode symbols for you. Type for example “across”,  then CTRL-Space to trigger completion, and the choices will include the “∀” and “” forms. You can see below how this works:

across_all

Programming languages can be at the same time simple, easy to learn, consistent, and expressive. Start using quantifiers now!

Acknowledgments to the Ecma Technical Committee on Eiffel and the Eiffel Software team, particularly Alexander Kogtenkov (see his blog post here) and (for the completion mechanism and its animated illustration above) Jocelyn Fiat.

VN:F [1.9.10_1130]
Rating: 10.0/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Notations you didn’t even know you could use

Consider the following expression:

∃ c: s   ¦   moisture (c) = soft

This is obviously mathematics. To express such a property in a programming language, you have to write a function containing a loop that iterates through the elements of s. Right?

Wrong. The above construct is valid Eiffel. It’s a consequence of recent syntax extensions that retain all the simplicity and consistency of the language but take full advantage of Unicode. Of course you do not have Unicode characters such as on you keyboard, but EiffelStudio’s completion mechanism inserts them for you.

To see how this works, just read Alexander Kogtenkov’s recent blog post on the topic.

Note added 24 December 2020: you will find a longer exposition in a later article on this blog.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

This Wednesday in Nice: survey talk on the Eiffel method

The “Morgenstern Colloquium” at the University of Nice / INRIA Sophia Antipolis invited me to give a talk, next Wednesday (18 December) at 11 in Sophia Antipolis, in the aptly named* “Kahn Building”. The announcement appears here. I proposed various topics but (pleasant surprise) the organizers explicitly asked me to lecture about what I really want to talk about: the Eiffel approach. I will give a general presentation describing not specifically the language but the unified view of software construction embodied in Eiffel, from modeling to requirements to design, implementation and verification. Here is the abstract:

With society’s growing reliance on IT systems, the ability to write high-quality software is ever more critical. While a posteriori verification techniques have their role, there is no substitute for methods and tools that provide built-in quality (“correctness by construction”) and scale up to very large systems. For several decades my colleagues and I have been building such a method, based in particular on the concept of Design by Contract, the associated tools and the supporting language, Eiffel. The scope is wide, encompassing all aspects of the software development process, from requirements and design to implementation and verification. I will present an overview of the approach, show what it can yield, and discuss remaining open issues.

This talk is meant for everyone, whether from industry or academia, with an interest in practical techniques for engineering high-quality software.

No registration is required. The presentation will be in English.

Note

*Gilles Kahn, a brilliant computer scientist who died too young, was for a while director of INRIA.

VN:F [1.9.10_1130]
Rating: 6.3/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

Adult entertainment

Sign seen in a Singapore shopping center:

 No-Playing

Let us make sure we understand: here children are not allowed, but playing is.

As a consequence such playing must be performed by non-children only. Adults welcome to play!

Maybe it is actually not the intended meaning.  Instead of

(and (not (allowed children)) (allowed playing))

the desired parsing may be

(not (allowed (playing children)))

One hint in favor of this second interpretation is that in practice people seldom put up signs to advertise that something is allowed.

So new-line must be a mere break character equivalent to space, not a semantics-carrying delimiter.

Somewhat reminiscent of eats shoots and leaves. Here it is not even a punctuation mark, just a humble new-line, but its visual effect is strong all the same.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Formality in requirements: new publication

The best way to make software requirements precise is to use one of the available “formal” approaches. Many have been proposed; I am not aware of a general survey published so far. Over the past two years, we have been working on a comprehensive survey of the use of formality in requirements, of which we are now releasing a draft. “We” is a joint informal research group from Innopolis University and the University of Toulouse, whose members have been cooperating on requirements issues, resulting in publications listed  under “References” below and in several scientific events.

The survey is still being revised, in particular because it is longer than the page limit of its intended venue (ACM Computing Surveys), and some sections are in need of improvement. We think, however, that the current draft can already provide a solid reference in this fundamental area of software engineering.

The paper covers a broad selection of methods, altogether 22 of them, all the way from completely informal to strictly formal. They are grouped into five categories: natural language, semi-formal, automata- or graph-based, other mathematical frameworks, programming-language based. Examples include SysML, Relax, Statecharts, VDM, Eiffel (as a requirements notation), Event-B, Alloy. For every method, the text proposes a version of a running example (the Landing Gear System, already used in some of our previous publications) expressed in the corresponding notation. It evaluates the methods using a set of carefully defined criteria.

The paper is: Jean-Michel Bruel, Sophie Ébersold, Florian Galinier, Alexandr Naumchev, Manuel Mazzara and Bertrand Meyer: Formality in Software Requirements, draft, November 2019.

The text is available here. Comments on the draft are welcome.

References

Bertrand Meyer, Jean-Michel Bruel, Sophie Ebersold, Florian Galinier and Alexandr Naumchev: Towards an Anatomy of Software Requirements, in TOOLS 2019, pages 10-40, see here (or arXiv version here). I will write a separate blog article about this publication.

Alexandr Naumchev and Bertrand Meyer: Seamless requirements. Computer Languages, Systems & Structures 49, 2017, pages 119-132, available here.

Florian Galinier, Jean-Michel Bruel, Sophie Ebersold and Bertrand Meyer: Seamless Integration of Multirequirements, in Complex Systems, 25th International Requirements Engineering Conference Workshop, IEEE, pages 21-25, 2017, available here.

Alexandr Naumchev, Manuel Mazzara, Bertrand Meyer, Jean-Michel Bruel, Florian Galinier and Sophie Ebersold: A contract-based method to specify stimulus-response requirements, Proceedings of the Institute for System Programming, vol. 29, issue 4, 2017, pp. 39-54. DOI: 10.15514, available here.

Alexandr Naumchev and Bertrand Meyer: Complete Contracts through Specification Drivers., in 10th International Symposium on Theoretical Aspects of Software Engineering (TASE), pages 160-167, 2016, available here.

Alexandr Naumchev, Bertrand Meyer and Víctor Rivera: Unifying Requirements and Code: An Example, in PSI 2015 (Ershov conference, Perspective of System Informatics), pages 233-244, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Why not program right?

recycled-logo (Originally published on CACM blog.)

Most of the world programs in a very strange way. Strange to me. I usually hear the reverse question: people ask us, the Eiffel community, to explain why we program our way. I hardly understand the question, because the only mystery is how anyone can even program in any other way.

The natural reference is the beginning of One Flew Over the Cuckoo’s Nest: when entering an insane asylum and wondering who is an inmate and who a doctor, you may feel at a loss for objective criteria. Maybe the rest of the world is right and we are the nut cases. Common sense suggests it.

But sometimes one can go beyond common sense and examine the evidence. So lend me an ear while I explain my latest class invariant. Here it is, in Figure 1. (Wait, do not just run away yet.)

multigraph_invariant

Figure 1: From the invariant of class MULTIGRAPH

This is a program in progress and by the time you read this note the invariant and enclosing class will have changed. But the ideas will remain.

Context: multigraphs

The class is called MULTIGRAPH and describes a generalized notion of graph, illustrated in Figure 2. The differences are that: there can be more than one edge between two nodes, as long as they have different tags (like the spouse and boss edges between 1 and 2); and there can be more than one edge coming out of a given node and with a given tag (such as the two boss edges out of 1, reflecting that 1’s boss might be 2 in some cases and 3 in others). Some of the nodes, just 1 here, are “roots”.

The class implements the notion of multigraph and provides a wide range of operations on multigraphs.

multigraph_example

Figure 2: A multigraph

Data structures

Now we turn to the programming and software engineering aspects. I am playing with various ways of accessing multigraphs. For the basic representation of a multigraph, I have chosen a table of triples:

                triples_table: HASH_TABLE [TRIPLE, TUPLE [source: INTEGER; tag: INTEGER; target: INTEGER]]  — Table of triples, each retrievable through its `source’, `tag’ and `target’.

where the class TRIPLE describes [source, tag, target] triples, with a few other properties, so they are not just tuples. It is convenient to use a hash table, where the key is such a 3-tuple. (In an earlier version I used just an ARRAY [TRIPLE], but a hash table proved more flexible.)

Sources and targets are nodes, also called “objects”; we represent both objects and tags by integers for efficiency. It is easy to have structures that map symbolic tag names such as “boss” to integers.

triples_table is the core data structure but it turns out that for the many needed operations it is convenient to have others. This technique is standard: for efficiency, provide different structures to access and manipulate the same underlying information, with some redundancy. So I also have:

 triples_from:  ARRAYED_LIST [LIST [TRIPLE]]
               — Triples starting from a given object. Indexed by object numbers.

  triples_with:  HASH_TABLE [LIST [TRIPLE], INTEGER]
               — Triples labeled by a given tag. Key is tag number.

 triples_to:  ARRAYED_LIST [LIST [TRIPLE]]
               — Triples leading into a given object. Indexed by object numbers.

Figure 3 illustrates triples_from and Figures 4 illustrates triples_with. triples_to is similar.

triples_from

Figure 3: The triples_from array of lists and the triples_table

triples_with

Figure 4: The triples_with array of lists and the triples_table

It is also useful to access multigraphs through yet another structure, which gives us the targets associated with a given object and tag:

successors: ARRAY [HASH_TABLE [LIST [TRIPLE], INTEGER]]
               — successors [obj] [t] includes all o such that there is a t- reference from obj to o.

For example in Figure 1 successors [1] [spouse] is {2, 3}, and in Figures 3 and 4 successors [26] [t] is {22, 55, 57}. Of course we can obtain the “successors” information through the previously defined structures, but since this is a frequently needed operation I decided to include a specific data structure (implying that every operation modifying the multigraph must update it). I can change my mind later on and decide to make “successors” a function rather than a data structure; it is part of the beauty of OO programming, particularly in Eiffel, that such changes are smooth and hardly impact client classes.

There is similar redundancy in representing roots:

                roots:  LINKED_SET [INTEGER]
                              — Objects that are roots.

                is_root:  ARRAY [BOOLEAN]
                              — Which objects are roots? Indexed by object numbers.

If o is a root, then it appears in the “roots” set and is_root [o] has value True.

Getting things right

These are my data structures. Providing such a variety of access modes is a common programming technique. From a software engineering perspective ― specification, implementation, verification… ― it courts disaster. How do we maintain their consistency? It is very easy for a small mistake to slip into an operation modifying the graph, causing one of the data structures to be improperly updated, but in a subtle and rare enough way that it will not manifest itself during testing, coming back later to cause strange behavior that will be very hard to debug.

For example, one of the reasons I have a class TRIPLE and not just 3-tuples is that a triple is not exactly  the same as an edge in the multigraph. I have decided that by default the operation that removes and edge would not remove the corresponding triple from the data structure, but leave it in and mark it as “inoperative” (so class TRIPLE has an extra “is_inoperative” boolean field). There is an explicit GC-like mechanism to clean up deleted edges occasionally. This approach brings efficiency but makes the setup more delicate since we have to be extremely careful about what a triple means and what removal means.

This is where I stop understanding how the rest of the world can work at all. Without some rigorous tools I just do not see how one can get such things right. Well, sure, spend weeks of trying out test cases, printing out the structures, manually check everything (in the testing world this is known as writing lots of “oracles”), try at great pains to find out the reason for wrong results, guess what program change will fix the problem, and start again. Stop when things look OK. When, as Tony Hoare once wrote, there are no obvious errors left.

Setting aside the minuscule share of projects (typically in embedded life-critical systems) that use some kind of formal verification, this process is what everyone practices. One can only marvel that systems, including many successful ones, get produced at all. To take an analogy from another discipline, this does not compare to working like an electrical engineer. It amounts to working like an electrician.

For a short time I programmed like that too (one has to start somewhere, and programming methodology was not taught back then). I no longer could today. Continuing with the Hoare citation, the only acceptable situation is to stop when there are obviously no errors left.

How? Certainly not, in my case, by always being right the first time. I make mistakes like everyone else does. But I have the methodology and tools to avoid some, and, for those that do slip through, to spot and fix them quickly.

Help is available

First, the type system. Lots of inconsistencies, some small and some huge, which in an untyped language would only hit during execution, do not make it past compilation. We are not just talking here about using REAL instead of INTEGER. With a sophisticated type system involving multiple inheritance, genericity, information hiding and void safety, a compiler error message can reflect a tricky logical mistake. You are using a SET as if it were a LIST (some operations are common, but others not). You are calling an operation on a reference that may be void (null) at run time. And so on.

By the way, about void-safety: for a decade now, Eiffel has been void-safe, meaning a compile-time guarantee of no run-time null pointer dereferencing. It is beyond my understanding how the rest of the world can still live with programs that run under myriad swords of Damocles: x.op (…) calls that might any minute, without any warning or precedent, hit a null x and crash.

Then there is the guarantee of logical consistency, which is where my class invariant (Figure 1) comes in. Maybe it scared you, but in reality it is all simple concepts, intended to make sure that you know what you are doing, and rely on tools to check that you are right. When you are writing your program, you are positing all kinds, logical assumptions, large and (mostly) small, all the time. Here, for the structure triples_from [o] to make sense, it must be a list such that:

  • It contains all the triples t in the triples_table such that t.source = o.
  •  It contains only those triples!

You know this when you write the program; otherwise you would not be having a “triples_from” structure. Such gems of knowledge should remain an integral part of the program. Individually they may not be rocket science, but accumulated over the lifetime of a class design, a subsystem design or a system design they collect all the intelligence that makes the software possible.  Yet in the standard process they are gone the next minute! (At best, some programmers may write a comment, but that does not happen very often, and a comment has no guarantee of precision and no effect on testing or correctness.)

Anyone who takes software development seriously must record such fundamental properties. Here we need the following invariant clause:

across triples_from as tf all

across tf.item as tp all tp.item.source = tf.cursor_index end

end

(It comes in the class, as shown in Figure 1, with the label “from_list_consistent”. Such labels are important for documentation and debugging purposes. We omit them here for brevity.)

What does that mean? If we could use Unicode (more precisely, if we could type it easily with our keyboards) we would write things like “∀ x: E | P (x) for all x in E, property P holds of x. We need programming-language syntax and write this as across E as x all P (x.item) end. The only subtlety is the .item part, which gives us generality beyond the  notation: x in the across is not an individual element of E but a cursor that moves over E. The actual element at cursor position is x.item, one of the properties of that cursor. The advantage is that the cursor has more properties, for example x.cursor_index, which gives its position in E. You do not get that with the plain of mathematics.

If instead of  you want  (there exists), use some instead of all. That is pretty much all you need to know to understand all the invariant clauses of class MULTIGRAPH as given in Figure 1.

So what the above invariant clause says is: take every position tf in triples_from; its position is tf.cursor_index and its value is tf.item. triples_from is declared as ARRAYED_LIST [LIST [TRIPLE]], so tf.cursor_index is an integer representing an object o, and tf.item is a list of triples. That list should  consist of the triples having tf.cursor_index as their source. This is the very property that we are expressing in this invariant clause, where the innermost across says: for every triple tp.item in the list, the source of that triple is the cursor index (of the outside across). Simple and straightforward, I think (although such English explanations are so much more verbose than formal versions, such as the Eiffel one here, and once you get the hang of it you will not need them any more).

How can one ever include a structure such as triples_from without expressing such a property? To put the question slightly differently: am I inside the asylum looking out, or outside the asylum looking in? Any clue would be greatly appreciated.

More properties

For the tag ( with_) and target lists, the properties are similar:

across triples_with as tw all across tw.item as tp all tp.item.tag = tw.key end end

across triples_to as tt all across tt.item as tp all tp.item.target = tt.cursor_index end end 

We also have some properties of array bounds:

 is_root.lower = 1 and is_root.upper = object_count

triples_from.lower = 1 and triples_from.upper = object_count

triples_to.lower = 1 and triples_to.upper = object_count

where object_count is the number of objects (nodes), and for an array a (whose bounds in Eiffel are arbitrary, not necessarily 0 or 1, and set on array creation), a.lower and a.upper are the bounds. Here we number the arrays from 1.

There are, as noted, two ways to represent rootness. We must express their consistency (or risk trouble). Two clauses of the invariant do the job:

across roots as t all is_root [t.item] end

across is_root as t all (t.item = roots.has (t.cursor_index)) end

The first one says that if we go through the list roots we only find elements whose is_root value is true; the second, that if we go through the array “is_root” we find values that are true where and only where the corresponding object, given by the cursor index, is in the roots set. Note that the = in that second property is between boolean values (if in doubt, check the type instantly in the EIffelStudio IDE!), so it means “if and only if.

Instead of these clauses, a more concise version, covering them both, is just

roots ~ domain (is_root)

with a function domain that gives the domain of a function represented by a boolean array. The ~ operator denotes object equality, redefined in many classes, and in particular in the SET classes (roots is a LINKED_SET) to cover equality between sets, i.e. the property of having the same elements.

The other clauses are all similarly self-explanatory. Let us just go through the most elaborate one, successors_consistent, involving three levels of across:

across successors as httpl all                   — httpl.item: hash table of list of triples

        across httpl.item as tpl all                — tpl.item: list of triples (tpl.key: key (i.e. tag) in hash table (tag)

                  across tpl.item as tp all            — tp.item: triple

                         tp.item.tag = tpl.key

and tp.item.source = httpl.cursor_index

                   end

          end

end

You can see that I struggled a bit with this one and made provisions for not having to struggle again when I would look at the code again 10 minutes, 10 days or 10 months later. I chose (possibly strange but consistent) names such as httpl for hash-table triple, and wrote comments (I do not usually need any in invariant and other contract clauses) to remind me of the type of everything. That was not strictly needed since once again the IDE gives me the types, but it does not cost much and could help.

What this says: go over successors; which as you remember is an ARRAY, indexed by objects, of HASH_TABLE, where each entry of such a hash table has an element of type [LIST [TRIPLE] and a key of type INTEGER, representing the tag of a number of outgoing edges from the given object. Go over each hash table httpl. Go over the associated list of triples tpl. Then for each triple tp in this list: the tag of the triple must be the key in the hash table entry (remember, the key does denote a tag); and the source of the triple must the object under consideration, which is the current iteration index in the array of the outermost iteration.

I hope I am not scaring you at this point. Although the concepts are simple, this invariant is more sophisticated than most of those we typically write. Many invariant clauses (and preconditions, and postconditions) are very simple properties, such as x > 0 or x ≠ y. The reason this one is more elaborate is not that I am trying to be fussy but that without it I would be the one scared to death. What is elaborate here is the data structure and programming technique. Not rocket science, not anything beyond programmers typically do, but elaborate. The only way to get it right is to buttress it by the appropriate logical properties. As noted, these properties are there anyway, in the back of your head, when you write the program. If you want to be more like an electrical engineer than an electrician, you have to write them down.

There is more to contracts

Invariants are not the only kind of such “contract properties. Here for example, from the same class, is a (slightly abbreviated) part of the postcondition (output property) of the operation that tells us, through a boolean Result, if the multigraph has an edge of given components osource, t (the tag) and otarget :

Result =

(across successors [osource] [t] as tp some

not tp.item.is_inoperative and tp.item.target = otarget

end)

In words, this clause expresses the compatibility of the operation with the successors view: it must answer yes if and only if otarget appears in the successor set of osource for t, and the corresponding triple is not marked inoperative.

The concrete benefits

And so? What do we get out of making these logical properties explicit? Just the intellectual satisfaction of doing things right, and the methodological guidance? No! Once you have done this work, it is all downhill. Turn on the run-time assertion monitoring option (tunable separately for preconditions, postconditions, invariants etc., and on by default in development mode), and watch your tests run. If you are like almost all of us, you will have made a few mistakes, some which will seem silly when or rather if you find them in time (but there is nothing funny about a program that crashes during operation) and some more subtle. Sit back, and just watch your contracts be violated. For example if I change <= to < in the invariant property tw.key <= max_tag, I get the result of Figure 5. I see the call stack that I can traverse, the object run-time structure that I can explore, and all the tools of a modern debugger for an OO language. Finding and correcting the logical flaw will be a breeze.

debugger

Figure 5: An invariant violation brings up the debugger

The difference

It will not be a surprise that I did not get all the data structures and algorithms of the class MULTIGRAPH  right the first time. The Design by Contract approach (the discipline of systematically expressing, whenever you write any software element, the associated logical properties) does lead to fewer mistakes, but everyone occasionally messes up. Everyone also looks at initial results to spot and correct mistakes. So what is the difference?

Without the techniques described here, you execute your software and patiently examine the results. In the example, you might output the content of the data structures, e.g.

List of outgoing references for every object:

        1: 1-1->1|D, 1-1->2|D, 1-1->3|D, 1-2->1|D, 1-2->2|D,  1-25->8|D, 1-7->1|D, 1-7->6|D,

1-10->8|D, 1-3->1|D, 1-3->2|D, 1-6->3|D, 1-6->4|D, 1-6->5|D

        3: 3-6->3, 3-6->4, 3-6->5, 3-9->14, 3-9->15,   3-9->16, 3-1->3, 3-1->2, 3-2->3, 3-2->2,

                  3-25->8, 3-7->3, 3-7->6, 3-10->8, 3-3->3,  3-3->2    

List of outgoing references for every object:

        1: 1-1->1|D, 1-1->2|D, 1-1->3|D, 1-2->1|D, 1-2->2|D, 1-25->8|D, 1-7->1|D, 1-7->6|D,

1-10->8|D, 1-3->1|D,  1-3->2|D, 1-6->3|D, 1-6->4|D, 1-6->5|D

        3: 3-6->3, 3-6->4, 3-6->5, 3-9->14, 3-9->15,  3-9->16, 3-1->3, 3-1->2, 3-2->3, 3-2->2,

                                 3-25->8, 3-7->3, 3-7->6, 3-10->8, 3-3->3,  3-3->2

and so on for all the structures. You check the entries one by one to ascertain that they are as expected. The process nowadays has some automated support, with tools such as JUnit, but it is still essentially manual, tedious and partly haphazard: you write individual test oracles for every relevant case. (For a more automated approach to testing, taking advantage of contracts, see [1].) Like the logical properties appearing in contracts, these oracles are called assertions but the level of abstraction is radically different: an oracle describes the desired result of one test, where a class invariant, or routine precondition, or postcondition expresses the properties desired of all executions.

Compared to the cost of writing up such contract properties (simply a matter of formalizing what you are thinking anyway when you write the code), their effect on testing is spectacular. Particularly when you take advantage of across iterators. In the example, think of all the checks and crosschecks automatically happening across all the data structures, including the nested structures as in the 3-level across clause. Even with a small test suite, you immediately get, almost for free, hundreds or thousands of such consistency checks, each decreasing the likelihood that a logical flaw will survive this ruthless process.

Herein lies the key advantage. Not that you will magically stop making mistakes; but that the result of such mistakes, in the form of contract violations, directly points to logical properties, at the level of your thinking about the program. A wrong entry in an output, whether you detect it visually or through a Junit clause, is a symptom, which may be far from the cause. (Remember Dijkstra’s comment, the real point of his famous Goto paper, about the core difficulty of programming being to bridge the gap between the static program text, which is all that we control, and its effect: the myriad possible dynamic executions.) Since the cause of a bug is always a logical mistake, with a contract violation, which expresses a logical inconsistency, you are much close to that cause.

(About those logical mistakes: since a contract violation reflects a discrepancy between intent, expressed by the contract, and reality, expressed by the code, the mistake may be on either side. And yes, sometimes it is the contract that is wrong while the implementation in fact did what is informally expected. There is partial empirical knowledge [1] of how often this is the case. Even then, however, you have learned something. What good is a piece of code of which you are not able to say correctly what it is trying to do?)

The experience of Eiffel programmers reflects these observations. You catch the mistakes through contract violations; much of the time, you find and correct the problem easily. When you do get to producing actual test output (which everyone still does, of course), often it is correct.

This is what has happened to me so far in the development of the example. I had mistakes, but converging to a correct version was a straightforward process of examining violations of invariant violations and other contract elements, and fixing the underlying logical problem each time.

By the way, I believe I do have a correct version (in the sense of the second part of the Hoare quote), on the basis not of gut feeling or wishful thinking but of solid evidence. As already noted it is hard to imagine, if the code contains any inconsistencies, a test suite surviving all the checks.

Tests and proofs

Solid evidence, not perfect; hard to imagine, not impossible. Tests remain only tests; they cannot exercise all cases. The only way to achieve demonstrable correctness is to rely on mathematical proofs performed mechanically. We have this too, with the AutoProof proof system for Eiffel, developed in recent years [1]. I cannot overstate my enthusiasm for this work (look up the Web-based demo), its results (automated proof of correctness of a full-fledged data structures and algorithms library [2]) and its potential, but it is still a research effort. The dynamic approach (meaning test-based rather than proof-based) presented above is production technology, perfected over several decades and used daily for large-scale mission-critical applications. Indeed (I know you may be wondering) it scales up without difficulty:

  • The approach is progressive. Unlike fully formal methods (and proofs), it does not require you to write down every single property down to the last quantifier. You can start with simple stuff like x > 0. The more you write, the more you get, but it is the opposite of an all-or-nothing approach.
  • On the practical side, if you are wondering about the consequences on performance of a delivered system: there is none. Run-time contract monitoring is a compilation option, tunable for different kinds of contracts (invariants, postconditions etc.) and different parts of a system. People use it, as discussed here, for development, testing and debugging. Most of the time, when you deliver a debugged system, you turn it off.
  • It is easy to teach. As a colleague once mentioned, if you can write an if-then-else you can write a precondition. Our invariants in the above example where a bit more sophisticated, but programmers do write loops (in fact, the Eiffel loop for iterating over a structure also uses across, with loop and instructions instead of all or some and boolean expressions). If you can write a loop over an array, you can write a property of the array’s elements.
  • A big system is an accumulation of small things. In a blog article [5] I recounted how I lost a full day of producing a series of technical diagrams of increasing complexity, using one of the major Web-based collaborative development tools. A bug of the system caused all the diagrams to reproduce the first, trivial one. I managed to get through to the developers. My impression (no more than an educated guess resulting from this interaction) is that the data structures involved were far simpler than the ones used in the above discussion. One can surmise that even simple invariants would have uncovered the bug during testing rather than after deployment.
  • Talking about deployment and tools used directly on the cloud: the action in software engineering today is in DevOps, a rapid develop-deploy loop scheme. This is where my perplexity becomes utter cluelessness. How can anyone even consider venturing into that kind of exciting but unforgiving development model without the fundamental conceptual tools outlined above?

We are back then to the core question. These techniques are simple, demonstrably useful, practical, validated by years of use, explained in professional books (e.g. [6]), introductory programming textbooks (e.g. [7]), EdX MOOCs (e.g. [8]), YouTube videos, online tutorials at eiffel.org, and hundreds of articles cited thousands of times. On the other hand, most people reading this article are not using Eiffel. On reflection, a simple quantitative criterion does exist to identify the inmates: there are far more people outside the asylum than inside. So the evidence is incontrovertible.

What, then, is wrong with me?

References

(Nurse to psychiatrist: these are largely self-references. Add narcissism to list of patient’s symptoms.)

1.    Ilinca Ciupa, Andreas Leitner, Bertrand Meyer, Manuel Oriol, Yu Pei, Yi Wei and others: AutoTest articles and other material on the AutoTest page.

2. Bertrand Meyer, Ilinca Ciupa, Lisa (Ling) Liu, Manuel Oriol, Andreas Leitner and Raluca Borca-Muresan: Systematic evaluation of test failure results, in Workshop on Reliability Analysis of System Failure Data (RAF 2007), Cambridge (UK), 1-2 March 2007 available here.

3.    Nadia Polikarpova, Ilinca Ciupa and Bertrand Meyer: A Comparative Study of Programmer-Written and Automatically Inferred Contracts, in ISSTA 2009: International Symposium on Software Testing and Analysis, Chicago, July 2009, available here.

4.    Carlo Furia, Bertrand Meyer, Nadia Polikarpova, Julian Tschannen and others: AutoProof articles and other material on the AutoProof page. See also interactive web-based online tutorial here.

5.    Bertrand Meyer, The Cloud and Its Risks, blog article, October 2010, available here.

6.    Bertrand Meyer: Object-Oriented Software Construction, 2nd edition, Prentice Hall, 1997.

7.    Bertrand Meyer: Touch of Class: Learning to Program Well Using Objects and Contracts, Springer, 2009, see touch.ethz.ch and Amazon page.

8.    MOOCs (online courses) on EdX : Computer: Art, Magic, Science, Part 1 and Part 2. (Go to archived versions to follow the courses.)

VN:F [1.9.10_1130]
Rating: 9.9/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +8 (from 10 votes)

Mainstream enough for me

Every couple of weeks or so, I receive a message such as the one below; whenever I give a talk on any computer science topic anywhere in the world, strangers come to me to express similar sentiments. While I enjoy compliments as much as anyone else, I am not the right recipient for such comments. In fact there are 7,599,999,999  more qualified recipients. For me, Eiffel is “mainstream” enough.

What strikes me is why so many commenters, after the compliment, stop at the lament. Eiffel is not some magical dream, it is a concrete technology available for download at eiffel.org. Praising Eiffel will not change the world. Using EiffelStudio might.

When one answers the compliments with “Thanks! Then use it for your work“, the variety of excuses is amusing, or sad depending on the perspective, from “my boss would not allow it” (variant: “my subordinates would not accept it”) to “does it work with [library that does not work with anything else]?”.

Well, you might have some library wrapping to do (EiffelStudio easily interfaces with C, C++ and others). Also, you should not stop at the first hurdle: it might be due to a bug (surprise! The technology is not perfect!), but it might also just be that Eiffel and EiffelStudio are different and you have to shed some long-held assumptions and practices. What matters is that the technology does work; companies large and small use Eiffel all the time for long-running projects, some into the millions of lines and tens of thousands of classes, and refuse to switch to anything else.

What follows is a literal translation of the original message into English (it was written in another language). Since the author, whom I do not know, did not state the email was a public comment, I removed identifying details.

 

Subject:Eiffel is fantastic! But why is it not mainstream?

Dear Professor Meyer:

Greetings from [the capital of a country on another continent].

I graduated from [top European university] in 1996 and completed a master’s in physics from [institute on another continent] in 2006.

I have worked for twenty years in the industry, from application engineer to company head. In my industry career I have been able to be both CEO and CTO at the same time, thanks to the good education I received originally.

Information systems were always a pillar of my business strategy. Unfortunately, I was disappointed every single time I commissioned the development of a new system. This led me to study further and to investigate why the problem is not solved. That’s how I found your book Object-Oriented Software Construction and became enthusiastic about Design by Contract, Eiffel and EiffelStudio. To me your method is the only method for developing “correct” software. The Eiffel programming language is, in my view, the only true object-oriented language.

However it befuddles me — I cannot understand —  why the “big” players in this industry (Apple, Google, Microsoft etc.) do not use Design by Contract. .NET has a Visual Studio extension with the name “Code Contracts” but it is no longer supported in the latest Visual Studio 2017. Big players, why don’t you promote Design by Contract?

Personally, after 20 years in industry, I found out that my true calling is in research. It would be a great pleasure to be able to work in research. My dream job is Data Scientist and I had thought to apply to Google for a job. Studying the job description, I noted that “Python” is one of the desired languages. Python is dynamically typed and does not support good encapsulation. No trace of Design by Contract…

What’s wrong with the software industry?

With best regards,

VN:F [1.9.10_1130]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Towards empirical answers to important software engineering questions

(Adapted from a two-part article on the Communications of the ACM blog.)

1 The rise of empirical software engineering

One of the success stories of software engineering research in recent decades has been the rise of empirical studies. Visionaries such as Vic Basili, Marvin Zelkowitz and Walter Tichy advocated empirical techniques early [1, 2, 3]; what enabled the field to take off was the availability of software repositories for such long-running projects as Apache, Linux and Eclipse [4], which researchers started mining using modern data analysis techniques.

These studies have yielded many insights including surprises. More experienced developers can produce more buggy code (Schröter, Zimmermann, Premraj, Zeller). To predict whether a module has bugs, intrinsic properties such as complexity seem to matter less than how many changes it went through (Moser, Pedrycz, Succi). Automatic analysis of user reports seems much better at identifying bugs than spotting feature requests (Panichella, Di Sorbo, Guzman, Visaggio, Canfora, Gall). More extensively tested modules tend to have more bugs (Mockus, Nagappan, Dinh-Trong). Eiffel programmers do use contracts (Estler, Furia, Nordio, Piccioni and me). Geographical distance between team members negatively affects the amount of communication in projects (Nordio, Estler, Tschannen, Ghezzi, Di Nitto and me). And so on.

The basic observation behind empirical software engineering is simple: if software products and processes are worthy of discussion, they must be worthy of quantitative discussion just like any natural artifact or human process. Usually at that point the advocacy cites Lord Kelvin:”If you cannot measure it, you cannot improve it” [5].

Not that advocacy is much needed today, at least for publishing research in software engineering and in computer science education. The need for empirical backing of conceptual proposals has achieved consensus.  The so-called a “Marco Polo paper” [6] (I traveled far and saw wonderful things, thank you very much for your attention) no longer suffices for program committees; today they want numbers (and also, thankfully, a “threats to validity” section which protects you against suspicions that the numbers are bogus by stating why they might be). Some think this practice of demanding empirical backing for anything you propose has gone too far; see Jeff Ullman’s complaint [7], pertaining to database research rather than software engineering, but reflecting some of the same discussions. Here we can counter Kelvin with another quote (for more effect attributed to Einstein, albeit falsely): not everything that can be counted counts, and not everything that counts can be counted.

2 Limits of empirical research

There can indeed be too much of a good thing. Still, no one would seriously deny the fundamental role that empirical research has gained in modern software engineering. Which does not prevent us from considering the limits of what it has achieved; not in a spirit of criticism for its own sake, but to help researchers define an effective agenda for the next steps. There are in my opinion two principal limitations of current empirical results in software engineering.

The first has to do with the distinction introduced above between the two kinds of possible targets for empirical assessment: products (artifacts) versus processes.

Both aspects are important, but one is much easier to investigate than the other. For software products, the material of study is available in the form of repositories mentioned above, with their wealth of information about lines of code, control and data structures, commits, editing changes, bug reports and bug fixes. Processes are harder to grasp. You gain some information on processes from the repositories (for example, patterns and delays of bug fixing), but processes deserve studies of their own. For example, agile teams practice iterations (sprints) of widely different durations, from a few days to a few weeks; what is the ideal length? A good empirical answer would help many practitioners. But this example illustrates how difficult empirical studies of processes can be: you would need to try many variations with teams of professional programmers (not students) in different projects, different application areas, different companies; for the results to be believable the projects should be real ones with business results at stake, there should be enough samples in each category to ensure statistical significance, and the companies should agree to publication of some form, possibly anonymized, of the outcomes. The difficulties are formidable.

This issue of how to obtain project-oriented metrics is related to the second principal limitation of some of the initial empirical software engineering work: the risk of indulging in lamppost research. The term refers to the well-known joke about the drunkard who, in the dark of the night, searches for his lost keys next to the lamp post, not because he has lost them there but because it is the only place where one can see anything. To a certain extent all research is lamppost research: by definition, if you succeed in studying something, it will be because it can be studied. But the risk is to choose to work on a problem only, or principally, because it is easy to set up an empirical study — regardless of its actual importance. To cite an example that I have used elsewhere, one may suspect that the reason there are so many studies of pair programming is not that it’s of momentous relevance but that it is not hard to set up an experiment.

3 Beyond the lamppost

As long as empirical software engineering was a young, fledgling discipline, it made good sense to start with problems that naturally lended themselves to empirical investigation. But now that the field has matured, it may be time to reverse the perspective and start from the consumer’s perspective: for practitioners of software engineering, what problems, not yet satisfactorily answered by software engineering theory, could benefit, in the search for answers, from empirical studies?

Indeed, this is what we are entitled to expect from empirical studies: guidance. The slogan of empirical software engineering is that software is worthy of study just like geological strata, photons, and lilies-of-the-valley; OK, sure, but we are talking about human artifacts rather than wonders of the natural world, and the idea should be to help us produce better software and produce software better.

4 A horror story

Whenever we call for guidance from empirical studies, we should immediately include a caveat: every empirical study has its limitations (politely called “threats to validity”) and one must be careful about any generalization. The following horror story serves as caution [9]. The fashion today in programming language design is to use the semicolon not as separator in the Algol tradition (instruction1 ; instruction2) but as a terminator in the C tradition (instruction1; instruction2;). The original justification, particularly in the case of Ada [10], is an empirical paper by Gannon and Horning [11], which purported to show that the terminator convention led to fewer errors. (The authors themselves not only give their experimental results but, departing from the experimenter’s reserve, explicitly jump to the conclusion that terminators are better.) This view defies reason: witness, among others, the ever-recommenced tragedy of if c then a; else; b where the semicolon after else is an error (a natural one, since one gets into the habit of adding semicolons just in case) but the code compiles, with the result that b will be executed in all cases rather than (as intended) just when c is false [12].

How in the world could an empirical study come up with such a bizarre conclusion? Go back to the original Gannon-Horning paper and the explanation becomes clear: the experiments used subjects who were familiar with the PL/I programming language, where semicolons are used generously and an extra semicolon is harmless, as it is in all practical languages (two successive semicolons being simply interpreted as the insertion of an empty instruction, causing no harm); but the experimental separator-based language and compiler used to the experiment treated an extra semicolon as an error! As if this were not enough, checking the details of the article reveals that the terminator language is terminator-based for both declarations and instructions, whereas the example delimiter language is only delimiter-based for instructions, but terminator-based for declarations. Talk about a biased experiment! The experiment was bogus and so are the results.

One should not be too harsh about a paper from 1975, when the very idea of systematic experimental studies of programming was novel, and some of its other results are worthy of consideration. But the sad terminator story, even though it only affected a syntax property, should serve as a reminder that we should not accept a view blindly just because someone invokes some empirical study to justify it. We should assess the study itself, its methods and its credibility.

5 Addressing the issues that matter

With this warning in mind, we should still expect empirical software engineering to help us practitioners. It should help address important software engineering problems.

Ideally, I should now list the open issues of software engineering, but I am in no position even to start such a list. All I can do is to give a few examples. They may not be important to you, but they give an idea:

  • What are the respective values of upfront design and refactoring? How best can we combine these approaches?
  • Specification and testing are complementary techniques. Specifications are in principles superior to testing in general, but testing remains necessary. What combination of specification and testing works best?
  • What is the best commit/release technique, and in particular should we use RTC (Review Then Commit, as with Apache originally then Google) or CTR (Commit To Review, as Apache later) [13]?
  • What measure of code properties best correlates with effort? Many fancy metrics have appeared in the literature over the years, but there is still a nagging feeling among many of us that for all its obvious limitations the vulgar SLOC metrics (Source Lines Of Code) still remains the least bad.
  • When can a manager decide to stop testing? We did some work on the topic [14], but it is only a start.
  • Is test coverage a good measure of test quality [15] (spoiler: it is not, but again we need more studies)?

And so on. These examples may not be the cases that you consider most important; indeed what we need is input from many software engineers to help steer empirical software engineering towards the topics that truly matter to the community.

To provide a venue for that discussion, a workshop will take place 10-12 September 2018 (provisional dates) in the Toulouse area, involving many of the tenors in empirical software engineering, with the same title as these two articles: Empirical Answers to Important Software Engineering Questions. The key idea is to start not from the solutions side (the lamppost) but from the actual challenges facing software engineers. It will not just be a traditional publication-oriented meeting but will also include ample time for discussions and joint work.

If you would like to contribute your example “important questions”, please use any appropriate support (responses to this blog, email to me, Facebook, LinkedIn, anything as long as one can find it). Suggestions will be taken into consideration for the workshop. Empirical software engineering has already established itself as a core area of research; it is time feed that research with problems that actually matter to software developers, managers and users

Acknowledgments

These reflections originated in a keynote that I gave at ESEM in Bolzano in 2010 (I am grateful to Barbara Russo and Giancarlo Succi for the invitation). I never wrote up the talk but I dug up the slides [8] since they might contain a few relevant observations. I used some of these ideas in a short panel statement at ESEC/FSE 2013 in Saint Petersburg, and I am grateful to Moshe Vardi for suggesting I should write them up for Communications of the ACM, which I never did.

References and notes

[1] Victor R. Basili: The role of experimentation in software engineering: past, present and future,  in 18th ICSE (International Conference on Software Engineering), 1996, see here.

[2] Marvin V. Zelkowitz and Dolores Wallace: Experimental validation in software engineering, International Conference on Empirical Assessment and Evaluation in Software Engineering, March 1997, see here.

[3] Walter F. Tichy: Should computer scientists experiment more?, in IEEE Computer, vol. 31, no. 5, pages 32-40, May 1998, see here.

[4] And EiffelStudio, whose repository goes back to the early 90s and has provided a fertile ground for numerous empirical studies, some of which appear in my publication list.

[5] This compact sentence is how the Kelvin statement is usually abridged, but his thinking was more subtle.

[6] Raymond Lister: After the Gold Rush: Toward Sustainable Scholarship in Computing, Proceedings of 10th conference on Australasian Computing Education Conference, pages 3-17, see here.

[7] Jeffrey D. Ullman: Experiments as research validation: have we gone too far?, in Communications of the ACM, vol. 58, no. 9, pages 37-39, 2015, see here.

[8] Bertrand Meyer, slides of a talk at ESEM (Empirical Software Engineering and Measurement), Bozen/Bolzano, 2010, available here. (Provided as background material only, they are  not a paper but just slide support for a 45-minute talk, and from several years ago.)

[9] This matter is analyzed in more detail in section 26.5 of my book Object-Oriented Software Construction, 2nd edition, Prentice Hall. No offense to the memory of Jim Horning, a great computer scientist and a great colleague. Even great computer scientists can be wrong once in a while.

[10] I know this from the source: Jean Ichbiah, the original designer of Ada, told me explicitly that this was the reason for his choice of  the terminator convention for semicolons, a significant decision since it was expected that the language syntax would be based on Pascal, a delimiter language.

[11] Gannon & Horning, Language Design for Programming Reliability, IEEE Transactions on Software Engineering, vol. SE-1, no. 2, June 1975, pages 179-191, see here.

[12] This quirk of C and similar languages is not unlike the source of the Apple SSL/TLS bug discussed earlier in this blog under the title Code matters.

[13] Peter C. Rigby, Daniel M. German, Margaret-Anne Storey: Open Source Software Peer Review Practices: a Case study of the Apache Server, in ICSE (International Conference on Software Engineering) 2008, pages 541-550, see here.

[14] Carlo A. Furia, Bertrand Meyer, Manuel Oriol, Andrey Tikhomirov and  Yi Wei:The Search for the Laws of Automatic Random Testing, in Proceedings of the 28th ACM Symposium on Applied Computing (SAC 2013), Coimbra (Portugal), ACM Press, 2013, see here.

[15] Yi Wei, Bertrand Meyer and Manuel Oriol: Is Coverage a Good Measure of Testing Effectiveness?, in Empirical Software Engineering and Verification (LASER 2008-2010), eds. Bertrand Meyer and Martin Nordio, Lecture Notes in Computer Science 7007, Springer, February 2012, see here.

VN:F [1.9.10_1130]
Rating: 9.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Split the Root: a little design pattern

Many programs take “execution arguments” which the program users provide at the start of execution. In EiffelStudio you can enter them under Execution -> Execution parameters.

The program can access them through the Kernel Library class ARGUMENTS. Typically, the root class of the system inherits from ARGUMENTS and its creation procedure will include something like

if argument_count /= N then
……..print (“XX expects exactly N arguments: AA, BB, …%N”)
else
……..u := argument (1) ; v := argument (2) ; …
……..“Proceed with normal execution, using u, v, …”
end

where N is the number of expected arguments, XX is the name of the program, and AA, …. are the roles of arguments. u, v, … are local variables. The criterion for acceptance could be “at least N” instead of exactly N. The features argument_count and arguments come from class ARGUMENTS.

In all but trivial cases this scheme (which was OK years ago, in a less sophisticated state of the language) does not work! The reason is that the error branch will fail to initialize attributes. Typically, the “Proceed with…” part in the other branch is of the form

               attr1 := u
                attr2 := v
                …
                create obj1.make (attr1, …)
                create obj2.make (attr2, …)
                “Work with obj1, obj2, …”

If you try to compile code of this kind, you will get a compilation error:

Compiler error message

Eiffel is void-safe: it guarantees that no execution will ever produce null-pointer dereference (void call). To achieve this guarantee, the compiler must make sure that all attributes are “properly set” to an object reference (non-void) at the end of the creation procedure. But the error branch fails to initialize obj1 etc.

You might think of replacing the explicit test by a precondition to the creation procedure:

               require
                                argument_count = N

but that does not work; the language definition explicit prohibits preconditions in a root creation procedure. The Ecma-ISO standard (the official definition of the language, available here) explains the reason for the corresponding validity rule (VSRP, page 32):

A routine can impose preconditions on its callers if these callers are other routines; but it makes no sense to impose a precondition on the external agent (person, hardware device, other program…) that triggers an entire system execution, since there is no way to ascertain that such an agent, beyond the system’s control, will observe the precondition.

The solution is to separate the processing of arguments from the rest of the program’s work. Add a class CORE which represents the real core of the application and separate it from the root class, say APPLICATION. In APPLICATION, all the creation procedure does is to check the arguments and, if they are fine, pass them on to an instance of the core class:

                note
                                description: “Root class, processes execution arguments and starts execution”
                class APPLICATION create make feature
                                core: CORE
                                                — Application’s core object
                                make
……..……..……..……..……..……..— Check arguments and proceed if they make sense.
                                                do
                                                             if argument_count /= N then
                                                                                print (“XX expects exactly N arguments: AA, BB, …%N”)
                                                                else
                                                                                create core.make (argument (1), argument (2) ; …)
                                                                                                — By construction the arguments are defined!
                                                                                core.live
                                                                                                — Perform actual work
                                                                                               — (`live’ can instead be integrated with `make’ in CORE.)

                                                                end
                                                end
                 end
 
We may call this little design pattern “Split the Root”. Nothing earth-shattering; it is simply a matter of separating concerns (cutting off the Model from the View). It assumes a system that includes text-based output, whereas many applications are graphical. It is still worth documenting, for two reasons.

First, in its own modest way, the pattern is useful for simple programs; beginners, in particular, may not immediately understand why the seemingly natural way of processing and checking arguments gets rejected by the compiler.

The second reason is that Split the Root illustrates the rules that preside over a carefully designed language meant for carefully designed software. At first it may be surprising and even irritating to see code rejected because, in a first attempt, the system’s root procedure has a precondition, and in a second attempt because some attributes are not initialized — in the branch where they do not need to be initialized. But there is a reason for these rules, and once you understand them you end up writing more solid software.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

AutoProof workshop: Verification As a Matter of Course

The AutoProof technology pursues the goal of “Verification As a Matter Of Course”, integrated into the EVE development environment. (The AutoProof  project page here; see particularly the online interactive tutorial.) A one-day workshop devoted to the existing AutoProof and current development will take place on October 1 near Toulouse in France. It is an informal event (no proceedings planned at this point, although based on the submissions we might decide to produce a volume), on a small scale, designed to bring together people interested in making the idea of practical verification a reality.

The keynote will be given by Rustan Leino from Microsoft Research, the principal author of the Boogie framework on which the current implementation of AutoProof relies.

For submissions (or to attend without submitting) see the workshop page here. You are also welcome to contact me for more information.

VN:F [1.9.10_1130]
Rating: 5.3/10 (15 votes cast)
VN:F [1.9.10_1130]
Rating: -2 (from 6 votes)

New paper: Theory of Programs

Programming, wrote Dijkstra many years ago, is a branch of applied mathematics. That is only half of the picture: the other half is engineering, and this dual nature of programming is part of its attraction.

Descriptions of the mathematical side are generally, in my view, too complicated. This article [1] presents a mathematical theory of programs and programming based on concepts taught in high school: elementary set theory. The concepts covered include:

  • Programming.
  • Specification.
  • Refinement.
  • Non-determinism.
  • Feasibility.
  • Correctness.
  • Programming languages.
  • Kinds of programs: imperative, functional, object-oriented.
  • Concurrency (small-step and large-step)
  • Control structures (compound, if-then-else and Dijkstra-style conditional, loop).
  • State, store and environment.
  • Invariants.
  • Notational conventions for building specifications and programs incrementally.
  • Loop invariants and variants.

One of the principal ideas is that a program is simply the description of a mathematical relation. The program text is a rendering of that relation. As a consequence, one may construct programming languages simply as notations to express certain kinds of mathematics. This approach is the reverse of the usual one, where the program text and its programming languages are the starting point and the center of attention: theoreticians develop techniques to relate them to mathematical concepts. It is more effective to start from the mathematics (“unparsing” rather than parsing).

All the results (74 properties expressed formally, a number of others in the text) are derived as theorems from rules of elementary set theory; there are no new axioms whatsoever.

The paper also has a short version [2], omitting proofs and many details.

References

[1] Theory of Programs, available here.
[2] Theory of Programs, short version of [1] (meant for quick understanding of the ideas, not for publication), available here.

 

VN:F [1.9.10_1130]
Rating: 5.6/10 (29 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 12 votes)

Framing the frame problem (new paper)

Among the open problems of verification, particularly the verification of object-oriented programs, one of the most vexing is framing: how to specify and verify what programs element do not change. Continuing previous work, this article presents a “double frame inference” method, automatic on both sides the specification and verification sides. There is no need to write frame specifications: they will be inferred from routine postconditions. For verification, the method computes the set of actually changed properties through a “change calculus”, itself based on the previously developed alias calculus.

Some verification techniques, such as Hoare-style proofs, require significant annotation effort and potentially yield full functional verification; others, such as model checking and abstract interpretation, have more limited goals but seek full automation. Framing, in my opinion, should be automatic, freeing the programmer-verifier to devote the annotation effort to truly interesting properties.

Reference

[1] Bertrand Meyer: Framing the Frame Problem, in Dependable Software Systems, Proceedings of August 2014 Marktoberdorf summer school, eds. Alexander Pretschner, Manfred Broy and Maximilian Irlbeck, NATO Science for Peace and Security, Series D: Information and Communication Security, Springer, 2015 (to appear), pages 174-185; preprint available here.

VN:F [1.9.10_1130]
Rating: 5.8/10 (17 votes cast)
VN:F [1.9.10_1130]
Rating: -1 (from 9 votes)

A gold medal

The French National Research Center (CNRS) has just awarded [1] its annual gold medal to Gérard Berry, a great recognition for an outstanding computer scientist. I first discovered Berry’s work through a brilliant 1976 article, Bottom-Up Computation of Recursive Programs [2], which explained recursion using methods of denotational semantics, and should really figure in collections of classic CS articles. One can hardly understand dynamic programming algorithms, for example, without realizing that they are bottom-up versions of recursive algorithms, through transformations described in the article. I relied extensively on its techniques for my own textbooks, from the advanced concepts of Introduction to the Theory of Programming Languages all the way down to the introductory presentation of recursion in Touch of Class.

Berry then went on to play a founding role in synchronous languages, explaining (in a paper whose reference I don’t have right now) that he went through a revelation of sorts after realizing that the automatic-control industry needed languages completely different from those computer scientists typically love. He created the Estérel language and shepherded it all the way to industrialization, serving as Chief Scientist of Estérel Technologies in the 2000 decade, before coming back to research.

He is also well known in France as a popularizer of informatics (computer science) through a number of successful books aiming at a large audience. He was appointed the first professor of informatics at the Collège de France, and in his inaugural lecture series presented a sweeping description of how information technology advances, based on powerful scientific concepts, permeate today’s world and raise ever new challenges. Hardly could the CNRS have chosen to distinguish a better representative of our discipline.

References

[1] CNRS announcement: here.

[2] Gérard Berry, Bottom-Up Computation of Recursive Programs, in RAIRO (Revue Française d’Automatique, Informatique, Recherche Opérationnelle, Informatique Théorique), tome 10, no R1 (1976), pp. 47-82, available here.

VN:F [1.9.10_1130]
Rating: 5.5/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Programming language features

 

InfoWorld is currently publishing a series of programming language assessments:

  • 9 Things We Hate About Objective-C, 4 June.
  • 15 Things We Hate About Java, 6 March.
  • 10 Features Apple Stole for the Swift Programming Language, 9 June.

Notable in these articles is what they do not mention: Eiffel has most of what the author misses in Objective-C and Java; and most of what Swift “stole” it stole from Eiffel.

In this article let us concentrate on the nine Objective-C complaints, by Peter Wayner [1]; subsequent articles will examine the Java “hates” and the Swift “steals”.

Criticism 1: “It is a little too different

“Objective-C lovers tout that Objective-C is a strict superset of C: If you can do it in C, you should be able to do it in Objective-C. But it doesn’t go the other way, so you’re stuck wondering, “Should I use an Objective-C method description or a C one?” Achieving portability to C programs requires constant vigilance and forethought.”

This is what happens when you mix language paradigms. Eiffel has a close relationship with C, but the two sides are clearly separated. You can call C from Eiffel, and the other way around. You can declare an Eiffel routine as “external C” and even include the C code inline: in other words an Eiffel “method description” can have a C implementation. The structure is always object-oriented (no need to fear that a novice programmer will revert to a C style for the design) but for access to low-level system mechanisms and small functions that should be optimized to the byte and microsecond you use C directly, in its ideal role.

Criticism 2: “It’s still mostly just plain old C

“For all its object-oriented coolness, you don’t get much else from Objective-C. It’s more of a way to organize your code for large systems than a way to write better code. You’re still responsible for pointers. You’re still responsible for keeping track of memory.

Eiffel is object-oriented all the way. You are not “responsible for pointers“. References are tame: no pointer arithmetic. You are not “responsible for keeping track of memory“:  objects are garbage-collected

“The C programmers loved to call their software a ‘portable assembly code’, and the same is true for Objective-C … except it’s only portable from the Mac to the iPad.”

“Portable assembly code” is exactly what C provides, and hence an excellent target for an Eiffel compiler. As to Eiffel, it runs on all platforms, from Windows to Linux to Solaris to VMS to the Mac.

Criticism 3: Stuck in the 80’s

Criticism 3: “Stuck in the ’80s

“Parachute pants, big hair, ‘The Breakfast Club’ — and the NeXT machine: Objective-C is like a time machine in programming-language land.”

Eiffel has undergone constant evolution, innovating on all fronts of programming constructs and integrating the best of known techniques.

“The primitives aren’t first-class citizens. Garbage collection, that wonderful idea that sustained Lisp, was adopted by Java ages ago. Objective-C got it in 2006. The same goes for properties and closures.”

All this has been in Eiffel forever. Agents (closures) were introduced in 1999, long before Java, C# and other OO languages had anything of the sort. Eiffel’s assigner commands are vastly superior to properties (no need to write all these boring getter functions).

 Criticism 4: “Punctuation

“The cool modern kids writing Python, Ruby, and CoffeeScript can craft billion-dollar companies without using brackets, braces, and parentheses. You’ll be wearing out your punctuation keys writing Objective-C. Colons, at-signs, asterisks? Is there any character that the language doesn’t use?”

Come on. How can one be so misinformed? The semicolon has been optional in Eiffel for fifteen years. The high-priest style of C, Objective-C, Java, C# and so many others, with its piling up of strange symbols, is something that Eiffel users never had to suffer.

Criticism 5: “Modern syntax

Not modern syntax, that is:

“Objective-C”s syntax is like Coke: They tried to modernize it in the ’90s, but it never stuck.”

Eiffel’s syntax is clear and simple. Total beginners, including high-school students, pick it up just as easily and naturally as advanced programmers, and as application experts who want to concentrate on their problem, not on learning strange language conventions going back to the nineteen-sixties.

Criticism 6: “No namespaces

Here Eiffel does not provide what the journalist wants: it is “post-namespaces” (as in “postmodern”). The Eiffel community has decided that the complexity of namespaces was not worth the trouble (what happens when you move packages around?) and prefers simple mechanisms for resolving class name clashes.

Criticism 7: “It only runs in Apple’s corner of the universe

” Variety is the spice of life. It’s even more important in a world where not everything is an iPhone. If a Windows or Linux shop recruits you, you can forget all of those extra Objective-C extensions you learned because they’ll be of no use.”

Eiffel is not tied to any manufacturer, computer architecture or operating system. If a new processor comes out, or a user needs an exotic platform, a port can usually be produced in a matter of hours. The compiler and the entire environment to which it belongs, EiffelStudio, are written in Eiffel; the supporting runtime is in a highly portable form of C, which requires very little customization, if any, for a new platform. (Here “the compiler” means the Eiffel Software implementation, but other implementations also put a strong emphasis on portability.)

Criticism 8: “XCode is your only choice

“In the Objective-C world, you get really only one choice. Why do you need to be different, comrade?”

Besides EiffelStudio other compilers and tools are available for Eiffel.

Criticism 9: “Apple’s benevolent dictatorship

“Do you want to give out more than 100 copies of your iPhone app? Forget it. Do you want to “think different” with your UI? Please go back and read the user interface guidelines. You can’t do anything without Apple’s permission because Apple uses strong crypto to lock down everything — and fanatically tyrannical policies to lock down the rest.”

The Eiffel language definition is steered by a standards committee under Ecma (the organization behind many of the major standards in IT), which anyone can join. EiffelStudio itself is available in open source. The Eiffel world knows nothing like the close control Apple exerts over its product; it welcomes all contributors.

Maybe someone should talk to Mr. Wayner and help him broaden his scope of programming language knowledge.

References

[1] Peter Wayner, 9 Things We Hate About Objective-C, InfoWorld, 4 June 2014, available here.

VN:F [1.9.10_1130]
Rating: 6.7/10 (17 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 8 votes)

Code matters

(Adapted from an article previously published on the CACM blog.)

Often, you will be told that programming languages do not matter much. What actually matters more is not clear; maybe tools, maybe methodology, maybe process. It is a pretty general rule that people arguing that language does not matter are defending bad languages.

Let us consider the Apple bug of a few weeks ago. Only a few weeks; the world has already moved to Heartbleed (to be discussed in a subsequent article), but that is not a reason to sweep away the memory of the Apple bug and the language design that it reflects.

In late February, users of  iPhones, iPads and iPods were enjoined to upgrade their devices immediately because  “an attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS.” The bug was traced [1] to code of the following form:

if (error_of_first_kind)
goto fail;
if (error_of_second_kind)
goto fail;
if (error_of_third_kind)
goto fail;
if (error_of_fourth_kind)
goto fail;
if (error_of_fifth_kind)
goto fail;
goto fail;
if (error_of_sixth_kind)
goto fail;
The_truly_important_code_handling_non_erroneous_case

In other words: just a duplicated line! (The extra line is highlighted above.) But the excess “goto” is beyond the scope of the preceding “if“, so it is executed unconditionally: all executions go directly to the “fail” label, so that The_truly_important_code_handling_non_erroneous_case never gets executed.

Critics have focused their ire on the  goto instruction, but it is of little relevance. What matters, language-wise, is the C/C++-Java-C# convention of delimiting the scope of conditional instructions, loops and other kinds of composite structures. Every component of such structures in these languages is syntactically a single instruction, so that:

  • If you want the branch to consist of an atomic instruction, you write that instruction by itself, as in: if (c) a = b;
  • If you want a sequence of instructions, you write it as a compound, enclosed by the ever so beautiful braces: if (c) {a = b; x = y;}

Although elegant in principle (after all, it comes from Algol), this convention is disastrous from a software engineering perspective because software engineering means understanding that programs change. One day, a branch of a conditional or loop has one atomic instruction; sometime later, a maintainer realizes that the corresponding case requires more sophisticated treatment, and adds an instruction, but fails to add the braces.

The proper language solution is to do away with the notion of compound instruction as a separate concept, but simply expect all branches of composite instructions to consist of a sequence, which could consist of several instructions, just one, or none at all. In Eiffel, you will write

if  c then
   x := y
end

or

 if  c then
   a := b
   x := y
else
   u := v
end

or

from i := 1 until c loop
   a := b
   i := i + 1
end

or

across my_list as l loop
   l.add (x)
end

and so on. This syntax also gets rid of all the noise that pollutes programs in languages retaining C’s nineteen-sixties conventions: parentheses around the conditions, semicolons for instructions on different lines; these small distractions accumulate into serious impediments to program readability.

With such a modern language design, the Apple bug could not have arisen. A duplicated line is either:

  • A keyword such as end, immediately caught as a syntax error.
  • An actual instruction such as an assignment, whose duplication causes either no effect or an effect limited to the particular case covered by the branch, rather than catastrophically disrupting all cases, as in the Apple bug.

Some people, however, find it hard to accept the obvious responsibility of language design. Take this comment derisively entitled  “the goto squirrel” by Dennis Hamilton in the ACM Risks forum [2]:

It is amazing to me that, once the specific defect is disclosed (and the diff of the actual change has also been published), the discussion has devolved into one of coding style and whose code is better.  I remember similar distractions around the Ariane 501 defect too, although in that case there was nothing wrong with the code—the error was that it was being run when it wasn’t needed and it was not simulation tested with new launch parameters under the mistaken assumption that if the code worked for Ariane 4, it should work for Ariane 5.

It is not about the code.  It is not about the code.  It is not about goto. It is not about coming up with ways to avoid introducing this particular defect by writing the code differently.

Such certainty! Repeating a wrong statement ( “it is not about the code“) does not make it  right. Of course “it” is about the code! If the code had been different the catastrophe would not have happened, so one needs some gall to state that the code is not the issue — and just as much gall, given that the catastrophe would also not have happened if the programming language had been different, to state that it is not about the programming language.

When Mr. Hamilton dismisses as “distractions” the explanations pointing to programming-related causes for the Ariane-5 disaster, I assume he has in mind the analysis which I published at the time with Jean-Marc Jézéquel [3], which explained in detail how the core issue was the absence of proper specifications (contracts). At that time too, we heard dismissive comments; according to one of the critics, the programming aspects did not count, since the whole thing was really a social problem: the French engineers in Toulouse did not communicate properly with their colleagues in England! What is great with such folk explanations is that they sound just right and please people because they reinforce existing stereotypes. They are by nature as impossible to refute as they are impossible to prove. And they avoid raising the important but disturbing questions: were the teams using the right programming language, the right specification method (contracts, as our article suggested), appropriate tools? In both the Ariane-5 and Apple cases, they were not.

If you want to be considered polite, you are not supposed to point out that the use of programming languages designed for the PDP-8 or some other long-gone machine is an invitation to disaster. The more terrible the programming language people use, and the more they know it is terrible (even if they will not admit it), the more scandalized they will be that you point out that it is, indeed, terrible. It is as if you had said something about their weight or the pimples on their cheeks. Such reactions do not make the comment less true. The expression of outrage is particularly inappropriate when technical choices are not just matters for technical argument, but have catastrophic consequences on society.

The usual excuse, in response to language criticisms, is that better tools, better quality control (the main recommendation of the Ariane-5 inquiry committee back in 1997), better methodology would also have avoided the problem. Indeed, a number of the other comments in the comp.risks discussion that includes Hamilton’s dismissal of code [2] point in this direction, noting for example that static analyzers could have detected code duplication and unreachable instructions. These observations are all true, but change nothing to the role of programming languages and coding issues.  One of the basic lessons from the study of software and other industrial disasters — see for example the work of Nancy Leveson — is that a disaster results from a combination of causes. This property is in fact easy to understand: a disaster coming from a single cause would most likely have been avoided. Consider the hypothetical example of a disastrous flaw in Amazon’s transaction processing. It seems from various sources that Amazon processes something like 300 transactions a second. Now let us assume three independent factors, each occurring with a probability of a thousandth (10-3), which could contribute to a failure. Then:

  • It is impossible that one of the factors could cause failure just by itself: that means it would make a transaction fail after around 3 seconds, and would be caught even in the most trivial unit testing. No one but the developer would ever know about it.
  • If two of the factors together cause failure, they will occur every million transactions, meaning about once an hour. Any reasonable testing will discover the problem before a release is ever deployed.
  • If all three factors are required, the probability is 10-9, meaning that a failure will occur about once a year. Only in that case will a real problem exist: a flaw that goes undetected for a long time, during which everything seems normal, until disaster strikes.

These observations explain why post-mortem examinations of catastrophes always point to a seemingly impossible combination of unfortunate circumstances. The archduke went to Sarajevo and he insisted on seeing the wounded and someone forgot to tell the drivers about the prudent decision to bypass the announced itinerary and the convoy stalled  and the assassin saw it and he hit Franz-Ferdinand right in the neck and there was nationalistic resentment in various countries and the system of alliances required countries to declare war [4]. Same thing for industrial accidents. Same thing for the Apple bug: obviously, there were no good code reviews and no static analysis tools applied and no good management; and, obviously, a programming language that blows out innocent mistakes into disasters of planetary import.

So much for the accepted wisdom, heard again and again in software engineering circles, that code does not matter, syntax does not count, typos are caught right away, and that all we should care about is process or agility or requirements or some other high-sounding concern more respectable than programming. Code? Programming languages? Did we not take care of those years ago? I remember similar distractions.”

There is a  positive conclusion to the “and” nature (in probabilistic terms, the multiplicative nature) of causes necessary to produce a catastrophe in practice: it suffices to get rid of one of the operands of the “and” to falsify its result, hence avoiding the catastrophe. When people tell you that code does not matter or that language does not matter, just understand the comment for what it really means, “I am ashamed of the programming language and techniques I use but do not want to admit it so I prefer to blame problems on the rest of the world“, and make the correct deduction: use a good programming language.

References

[1] Paul Duckline:  Anatomy of a “goto fail” – Apple’s SSL bug explained, plus an unofficial patch for OS X!, Naked Security blog (Sophos), 24 February 2014, available here.

[2] Dennis E. Hamilton: The Goto Squirrel, ACM Risks Forum, 28 February 2014, available here.

[3] Jean-Marc Jézéquel and Bertrand Meyer: Design by Contract: The Lessons of Ariane, in Computer (IEEE), vol. 30, no. 1, January 1997, pages 129-130, available online here and, with reader responses here.

[4] Assassination of Ferdinand of Autria: here.

VN:F [1.9.10_1130]
Rating: 8.1/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

New article: passive processors

 

The SCOOP concurrency model has a clear division of objects into “regions”, improving the clarity and reliability of concurrent programs by establishing a close correspondence between the object structure and the process structure. Each region has an associated “processor”, which executes operations on the region’s objects. A literal application of this rule implies, however, a severe performance penalty. As part of the work for his PhD thesis (defended two weeks ago), Benjamin Morandi found out that a mechanism for specifying certain processors as “passive” yields a considerable performance improvement. The paper, to be published at COORDINATION, describes the technique and its applications.

Reference

Benjamin Morandi, Sebastian Nanz and Bertrand Meyer: Safe and Efficient Data Sharing for Message-Passing Concurrency, to appear in proceedings of COORDINATION 2014, 16th International Conference on Coordination Models and Languages, Berlin, 3-6 June 2014, draft available here.
.

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

Eiffel as an expression language

A functional-programming style, or more generally a style involving more expressions and fewer instructions, is possible in Eiffel. In particular, Eiffel’s agent mechanism embeds a full functional-programming mechanism in the object-oriented framework of the language.

To make the notations simpler, we are discussing and tentatively implementing a number of proposed extensions. They involve no fundamental new language mechanisms, but provide new, more concise notations for existing mechanisms. Examples are:

  • Conditional expressions.
  • Implicit tuple, a rule allowing the omission of brackets for an actual argument when it is a tuple and the last argument, e.g. f (x, y, z) as an abbreviation for f ([x, y, z]) (an example involving just one argument). Tuples already provided the equivalent of a variable-argument (“varargs”) facility, but it is made simpler to use with this convention.
  • Parenthesis alias, making it possible to write just f (x, y) when f is an agent (closure, lambda expression, delegate etc. in other terminologies), i.e. treating f as if it were a function; the notation is simply an abbreviation for f.item ([x, y]) (an example that also takes advantage of implicit tuples). It has many other applications since a “parenthesis alias” can be defined for a feature of any class.
  • Avoiding explicit assignments to Result.
  • Type inference (to avoid explicitly specifying the type when it can be deduced from the context). This is a facility for the programmer, useful in particular for local variables, but does not affect the type system: Eiffel remains strongly typed, it is just that you can be lazy about writing the type when there is no ambiguity.
  • In the same vein, omitting the entire list of generic parameters when it can be inferred.

The description of the mechanism (see the link in [1]) is in the form of a set of slides explaining the concepts and presenting example. This is a working document and feedback is welcome.

References

[1] Eiffel as an expression language, Eiffel Software working document, 2012-2014, see here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

LASER 2014 (Elba, September)

2014 marks the 10-th anniversary (11th edition) of the LASER summer school. The school will be held September 7-14, 2014, and the detailed information is here.

LASER (the name means Laboratory for Applied Software Engineering Research) is dedicated to practical software engineering. The roster of speakers since we started is a who’s who of innovators in the field. Some of the flavor of the school can gathered from the three proceedings volumes published in Springer LNCS (more on the way) or simply by browsing the pages of the schools from previous years.

Usually we have a theme, but to mark this anniversary we decided to go for speakers first; we do have a title, “Leading-Edge Software Engineering”, but broad enough to encompass a wide variety of a broad range of topics presented by star speakers: Harald Gall, Daniel Jackson, Michael Jackson, Erik Meijer (appearing at LASER for the third time!), Gail Murphy and Moshe Vardi. With such a cast you can expect to learn something important regardless of your own primary specialty.

LASER is unique in its setting: a 5-star hotel in the island paradise of Elba, with outstanding food and countless opportunities for exploring the marvelous land, the beaches, the sea, the geology (since antiquity Elba has been famous for its stones and minerals) and the history, from the Romans to Napoleon, who in the 9 months of his reign changed the island forever. The school is serious stuff (8:30 to 13 and 17 to 20 every day), but with enough time to enjoy the surroundings.

Registration is open now.

VN:F [1.9.10_1130]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

Negative variables: new version

I have mentioned this paper before (see the earlier blog entry here) but it is now going to be published [1] and has been significantly revised, both to take referee comments into account and because we found better ways to present the concepts.

We have  endeavored to explain better than in the draft why the concept of negative variable is necessary and why the usual techniques for modeling object-oriented programs do not work properly for the fundamental OO operation, qualified call x.r (…). These techniques are based on substitution and are simply unable to express certain properties (let alone verify them). The affected properties are those involving properties of the calling context or the global project structure.

The basic idea (repeated in part from the earlier post) is as follows. In modeling OO programs, we have to take into account the unique “general relativity” property of OO programming: all the operations you write are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

We introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current.

Negative variable as back pointer

The paper explains why this concept is necessary, describes the associated formal rules, and presents applications.

Reference

[1] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, to appear in Specification, Algebra, and Software, eds. Shusaku Iida, Jose Meseguer and Kazuhiro Ogata, Springer Lecture Notes in Computer Science, 2014, to appear. See text here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

Niklaus Wirth birthday symposium, 20 February, Zurich

In honor of Niklaus Wirth’s 80-th birthday we are organizing a symposium at ETH on February 20, 2014. This is a full-day event with invited talks by:

  • Vint Cerf
  • Hans Eberlé
  • Michael Franz
  • me
  • Carroll Morgan
  • Martin Odersky
  • Clemens Szyperski
  • Niklaus Wirth himself

From the symposium’s web page:

Niklaus Wirth was a Professor of Computer Science at ETH Zürich, Switzerland, from 1968 to 1999. His principal areas of contribution were programming languages and methodology, software engineering, and design of personal workstations. He designed the programming languages Algol W, Pascal, Modula-2, and Oberon, was involved in the methodologies of structured programming and stepwise refinement, and designed and built the workstations Lilith and Ceres. He published several text books for courses on programming, algorithms and data structures, and logical design of digital circuits. He has received various prizes and honorary doctorates, including the Turing Award, the IEEE Computer Pioneer, and the Award for outstanding contributions to Computer Science Education.

Participation is free (including breaks, lunch and the concluding “Apéro”) but space is strictly limited and we expect to run out of seats quickly. So if you are interested (but only if you are certain to attend) please register right away.

Symposium page and access to registration form: here.

VN:F [1.9.10_1130]
Rating: 8.7/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

Multirequirements (new paper)

 

As part of a Festschrift volume for Martin Glinz of the university of Zurich I wrote a paper [1] describing a general approach to requirements that I have been practicing and developing for a while, and presented in a couple of talks. The basic idea is to rely on object-oriented techniques, including contracts for the semantics, and to weave several levels of discourse: natural-language, formal and graphical.

Reference

[1] Bertrand Meyer: Multirequirements, to appear in Martin Glinz Festschrift, eds. Anne Koziolek and Norbert Scheyff, 2013, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Negative variables and the essence of object-oriented programming (new paper)

In modeling object-oriented programs, for purposes of verification (proofs) or merely for a better understanding, we are faced with the unique “general relativity” property of OO programming: all the operations you write (excluding non-OO mechanisms such as static functions) are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

In earlier work [1, 2] initially motivated by the development of the Alias Calculus, I introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current. Alexander Kogtenkov has implemented these ideas and refined them.

Negative variable as back pointer

In a recent paper under submission [3], we review the concepts and applications of negative variables.

References

[1] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, 2011, available here.

[2] Bertrand Meyer: Towards a Calculus of Object Programs, in Patterns, Programming and Everything, Judith Bishop Festschrift, eds. Karin Breitman and Nigel Horspool, Springer-Verlag, 2012, pages 91-128, available here.

[3] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, submitted for publication, 2012. [Updated 13 January 2014: I have removed the link to the draft mentioned in this post since it is now superseded by the new version, soon to be published, and available here.]

VN:F [1.9.10_1130]
Rating: 9.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Habit, happiness, and programming languages

One of the occupational hazards of spreading the word about Eiffel is the frequent answer “yes, it’s much better than the language I use now, I would like to switch, but…“, followed by some sheepish excuse.

Last night I went to see Eugene Onegin once more (I still hope some day to land the part of Monsieur Triquet). Towards the beginning of the first act [1], Tatiana’s mother (Larina), reflecting in a melancholic tone on the vicissitudes of her (long ago) arranged marriage (and (amazingly) anticipating the very fate (as sketched in the last act) of her own daughter (talking about (amazing) anticipation, is there any other similarly hair-raising case of an author (here of the text behind the libretto) so presciently staging the (exact (down to the very last details)) story of his own future tragic death) but enough digressions (sorry (this is supposed (after all) to be (although it is not the first time (and probably not the last either) it strays from the script) a technology blog))), sings

From above, we were given habit:
It is a substitute for happiness

Is this not exactly the excuse?

Reference

[1] Libretto of Onegin, in English here, in the original there.

 

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Precedent

Alexander Kogtenkov pointed out to me that precursor work to my papers on the Alias Calculus [1] [2] had been published by John Whaley and Martin Rinard [3]. There are some significant differences; in particular my rules are simpler, and their work is not explicitly presented as a calculus. But many of the basic ideas are the same. The reason I did not cite that paper is simply that I was not aware of it; I am happy to correct the omission.

References

[1] Bertrand Meyer: Towards a Theory and Calculus of Aliasing, in Journal of Object Technology, vol. 9, no. 2, March-April 2010, pages 37-74, available here (superseded by [2])
[2] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, 2011, available here (revised and improved version of [1].)
[3] John Whaley and Martin Rinard: Compositional Pointer and Escape Analysis for Java Programs, in POPL 1999, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

The manhood test

 

I came across an obscure and surprisingly interesting article by Cliff Jones [1], about the history of rely-guarantee but with the following extract:

It was perhaps not fully appreciated at the time of [Hoare’s 1969 axiomatic semantics paper] that the roles of pre and post conditions differ in that a pre condition gives permission to a developer to ignore certain possibilities; the onus is on a user to prove that a component will not be initiated in a state that does not satisfy its pre condition. In contrast a post condition is an obligation on the code that is created according to the specification. This Deontic view carries over [to rely-guarantee reasoning].

I use words more proletarian than “deontic”, but this view is exactly what stands behind the concepts of Design by Contract and has been clearly emphasized in all Eiffel literature ever since the first edition of OOSC. It remains, however, misunderstood outside of the Eiffel community; many people confuse Design by Contract with its opposite, defensive programming. The criterion is simple: if you have a precondition to a routine, are you willing entirely to forsake the corresponding checks (conditionals, exceptions…) in the routine body? If not, you may be using the word “contract” as a marketing device, but that’s all. The courage to remove the checks is the true test of adulthood.

The application of Microsoft’s “Code Contracts” mechanism to the .NET libraries fails that test: a precondition may say “buffer not full” or “insertions allowed”, but the code still checks the condition and triggers an exception. The excuse I have heard is that one cannot trust those unwashed developers. But the methodological discipline is lost. Now let me repeat this using clearer terminology: it’s not deontic.

Reference

[1] Cliff Jones: The role of auxiliary variables in the formal development of concurrent programs, in Reflections on the work of C. A. R. Hoare, eds. Jones, Roscoe and Wood, Springer Lecture Notes in Computer Science,  2009, technical report version available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Domain Theory: precedents

Both Gary Leavens and Jim Horning commented (partly here, partly on Facebook) about my Domain Theory article [1] to mention that Larch had mechanisms for domain modeling and specification reuse. As Horning writes:

The Larch Shared Language was really all about creating reusable domain theories, including theorems about the domains.  See, for example [2] and [3].

I am honored that they found the time to write about the article and happy to acknowledge Larch, one of the most extensive efforts, over several decades, to provide serious notations and tools for specification. Leavens’s and Horning’s messages gave me the opportunity to re-read some Larch papers and discover a couple I did not know.

My article did not try to provide exhaustive references; if it had, Larch would have been among them. I would probably have cited my own paper on M [4], earlier than [3], which introduces a notation for composing specifications; see section 1.4 (“Features of the M method and the associated notation have thus been devised to allow for modular descriptions of systems. A system description may include an interface paragraph that describes the connection of the current specification with others, existing or yet to be written”) and the  presentation of these mechanisms in section 5.

Larch traits, described in [3], pursue a similar aim, but the earlier article cited by Horning [2] is a general, informal discussion of formal specification; it does not mention traits, and in fact does not cite Larch, stating instead “We have experimented with the use of two very different tools, PIE and Affirm, in constructing modest sized algebraic specifications”. Its general observations about the specification task remain useful today, and it does mention reuse in passing.

If we were to look for precedents, the basic source would have to be the Clear specification language of Goguen and Burstall, for which the citations [5, 6, 7] all appear in my M paper [4] and go back further: 1977-1981. Clear made a convincing case for modularizing specifications, and defined supporting language constructs.

Since these early publications, many people have come to realize that reuse and composition can be as useful on the specification side as they are for programming. Typical specification and verification techniques, however, do not take advantage of this idea and tend to make us restart every time from the lowest level. Domain Theory, as outlined in [1], is intended to bring abstraction, which has proved so beneficial in other parts of software engineering, to the world of specification.

References

[1] Domain Theory: The Forgotten step in program verification, an article in this blog, see here.

[2] John V. Guttag, James J. Horning, Jeannette M. Wing: Some Notes on Putting Formal Specifications to Productive Use, in Science of Computer Programming, vol. 2, no. 1, 1982, pages 53-68. (BM note: I found a copy here.)

[3] John V. Guttag, James J. Horning: A Larch Shared Language Handbook, in Science of Computer Programming, vol. 6, no. 2, 1986, pages 135-157. (BM note: I found a copy here, which also has a link to the Larch report.)

[4] Bertrand Meyer: M: A System Description Method, Technical Report TR CS 85-15, University of California, Santa Barbara, 1985, available here.

[5] Rod M. Burstall and Joe A. Goguen: Putting Theories Together to Make Specifications, in Proceedings of 5th International Joint Conference on Artificial Intelligence, Cambridge (Mass.), 1977, pages 1045- 1058.

[6] Rod M. Burstall and Joe A. Goguen: “The Semantics of Clear, a Specification Language,” in Proceedings of Advanced Course on Abstract Software Specifications, Copenhagen, Lecture Notes on Computer Science 86, Copenhagen, Springer-Verlag, 1980, pages 292-332, available here.

[7] Rod M. Burstall and Joe A. Goguen: An Informal Introduction to Specifications using Clear, in The Correctness Problem in Computer Science, eds R. S. Boyer and JJ. S. Moore, Springer-Verlag, 1981, pages 185-213.

VN:F [1.9.10_1130]
Rating: 8.8/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

Domain Theory: the forgotten step in program verification

 

Program verification is making considerable progress but is hampered by a lack of abstraction in specifications. A crucial step is, almost always, absent from the process; this omission is the principal obstacle to making verification a standard component of everyday software development.

Steps in software verification

In the first few minutes of any introduction to program verification, you will be told that the task requires two artifacts: a program, and a specification. The program describes what executions will do; the specification, what they are supposed to do. To verify software is to ascertain that the program matches the specification: that it does is what it should.

The consequence usually drawn is that verification consists of three steps: write a specification, write a program, prove that the program satisfies the specification. The practical process is of course messier, if only because the first two steps may occur in the reverse order and, more generally, all three steps are often intertwined: the specification and the program influence each other, in particular through the introduction of “verification conditions” into the program; and initial proof attempts will often lead to changes in both the specification and the program. But by and large these are the three accepted steps.

Such a description misses a fourth step, a prerequisite to specification that is essential to a scalable verification process: Domain Theory. Any program addresses a specific domain of discourse, be it the domain of network access and communication for a mobile phone system, the domain of air travel for a flight control system, of companies and shares for a stock exchange system and so on. Even simple programs with a limited scope, such as the computation of the maximum of an array, use a specific domain beyond elementary mathematics. In this example, it is the domain of arrays, with their specific properties: an array has a range, a minimum and maximum indexes in that range, an associated sequence of values; we may define a slice a [i..j], ask for the value associated with a given index, replace an element at a given index and so on. The Domain Theory provides a formal model for any such domain, with the appropriate mathematical operations and their properties. In the example the operations are the ones just mentioned, and the properties will include the axiom that if we replace an element at a certain index i with a value v then access the element at an index j, the value we get is v if i = j, and otherwise the earlier value at j.

The role of a Domain Theory

The task of devising a Domain Theory is to describe such a domain of reference, in the spirit of abstract data types: by listing the applicable operations and their properties. If we do not treat this task as a separate step, we end up with the kind of specification that works for toy examples but quickly becomes unmanageable for real-life applications. Most of the verification literature, unfortunately, relies on such specifications. They lack abstraction since they keep using the lowest-level mathematical objects and constructs, such as numbers and quantified expressions. They are to specification what assembly language is to modern programming.

Dines Bjørner has for a long time advocated a closely related idea, domain engineering; see for example his book in progress [1]. Unfortunately, he does not take advantage of modularization through abstract data types; the book is an example of always-back-to-the-basics specification, resorting time and again to fully explicit specifications based on a small number of mathematical primitives, and as a consequence making formal specification look difficult.

Maximum computed from both ends

As a simple example of modeling through an abstract theory consider an algorithm for computing the maximum of an array. We could use the standard technique that goes through the array one-way, but for variety let us take the algorithm that works from both ends, moving two integer cursors towards each other until they meet.  (This example was used in a verification competition at a recent conference, I forgot which one.) The code looks like this:

Two-way maximum

The specification, expressed by the postcondition (ensure) should state that Result is the maximum of the array; the loop invariant will be closely related to it. How do we express these properties? The obvious way is not the right way. It states the postcondition as something like

k: Z | (ka.lowerka.upper) a [k] ≤ Result

k: Z | ka.lowerka.upper a [k] = Result

In words, Result is at least as large as every element of the array, and is equal to at least one of the elements of the array. The invariant can also be expressed in this style (try it).

The preceding specification expresses the desired property, but it is of an outrageously lower level than called for. The notion of maximum is a general one for arrays over an ordered type. It can be computed through many different algorithms in addition to the one shown above, and exists independently of these algorithms. The detailed, assembly-language-like definition of its properties should not have to be repeated in every case. It should be part of the Domain Theory for the underlying notion, arrays.

A specification at the right level of abstraction

In a Domain Theory for arrays of elements from an ordered set, one of the principal operations is maximum, satisfying the above properties. The definition of maximum through these properties belongs at the Domain Theory level. The Domain Theory should include that definition, independent of any particular computational technique such as two_way_max. Then the routine’s postcondition, relying on this notion from the Domain Theory, becomes simply

Result = a.maximum

The application of this approach to the loop invariant is particularly interesting. If you tried to write it at the lowest level, as suggested above, you should have produced something like this:

a.lowerija.upper

k: Z | kikj ∧ (∀ l: Z | l a.lowerl a.upper a [l] ≤ a [k])

The first clause is appropriate but the rest is horrible! With its nested quantified expressions it gives an impression of great complexity for a property that is in fact straightforward, simple enough in fact to be explained to a 10-year-old: the maximum of the entire array can be found between indexes i and j. In other words, it is also the maximum of the array slice going from i to j. The Domain Theory will define the notion of slice and enable us to express the invariant as just

a.lowerij a.upper — This bounding clause remains

a.maximum = (a [i..j ]).maximum

(where we will write the slice a [i..j ] as a.slice (i, j ) if we do not have mechanisms for defining special syntax). To verify the routine becomes trivial: on loop exit the invariant still holds and i = j, so the maximum of the entire array is given by the maximum of the single-element slice a [i..i ], which is the value of its single element a [i ]. This last property — the maximum of a single-element array is its single value — is independent of the verification of any particular program and should be proved as a little theorem of the Domain Theory for arrays.

The comparison between the two versions is striking: without Domain Theory, we are back to the most tedious mathematical manipulations again and again; simple, clear properties look complicated and obscure. This just for a small example on basic data structures; now think what it will be for a complex application domain. Without a first step of formal modeling to develop a Domain Theory, no realistic specification and verification process is realistic.

Although the idea is illustrated here through examples of individual routines, the construction of a Domain Theory should usually occur, in an object-oriented development process, at the level of a class: the embodiment of an abstract data type, which is at the appropriate level of granularity. The theory applies to objects of a given type, and hence will be used for the verification of all operations of that type. This observation justifies the effort of devising a Domain Theory, since it will benefit a whole set of software elements.

Components of a Domain Theory

The Domain Theory should include the three ingredients illustrated in the example:

  • Operations, modeled as mathematical functions (no side effects of course, we are in the world of specification).
  • Axioms characterizing the defining properties of these operations.
  • Theorems, characterizing other important properties.

This approach is of course nothing else than abstract data types (the same thing, although few people realize it, as object-oriented analysis). Even though ADTs are a widely popularized notion, supported for example by tools such as CafeOBJ [2] and Maude [3], it is generally not taken to its full conclusions; in particular there is too often a tendency to define every new ADT from scratch, rather than building up libraries of reusable high-level mathematical components in the O-O spirit of reuse.

Results, not just definitions

In devising a Domain Theory with the three kinds of ingredient listed above, we should not forget the last one, the theorems! The most depressing characteristic of much of the work on formal specification is that it is long on definitions and short on results, while good mathematics is supposed to be the reverse. I think people who have seriously looked at formal methods and do not adopt them are turned off not so much by the need to use mathematics but by the impression they get little value for it.

That is why Eiffel contracts do get adopted: even if it’s just for testing and debugging, people see immediate returns. It suffices for a programmer to have caught one bug as the violation of a simple postcondition to be convinced for life and lose any initial math-phobia.

Quantifiers are evil

As we go beyond simple contract properties — this argument must be positive, this reference will not be void — the math needs to be at the same level of abstraction to which, as modern programmers, we are accustomed. For example, one should always be wary of program specifications relying directly on quantified expressions, as in the low-level variants of the postcondition and loop invariant of the two_way_max routine.

This is not just a matter of taste, as in the choice in logic [4] between lambda expressions (more low-level but also more immediately understandable) and combinators (more abstract but, for many, more abstruse). We are talking here about the fundamental software engineering problem of scalability; more generally, of the understandability, extendibility and reusability of programs, and the same criteria for their specification and verification. Quantifiers are of course needed to express fundamental properties of a structure but in general should not directly appear in program assertions: as the example illustrated, their level of abstraction is lower than the level of discourse of a modern object-oriented program. If the rule — Quantifiers Considered Harmful — is not absolute, it must be pretty close.

Quantified expressions, “All elements of this structure possess this property” and “Some element of this structure possesses this property” — belong in the description of the structure and not in the program. They should appear in the Domain Theory, not in the verification. If you want to express that a hash table search found an element of key K, you should not write

(Result = Void ∧ (∀ i: Z | i a.loweri a.upper a.item (i).key ≠ K))

(ResultVoid ∧ (∀ i: Z | i a.loweri a.upper a.item (i).key = K ∧ Result = a.item (i))

but

Result /= Void     (Result a.elements_of_key (K))

The quantified expressions will appear in the Domain Theory for the corresponding structure, in the definition of such domain properties as elements_of_key. Then the program’s specification — the contracts to be verified — can rely on concepts that make sense to the programmer; the verification will take advantage of theorems that have been proved independently since they belong to the Domain Theory and do not depend on individual programs.

Even the simplest examples…

Practical software verification requires Domain Theory even in the simplest cases, including those often used as purely academic examples. Perhaps the most common (and convenient) way to explain the notion of loop invariant is Euclid’s algorithm to compute the greatest common divisor (gcd) of two numbers (with a structure remarkably similar to that of two_way_max):
Euclid

I have expressed the postcondition using a concept from an assumed Domain Theory for the underlying problem: gcd, the mathematical function that yields the greatest common divisor of two integers. Many specifications I have seen go back to the basics, with something like this (using \\ for integer remainder):

a \\ Result = 0 b \\ Result = 0   ∀ i: N | (a \\ i = 0) ∧ (b \\ i = 0)  i Result

This is indeed the definition of what it means for Result to be the gcd of a and b (it divides a, it divides b, and is greater than any other integer that also has these two properties). But it makes no sense to include such a detailed mathematical property in the specification of a program element. It belongs in the domain theory, where it will serve as the definition of a function gcd, which we can then use directly in the specification of the program.

Note how the invariant makes the necessity of the Domain Theory approach even more clear: try to express it in the basic mathematical form, not using the function gcd, It can be done, but the result is typical of the high complexity to usefulness ratio of traditional formal specifications mentioned above. Instead, the invariant that I have included in the program text above says exactly what there is to say, clearly and concisely: at each iteration, the gcd of our two temporary values, i and j, is the result that we are seeking, the gcd of the original values a and b. On exit from the loop, when i and j are equal, their common value is that result.

It is also thanks to the Domain Theory modeling that the verification of the program — consisting of proving that the stated property is indeed invariant — will be so simple: as part of the theory, we should have the two little theorems

i > j > 0 gcd (i, j) = gcd (ij, j)
gcd
(i, i) = i

which immediately show the implementation to be correct.

Inside of any big, fat, messy, quantifier-ridden specification there is a simple, elegant and clear Domain-Theory-based specification desperately trying to get out. Find it and use it.

From Domain Theory to domain library

One of the reasons most people working on program verification have not used the division into levels of discourse described here, with a clear role for developing a Domain Theory, is that they lack the appropriate notational support. Mathematical notation is of course available, but we are talking about programs a general verification framework cannot resort to a new special notation for every new application domain.

This is one of the places where Eiffel provides a consistent solution, through its seamless approach to integrating programs and specifications in a single notation. Thanks to mechanisms such as deferred classes (classes that describe concepts through detailed specifications without committing to an implementation), Eiffel is as much for specification as for design and implementation; a Domain Theory can be expressed though a set of deferred Eiffel classes, which we may call a domain library. The classes in a domain library should not just be deferred, meaning devoid of implementation; they should in addition describe stateless operations only — queries, not commands — since they are modeling purely mathematical concepts.

An earlier article in this blog [5] outlined the context of our verification work: the EVE project (Eiffel Verification Environment), a practical approach to integrating software verification in the day-to-day practice of modern software development, with the slogan ““Verification As a Matter Of Course”. In this project we have applied the idea of Domain Theory by building a domain library covering fundamental concepts of set theory, including functions and relations. This is the Mathematical Model Library (MML) [6, 7], which we use to verify the new data structure library EiffelBase 2 using specifications at the appropriate level of abstraction.

MML is in fact useful for the specification of a wide variety of programs, since almost every application area can benefit from the general concepts of set, subset, relation and such. But to cover a specific application domain, say flight traffic control, MML will generally not suffice; you will need to devise a Domain Theory that mathematically models the target domain, and may express it in the form of a domain library written in the same general spirit as MML: all deferred, stateless, focused on high-level abstractions.

It is one of the attractions of Eiffel that you can express such a theory and library in the same notation as the programs that will use it — more precisely in a subset of that notation, since the specification classes do not need the imperative constructs of the language such as instructions and attributes. Then both the development process and the verification use a seamlessly integrated set of notations and techniques, and all use the same tools from a modern IDE, in our case EiffelStudio, for browsing, editing, working with graphical repreentation, metrics etc.

DSL libraries for specifications

A mechanism to express Domain Theories is to a general specification mechanism essentially like a Domain Specific Language (DSL) is to a general programming language: a specialization for a particular domain. Domain libraries make the approach practical by:

  • Embedding the specification language in the programming language.
  • Fundamentally relying on reuse, in the best spirit of object technology.

This approach is in line with the one I presented for handling DSLs in an earlier article of this blog [8] (thanks, by the way, for the many comments received, some of them posted here and some on Facebook and LinkedIn where the post triggered long discussions). It is usually a bad idea to invent a new language for a new application domain. A better solution is to rely on libraries, by taking advantage of the power of object-oriented mechanisms to model (in domain libraries) and implement (for DSLs) the defining features of such a domain, and to make the result widely reusable. The resulting libraries are purely descriptive in the case of a domain library expressing a Domain Theory, and directly usable by programs in the case of a library embodying a DSL, but the goal is the same.

A sound and necessary engineering practice

Many ideas superficially look similar to Domain Theory: domain engineering as mentioned above, “domain analysis” as widely discussed in the requirements literature, model-driven development, abstract data type specification… They all start from some of the same observations, but  Domain Theory as described in this article is something different: a systematic approach to modeling an arbitrary application domain mathematically, which:

  • Describes the concepts through applicable operations, axioms and (most importantly) theorems.
  • Expresses these elements in an applicative (side-effect free, i.e. equivalent to pure mathematics) subset of the programming language, for direct embedding in program specifications.
  • Relies on the class mechanism to structure the results.
  • Collects the specifications into specification libraries and promotes the reuse of specifications in the same way we promote software reuse.
  • Uses the combination of these techniques to ensure that program specifications are at a high level of abstraction, compatible with the programmers’ view of their software.
  • Promotes a clear and effective verification process.

The core idea is in line with standard engineering practices in disciplines other than software: to build a bridge, a car or a chip you need first to develop a sound model of the future system and its environment, using any useful models developed previously rather than always going back to elementary textbook mathematics.

It seems in fact easier to justify doing Domain Analysis than to justify not doing it. The power of expression and abstraction of our programs has grown by leaps and bounds; it’s time for our specifications to catch up.

References

[1] Dines Bjørner: From Domains to Requirements —The Triptych Approach to Software Engineering, “to be submitted to Springer”, available here.

[2] Kokichi Futatsugi and others: CafeObj page, here.

[3] José Meseguer and others: Maude publication page, here.

[4] J. Roger Hindley, J. P. Seldin: Introduction to Combinators and l-calculus, Cambridge University Press, 1986.

[5] Verification As a Matter Of Course, earlier article on this blog (March 2010), available here.

[6] Bernd Schoeller, Tobias Widmer and Bertrand Meyer. Making specifications complete through models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, pages 48-70, available here.

[7] Nadia Polikarpova, Carlo A. Furia and Bertrand Meyer: Specifying Reusable Components, in VSTTE’10: Verified Software: Theories, Tools and Experiments, Edinburgh, August 2010, Lecture Notes in Computer Science, Springer-Verlag, available here.

[8] Never Design a Language, earlier article on this blog (January 2012), available here.

VN:F [1.9.10_1130]
Rating: 9.3/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)