Archive for the ‘Formal methods and proofs’ Category.

AutoProof workshop: Verification As a Matter of Course

The AutoProof technology pursues the goal of “Verification As a Matter Of Course”, integrated into the EVE development environment. (The AutoProof  project page here; see particularly the online interactive tutorial.) A one-day workshop devoted to the existing AutoProof and current development will take place on October 1 near Toulouse in France. It is an informal event (no proceedings planned at this point, although based on the submissions we might decide to produce a volume), on a small scale, designed to bring together people interested in making the idea of practical verification a reality.

The keynote will be given by Rustan Leino from Microsoft Research, the principal author of the Boogie framework on which the current implementation of AutoProof relies.

For submissions (or to attend without submitting) see the workshop page here. You are also welcome to contact me for more information.

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

Design by Contract: ACM Webinar this Thursday

A third ACM webinar this year (after two on agile methods): I will be providing a general introduction to Design by Contract. The date is this coming Thursday, September 17, and the time is noon New York (18 Paris/Zurich, 17 London, 9 Los Angeles, see here for hours elsewhere). Please tune in! The event is free but requires registration here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

New paper: Theory of Programs

Programming, wrote Dijkstra many years ago, is a branch of applied mathematics. That is only half of the picture: the other half is engineering, and this dual nature of programming is part of its attraction.

Descriptions of the mathematical side are generally, in my view, too complicated. This article [1] presents a mathematical theory of programs and programming based on concepts taught in high school: elementary set theory. The concepts covered include:

  • Programming.
  • Specification.
  • Refinement.
  • Non-determinism.
  • Feasibility.
  • Correctness.
  • Programming languages.
  • Kinds of programs: imperative, functional, object-oriented.
  • Concurrency (small-step and large-step)
  • Control structures (compound, if-then-else and Dijkstra-style conditional, loop).
  • State, store and environment.
  • Invariants.
  • Notational conventions for building specifications and programs incrementally.
  • Loop invariants and variants.

One of the principal ideas is that a program is simply the description of a mathematical relation. The program text is a rendering of that relation. As a consequence, one may construct programming languages simply as notations to express certain kinds of mathematics. This approach is the reverse of the usual one, where the program text and its programming languages are the starting point and the center of attention: theoreticians develop techniques to relate them to mathematical concepts. It is more effective to start from the mathematics (“unparsing” rather than parsing).

All the results (74 properties expressed formally, a number of others in the text) are derived as theorems from rules of elementary set theory; there are no new axioms whatsoever.

The paper also has a short version [2], omitting proofs and many details.

References

[1] Theory of Programs, available here.
[2] Theory of Programs, short version of [1] (meant for quick understanding of the ideas, not for publication), available here.

 

VN:F [1.9.10_1130]
Rating: 6.4/10 (18 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 9 votes)

Framing the frame problem (new paper)

Among the open problems of verification, particularly the verification of object-oriented programs, one of the most vexing is framing: how to specify and verify what programs element do not change. Continuing previous work, this article presents a “double frame inference” method, automatic on both sides the specification and verification sides. There is no need to write frame specifications: they will be inferred from routine postconditions. For verification, the method computes the set of actually changed properties through a “change calculus”, itself based on the previously developed alias calculus.

Some verification techniques, such as Hoare-style proofs, require significant annotation effort and potentially yield full functional verification; others, such as model checking and abstract interpretation, have more limited goals but seek full automation. Framing, in my opinion, should be automatic, freeing the programmer-verifier to devote the annotation effort to truly interesting properties.

Reference

[1] Bertrand Meyer: Framing the Frame Problem, in Dependable Software Systems, Proceedings of August 2014 Marktoberdorf summer school, eds. Alexander Pretschner, Manfred Broy and Maximilian Irlbeck, NATO Science for Peace and Security, Series D: Information and Communication Security, Springer, 2015 (to appear), pages 174-185; preprint available here.

VN:F [1.9.10_1130]
Rating: 8.8/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Detecting deadlock automatically? (New paper)

To verify sequential programs, we have to prove that they do the right thing, but also that they do it within our lifetime — that they terminate. The termination problem is considerably harder with concurrent programs, since they add a new form of non-termination: deadlock. A set of concurrent processes or threads will deadlock if they end up each holding a resource that another wants and wanting a resource that another holds.

There is no general solution to the deadlock problem, even a good enough general solution. (“Good enough” is the best we can hope for, since like many important problems deadlock is undecidable.) It is already hard enough to provide run-time deadlock detection, to be able at least to cancel execution when deadlock happens. The research reported in this new paper [1] pursues the harder goal of static detection. It applies to an object-oriented context (specifically the SCOOP model of concurrent OO computation) and relies fundamentally on the alias calculus, a static alias analysis technique developed in previous publications.

The approach is at its inception and considerable work remains to be done. Still, the example handled by the paper is encouraging: analyzing two versions of the dining philosophers problem and proving — manually — that one can deadlock and the other cannot.

References

[1] Bertrand Meyer: An automatic technique for static deadlock prevention, in PSI 2014 (Ershov Informatics Conference), eds. Irina Virbitskaite and Andrei Voronkov, Lecture Notes in Computer Science, Springer, 2015, to appear.; draft available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

Non-morganatic specifications

I have temporarily withdrawn this article because the specific case it used as an example has changed. I will re-publish it as soon as the situation has stabilized.

VN:F [1.9.10_1130]
Rating: 6.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Attached by default?

 

Opinions requested! See at end.

A void call, during the execution of an object-oriented program, is a call of the standard OO form

x·some_routine (…)                                                /CALL/

where x, a reference, happens to be void (null) instead of denoting, as expected, an object. The operation is not possible; it leads to an exception and, usually, a crash of the program. Void calls are also called “null pointer dereferencing”.

One of the major advances in Eiffel over the past years has been the introduction of attached types, entirely removing the risk of void calls. The language mechanisms, extending the type system, make void-call avoidance a static property, part of type checking: just as the compiler will prevent you from assigning a boolean value to an integer variable, so will it flag your program if it sees a risk of void call. Put the other way around, if your program passes compilation, you have the guarantee that its executions will never produce a void call. Attached types thus remove one of the major headaches of programming, what Tony Hoare [1] called his “one-billion-dollar mistake”:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W) [2]. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty year

Thanks to attached types, Eiffel programmers can sleep at night: their programs will not encounter void calls.

To benefit from this advance, you must declare variables accordingly, as either attached (never void after initialization) or detachable (possibly void). You must also write the program properly:

  • If you declare x attached, you must ensure in the rest of the program that before its first use x will have been attached to an object, for example through a creation instruction create x.
  • If you declare x detachable, you must make sure that any call of the above form /CALL/ happens in a context where x is guaranteed to be non-void; for example, you could protect it by a test if x /= Void then or, better, an “object test”.

Code satisfying these properties is called void-safe.

Void safety is the way to go: who wants to worry about programs, even after they have been thoroughly tested and have seemingly worked for a while, crashing at unpredictable times? The absence of null-pointer-dereferencing can be a statically  enforced property, as the experience of Eiffel now demonstrates; and that what it should be. One day, children will think void-safely from the most tender age, and their great-grandparents will tell them, around the fireplace during long and scary winter nights, about the old days when not everyone was programming in Eiffel and even those who did were worried about the sudden null-pointer-derefencing syndrome. To get void safety through ordinary x: PERSON declarations, you had (children, hold your breath) to turn on a compiler option!

The transition to void safety was neither fast nor easy; in fact, it has taken almost ten years. Not everyone was convinced from the beginning, and we have had to improve and simplify the mechanism along the way to make void-safe programming practical. Compatibility has been a key issue throughout: older classes are generally not void-safe, but in a language that has been around for many years and has a large code base of operational software it is essential to ensure a smooth transition. Void safety has, from its introduction, been controlled by a compiler option:

  • With the option off, old code will compile as it used to do, but you do not get any guarantee of void safety. At execution time, a void call can still cause your program to go berserk.
  • With the option on, you get the guarantee: no void calls. To achieve this goal, you have to make sure the classes obey the void safety rules; if they do not, the compiler will reject them until you fix the problem.

In the effort to reconcile the compatibility imperative with the inexorable evolution to void safety, the key decisions have affected default values for compiler options and language conventions. Three separate decisions, in fact. Two of the defaults have already been switched; the question asked at the end of this article addresses the switching of the last remaining one.

The first default governed the void-safety compiler option. On its introduction, void-safety was off by default; the mechanism had to be turned on explicitly, part of the “experimental” option that most EiffelStudio releases offer for new, tentative mechanisms. That particular decision changed a year ago, with version 7.3 (May 2013): now void safety is the default. To include non-void-safe code you must mark  it explicitly.

The second default affects a language convention: the meaning of a standard declaration. A typical declaration, such as

x: PERSON                                                                                      /A/

says that at run time x denotes a reference which, if not void, will be attached to an object of type PERSON.  In pre-void-safety Eiffel, as in today’s other typed OO languages,  the reference could occasionally become void at run time; in other words, x was detachable. With the introduction of void safety, you could emphasize this property by specifying it explicitly:

x: detachable PERSON                                                             /B/

You could also specify that x would never be void by declaring it attached, asking the compiler to guarantee this property for you (through its application of the void-safety rules to all operations involving x). The explicit form in this case is

x: attached PERSON                                                               /C/

In practical programming, of course, you do not want to specify attached or detachable all the time: you want to use the simple form /A/ as often as possible. Originally, since we were starting from a non-void-safe language, compatibility required /A/ to mean /B/ by default. But it turns out that “attached” really is the dominant case: most references should remain attached at all times and Void values should be reserved for important but highly specialized cases such as terminating linked data structures. So the simple form should, in the final state of the language, mean /C/. That particular default was indeed switched early (version 7.0, November 2011) for people using the void-safety compiler option. As a result, the attached keyword is no longer necessary for declarations such as the above, although it remains available. Everything is attached by default; when you want a reference that could be void (and are prepared to bear the responsibility for convincing the compiler that it won’t when you actually use it in a call), you declare it as detachable; that keyword remains necessary.

There remains one last step in the march to all-aboard-for-void-safety: removing the “detachable by default” option, that is to say, the compiler option that will make /A/ mean /B/ (rather than /C/). It is only an option, and not the default; but still it remains available. Do we truly need it? The argument for removing it  is that it simplifies the specification (the fewer options the better) and encourages everyone, even more than before, to move to the new world. The argument against is to avoid disturbing existing projects, including their compiler control files (ECFs).

The question looms: when do we switch the defaults? Some of us think the time is now; specifically, the November release (14.11) [4].

Do you think the option should go? We would like your opinion. Please participate in the Eiffelroom poll [5].

 

References and note

[1] C.A.R. Hoare: Null References: The Billion Dollar Mistake , abstract of talk at QCon London, 9-12 March 2009, available here.

[2] (BM note) As a consolation, before Algol W, LISP already had NIL, which is the null pointer.

[3] Bertrand Meyer, Alexander Kogtenkov and Emmanuel Stapf: Avoid a Void: The Eradication of Null Dereferencing, in Reflections on the Work of C.A.R. Hoare, eds. C. B. Jones, A.W. Roscoe and K.R. Wood, Springer-Verlag, 2010, pages 189-211, available here.

[4] EiffelStudio version numbering changed in 2014: from a classic major_number.minor_number to a plain year.month, with two principal releases, 5 and 11 (May and November).

[5] Poll on switching the attachment defaults: at the bottom of the Eiffelroom page here (direct access here).

VN:F [1.9.10_1130]
Rating: 8.6/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

Code matters

(Adapted from an article previously published on the CACM blog.)

Often, you will be told that programming languages do not matter much. What actually matters more is not clear; maybe tools, maybe methodology, maybe process. It is a pretty general rule that people arguing that language does not matter are partisans of bad languages.

Let us consider the Apple bug of a few weeks ago. Only a few weeks; the world has already moved to Heartbleed (to be discussed in a subsequent article), but that is not a reason to sweep away the memory of the Apple bug and the language design that it reflects.

In late February, users of  iPhones, iPads and iPods were enjoined to upgrade their devices immediately because  “an attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS.” The bug was traced [1] to code of the following form:

if (error_of_first_kind)
goto fail;
if (error_of_second_kind)
goto fail;
if (error_of_third_kind)
goto fail;
if (error_of_fourth_kind)
goto fail;
if (error_of_fifth_kind)
goto fail;
goto fail;
if (error_of_sixth_kind)
goto fail;
The_truly_important_code_handling_non_erroneous_case

In other words: just a duplicated line! (The extra line is highlighted above.) But the excess “goto” is beyond the scope of the preceding “if“, so it is executed unconditionally: all executions go directly to the “fail” label, so that The_truly_important_code_handling_non_erroneous_case never gets executed.

Critics have focused their ire on the  goto instruction, but it is of little relevance. What matters, language-wise, is the C/C++-Java-C# convention of delimiting the scope of conditional instructions, loops and other kinds of composite structures. Every component of such structures in these languages is syntactically a single instruction, so that:

  • If you want the branch to consist of an atomic instruction, you write that instruction by itself, as in: if (c) a = b;
  • If you want a sequence of instructions, you write it as a compound, enclosed by the ever so beautiful braces: if (c) {a = b; x = y;}

Although elegant in principle (after all, it comes from Algol), this convention is disastrous from a software engineering perspective because software engineering means understanding that programs change. One day, a branch of a conditional or loop has one atomic instruction; sometime later, a maintainer realizes that the corresponding case requires more sophisticated treatment, and adds an instruction, but fails to add the braces.

The proper language solution is to do away with the notion of compound instruction as a separate concept, but simply expect all branches of composite instructions to consist of a sequence, which could consist of several instructions, just one, or none at all. In Eiffel, you will write

if  c then
   x := y
end

or

 if  c then
   a := b
   x := y
else
   u := v
end

or

from i := 1 until c loop
   a := b
   i := i + 1
end

or

across my_list as l loop
   l.add (x)
end

and so on. This syntax also gets rid of all the noise that pollutes programs in languages retaining C’s nineteen-sixties conventions: parentheses around the conditions, semicolons for instructions on different lines; these small distractions accumulate into serious impediments to program readability.

With such a modern language design, the Apple bug could not have arisen. A duplicated line is either:

  • A keyword such as end, immediately caught as a syntax error.
  • An actual instruction such as an assignment, whose duplication causes either no effect or an effect limited to the particular case covered by the branch, rather than catastrophically disrupting all cases, as in the Apple bug.

Some people, however, find it hard to accept the obvious responsibility of language design. Take this comment derisively entitled  “the goto squirrel” by Dennis Hamilton in the ACM Risks forum [2]:

It is amazing to me that, once the specific defect is disclosed (and the diff of the actual change has also been published), the discussion has devolved into one of coding style and whose code is better.  I remember similar distractions around the Ariane 501 defect too, although in that case there was nothing wrong with the code—the error was that it was being run when it wasn’t needed and it was not simulation tested with new launch parameters under the mistaken assumption that if the code worked for Ariane 4, it should work for Ariane 5.

It is not about the code.  It is not about the code.  It is not about goto. It is not about coming up with ways to avoid introducing this particular defect by writing the code differently.

Such certainty! Repeating a wrong statement ( “it is not about the code“) does not make it  right. Of course “it” is about the code! If the code had been different the catastrophe would not have happened, so one needs some gall to state that the code is not the issue — and just as much gall, given that the catastrophe would also not have happened if the programming language had been different, to state that it is not about the programming language.

When Mr. Hamilton dismisses as “distractions” the explanations pointing to programming-related causes for the Ariane-5 disaster, I assume he has in mind the analysis which I published at the time with Jean-Marc Jézéquel [3], which explained in detail how the core issue was the absence of proper specifications (contracts). At that time too, we heard dismissive comments; according to one of the critics, the programming aspects did not count, since the whole thing was really a social problem: the French engineers in Toulouse did not communicate properly with their colleagues in England! What is great with such folk explanations is that they sound just right and please people because they reinforce existing stereotypes. They are by nature as impossible to refute as they are impossible to prove. And they avoid raising the important but disturbing questions: were the teams using the right programming language, the right specification method (contracts, as our article suggested), appropriate tools? In both the Ariane-5 and Apple cases, they were not.

If you want to be considered polite, you are not supposed to point out that the use of programming languages designed for the PDP-8 or some other long-gone machine is an invitation to disaster. The more terrible the programming language people use, and the more they know it is terrible (even if they will not admit it), the more scandalized they will be that you point out that it is, indeed, terrible. It is as if you had said something about their weight or the pimples on their cheeks. Such reactions do not make the comment less true. The expression of outrage is particularly inappropriate when technical choices are not just matters for technical argument, but have catastrophic consequences on society.

The usual excuse, in response to language criticisms, is that better tools, better quality control (the main recommendation of the Ariane-5 inquiry committee back in 1997), better methodology would also have avoided the problem. Indeed, a number of the other comments in the comp.risks discussion that includes Hamilton’s dismissal of code [2] point in this direction, noting for example that static analyzers could have detected code duplication and unreachable instructions. These observations are all true, but change nothing to the role of programming languages and coding issues.  One of the basic lessons from the study of software and other industrial disasters — see for example the work of Nancy Leveson — is that a disaster results from a combination of causes. This property is in fact easy to understand: a disaster coming from a single cause would most likely have been avoided. Consider the hypothetical example of a disastrous flaw in Amazon’s transaction processing. It seems from various sources that Amazon processes something like 300 transactions a second. Now let us assume three independent factors, each occurring with a probability of a thousandth (10-3), which could contribute to a failure. Then:

  • It is impossible that one of the factors could cause failure just by itself: that means it would make a transaction after around 3 seconds, and would be caught even in the most trivial unit testing. No one but the developer would ever know about it.
  • If two of the factors together cause failure, they will occur every million transactions, meaning about once an hour. Any reasonable testing will discover the problem before a release is ever deployed.
  • If all three factors are required, the probability is 10-9, meaning that a failure will occur about once a year. Only in that case will a real problem exist: a flaw that goes undetected for a long time, during which everything seems normal, until disaster strikes.

These observations explain why post-mortem examinations of catastrophes always point to a seemingly impossible combination of unfortunate circumstances. The archduke went to Sarajevo and he insisted on seeing the wounded and someone forgot to tell the drivers about the prudent decision to bypass the announced itinerary and the convoy stalled  and the assassin saw it and he hit Franz-Ferdinand right in the neck and there was nationalistic resentment in various countries and the system of alliances required countries to declare war [4]. Same thing for industrial accidents. Same thing for the Apple bug: obviously, there were no good code reviews and no static analysis tools applied and no good management; and, obviously, a programming language that blows out innocent mistakes into disasters of planetary import.

So much for the accepted wisdom, heard again and again in software engineering circles, that code does not matter, syntax does not count, typos are caught right away, and that all we should care about is process or agility or requirements or some other high-sounding concern more respectable than programming. Code? Programming languages? Did we not take care of those years ago? I remember similar distractions.”

There is a  positive conclusion to the “and” nature (in probabilistic terms, the multiplicative nature) of causes necessary to produce a catastrophe in practice: it suffices to get rid of one of the operands of the “and” to falsify its result, hence avoiding the catastrophe. When people tell you that code does not matter or that language does not matter, just understand the comment for what it really means, “I am ashamed of the programming language and techniques I use but do not want to admit it so I prefer to blame problems on the rest of the world“, and make the correct deduction: use a good programming language.

References

[1] Paul Duckline:  Anatomy of a “goto fail” – Apple’s SSL bug explained, plus an unofficial patch for OS X!, Naked Security blog (Sophos), 24 February 2014, available here.

[2] Dennis E. Hamilton: The Goto Squirrel, ACM Risks Forum, 28 February 2014, available here.

[3] Jean-Marc Jézéquel and Bertrand Meyer: Design by Contract: The Lessons of Ariane, in Computer (IEEE), vol. 30, no. 1, January 1997, pages 129-130, available online here and, with reader responses here.

[4] Assassination of Ferdinand of Autria: here.

VN:F [1.9.10_1130]
Rating: 8.1/10 (12 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

New article: contracts in practice

For almost anyone programming in Eiffel, contracts are just a standard part of daily life; Patrice Chalin’s pioneering study of a few years ago [1] confirmed this impression. A larger empirical study is now available to understand how developers actually use contracts when available. The study, to published at FM 2014 [2] covers 21 programs, not just in Eiffel but also in JML and in Code Contracts for C#, totaling 830,000 lines of code, and following the program’s revision history for a grand total of 260 million lines of code over 7700 revisions. It analyzes in detail whether programmers use contracts, how they use them (in particular, which kinds, among preconditions, postconditions and invariants), how contracts evolve over time, and how inheritance interacts with contracts.

The paper is easy to read so I will refer you to it for the detailed conclusions, but one thing is clear: anyone who thinks contracts are for special development or special developers is completely off-track. In an environment supporting contracts, especially as a native part of the language, programmers understand their benefits and apply them as a matter of course.

References

[1] Patrice Chalin: Are practitioners writing contracts?, in Fault-Tolerant System, eds. Butler, Jones, Romanovsky, Troubitsyna, Springer LNCS, vol. 4157, pp. 100–113, 2006.

[2] H.-Christian Estler, Carlo A. Furia, Martin Nordio, Marco Piccioni and Bertrand Meyer: Contracts in Practice, to appear in proceedings of 19th International Symposium on Formal Methods (FM 2014), Singapore, May 2014, draft available here.

VN:F [1.9.10_1130]
Rating: 8.4/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

Eiffel as an expression language

A functional-programming style, or more generally a style involving more expressions and fewer instructions, is possible in Eiffel. In particular, Eiffel’s agent mechanism embeds a full functional-programming mechanism in the object-oriented framework of the language.

To make the notations simpler, we are discussing and tentatively implementing a number of proposed extensions. They involve no fundamental new language mechanisms, but provide new, more concise notations for existing mechanisms. Examples are:

  • Conditional expressions.
  • Implicit tuple, a rule allowing the omission of brackets for an actual argument when it is a tuple and the last argument, e.g. f (x, y, z) as an abbreviation for f ([x, y, z]) (an example involving just one argument). Tuples already provided the equivalent of a variable-argument (“varargs”) facility, but it is made simpler to use with this convention.
  • Parenthesis alias, making it possible to write just f (x, y) when f is an agent (closure, lambda expression, delegate etc. in other terminologies), i.e. treating f as if it were a function; the notation is simply an abbreviation for f.item ([x, y]) (an example that also takes advantage of implicit tuples). It has many other applications since a “parenthesis alias” can be defined for a feature of any class.
  • Avoiding explicit assignments to Result.
  • Type inference (to avoid explicitly specifying the type when it can be deduced from the context). This is a facility for the programmer, useful in particular for local variables, but does not affect the type system: Eiffel remains strongly typed, it is just that you can be lazy about writing the type when there is no ambiguity.
  • In the same vein, omitting the entire list of generic parameters when it can be inferred.

The description of the mechanism (see the link in [1]) is in the form of a set of slides explaining the concepts and presenting example. This is a working document and feedback is welcome.

References

[1] Eiffel as an expression language, Eiffel Software working document, 2012-2014, see here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

LASER 2014 (Elba, September)

2014 marks the 10-th anniversary (11th edition) of the LASER summer school. The school will be held September 7-14, 2014, and the detailed information is here.

LASER (the name means Laboratory for Applied Software Engineering Research) is dedicated to practical software engineering. The roster of speakers since we started is a who’s who of innovators in the field. Some of the flavor of the school can gathered from the three proceedings volumes published in Springer LNCS (more on the way) or simply by browsing the pages of the schools from previous years.

Usually we have a theme, but to mark this anniversary we decided to go for speakers first; we do have a title, “Leading-Edge Software Engineering”, but broad enough to encompass a wide variety of a broad range of topics presented by star speakers: Harald Gall, Daniel Jackson, Michael Jackson, Erik Meijer (appearing at LASER for the third time!), Gail Murphy and Moshe Vardi. With such a cast you can expect to learn something important regardless of your own primary specialty.

LASER is unique in its setting: a 5-star hotel in the island paradise of Elba, with outstanding food and countless opportunities for exploring the marvelous land, the beaches, the sea, the geology (since antiquity Elba has been famous for its stones and minerals) and the history, from the Romans to Napoleon, who in the 9 months of his reign changed the island forever. The school is serious stuff (8:30 to 13 and 17 to 20 every day), but with enough time to enjoy the surroundings.

Registration is open now.

VN:F [1.9.10_1130]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

Negative variables: new version

I have mentioned this paper before (see the earlier blog entry here) but it is now going to be published [1] and has been significantly revised, both to take referee comments into account and because we found better ways to present the concepts.

We have  endeavored to explain better than in the draft why the concept of negative variable is necessary and why the usual techniques for modeling object-oriented programs do not work properly for the fundamental OO operation, qualified call x.r (…). These techniques are based on substitution and are simply unable to express certain properties (let alone verify them). The affected properties are those involving properties of the calling context or the global project structure.

The basic idea (repeated in part from the earlier post) is as follows. In modeling OO programs, we have to take into account the unique “general relativity” property of OO programming: all the operations you write are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

We introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current.

Negative variable as back pointer

The paper explains why this concept is necessary, describes the associated formal rules, and presents applications.

Reference

[1] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, to appear in Specification, Algebra, and Software, eds. Shusaku Iida, Jose Meseguer and Kazuhiro Ogata, Springer Lecture Notes in Computer Science, 2014, to appear. See text here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

Saint Petersburg Software Engineering Seminar: 14 January 2014 (6 PM)

There will be two talks in the Software Engineering Seminar at ITMO, 18:00 local time, Tuesday, January 14, 2014. Please arrive 10 minutes early for registration.

Place: ITMO, Sytninskaya Ulitsa, Saint Petersburg.

Andrey Terekhov (SPBGU): Programming crystals

(I do not know whether this talk will be in Russian or English. An abstract follows but the talk is meant as the start of a discussion rather than a formal lecture.)

В течение последних 20-30 лет основными языками программирования кристаллов были VHDL и Verilog. Эти языки изначально проектировались как средства создания проектной документации, потом они стали использоваться в качестве инструмента моделирования и только сравнительно недавно для этих языков появились средства генерации кода уровня RTL (Register Transfer Language). Тексты на  VHDL и Verilog очень громоздки, трудно читаемые, плохо стандартизованы (одна и та же программа может синтезироваться на одном инструменте и не поддаваться синтезу на другом. Лет 10 назад появился язык SystemC – это С++ с огромным набором библиотек. С одной стороны, любая программа на SystemC может транслироваться стандартными трансляторами С++ , есть удобные средства потактного моделирования и приличные средства генгерации RTL, с другой стороны, требование совместимости с С++ не прошло даром – если в базовом языке нет средств описания параллелизма и конвейеризации, их приходится добавлять весьма искусственными приемами через приставные библиотеки. Буквально в прошлом году фирма Xilinx выпустила продукт Vivado, в рекламе которого утверждается, что он способен автоматически транслировать обычные программы на С/C++ в RTL промышленного качества.

Мы выполнили несколько экспериментов по использованию этого продукта, оказалось, что обещанной автоматизации там нет, пользователь должен писать на С, постоянно думая о том, как его код будет выглядеть в финальном RTL,  расставлять огромное количество прагм, причем не всегда очевидных.

Основной тезис доклада – такая важная область, как проектирование кристаллов, нуждается в специализированных языковых и инструментальных средствах, обеспечивающих  создание компактных и  легко читаемых программ, которые могут быть использованы как для симуляции, так и для генерации эффективного RTL. В докладе будут приведены примеры программ на языке HaSCoL (Hardware and Software Codesign Language), разработанном на кафедре системного программирования СПбГУ, и даны некоторые сравнительные характеристики.

Sergey Velder (ITMO): Alias graphs

(My summary – BM.) In the ITMO SEL work on automatic alias analysis, a new model has been developed: alias graphs, an abstraction of the object structure. This short talk will compare it to previously used approaches.

VN:F [1.9.10_1130]
Rating: 5.5/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

New paper: alias calculus and frame inference

For a while now I have  been engaged in  a core problem of software verification: the aliasing problem. As with many difficult problems in science, it is easy to state the basic question: can we determine automatically whether at a program point p the values of two reference expressions e and f can ever denote the same object?

Alias analysis lies at the core of many problems in software analysis and verification.

Earlier work [2] I introduced an “alias calculus”. The calculus is a set of rules, attached to the constructs of the programming language, to compute the “alias relation”: the set of possibly aliased expression pairs. A new paper [1] with Sergey Velder and Alexander Kogtenkov improves the model (correcting in particular an error in the axiom for assignment, whose new version has been proved sound using Coq) and applies it to the inference of frame properties. Here the abstract:

Alias analysis, which determines whether two expressions in a program may reference to the same object, has many potential applications in program construction and verification. We have developed a theory for alias analysis, the “alias calculus”, implemented its application to an object-oriented language, and integrated the result into a modern IDE. The calculus has a higher level of precision than many existing alias analysis techniques. One of the principal applications is to allow automatic change analysis, which leads to inferring “modifies clauses”, providing a significant advance towards addressing the Frame Problem. Experiments were able to infer the “modifies” clauses of an existing formally specied library. Other applications, in particular to concurrent programming, also appear possible. The article presents the calculus, the application to frame inference including experimental results, and other projected applications. The ongoing work includes building more efficient model capturing aliasing properties and soundness proof for its essential elements.

This is not the end of the work, as better models and implementations are needed, but an important step.

References

[1] Sergey Velder, Alexander Kogtenkovand Bertrand Meyer: Alias Calculus, Frame Calculus and Frame Inference, in Science of Computer Programming, to appear in 2014 (appeared online 26 November 2013); draft available here, published version here.
[2] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, Chinese Academy of Sciences, 2011, pages 77-116, available here.

 

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

Smaller, better textbook

A new version of my Touch of Class [1] programming textbook is available. It is not quite a new edition but more than just a new printing. All the typos that had been reported as of a few months ago have been corrected.

The format is also significantly smaller. This change is more than a trifle. When а  reader told me for the first time “really nice book, pity it is so heavy!”, I commiserated and did not pay much attention. After twenty people said that, and many more after them, including professors looking for textbooks for their introductory programming classes, I realized it was a big deal. The reason the book was big and heavy was not so much the size of the contents (876 is not small, but not outrageous for a textbook introducing all the fundamental concepts of programming). Rather, it is a technical matter: the text is printed in color, and Springer really wanted to do a good job, choosing thick enough paper that the colors would not seep though. In addition I chose a big font to make the text readable, resulting in a large format. In fact I overdid it; the font is bigger than necessary, even for readers who do not all have the good near-reading sight of a typical 19-year-old student.

We kept the color and the good paper,  but reduced the font size and hence the length and width. The result is still very readable, and much more portable. I am happy to make my contribution to reducing energy consumption (at ETH alone, think of the burden on Switzerland’s global energy bid of 200+ students carrying the book — as I hope they do — every morning on the buses, trains and trams crisscrossing the city!).

Springer also provides electronic access.

Touch of Class is the textbook developed on the basis of the Introduction to Programming course [2], which I have taught at ETH Zurich for the last ten years. It provides a broad overview of programming, starting at an elementary level (zeros and ones!) and encompassing topics not usually covered in introductory courses, even a short introduction to lambda calculus. You can get an idea of the style of coverage of such topics by looking up the sample chapter on recursion at touch.ethz.ch. Examples of other topics covered include a general introduction to software engineering and software tools. The presentation uses full-fledged object-oriented concepts (including inheritance, polymorphism, genericity) right from the start, and Design by Contract throughout. Based on the “inverted curriculum” ideas on which I published a number of articles, it presents students with a library of reusable components, the Traffic library for graphical modeling of traffic in a city, and builds on this infrastructure both to teach students abstraction (reusing code through interfaces including contracts) and to provide them models of high-quality code for imitation and inspiration.

For more details see the article on this blog that introduced the book when it was first published [3].

References

[1] Bertrand Meyer, Touch of Class: An Introduction to Programming Well Using Objects and Contracts, Springer Verlag, 2nd printing, 2013. The Amazon page is here. See the book’s own page (with slides and other teaching materials, sample chapter etc.) here. (Also available in Russian, see here.)

[2] Einführung in die Programmierung (Introduction to Programming) course, English course page here.

[3] Touch of Class published, article on this blog, 11 August 2009, see [1] here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 7 votes)

The invariants of key algorithms (new paper)

 

I have mentioned this paper before but as a draft. It has now been accepted by ACM’s Computing Surveys and is scheduled to appear in September 2014; the current text, revised from the previous version, is available [1].

Here is the abstract:

Software verification has emerged as a key concern for ensuring the continued progress of information technology. Full verification generally requires, as a crucial step, equipping each loop with a “loop invariant”. Beyond their role in verification, loop invariants help program understanding by providing fundamental insights into the nature of algorithms. In practice, finding sound and useful invariants remains a challenge. Fortunately, many invariants seem intuitively to exhibit a common flavor. Understanding these fundamental invariant patterns could therefore provide help for understanding and verifying a large variety of programs.

We performed a systematic identification, validation, and classification of loop invariants over a range of fundamental algorithms from diverse areas of computer science. This article analyzes the patterns, as uncovered in this study,governing how invariants are derived from postconditions;it proposes a taxonomy of invariants according to these patterns, and presents its application to the algorithms reviewed. The discussion also shows the need for high-level specifications based on “domain theory”. It describes how the invariants and the corresponding algorithms have been mechanically verified using an automated program prover; the proof source files are available. The contributions also include suggestions for invariant inference and for model-based specification.

Reference

[1] Carlo Furia, Bertrand Meyer and Sergey Velder: Loop invariants: Analysis, Classification and Examples, in ACM Computing Surveys, to appear in September 2014, preliminary text available here.

VN:F [1.9.10_1130]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

Specify less to prove more

Software verification is progressing slowly but surely. Much of that progress is incremental: making the fundamental results applicable to real programs as they are built every day by programmers working in standard circumstances. A key condition is to minimize the amount of annotations that they have to provide.

The article mentioned in my previous post, “Program Checking With Less Hassle” [1], to be presented at VSTTE in San Francisco on Friday by its lead author, Julian Tschannen, introduces several interesting contributions in this direction. One of the surprising conclusions is that sometimes it pays to specify less. That goes against intuition: usually, the more specification information (correctness annotations) you provide the more you help the prover. But in fact partial specifications can hurt rather than help. Consider for example a swap routine with a partial specification, which actually stands in the way of a proof. If modularity is not a concern, for example if the routine is part of the code being verified rather than of a library, it may be more effective to ignore the specification and use the routine’s implementation. This is particularly appropriate for smallhelper routines such as the swap example.

This inlining technique is applicable in other cases, for example to make up for a missing precondition: assume that a helper routine will only work for x > 0 but does not state that precondition, or maybe states only the weaker one x ≥ 0 ; in the code, however, it is only called with positive arguments. If we try to verify the code modularly we will fail, as indeed we should since the routine is incorrect as a general-purpose primitive. But within the context of the code there is nothing wrong with it. Forgetting the contract of the routine if any, and instead using its actual implementation, we may be able to show that everything is fine.

Another component of the approach is to fill in preconditions that programmers have omitted because they are somehow obvious to them. For example it is tempting and common to write just a [1] > 0 rather than a /= Void and then a [1] > 0 for a detachable array a. The tool takes care of  interpreting the simpler precondition as the more complete one.

The resulting “two-step verification”, integrated into the AutoProof verification tool for Eiffel, should turn out to be an important simplification towards the goal of “Verification As a Matter Of Course” [2].

References

[1] Julian Tschannen, Carlo A. Furia, Martin Nordio and Bertrand Meyer: Program Checking With Less Hassle, in VSTTE 2013, Springer LNCS, to appear, draft available here; presentation on May 17 in the 15:30-16:30  session.

[2] Verification As a Matter Of Course, article in this blog, 29 March 2010, see here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Presentations at ICSE and VSTTE

 

The following presentations from our ETH group in the ICSE week (International Conference on Software Engineering, San Francisco) address important issues of software specification and verification, describing new techniques that we have recently developed as part of our work building EVE, the Eiffel Verification Environment. One is at ICSE proper and the other at VSTTE (Verified Software: Tools, Theories, Experiments). If you are around please attend them.

Julian Tschannen will present Program Checking With Less Hassle, written with Carlo A. Furia, Martin Nordio and me, at VSTTE on May 17 in the 15:30-16:30 session (see here in the VSTTE program. The draft is available here. I will write a blog article about this work in the coming days.

Nadia Polikarpova will present What Good Are Strong Specifications?, written with , Carlo A. Furia, Yu Pei, Yi Wei and me at ICSE on May 22 in the 13:30-15:30 session (see here in the ICSE program). The draft is available here. I wrote about this paper in an earlier post: see here. It describes the systematic application of theory-based modeling to the full specification and verification of advanced software.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Multirequirements (new paper)

 

As part of a Festschrift volume for Martin Glinz of the university of Zurich I wrote a paper [1] describing a general approach to requirements that I have been practicing and developing for a while, and presented in a couple of talks. The basic idea is to rely on object-oriented techniques, including contracts for the semantics, and to weave several levels of discourse: natural-language, formal and graphical.

Reference

[1] Bertrand Meyer: Multirequirements, to appear in Martin Glinz Festschrift, eds. Anne Koziolek and Norbert Scheyff, 2013, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Negative variables and the essence of object-oriented programming (new paper)

In modeling object-oriented programs, for purposes of verification (proofs) or merely for a better understanding, we are faced with the unique “general relativity” property of OO programming: all the operations you write (excluding non-OO mechanisms such as static functions) are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

In earlier work [1, 2] initially motivated by the development of the Alias Calculus, I introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current. Alexander Kogtenkov has implemented these ideas and refined them.

Negative variable as back pointer

In a recent paper under submission [3], we review the concepts and applications of negative variables.

References

[1] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, 2011, available here.

[2] Bertrand Meyer: Towards a Calculus of Object Programs, in Patterns, Programming and Everything, Judith Bishop Festschrift, eds. Karin Breitman and Nigel Horspool, Springer-Verlag, 2012, pages 91-128, available here.

[3] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, submitted for publication, 2012. [Updated 13 January 2014: I have removed the link to the draft mentioned in this post since it is now superseded by the new version, soon to be published, and available here.]

VN:F [1.9.10_1130]
Rating: 9.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

A fundamental duality of software engineering

A couple of weeks ago I proposed a small quiz. (I also stated that the answer would come “on Wednesday” — please understand any such promise as “whenever I find the time”. Sorry.) Here is the answer.

The quiz was:

I have a function:

  • For 0 it yields 0.
  • For 1 it yields 1.
  • For 2 it yields 4.
  • For 3 it yields 9.
  • For 4 it yields 16.

What is the value for 5?

Shortly thereafter I added a hint: the value for 5 is 25, and changed the question to: “What is the value for 6?”. For good measure we can also ask about the value for 1000. Now compare your answer to  what follows.

A good answer for the value at 6 is: 34 . The function in this case is -10 + 5 x + |2 x – 3| + |2 x -7|. It matches the values for the given inputs.

Linear, small values

 

 

 

 

 

 

 

 

 

The value for 1000 is 8980:

Linear function, full range

 

 

 

 

 

 

 

 

 

Another good answer at position 6 is 35.6. It comes up if we assume the function is over reals rather than integers; then a possible formula, which correlates very well (R-square of 0.9997) with the values at the given inputs, is:

869.42645566111 (1 – 0.4325853145802 e-.0467615868913719  (x – 17.7342512233011))2.3116827277657443

Exponential function, initial range

 

 

 

 

 

 

 

 

 

 

with a quite different asymptotic behavior, giving the value 869.4 at position 1000:

Exponential, full range

 

 

 

 

 

 

 

 

 

 

Some readers might have thought of another possibility, the square function x2, which again matches all the given values:

Square function, initial range

 

 

 

 

 

 

 

 

 

 

So which of these answers is right? Each is as good as the others, and as bad. There is in particular no reason to believe that the values given in the quiz’s statement suggest the square function. Any function that fits the given values, exactly (if we stick to integers) or approximately (with reals as simulated on a computer) is an equally worthy candidate. Six inputs, or six thousand, do not resolve the question. At best they are hints.

This difference between a hint and a solution is at the core of software engineering. It is, for example, the difference between a test and a specification. A test tells us that the program works for some values; as Dijkstra famously pointed out, and anyone who has developed a serious program has experienced, it does not tell us that it will work for others. The more successful tests, the more hints; but they are still only hints. I have always wondered whether Dijkstra was explicitly thinking of the Popperian notion of falsifiability: no number of experiments will prove a physical theory (although a careful experiment may boost the confidence in the theory, especially if competing theories fail to explain it, as the famous Eddington expedition did for relativity in 1919 [1]); but a single experiment can disprove a theory. Similarly, being told that our function’s value at 6 is 34 disqualifies the square function and the last one (the exponential), but does not guarantee that the first function (the linear combination) is the solution.

The specification-testing duality is the extension to computer science of the basic duality of logic. It starts with the elementary boolean operators: to prove a or b it suffices to establish a or to establish b; and to disprove a and b it suffices to show that a does not hold or to show that b does not hold. The other way round, to disprove a or b we have to show that a does not hold and to show that b does not hold; to prove that a and b holds, we have to show that a holds and to show that b holds.

Predicate calculus generalizes or to , “there exists”, and and to , “for all”. To prove ∃ x | p (x) (there is an x of which p holds) it suffices to find one value a such that p (a); let’s be pretentious and say we have “skolemized” x. To disprove∀ x | p (x) (p holds of all x) it suffices to find one value for which p does not hold.

In software engineering the corresponding duality is between proofs and tests, or (equivalently) specifications and use cases. A specification is like a “for all”: it tells us what must happen for all envisioned inputs. A test is like a “there exists”: it tells us what happens for a particular input and hence, as in predicate calculus, it is interesting as a disproof mechanism:

  • A successful test brings little information (like learning the value for 5 when trying to figure out what a function is, or finding one true value in trying to prove a or a false value in trying to prove a ).
  • An unsuccessful test brings us decisive information (like a false value for a ): the program is definitely not correct. It skolemizes incorrectness.

A proof, for its part, brings the discussion to an end when it is successful. In practice, testing may still be useful in this case, but only testing that addresses issues not covered by the proof:

  • Correctness of the compiler and platform, if not themselves proved correct.
  • Correctness the proof tools themselves, since most practical proofs require software support.
  • Aspects not covered by the specification such as, typically, performance and usability.

But for the properties it does cover the proof is final.

It is as foolish, then, to use tests in lieu of specifications as it would be to ignore the limitations of a proof. Agile approaches have caused much confusion here; as often happens in the agile literature [2], the powerful insight is mixed up with harmful advice. The insight, which has significantly improved the practice of software development, is that the regression test suite is a key asset of a project and that tests should be run throughout. The bad advice is to ditch upfront requirements and specifications in favor of tests. The property that tests lack and specifications possess is generality. A test is an instance; a thousand tests can never be more than a thousand instances. As I pointed out in a short note in EiffelWorld (the precursor to this blog) a few years ago [3], the relationship is not symmetric: one can generate tests from a specification, but not the other way around.

The same relationship holds between use cases and requirements. It is stunning to see how many people think that use cases (scenarios) are a form of requirements. As requirements they are as useless as one or ten values are to defining a function. Use cases are a way to complement the requirements by describing the system’s behavior in selected important cases. A kind of reality check, to ensure that whatever abstract aims have been defined for the system it still covers the cases known to be of immediate interest. But to rely on use cases as requirements means that you will get a system that will satisfy the use cases — and possibly little else.

When I use systems designed in recent years, in particular Web-based systems, I often find myself in a stranglehold: I am stuck with the cases that the specifiers thought of. Maybe it’s me, but my needs tend, somehow, to fall outside of these cases. Actually it is not just me. Not long ago, I was sitting close to a small-business owner who was trying to find her way through an insurance site. Clearly the site had a planned execution path for employees, and another for administrators. Problem: she was both an employee and the administrator. I do not know how the session ended, but it was a clear case of misdesign: a system built in terms of standard scenarios. Good specification performs an extra step of abstraction (for example using object-oriented techniques and contracts, but this is for another article). Skipping this step means forsaking the principal responsibility of the requirements phase: to generalize from an analysis of the behavior in known cases to a definition of the desired behaviors in all relevant cases.

Once more, as everywhere else in computer science [4], abstraction is the key to solid results that stand the test of time. Definitely better than judging a book by its cover, inferring a function by its first few values, verifying a program by its tests, or specifying a system by its use cases.

References

[1] See e.g. a blog article: Einstein and Eddington, here.

[2] Bertrand Meyer: Agile! The Good, the Hype and the Ugly, 2013, to appear.

[3] Bertrand Meyer: Test or spec? Test and spec? Test from spec!, EiffelWorld column, 2004 available here.

[4] Jeff Kramer: Is Abstraction the Key to Computer Science?, in Communications of the ACM, vol. 50, no. 4, April 2007, pages 36-42,  available from CiteSeer here

VN:F [1.9.10_1130]
Rating: 9.5/10 (32 votes cast)
VN:F [1.9.10_1130]
Rating: +13 (from 13 votes)

Precedent

Alexander Kogtenkov pointed out to me that precursor work to my papers on the Alias Calculus [1] [2] had been published by John Whaley and Martin Rinard [3]. There are some significant differences; in particular my rules are simpler, and their work is not explicitly presented as a calculus. But many of the basic ideas are the same. The reason I did not cite that paper is simply that I was not aware of it; I am happy to correct the omission.

References

[1] Bertrand Meyer: Towards a Theory and Calculus of Aliasing, in Journal of Object Technology, vol. 9, no. 2, March-April 2010, pages 37-74, available here (superseded by [2])
[2] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, 2011, available here (revised and improved version of [1].)
[3] John Whaley and Martin Rinard: Compositional Pointer and Escape Analysis for Java Programs, in POPL 1999, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Domain Theory: precedents

Both Gary Leavens and Jim Horning commented (partly here, partly on Facebook) about my Domain Theory article [1] to mention that Larch had mechanisms for domain modeling and specification reuse. As Horning writes:

The Larch Shared Language was really all about creating reusable domain theories, including theorems about the domains.  See, for example [2] and [3].

I am honored that they found the time to write about the article and happy to acknowledge Larch, one of the most extensive efforts, over several decades, to provide serious notations and tools for specification. Leavens’s and Horning’s messages gave me the opportunity to re-read some Larch papers and discover a couple I did not know.

My article did not try to provide exhaustive references; if it had, Larch would have been among them. I would probably have cited my own paper on M [4], earlier than [3], which introduces a notation for composing specifications; see section 1.4 (“Features of the M method and the associated notation have thus been devised to allow for modular descriptions of systems. A system description may include an interface paragraph that describes the connection of the current specification with others, existing or yet to be written”) and the  presentation of these mechanisms in section 5.

Larch traits, described in [3], pursue a similar aim, but the earlier article cited by Horning [2] is a general, informal discussion of formal specification; it does not mention traits, and in fact does not cite Larch, stating instead “We have experimented with the use of two very different tools, PIE and Affirm, in constructing modest sized algebraic specifications”. Its general observations about the specification task remain useful today, and it does mention reuse in passing.

If we were to look for precedents, the basic source would have to be the Clear specification language of Goguen and Burstall, for which the citations [5, 6, 7] all appear in my M paper [4] and go back further: 1977-1981. Clear made a convincing case for modularizing specifications, and defined supporting language constructs.

Since these early publications, many people have come to realize that reuse and composition can be as useful on the specification side as they are for programming. Typical specification and verification techniques, however, do not take advantage of this idea and tend to make us restart every time from the lowest level. Domain Theory, as outlined in [1], is intended to bring abstraction, which has proved so beneficial in other parts of software engineering, to the world of specification.

References

[1] Domain Theory: The Forgotten step in program verification, an article in this blog, see here.

[2] John V. Guttag, James J. Horning, Jeannette M. Wing: Some Notes on Putting Formal Specifications to Productive Use, in Science of Computer Programming, vol. 2, no. 1, 1982, pages 53-68. (BM note: I found a copy here.)

[3] John V. Guttag, James J. Horning: A Larch Shared Language Handbook, in Science of Computer Programming, vol. 6, no. 2, 1986, pages 135-157. (BM note: I found a copy here, which also has a link to the Larch report.)

[4] Bertrand Meyer: M: A System Description Method, Technical Report TR CS 85-15, University of California, Santa Barbara, 1985, available here.

[5] Rod M. Burstall and Joe A. Goguen: Putting Theories Together to Make Specifications, in Proceedings of 5th International Joint Conference on Artificial Intelligence, Cambridge (Mass.), 1977, pages 1045- 1058.

[6] Rod M. Burstall and Joe A. Goguen: “The Semantics of Clear, a Specification Language,” in Proceedings of Advanced Course on Abstract Software Specifications, Copenhagen, Lecture Notes on Computer Science 86, Copenhagen, Springer-Verlag, 1980, pages 292-332, available here.

[7] Rod M. Burstall and Joe A. Goguen: An Informal Introduction to Specifications using Clear, in The Correctness Problem in Computer Science, eds R. S. Boyer and JJ. S. Moore, Springer-Verlag, 1981, pages 185-213.

VN:F [1.9.10_1130]
Rating: 8.8/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

Aliasing and framing: Saint Petersburg seminar next week

In  last Thursday’s session of the seminar, Kokichi Futatsugi’s talk took longer than planned (and it would have been a pity to stop him), so I postponed my own talk on Automatic inference of frame conditions through the alias calculus to next week (Thursday local date). As usual it will be broadcast live.

Seminar page: here, including the link to follow the webcast.

Time and date: 5 April 2012, 18 Saint Petersburg time; you can see the local time at your location here.

Abstract:

Frame specifications, the description of what does not change in a routine call, are one of the most annoying components of verification, in particular for object-oriented software. Ideally frame conditions should be inferred automatically. I will present how the alias calculus, described in recent papers, can address this need.

There may be a second talk, on hybrid systems, by Sergey Velder.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

New LASER proceedings

Springer has just published in the tutorial sub-series of Lecture Notes in Computer Science a new proceedings volume for the LASER summer school [1]. The five chapters are notes from the 2008, 2009 and 2010 schools (a previous volume [2] covered earlier schools). The themes range over search-based software engineering (Mark Harman and colleagues), replication of software engineering experiments (Natalia Juristo and Omar Gómez), integration of testing and formal analysis (Mauro Pezzè and colleagues), and, in two papers by our ETH group, Is branch coverage a good measure of testing effectiveness (with Yi Wei and Manuel Oriol — answer: not really!) and a formal reference for SCOOP (with Benjamin Morandi and Sebastian Nanz).

The idea of these LASER tutorial books — which are now a tradition, with the volume from the 2011 school currently in preparation — is to collect material from the presentations at the summer school, prepared by the lecturers themselves, sometimes in collaboration with some of the participants. Reading them is not quite as fun as attending the school, but it gives an idea.

The 2012 school is in full preparation, on the theme of “Advanced Languages for Software Engineering” and with once again an exceptional roster of speakers, or should I say an exceptional roster of exceptional speakers: Guido van Rossum (Python), Ivar Jacobson (from UML to Semat), Simon Peyton-Jones (Haskell), Roberto Ierusalimschy (Lua), Martin Odersky (Scala), Andrei Alexandrescu (C++ and D),Erik Meijer (C# and LINQ), plus me on the design and evolution of Eiffel.

The preparation of LASER 2012 is under way, with registration now open [3]; the school will take place from Sept. 2 to Sept. 8 and, like its predecessors, in the wonderful setting on the island of Elba, off the coast of Tuscany, with a very dense technical program but time for enjoying the beach, the amenities of a 4-star hotel and the many treasures of the island. On the other hand not everyone likes Italy, the sun, the Mediterranean etc.; that’s fine too, you can wait for the 2013 proceedings.

References

[1] Bertrand Meyer and Martin Nordio (eds): Empirical Software Engineering and Verification, International Summer Schools LASER 2008-2010, Elba Island, Italy, Revised Tutorial Lectures, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[2] Peter Müller (ed.): Advanced Lectures on Software Engineering, LASER Summer School 2007-2008, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[3] LASER summer school information and registration form, http://se.ethz.ch/laser.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

ERC Advanced Investigator Grant: Concurrency Made Easy

In April we will be starting the  “Concurrency Made Easy” research project, the result of a just announced Advanced Investigator Grant from the European Research Council. Such ERC grants are awarded to a specific person, rather than a consortium of research organizations as in the usual EU funding scheme. The usual amount, which applies in my case, is 2.5 million euros (currently almost 3 .3 million dollars) over five years, on a specific theme. According to the ERC’s own description [1],

ERC Advanced Grants allow exceptional established research leaders of any nationality and any age to pursue ground-breaking, high-risk projects that open new directions in their respective research fields or other domains.

This is the most sought-after research funding instrument of the EU, with a success rate of about 12% [2], out of a group already preselected by the host institutions. What makes ERC Advanced Investigator Grants so coveted is the flexibility of the scheme (no constraints on the topic, light administrative baggage) and the trust that an award implies in a particular researcher and his ability to carry out advanced research.

The name of the CME project clearly signals its ambition: to turn concurrent programming into a normal, unheroic part of programming. Today adding concurrency to a program, usually in the form of multithreading, is very hard, complexity and risk of all kinds. Everyone is telling us that we must rethink programming, retrain programmers and revamp curricula to put the specific reasoning modes of concurrent programming at the center. I don’t think this can work; thinking concurrently is just too hard to become the default mode. Instead, we should adapt programming languages, theories and tools so that programmers can continue to apply the reasoning schemes that have proved so successful in classical programming, especially object-oriented programming with the benefit of Design by Contract.

The starting point is the SCOOP model, to which I started an introduction in an earlier article of this blog [3], with a sequel yet to come. SCOOP is a minimal extension to the O-O framework to support concurrency, yielding very simple (the S in the acronym) solutions to concurrent programming problems. As part of the CME project we plan to develop it in many different directions and establish a sound and effective formal basis.

I have put the project description — the scientific part of the actual proposal text accepted by the ERC — online [4].

In the next few weeks I will be publishing here specific announcements for the positions we are seeking to fill very quickly; they include postdocs, PhD students, and one research engineer. We are looking for candidates with excellent knowledge and practice of concurrency, Eiffel, formal techniques etc. The formal application procedure will be Web-based and is not in place yet but you can contact me if you fit the profile and are interested.

We can defeat the curse: concurrent programming (an obligatory condition of any path towards a successful future for information technology) does not have to be black magic. It can be made simple and efficient. Such is the challenge of the CME project.

References

[1] European Research Council: Advanced Grants, available here.

[2] European Research Council: Press release on 2011 Advanced Investigator Grants, 24 January 2012, available here.

[3] Concurrent Programming is Easy, article from this blog, available here.

[4] CME Advanced Investigator Grant project description, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

TOOLS 2012, “The Triumph of Objects”, Prague in May: Call for Workshops

Workshop proposals are invited for TOOLS 2012, The Triumph of Objectstools.ethz.ch, to be held in Prague May 28 to June 1. TOOLS is a federated set of conferences:

  • TOOLS EUROPE 2012: 50th International Conference on Objects, Models, Components, Patterns.
  • ICMT 2012: 5th International Conference on Model Transformation.
  • Software Composition 2012: 10th International Conference.
  • TAP 2012: 6th International Conference on Tests And Proofs.
  • MSEPT 2012: International Conference on Multicore Software Engineering, Performance, and Tools.

Workshops, which are normally one- or two-day long, provide organizers and participants with an opportunity to exchange opinions, advance ideas, and discuss preliminary results on current topics. The focus can be on in-depth research topics related to the themes of the TOOLS conferences, on best practices, on applications and industrial issues, or on some combination of these.

SUBMISSION GUIDELINES

Submission proposal implies the organizers’ commitment to organize and lead the workshop personally if it is accepted. The proposal should include:

  •  Workshop title.
  • Names and short bio of organizers .
  • Proposed duration.
  •  Summary of the topics, goals and contents (guideline: 500 words).
  •  Brief description of the audience and community to which the workshop is targeted.
  • Plans for publication if any.
  • Tentative Call for Papers.

Acceptance criteria are:

  • Organizers’ track record and ability to lead a successful workshop.
  •  Potential to advance the state of the art.
  • Relevance of topics and contents to the topics of the TOOLS federated conferences.
  •  Timeliness and interest to a sufficiently large community.

Please send the proposals to me (Bertrand.Meyer AT inf.ethz.ch), with a Subject header including the words “TOOLS WORKSHOP“. Feel free to contact me if you have any question.

DATES

  •  Workshop proposal submission deadline: 17 February 2012.
  • Notification of acceptance or rejection: as promptly as possible and no later than February 24.
  • Workshops: 28 May to 1 June 2012.

 

VN:F [1.9.10_1130]
Rating: 7.3/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

The story of our field, in a few short words

 

(With all dues to [1], but going up from four to five as it is good to be brief yet not curt.)

At the start there was Alan. He was the best of all: built the right math model (years ahead of the real thing in any shape, color or form); was able to prove that no one among us can know for sure if his or her loops — or their code as a whole — will ever stop; got to crack the Nazis’ codes; and in so doing kind of saved the world. Once the war was over he got to build his own CPUs, among the very first two or three of any sort. But after the Brits had used him, they hated him, let him down, broke him (for the sole crime that he was too gay for the time or at least for their taste), and soon he died.

There was Ed. Once upon a time he was Dutch, but one day he got on a plane and — voilà! — the next day he was a Texan. Yet he never got the twang. The first topic that had put him on  the map was the graph (how to find a path, as short as can be, from a start to a sink); he also wrote an Algol tool (the first I think to deal with all of Algol 60), and built an OS made of many a layer, which he named THE in honor of his alma mater [2]. He soon got known for his harsh views, spoke of the GOTO and its users in terms akin to libel, and wrote words, not at all kind, about BASIC and PL/I. All this he aired in the form of his famed “EWD”s, notes that he would xerox and send by post along the globe (there was no Web, no Net and no Email back then) to pals and foes alike. He could be kind, but often he stung. In work whose value will last more, he said that all we must care about is to prove our stuff right; or (to be more close to his own words) to build it so that it is sure to be right, and keep it so from start to end, the proof and the code going hand in hand. One of the keys, for him, was to use as a basis for ifs and loops the idea of a “guard”, which does imply that the very same code can in one case print a value A and in some other case print a value B, under the watch of an angel or a demon; but he said this does not have to be a cause for worry.

At about that time there was Wirth, whom some call Nick, and Hoare, whom all call Tony. (“Tony” is short for a list of no less than three long first names, which makes for a good quiz at a party of nerds — can you cite them all from rote?) Nick had a nice coda to Algol, which he named “W”; what came after Algol W was also much noted, but the onset of Unix and hence of C cast some shade over its later life. Tony too did much to help the field grow. Early on, he had shown a good way to sort an array real quick. Later he wrote that for every type of unit there must be an axiom or a rule, which gives it an exact sense and lets you know for sure what will hold after every run of your code. His fame also comes from work (based in part on Ed’s idea of the guard, noted above) on the topic of more than one run at once, a field that is very hot today as the law of Moore nears its end and every maker of chips has moved to  a mode where each wafer holds more than one — and often many — cores.

Dave (from the US, but then at work under the clime of the North) must not be left out of this list. In a paper pair, both from the same year and both much cited ever since,  he told the world that what we say about a piece of code must only be a part, often a very small part, of what we could say if we cared about every trait and every quirk. In other words, we must draw a clear line: on one side, what the rest of the code must know of that one piece; on the other, what it may avoid to know of it, and even not care about. Dave also spent much time to argue that our specs must not rely so much on logic, and more on a form of table.  In a later paper, short and sweet, he told us that it may not be so bad that you do not apply full rigor when you chart your road to code, as long as you can “fake” such rigor (his own word) after the fact.

Of UML, MDA and other such TLAs, the less be said, the more happy we all fare.

A big step came from the cold: not just one Norse but two, Ole-J (Dahl) and Kris, came up with the idea of the class; not just that, but all that makes the basis of what today we call “O-O”. For a long time few would heed their view, but then came Alan (Kay), Adele and their gang at PARC, who tied it all to the mouse and icons and menus and all the other cool stuff that makes up a good GUI. It still took a while, and a lot of hit and miss, but in the end O-O came to rule the world.

As to the math basis, it came in part from MIT — think Barb and John — and the idea, known as the ADT (not all TLAs are bad!), that a data type must be known at a high level, not from the nuts and bolts.

There also is a guy with a long first name (he hates it when they call him Bert) but a short last name. I feel a great urge to tell you all that he did, all that he does and all that he will do, but much of it uses long words that would seem hard to fit here; and he is, in any case, far too shy.

It is not all about code and we must not fail to note Barry (Boehm), Watts, Vic and all those to whom we owe that the human side (dear to Tom and Tim) also came to light. Barry has a great model that lets you find out, while it is not yet too late, how much your tasks will cost; its name fails me right now, but I think it is all in upper case.  At some point the agile guys — Kent (Beck) and so on — came in and said we had got it all wrong: we must work in pairs, set our goals to no more than a week away, stand up for a while at the start of each day (a feat known by the cool name of Scrum), and dump specs in favor of tests. Some of this, to be fair, is very much like what comes out of the less noble part of the male of the cow; but in truth not all of it is bad, and we must not yield to the urge to throw away the baby along with the water of the bath.

I could go on (and on, and on); who knows, I might even come back at some point and add to this. On the other hand I take it that by now you got the idea, and even on this last day of the week I have other work to do, so ciao.

Notes

[1] Al’s Famed Model Of the World, In Words Of Four Signs Or Fewer (not quite the exact title, but very close): find it on line here.

[2] If not quite his alma mater in the exact sense of the term, at least the place where he had a post at the time. (If we can trust this entry, his true alma mater would have been Leyde, but he did not stay long.)

VN:F [1.9.10_1130]
Rating: 10.0/10 (14 votes cast)
VN:F [1.9.10_1130]
Rating: +11 (from 11 votes)

The charming naïveté of an IEEE standard

The IEEE Standard for Requirements Specifications [1], a short and readable text providing concrete and useful advice, is a valuable guide for anyone writing requirements. In our course projects we always require students to follow its recommended structure.

Re-reading it recently, I noticed the following extract  in the section that argues that a  requirements specification should be verifiable (sentence labels in brackets are my addition):

[A] Nonverifiable requirements include statements such as “works well,” “good human interface,” and “shall usually happen.” [B] These requirements cannot be verified because it is impossible to define the terms “good,” “well,” or “usually.”

[C] The statement that “the program shall never enter an infinite loop” is nonverifiable because the testing of this quality is theoretically impossible.

[D] An example of a verifiable statement is
      [E] “Output of the program shall be produced within 20 s of event 60% of the time; and shall be produced within 30 s of event 100% of the time.”
[F] This statement can be verified because it uses concrete terms and measurable quantities.

[A] and [B] are good advice, deserving to be repeated in every software engineering course and to anyone writing requirements. [C], however, is puzzling.

One might initially understand that the authors are telling us that it is impossible to devise a finite set of tests guaranteeing that a program terminates. But on closer examination this cannot be what they mean. Such a statement, although correct, would be uninteresting since it can be applied to any functional requirement: if I say “the program shall accept an integer as input and print out that same integer on the output”, I also cannot test that (trivial) requirement finitely since I would have to try all integers. The same observation applies to the example given in [D, E, F]: the property [D] they laud as an example of a  “verifiable” requirement is just as impossible to test exhaustively [2].

Since the literal interpretation of [C] is trivial and applies to essentially all possible requirements, whether bad or good in the authors’ eyes, they must mean something else when they cite loop termination as their example of a nonverifiable requirement. The word “theoretically” suggests what they have in mind: the undecidability results of computation theory, specifically the undecidability of the Halting Problem. It is well known that no general mechanism exists to determine whether an arbitrary program, or even just an arbitrary loop, will terminate. This must be what they are referring to.

Except, of course, that they are wrong. And a very good thing too that they are wrong, since “The program shall never enter an infinite loop” is a pretty reasonable requirement for any system [3].

If we were to accept [C], we would also accept that it is OK for any program to enter an infinite loop every once in a while, because the authors of its requirements were not permitted to specify otherwise! Fortunately for users of software systems, this particular sentence of the standard is balderdash.

What the halting property states, of course, is that no general mechanism exists that could examine an arbitrary program or loop and tell us whether it will always terminate. This result in no way excludes the possibility of verifying (although not through “testing”) that a particular program or loop will terminate. If the text of a program shows that it will print “Hello World” and do nothing else, we can safely determine that it will terminate. If a loop is of the form

from i := 1 until i > 10 loop
…..print (i)
…..i := i + 1
end

there is also no doubt about its termination. More complex examples require the techniques of modern program verification, such as exhibiting a loop variant in the sense of Hoare logic, but they can still be practically tractable.

Like many fundamental results of modern science (think of Heisenberg’s uncertainty principle), Turing’s demonstration of the undecidability of the Halting Problem is at the same time simple to state, striking, deep, and easy to misunderstand. It is touchingly refreshing to find such a misunderstanding in an IEEE standard.

Do not let it discourage you from applying the excellent advice of the rest of IEEE 830-1998, ; but when you write a program, do make sure — whether or not the requirements specify this property explicitly — that all its loops terminate.

Reference and notes

[1] IEEE Computer Society: IEEE Recommended Practice for Software Requirements SpeciÞcations, IEEE Standard 830-1998, revised 1998; available here (with subscription).

[2] The property [E] is actually more difficult to test, even non-exhaustively, than the authors acknowledge, if only because it is a probabilistic requirement, which can only be tested after one has defined appropriate probabilistic hypotheses.

[3] In requesting that all programs must terminate we must of course take note of the special case of systems that are non-terminating by design, such as most embedded systems. Such systems, however, are still made out of components representing individual steps that must terminate. The operating system on your smartphone may need to run forever (or until the next reboot), but the processing of an incoming text message is still, like a traditional program, required to terminate in finite time.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

Specification explosion

To verify software, we must specify it; otherwise there is nothing to verify against. People often cite the burden of specification as the major obstacle toward making verification practical. At issue are not only the effort required to express the goals of software elements (their contracts) but also intermediate assertions, or “verification conditions”, including loop invariants, required by the machinery of verification.

At a Microsoft Software Verification summer school [1] in Moscow on July 18 — the reason why there was no article on this blog last week — Stefan Tobies, one of the lecturers, made the following observation about the specification effort needed to produce fully verified software. In his experience, he said, the ratio of specification lines to program lines is three to one.

Such a specification explosion, to coin a phrase, has to be addressed by any practical approach to verification. It would be interesting to get estimates from others with verification experience.

Reducing specification explosion  is crucial to the Eiffel effort to provide “Verification As a Matter Of Course” [2]. The following three techniques should go a long way:

  • Loop invariant inference. Programmers can be expected to write contracts expressing the purpose of routines (preconditions, postconditions) and classes (class invariants), but often balk at writing the intermediate assertions necessary to prove the correctness of loops. An earlier article [3] mentioned some ongoing work on this problem and I hope to come back to the topic.
  • Frame conventions. As another recent article has discussed [4], a simple language convention can dramatically reduce the number of assertions by making frame conditions explicit.
  • Model-based contracts. This technique calls for a separate article; the basic idea is to express the effect of operations through high-level mathematical models relying on a library that describe such mathematical abstractions as sets, relations, functions and graphs.

The risk of specification explosion is serious enough to merit a concerted defense.

 

References

[1] Summer School in Software Engineering and Verification, details here.

[2] Verification As a Matter Of Course, slides of a March 2010 talk, see an earlier article on this blog.

[3] Contracts written by people, contracts written by machines, an earlier article on this blog.

[4] If I’m not pure, at least my functions are, an earlier article on this blog.

VN:F [1.9.10_1130]
Rating: 9.3/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)