New LASER proceedings

Springer has just published in the tutorial sub-series of Lecture Notes in Computer Science a new proceedings volume for the LASER summer school [1]. The five chapters are notes from the 2008, 2009 and 2010 schools (a previous volume [2] covered earlier schools). The themes range over search-based software engineering (Mark Harman and colleagues), replication of software engineering experiments (Natalia Juristo and Omar Gómez), integration of testing and formal analysis (Mauro Pezzè and colleagues), and, in two papers by our ETH group, Is branch coverage a good measure of testing effectiveness (with Yi Wei and Manuel Oriol — answer: not really!) and a formal reference for SCOOP (with Benjamin Morandi and Sebastian Nanz).

The idea of these LASER tutorial books — which are now a tradition, with the volume from the 2011 school currently in preparation — is to collect material from the presentations at the summer school, prepared by the lecturers themselves, sometimes in collaboration with some of the participants. Reading them is not quite as fun as attending the school, but it gives an idea.

The 2012 school is in full preparation, on the theme of “Advanced Languages for Software Engineering” and with once again an exceptional roster of speakers, or should I say an exceptional roster of exceptional speakers: Guido van Rossum (Python), Ivar Jacobson (from UML to Semat), Simon Peyton-Jones (Haskell), Roberto Ierusalimschy (Lua), Martin Odersky (Scala), Andrei Alexandrescu (C++ and D),Erik Meijer (C# and LINQ), plus me on the design and evolution of Eiffel.

The preparation of LASER 2012 is under way, with registration now open [3]; the school will take place from Sept. 2 to Sept. 8 and, like its predecessors, in the wonderful setting on the island of Elba, off the coast of Tuscany, with a very dense technical program but time for enjoying the beach, the amenities of a 4-star hotel and the many treasures of the island. On the other hand not everyone likes Italy, the sun, the Mediterranean etc.; that’s fine too, you can wait for the 2013 proceedings.

References

[1] Bertrand Meyer and Martin Nordio (eds): Empirical Software Engineering and Verification, International Summer Schools LASER 2008-2010, Elba Island, Italy, Revised Tutorial Lectures, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[2] Peter Müller (ed.): Advanced Lectures on Software Engineering, LASER Summer School 2007-2008, Springer Verlag, Lecture Notes in Computer Science 7007, Springer-Verlag, 2012, see here.

[3] LASER summer school information and registration form, http://se.ethz.ch/laser.

Never design a language

It is a common occurrence in software development. Someone says: “We should design a language”. The usual context is that some part of the development requires a rich functionality set, and it appears appropriate to provide a flexible solution through a specialized language. As an example, in the development of an airline’s frequent flyer program on which I once worked the suggestion came to design a “Flyer Award Language” , with instructions appropriate for that application domain: record a trip, redeem an award, provide a statement of available miles and so on. A common term for such notations is DSL, for Domain-Specific Language.

Designing a language in such a context is almost always a bad idea (and I am not sure why I wrote “almost”). Languages are endless objects of discussion, usually on the least important aspects, which are also the most visible and those on which everyone has a strong opinion: concrete syntactic properties. People might pretend otherwise (“let’s not get bogged down on syntax, this is just one possible form”) but syntax is what the discussions will get bogged down to — keywords or symbols, this order or that order of operands, one instruction with several variants vs. several instructions… — at the expense of discussing the fundamental issues of functionality.

Worse yet, even if a language will be part of the solution it is usually just one facet to the solution. As was already explained in detail in [1], any useful functionality set will naturally be useful through several interfaces: a textual notation with concrete syntax may be one of them, but other possible ones include an API (Abstract Program Interface) for use from other software elements, a Graphical User Interface, a web user interface, yet another for web services (typically WSDL or some other XML or JSON format).

In such cases, starting with a concrete textual language is pretty silly, since it cannot yield the others directly (it would have to be parsed and further analyzed, which does not make sense). Of all the kinds of interface listed, the most fundamental one is the API: it describes the raw functionality, excluding any choice of syntax but including, thanks to contracts, elements of semantics. For example, a class AWARD in our frequent flyer application might include the feature


             redeem_for_upgrade (c: CUSTOMER; f : FLIGHT)
                                     — Upgrade c to next class of service on f.
                       require
                                    c /= holder
implies holder.allowed_substitute (c)
                                    f.permitted_for_upgrade
(Current)
                                    c.booked
( f )
                       
ensure
                                    c.class_of_service
( f ) =  old c.class_of_service ( f ) + 1

There is of course no implementation as this declaration only specifies an interface, but it says what needs to be said: to redeem the award for an upgrade, the intended customer must be either the holder of the award or an allowed substitute; the flight must be available for an upgrade with the current award (including the availability of enough miles); the intended customer must already be booked on the flight; and the upgrade will be for the next class of service.

These details are the kind of things that need to be discussed and agreed before the API is finalized. Then one can start discussing about a textual form (a DSL), a graphical interface, a web services interface. They all consist of relatively simple layers to be superimposed on a solidly defined and precisely specified basis. Once you have that basis, you can have all the fun you like arguing over everyone’s favorite forms of concrete syntax; it cannot hurt the project any more. Having these discussions early, at the expense of the more fundamental issues, is a great danger.

One of the key rules for successful software construction — as for many other ventures of course, especially in science and technology — is to distinguish the essential from the auxiliary, and consequently to devote proper attention to the essential issues while avoiding disputations of auxiliary issues. To define functionality, API is essential; language is auxiliary.

So when should you design a language? Never. Well, hardly ever.

Reference

[1] Bertrand Meyer: Introduction to the Theory of Programming Languages, Prentice Hall, 1990.

Ado About The Resource That Was (Not)

 

After a few weeks of use, Microsoft Outlook tends in my experience to go into a kind of thrashing mode where the user interface no longer quite functions as it should, although to the tool’s credit it does not lose information. Recently I have been getting pop-up warnings such as

 

A required resource was

 

A required resource was what? The message reminded me of an episode in a long-ago game of Scrabble, in which I proposed ADOABOUT as a word. “Ado about what? ”, the other players asked, and were not placated by my answer.

The message must have been trying to say  that a required resource was missing, or not found, but at the time of getting the final detail Outlook must have run out of UI resources and hence could not summon the needed text string. Not surprising, since running out of resources is precisely what caused the message to appear, in a valiant attempt to tell the user what is going on. (Valiant but not that useful: if you are not a programmer on the Outlook development team but just a customer trying to read email, it is not absolutely obvious how the message, even with the missing part, helps you.) The irony in the example is that the title bar suggests the problem arose in connection with trying to display the “Social Connector” area, a recent Outlook feature which I have never used. (Social connector? Wasn’t the deal about getting into computer science in the first place that for the rest of your life you’d be spared the nuisance of social connections? One can no longer trust anything nowadays.)

We can sympathize with whoever wrote the code. The Case Of The Resource That Was (Not) is an example of a general programming problem which we may call Space Between Your Back And Wall  or SBYBAW:  when you have your back against the wall, there is not much maneuvering space left.

A fairly difficult case of the SBYBAW problem arises in garbage collection, for example for object-oriented languages. A typical mark-and-sweep garbage collector must traverse the entire object structure to remove all the objects that have not been marked as reachable from the stack. The natural way to write a graph traversal algorithm is recursive: visit the roots; then recursively traverse their successors, flagging visited objects in some way to avoid cycling. Yes, but the implementation of a recursive routine relies on a stack of unpredictable size (the longest path length). If we got into  garbage collection, most likely it’s that we ran out of memory, precisely the kind of situation in which we cannot afford room for unpredictable stack growth.

In one of the early Eiffel garbage collectors, someone not aware of better techniques had actually written the traversal recursively; had the mistake not been caught early enough, it would no doubt have inflicted unbearable pain on humankind. Fortunately there is a solution: the Deutsch-Schorr-Waite algorithm [1], which avoids recursion on the program side by perverting the data structure to  replace some of the object links by recursion-control links; when the traversal’s execution proceeds along an edge, it reverses that edge to permit eventual return to the source. Strictly speaking, Deutsch-Schorr-Waite still requires a stack of booleans — to distinguish original edges from perverted ones — but we can avoid a separate stack (even just  a stack of booleans, which can be compactly represented in a few integers) by storing these booleans in the mark field of the objects themselves. The resulting traversal algorithm is a beauty — although it is fairly tricky, presents a challenge for verification tools, and raises new difficulties in a multi-threaded environment.

Deutsch-Schorr-Waite is a good example of “Small Memory Software” as studied in a useful book of the same title [2]. The need for Small Memory Software does not just arise for embedded programs running on small devices, but also in mainstream programming whenever we face the SBYBAW issue.

The SBYBAW lesson for the programmer is tough but simple. The resources we have at our disposal on a computing system may be huge, but they are always finite, and our programs’ appetite for resources will eventually exhaust them. At that stage, we have to deal with the SBYBAW rule, which sounds like a tautology but is an encouragement to look for clever algorithms:  techniques for freeing resources when no resources remain must not request new resources.

References

[1] Deutsch-Schorr-Waite is described in Knuth and also in [2]. Someone should start a Wikipedia entry.

[2] James Noble and Charles Weir: Small Memory Software: Patterns for Systems with Limited Memory, Addison-Wesley, 2001.

A safe and stable solution

Reading about the latest hullabaloo around Android’s usage of Java, and more generally following the incessant flow of news about X suing Y in the software industry (with many combinations of X and Y) over Java and other object-oriented technologies, someone with an Eiffel perspective can only smile. Throughout its history, suggestions to use Eiffel have often been met initially — along with “Will Eiffel still be around next year?”, becoming truly riotous after 25 years — with objections of proprietariness, apparently because Eiffel initially came from a startup company. In contrast, many other approaches, from C++ to Smalltalk and Java, somehow managed to get favorable vibes from the media; the respective institutions, from AT&T to Xerox and Sun, must be disinterested benefactors of humanity.

Now many who believed this are experiencing a next-morning surprise, discovering under daylight that the person next to whom they wake up is covered with patents and lawsuits.

For their part, people who adopted Eiffel over the years and went on to develop project after project  do not have to stay awake worrying about legal issues and the effects of corporate takeovers; they can instead devote their time to building the best software possible with adequate methods, notations and tools.

This is a good time to recall the regulatory situation of Eiffel. First, the Eiffel Software implementation (EiffelStudio): the product can be used through either an open-source and a proprietary licenses. With both licenses the software is exactly the same; what differs is the status of the code users generate: with the open-source license, they are requested to make their own programs open-source; to keep their code proprietary, they need the commercial license. This is a fair and symmetric requirement. It is made even more attractive by the absence of any run-time fees or royalties of the kind typically charged by database vendors.

The open-source availability of the entire environment, over 2.5 millions line of (mostly Eiffel) code, has spurred the development of countless community contributions, with many more in progress.

Now for the general picture on the language, separate from any particular implementation. Java’s evolution has always been tightly controlled by Sun and now its successor Oracle. There may actually be technical arguments in favor of the designers retaining a strong say in the evolution of a language, but they no longer seem to apply any more now that most of the Java creators have left the company. Contrast this with Eiffel, which is entirely under the control of an international standards committee at ECMA International, the oldest and arguably the most prestigious international standards body for information technology. The standard is freely available online from the ECMA site [1]. It is also an ISO standard [2].

The standardization process is the usual ECMA setup, enabling any interested party to participate. This is not just a statement of principle but the reality, to which I can personally testify since, in spite of being the language’s original designer and author of the reference book, I lost countless battles in the discussions that led to the current standard and continue in preparation of the next version. While I was not always pleased on the moment, the committee’s collegial approach has led to a much more solid result than any single person could have achieved.

The work of ECMA TC49-TG4 (the Eiffel standard committee) has disproved the conventional view that committees can only design camels. In fact TC49-TG4 has constantly worked to keep the language simple and manageable, not hesitating to remove features deemed obsolete or problematic, while extending the range of the language and increasing the Eiffel programmer’s power of expression. As a result, Eiffel today is an immensely better language than when we started our work in 2002. Without a strong community-based process we would never, for example, have made Eiffel the first widespread language to guarantee void-safety (the compile-time removal of null-pointer-dereferencing errors), a breakthrough for software reliability.

Open, fair, free from lawsuits and commercial fights, supported by an enthusiastic community: for projects that need a modern quality-focused software framework, Eiffel is a safe and stable solution.

References

[1] ECMA International: Standard ECMA-367: Eiffel: Analysis, Design and Programming Language, 2nd edition (June 2006), available here (free download).

[2] International Organization for Standardization: ISO/IEC 25436:2006: Information technology — Eiffel: Analysis, Design and Programming Language, available here (for a fee; same text as [1], different formatting).

Specification explosion

To verify software, we must specify it; otherwise there is nothing to verify against. People often cite the burden of specification as the major obstacle toward making verification practical. At issue are not only the effort required to express the goals of software elements (their contracts) but also intermediate assertions, or “verification conditions”, including loop invariants, required by the machinery of verification.

At a Microsoft Software Verification summer school [1] in Moscow on July 18 — the reason why there was no article on this blog last week — Stefan Tobies, one of the lecturers, made the following observation about the specification effort needed to produce fully verified software. In his experience, he said, the ratio of specification lines to program lines is three to one.

Such a specification explosion, to coin a phrase, has to be addressed by any practical approach to verification. It would be interesting to get estimates from others with verification experience.

Reducing specification explosion  is crucial to the Eiffel effort to provide “Verification As a Matter Of Course” [2]. The following three techniques should go a long way:

  • Loop invariant inference. Programmers can be expected to write contracts expressing the purpose of routines (preconditions, postconditions) and classes (class invariants), but often balk at writing the intermediate assertions necessary to prove the correctness of loops. An earlier article [3] mentioned some ongoing work on this problem and I hope to come back to the topic.
  • Frame conventions. As another recent article has discussed [4], a simple language convention can dramatically reduce the number of assertions by making frame conditions explicit.
  • Model-based contracts. This technique calls for a separate article; the basic idea is to express the effect of operations through high-level mathematical models relying on a library that describe such mathematical abstractions as sets, relations, functions and graphs.

The risk of specification explosion is serious enough to merit a concerted defense.

 

References

[1] Summer School in Software Engineering and Verification, details here.

[2] Verification As a Matter Of Course, slides of a March 2010 talk, see an earlier article on this blog.

[3] Contracts written by people, contracts written by machines, an earlier article on this blog.

[4] If I’m not pure, at least my functions are, an earlier article on this blog.

Towards a Calculus of Object Programs

I posted here a draft of a new article, Towards a Calculus of Object Programs.

Here is the abstract:

Verifying properties of object-oriented software requires a method for handling references in a simple and intuitive way, closely related to how O-O programmers reason about their programs. The method presented here, a Calculus of Object Programs, combines four components: compositional logic, a framework for describing program semantics and proving program properties; negative variables to address the specifics of O-O programming, in particular qualified calls; the alias calculus, which determines whether reference expressions can ever have the same value; and the calculus of object structures, a specification technique for the structures that arise during the execution of an object-oriented program.
The article illustrates the Calculus by proving the standard algorithm for reversing a linked list.

If I’m not pure, at least my functions are

..

If I’m not pure, at least my jewels are [1].

..

We often need to be reassured that a routine, usually a function, is “pure”, meaning that it does not change the state of the computation. For example, a function used in a contract element (precondition, postcondition, class invariant, loop invariant) should be purely descriptive, since it is a specification element; evaluating it, typically for testing and debugging, should not create a change of behavior — a “Heisenberg effect” — in the very computation that it is intended to assess. Another application is in a concurrency context, particularly in SCOOP (see earlier posts and forthcoming ones): if one or more functions are pure, several of their executions can run  concurrently on the same object.

The notion of purity admits variants. The usual notion is what  [2] calls weak purity, which precludes changes to previously existing objects but allow creating new objects. In the EiffelBase library we also encounter routines that have another form of purity, which we may call “relative” purity: they can leave the same state on exit as they found on entry, but in-between they might change the state.  For the rest of this discussion we will rely on the standard notion of weak purity: no changes permitted on existing objects.

It is often suggested that the programming language should support specifying that a routine is pure; many people have indeed proposed the addition of a keyword such as pure to Eiffel. One of the reasons this is not — in my opinion — such a great idea is that purity is just a special case of the more general problem of framing: specifying and verifying what a routine does not change. If we can specify an arbitrary frame property, then we can, as a special case covered by the general mechanism, specify that a routine changes nothing.

To see why framing is so important, consider a class ACCOUNT with a routine deposit that has the postcondition

balance = old balance + sum………..— Where sum is the argument of deposit

Framing here means stating that nothing else than balance changes; for example the account’s owner and its number should remain the same. It is not practical to write all individual postcondition clauses such as

owner= old owner
number=
old number

and so on. But we do need to specify these properties and enforce them, if only to avoid that a descendant class (maybe MAFIA_ACCOUNT) distort the rules defined by the original.

One technique is to add a so-called “modifies clause”, introduced by verification tools such as ESC-Java [3] and JML [4]. Modifies clauses raise some theoretical issues; in particular, the list of modified expressions is often infinite, so we must restrict ourselves to an abstract-data-type view where we characterize a class by commands and queries and the modifies clause only involves queries of the class. Many people find this hard to accept at first, since anything that is not talked about can change, but it is the right approach. A modifies clause of sorts, included in the postcondition, appeared in an earlier iteration of the Eiffel specification, with the keyword only (which is indeed preferable to modifies, which in the Eiffel style favoring grammatically simple keywords would be modify, since what we want to express is not that the routine must change anything at all  but that it may only change certain specified results!). The convention worked well with inheritance, since it included the rule that a clause such as only balance, in class  ACCOUNT, prescribes that the routine may not, in its modifies version as well as in any version redefined in descendants, change any other query known at the level of ACCOUNT; but a descendant version may change, subject to its own ACCOUNT clauses, any new query introduced by a descendant.

To declare a routine as pure, it would suffice to use an empty only clause (not very elegant syntactically — “only” what? — but one can get used to it).

This construct has been discarded, as it places too heavy a burden on the programmer-specifier. Here the key observation resulted from a non-scientific but pretty extensive survey I made of all the JML code I could get my hands on. I found that every time a query appeared in a modifies clause it was also listed in the postcondition! On reflection, this seems reasonable: if you are serious about specification, as anyone bothering to write such a clause surely is, you will not just express that something changes and stop there; you will write something about how it may change. Not necessarily the exact result, as in

my_query = precise_final_value

but at least some property of that result, as in

some_property (my_query)

This observation has, however, an inescapable consequence for language design: modifies or only clauses should be inferred by the compiler from the postcondition, not imposed on the programmer as an extra burden. The convention, which we may call the Implicit Framing Rule, is simple:

A routine may change the value of a query only if the query is specified in the routine’s postcondition

(or, if you like double negation, “no routine may change the value of a query other than those specified in its postcondition”). Here we say that a feature is “specified” in a postcondition if it appears there outside of the scope of an old expression. (Clearly, an occurrence such as old balance does not imply that balance can be modified, hence this restriction to occurrences outside of an old.)

With this convention the only clause becomes implicit: it would simply list all the queries specified in the postcondition, so there is no need for the programmer to write it. For the rare case of wanting to specify that a query q may change, but not wanting to specify how, it is easy to provide a library function, say involved, that always return true and can be used in postconditions, as in involved (q).

The convention is of course not just a matter of programming methodology but, in an IDE supporting verification, such as the EVE “Verification As a Matter Of Course” environment which we are building for Eiffel [5], the compiler will enforce the definition above — no change permitted to anything not specified in the postcondition — and produce an error in case of a violation. This also means that we can easily specify that a routine is pure: it must simply not specify any query in its postcondition. It may still list it in an old clause, as happens often in practice, e.g.

Result = old balance – Minimum_balance………..— In the postcondition of a function available_funds

Note the need to use old here. Apart from this addition of old to some postconditions, a considerable advantage of the convention is that most existing code using pure functions will be suitable to the new purity enforcement without any need to provide new annotations.

I believe that this is the only sustainable convention. It does not, of course, solve the frame problem by itself (for attempts in this direction see [6, 7]), but it is a necessary condition for a solution that is simple, easily taught, fairly easily implemented, and effective. It goes well with model-based specifications [8, 9], which I believe are the technique of most promise for usable  specifications of object-oriented software. And it provides a straightforward, no-frills way to enforce purity where prescribed by the Command-Query Separation principle [10, 11]: if I’m not pure, at least my functions must be.

References

[1] From the lyrics of the aria Glitter and Be Gay in Leonard Bernstein’s Candide, text by Lillian Hellman and others. Youtube offers several performances, including  by Diana Damrau (here) and Natalie Dessay (here) . For the text see e.g. here.

[2] Adam Darvas and Peter Müller: Reasoning About Method Calls in Interface Specifications, in Journal of Object Technology, Volume 5, no. 5, jUNE 2006, pages 59-85, doi:10.5381/jot.2006.5.5.a3, available here.

[3] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe and R. Stata: Extended static checking for Java, in PLDI 2002 (Programming Language Design and Implementation), pages 234–245, 2002.

[4] Gary Leavens et al.: Java Modeling Language, see references here.

[5] Julian Tschannen, Carlo A. Furia, Martin Nordio, and Bertrand Meyer: Verifying Eiffel Programs with Boogie, to appear in Boogie 2011, First International Workshop on Intermediate Verification Languages, Wroclaw, August 2011. See documentation about the EVE project on the project page.

[6] Ioannis Kassios: Dynamic Frames: Support for Framing, Dependencies and Sharing Without Restrictions, in Formal Methods 2006, eds. J. Misra, T. Nipkow and E. Sekerinski, Lecture Notes in Computer Science 4085, Springer Verlag, 2006, pages 268-283.

[7] Bernd Schoeller: Making Classes Provable through Contracts, Models and Frames, PhD thesis, ETH Zurich, 2007, available here

[8] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[9] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ’10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[10] Bertrand Meyer: Object-Oriented Software Construction, first (1988) and second (1997) editions, Prentice Hall.

[11] Bertrand Meyer: Touch of Class: An Introduction to Programming Well, Using Objects and Contracts, Springer Verlag, 2009.

Agile methods: the good, the bad and the ugly

It was a bit imprudent last Monday to announce the continuation of the SCOOP discussion for this week; with the TOOLS conference happening now, with many satellite events such as the Eiffel Design Feast of the past week-end and today’s “New Eiffel Technology Community” workshop, there is not enough time for a full article. Next week might also be problematic. The SCOOP series will resume, but in the meantime I will report on other matters.

As something that can be conveniently typed in while sitting in the back of the TOOLS room during fascinating presentations, here is a bit of publicity for the next round of one-day seminars for industry — “Compact Course” is the official terminology — that I will be teaching at ETH in Zurich next November (one in October), some of them with colleagues. It’s the most extensive session that we have ever done; you can see the full programs and registration information here.

  • Software Engineering for Outsourced and Distributed Development, 27 October 2011
    Taught with Peter Kolb and Martin Nordio
  • Requirements Engineering, 17 November
  • Software Testing and Verification: state of the art, 18 November
    With Carlo Furia and Sebastian Nanz
  • Agile Methods: the Good, the Bad and the Ugly, 23 November
  • Concepts and Constructs of Concurrent Computation, 24 November
    With Sebastian Nanz
  • Design by Contract, 25 November

The agile methods course is new; its summary reads almost like a little blog article, so here it is.

Agile methods: the Good, the Bad and the Ugly

Agile methods are wonderful. They’ll give you software in no time at all, turn your customers and users into friends, catch bugs before they catch you, change the world, and boost your love life. Do you believe these claims (even excluding the last two)? It’s really difficult to form an informed opinion, since most of the presentations of eXtreme Programming and other agile practices are intended to promote them (and the consultants to whom they provide a living), not to deliver an objective assessment.

If you are looking for a guru-style initiation to the agile religion, this is not the course for you. What it does is to describe in detail the corpus of techniques covered by the “agile” umbrella (so that you can apply them effectively to your developments), and assess their contribution to software engineering. It is neither “for” nor “against” agile methods but fundamentally descriptive, pedagogical, objective and practical. The truth is that agile methods include some demonstrably good ideas along with some whose benefits are at best dubious. In addition (and this should not be a surprise) they cannot make the fundamental laws of software engineering go away.

Agile methods have now been around for more than a decade, during which many research teams, applying proven methods of experimental science, have performed credible empirical studies of how well the methods really work and how they compare to more traditional software engineering practices. This important body of research results, although not widely known, is critical to managers and developers in industry for deciding whether and how to use agile development. The course surveys these results, emphasizing the ones most directly relevant to practitioners.

A short discussion session will enable participants with experience in agile methods to share their results.

Taking this course will give you a strong understanding of agile development, and a clear view of when, where and how to apply them.

Schedule

Morning session: A presentation of agile methods

  • eXtreme Programming, pair programming, Scrum, Test-Driven Development, continuous integration, refactoring, stakeholder involvement, feature-driven development etc.
  • The agile lifecycle.
  • Variants: lean programming etc.

Afternoon session (I): Assessment of agile methods

  • The empirical software engineering literature: review of available studies. Assessment of their value. Principles of empirical software engineering.
  • Agile methods under the scrutiny of empirical research: what helps, what harms, and what has no effect? How do agile methods fare against traditional techniques?
  • Examples: pair programming versus code reviews; tests versus specifications; iterative development versus “Big Upfront Everything”.

Afternoon session (II): Discussion and conclusion

This final part of the course will present, after a discussion session involving participants with experience in agile methods, a summary of the contribution of agile methods to software engineering.

It will conclude with advice for organizations involved in software development and interested in applying agile methods in their own environment.

Target groups

CIOs; software project leaders; software developers; software testers and QA engineers.

Assessing concurrency models

By describing a  poorly conceived hypothetical experiment, last week’s article described the “Professor Smith syndrome” consisting of four risks that threaten the validity of empirical software engineering experiments relying on students in a course:

  • Professor Smith Risk 1: possible bias if the evaluator has a stake in the ideas or tools under assessment.
  • Professor Smith Risk 2: creating different levels of motivation in the different groups (Hawthorne effect).
  • Professor Smith Risk 3: extrapolating from students to professionals.
  • Professor Smith Risk 4: violation of educational ethics if the experiment may cause some students to learn better than others.

If you have developed a great new method or tool and would like to assess it, the best way to address Risk 1 is to find someone else to do the assessment. What if  this solution is not practical? Recently we wanted to get some empirical evidence on the merits of the SCOOP (Simple Concurrent Object-Oriented Programming) approach to concurrency [1, 2], on which I have worked for a long time and which is now part of EiffelStudio since the release of 6.8 a couple of weeks ago. We wanted to see if, despite the Professor Smith risks, we could do a credible study ourselves.

The ETH Software Architecture course[3], into which we introduced some introductory material on concurrency last year (as part of a general effort to push more concurrency into software courses at ETH), looked like a good place to try an evaluation; it is a second-year course, where students, or so we thought, would have little prior experience in concurrent software design.

The study’s authors — Sebastian Nanz, Faraz Torshizi and Michela Pedroni — paid special attention to the methodological issues. To judge for yourself whether we addressed them properly, you can read the current version of our paper to be presented at ESEM 2011 [4]. Do note that it is a draft and that we will improve the paper for final publication.

Here is some of what we did. I will not address the Professor Smith Risk 3, the use of students, which (as Lionel Briand has pointed out in a comment on the previous article) published work has studied; in a later article I will give  references to some of that work. But we were determined to tackle the other risks explicitly, so as to obtain credible results.

The basic experiment was a session in which the students were exposed to two different design methods for concurrent software: multithreaded programming in Java, which I’ll call “Java Threads”, and SCOOP. We wanted to explore whether it is easier to program in SCOOP than in Java. This is too general a hypothesis, so it was refined into three concrete hypotheses: is it easier to understand a SCOOP program? Is it easier to find errors in SCOOP programs? Do programmers using SCOOP make fewer errors?

A first step towards reducing the effect — Professor Smith Risk 1 — of any emotional attachment of the experimenters  to one of the approaches, SCOOP in our case, was to generalize the study. Although what directly interested us was to compare SCOOP against Java Threads, we designed the study as a general scheme to compare concurrency approaches; SCOOP and Java Threads are just an illustration, but anyone else interested in assessing concurrency techniques — say Erlang versus C# concurrency — can apply the same methodology. This decision had two benefits: it freed the study from dependency on the particular techniques, hence, we hope, reducing bias; and as side attraction of the kind that is hard for researchers to resist, it increased the publishability of the results.

Circumstances unexpectedly afforded us another protection against any for-SCOOP bias: unbeknownst to us at the time of the study’s design, a first-year course had newly added (in 2009, whereas our study was performed in 2010) an introduction to concurrent programming — using Java Threads! While we had thought that concurrency in any form would be new to most students, in fact almost all of them had now seen Java Threads before. (The new material in the first-year course was taken by ETH students only, but many transfer students had also already had an exposure to Java Threads.) On the other hand, students had not had any prior introduction to SCOOP. So any advantage that one of the approaches may have had because of students’ prior experience would work against our hypotheses. This unexpected development would not help if the study’s results heavily favored Java Threads, but if they favored SCOOP it would reinforce their credibility.

A particular pedagogical decision was made regarding the teaching of our concurrency material: it started with a self-study rather than a traditional lecture. One of the reasons for this decision was purely pedagogical: we felt (and the course evaluations confirmed) that at that stage of the semester the students would enjoy a break in the rhythm of the course. But another reason was to avoid any bias that might have arisen from any difference in the lecturers’ levels of enthusiasm and effectiveness in teaching the two approaches. In the first course session devoted to concurrency, students were handed study materials presenting Java Threads and SCOOP and containing a test to be taken; the study’s results are entirely based on their answers to these tests. The second session was a traditional lecture presenting both approaches again and comparing them. The purpose of this lecture was to make sure the students got the full picture with the benefit of a teacher’s verbal explanations.

The study material was written carefully and with a tone as descriptive and neutral as possible. To make comparisons meaningful, it does not follow a structure specific to Java Threads or  SCOOP  (as we might have used had we taught only one of these approaches); instead it relies in both cases on the same overall plan  (figure 2 of the paper), based on a neutral analysis of concurrency concepts and issues: threads, mutual exclusion, deadlock etc. Each section then presents, for one such general concurrency question, the solution proposed by Java Threads or SCOOP.

This self-study material, as well as everything else about the study, is freely available on the Web; see the paper for the links.

In the self-study, all students studied both the Java Threads and SCOOP materials. They were randomly assigned to two groups, for which the only difference was the order of studying the approaches. We feel that this decision addresses the ethical issue (Professor Smith Risk 4): any pedagogical effect of reading about A before B rather than the reverse, in the course of a few hours, has to be minimal if you end up reading about the two of them, and on the next day follow a lecture that also covers both.

Having all students study both approaches — a crossover approach in the terminology of [5] — should also address the Hawthorne effect (Professor Smith Risk 2): students have no particular incentive to feel that one of the approaches is more hip than the other. While they are not told that SCOOP is partly the work of the instructors, some of them may know or guess this information; the consequences, positive or negative, are limited, since they are asked in both cases to do as well as they can in answering the assessment questions.

The design of that evaluation is another crucial element in trying to avoid bias. We tried, to the extent possible, to base the assessment on objective criteria. For the first hypothesis (program understanding) the technique was to ask the students to predict the output of some simple concurrent programs. To address the risk of a binary correct/incorrect assessment, and get a more fine-grained view, we devised the programs so that they would produce output strings and measured the Levenshtein (edit) distance to the correct result. For the second hypothesis (ease of program debugging), we gave students programs exhibiting typical errors in both approaches and asked them to tell us both the line number of any error they found and an explanation. Assessing the explanation required human analysis; the idea of also assigning partial credit for pointing out a line number without providing a good explanation is that it may be meaningful that a student found that something is amiss even without being quite able to define what it is. The procedure for the third hypothesis (producing programs with fewer errors) was more complex and required two passes over the result; it requires some human analysis, as you will see in the article, but we hope that the two-pass process removes any bias.

This description of the study is only partial and you should read the article [4] for the full details of the procedure.

So what did we find in the end? Does SCOOP really makes concurrency easier to learn, concurrent programs easier to debug, and concurrent programmers less error-prone? Here too  I will refer you to the article. Let me simply mention that the results held some surprises.

In obtaining these results we tried very hard to address the Professor Smith syndrome and its four risks. Since all of our materials, procedures and data are publicly accessible, described in some detail in the paper, you can determine for yourself how well we met this objective, and whether it is possible to perform credible assessments even of one’s own work.

References

Further reading: for general guidelines on how to conduct empirical research see [5]; for ethical guidelines, applied to psychological research but generalizable, see [6].

[1] SCOOP Eiffel documentation, available here.

[2] SCOOP project documentation at ETH, available here.

[3] Software Architecture course at ETH, course page (2011).

[4] Sebastian Nanz, Faraz Torshizi, Michela Pedroni and Bertrand Meyer: Design of an Empirical Study for Comparing the Usability of Concurrent Programming Languages, to appear in ESEM 2011 (ACM/IEEE International Symposium on Empirical Software Engineering and Measurement), 22-23 September 2011. Draft available here.

[5] Barbara A. Kitchenham, Shari L. Pfleeger, Lesley M. Pickard, Peter W. Jones, David C. Hoaglin, Khaled El-Emam and Jarrett Rosenberg: Preliminary Guidelines for Empirical Research in Software Engineering, national Research Council Canada (NRC-CNRC), Report ERB-1082, 2001, available here.

[6] Robert Rosenthal: Science and ethics in conducting, analyzing, and reporting psychological research, in  Psychological Science, 5, 1994, p127-134. I found a copy cached by a search engine here.

Publish no loop without its invariant

 

There may be no more blatant example of  the disconnect between the software engineering community and the practice of programming than the lack of widespread recognition for the fundamental role of loop invariants. 

Let’s recall the basics, as they are taught in the fourth week or so of the ETH introductory programming course [1], from the very moment the course introduces loops. A loop is a mechanism to compute a result by successive approximations. To describe the current approximation, there is a loop invariant. The invariant must be:

  1. Weak enough that we can easily ensure it on a subset, possibly trivial, of our data set. (“Easily” means than this task is substantially easier than the full problem we are trying to solve.)
  2. Versatile enough that if it holds on some subset of the data we can easily (in the same sense) make it hold on a larger subset — even if only slightly larger.
  3. Strong enough that, when it covers the entire data, it yields the result we seek.

As a simple example, assume we seek the maximum of an array a of numbers, indexed from 1. The invariant states that Result is the maximum of the array slice from 1 to i. Indeed:

  1. We can trivially obtain the invariant by setting Result to be a [1]. (It is then the maximum of the slice a [1..1].)
  2. If the invariant holds, we can extend it to a slightly larger slice — larger by just one element — by increasing i by 1 and updating Result to be the greater of the previous Result and the element a [i] (for the new  i).
  3. When the slice covers the entire array — that is, i = n — the invariant tells us that Result is the maximum of the slice a [1..n], giving us the result we seek.

You cannot understand the corresponding program text

    from
        i := 1; Result := a [1]
    until i = n loop
        i := i + 1
        if Result < a [i] then Result := a [i] end
    end

without understanding the loop invariant. That is true even of people who have never heard the term: they will somehow form a mental image of the intermediate situation that justifies the algorithm. With the formal notion, the reasoning becomes precise and checkable. The difference is the same as between a builder who has no notion of theory, and one who has learned the laws of mechanics and construction engineering.

As another example, take Levenshtein distance (also known as edit distance). It is the shortest sequence of operations (insert, delete or replace a character) that will transform a string into another. The algorithm (a form of dynamic programming) fills in a matrix top to bottom and left to right, each entry being one plus the maximum of the three neighboring ones to the top and left, except if the corresponding characters in the strings are the same, in which case it keeps the top-left neighbor’s value. The basic operation in the loop body reads

      if source [i] = target [j] then
           dist [i, j] := dist [i -1, j -1]
      else
           dist [i, j] := min (dist [i, j-1], dist [i-1, j-1], dist [i-1, j]) + 1
      end

You can run this and see it work, filling the array cell after cell, then delivering the result at (dist [M, N] (the bottom-right entry, M and i being the lengths of the source and target strings. Or just watch the animation on page 60 of [2]. It works, but why it works remains a total mystery until someone tells you the invariant:

Every value of dist filled so far is the minimum distance from the initial substrings of the source, containing characters at position 1 to p, to the initial substring of the target, positions 1 to q.

This is the rationale for the above code: we want to compute the next value, at position [i, j]; if the corresponding characters in the source and target are the same, no operation is needed to extend the result we had in the top-left neighbor (position [i-1, j-1]); if not, the best we can do is the minimum we can get by extending the results obtained for our three neighbors: through the insertion of source [i] if the minimum comes from the neighbor to the left, [i-1, j]; through the deletion of target [j] if it comes from the neighbor above; or through a replacement if from the top-left neighbor.

With this explanation, a mysterious, almost hermetic algorithm instantly becomes crystal-clear. 

Yet another example is in-place linked list reversal. The body of the loop is a pointer ballet:

temp := previous
previous
:= next
next
:= next.right
previous.put_right
(temp)

with proper initialization (set next to the value of first and previous to Void) and finalization (set first to the value of previous). This is not the only possible implementation, but all variants of the algorithm use a very similar scheme.

The code looks again pretty abstruse, and hard to get right if you do not remember it exactly. As in the other examples, the only way to understand it is to see the invariant, describing the intermediate assumption after a typical loop iteration. If the original situation was this:

List reversal: initial state
List reversal: initial state

then after a few iterations the algorithm yields this intermediate situation: 

List reversal: intermediate state
List reversal: intermediate state

 The figure illustrates the invariant:

Starting from previous and repeatedly following right links yields the elements of some initial part of the list, but in the reverse of their original order; starting from next and following right links yields the remaining elements, in their original order. 

Then it is clear what the loop body with its pointer ballet is about: it moves by one position to the right the boundary between the two parts, making sure that the invariant holds again in the new state, with one more element in the first (yellow) part and one fewer in the second (pink) part. At the end the second part will be empty and the first part will encompass all elements, so that (after resetting first to the value of previous) we get the desired result.

This example is particularly interesting because list reversal is a standard interview questions for programmers seeking a job; as a result, dozens of  pages around the Web helpfully present algorithms for the benefit of job candidates. I ran a search  on “List reversal algorithm” [3], which yields many such pages. It is astounding to see that from the first fifteen hits or so, which include pages from programming courses at both Stanford and MIT, not a single one mentions invariants, or (even without using the word) gives the above explanation. The situation is all the more bizarre that many of these pages — read them for yourself! — go into intricate details about variants of the pointer manipulations. There are essentially no correctness arguments.

If you go a bit further down the search results, you will find some papers that do reference invariants, but here is the catch: rather than programming or algorithms papers, they are papers about software verification, such as one by Richard Bornat which uses a low-level (C) version of the example to illustrate separation logic [4]. These are good papers but they are completely distinct from those directed at ordinary programmers, who simply wish to learn a basic algorithm, understand it in depth, and remember it on the day of the interview and beyond.

This chasm is wrong. Software verification techniques are not just good for the small phalanx of experts interested in formal proofs. The basic ideas have potential applications to the daily business of programming, as the practice of Eiffel has shown (this is the concept of  “Verification As a Matter Of Course” briefly discussed in an earlier post [5]). Absurdly, the majority of programmers do not know them.

It’s not that they cannot do their job: somehow they eke out good enough results, most of the time. After all, the European cathedrals of the middle ages were built without the benefit of sophisticated mathematical models, and they still stand. But today we would not hire a construction engineer who had not studied the appropriate mathematical techniques. Why should we make things different for software engineering, and deprive practitioners from the benefits of solid, well-accepted theory?  

As a modest first step, there is no excuse, ever, for publishing a loop without the basic evidence of its adequacy: the loop invariant.

References

[1] Bertrand Meyer: Touch of Class: Learning to Program Well, Using Objects and Contracts, Springer, 2009. See course page (English version) here.

[2] Course slides on control structures,  here in PowerPoint (or here in PDF, without the animation); see example starting on page 51, particularly the animation on page 54. More recent version in German here (and in PDF here), animation on page 60.

[3] For balance I ran the search using Qrobe, which combines results from Ask, Bing and Google.

[4] Richard Bornat, Proving Pointer Programs in Hoare Logic, in  MPC ’00, 5th International Conference on Mathematics of Program Construction, 2000, available here.

[5] Bertrand Meyer, Verification as a Matter of Course, a post on this blog.