ERC Advanced Investigator Grant: Concurrency Made Easy

In April we will be starting the  “Concurrency Made Easy” research project, the result of a just announced Advanced Investigator Grant from the European Research Council. Such ERC grants are awarded to a specific person, rather than a consortium of research organizations as in the usual EU funding scheme. The usual amount, which applies in my case, is 2.5 million euros (currently almost 3 .3 million dollars) over five years, on a specific theme. According to the ERC’s own description [1],

ERC Advanced Grants allow exceptional established research leaders of any nationality and any age to pursue ground-breaking, high-risk projects that open new directions in their respective research fields or other domains.

This is the most sought-after research funding instrument of the EU, with a success rate of about 12% [2], out of a group already preselected by the host institutions. What makes ERC Advanced Investigator Grants so coveted is the flexibility of the scheme (no constraints on the topic, light administrative baggage) and the trust that an award implies in a particular researcher and his ability to carry out advanced research.

The name of the CME project clearly signals its ambition: to turn concurrent programming into a normal, unheroic part of programming. Today adding concurrency to a program, usually in the form of multithreading, is very hard, complexity and risk of all kinds. Everyone is telling us that we must rethink programming, retrain programmers and revamp curricula to put the specific reasoning modes of concurrent programming at the center. I don’t think this can work; thinking concurrently is just too hard to become the default mode. Instead, we should adapt programming languages, theories and tools so that programmers can continue to apply the reasoning schemes that have proved so successful in classical programming, especially object-oriented programming with the benefit of Design by Contract.

The starting point is the SCOOP model, to which I started an introduction in an earlier article of this blog [3], with a sequel yet to come. SCOOP is a minimal extension to the O-O framework to support concurrency, yielding very simple (the S in the acronym) solutions to concurrent programming problems. As part of the CME project we plan to develop it in many different directions and establish a sound and effective formal basis.

I have put the project description — the scientific part of the actual proposal text accepted by the ERC — online [4].

In the next few weeks I will be publishing here specific announcements for the positions we are seeking to fill very quickly; they include postdocs, PhD students, and one research engineer. We are looking for candidates with excellent knowledge and practice of concurrency, Eiffel, formal techniques etc. The formal application procedure will be Web-based and is not in place yet but you can contact me if you fit the profile and are interested.

We can defeat the curse: concurrent programming (an obligatory condition of any path towards a successful future for information technology) does not have to be black magic. It can be made simple and efficient. Such is the challenge of the CME project.

References

[1] European Research Council: Advanced Grants, available here.

[2] European Research Council: Press release on 2011 Advanced Investigator Grants, 24 January 2012, available here.

[3] Concurrent Programming is Easy, article from this blog, available here.

[4] CME Advanced Investigator Grant project description, available here.

R&D positions in software engineering

In the past couple of weeks, three colleagues separately sent me announcements of PhD and postdoc positions, asking me to circulate them. This blog is as good a place as any other (as a matter of fact when I asked Oge de Moor if it was appropriate  he replied: “Readers of your blog are just the kind of applicants we look for…” — I hope you appreciate the compliment). So here they are:

  • Daniel Jackson, for work on Alloy, see here. (“Our plan now is to take Alloy in new directions, and dramatically improve the usability and scalability of the analyzer. We are looking for someone who is excited by these possibilities and will be deeply involved not only in design and implementation but also in strategic planning. There are also opportunities to co-advise students and participate in research proposals.”)
  • University College London (Anthony Finkelstein), a position as lecturer, similar to assistant professor), see here. The title is “Lecturer in software testing” (which, by the way, seems a bit restrictive, as I am more familiar with an academic culture in which the general academic discipline is set, say software engineering or possibly software verification, rather than a narrow specialty, but I hope the scope can be broadened).
  • Semmle, Oge de Moor’s company, see here.

These are all excellent environments, so if you are very, very good, please consider applying. If you are very, very, very good don’t, as I am soon going to post a whole set of announcements for PhD and postdoc positions at ETH in connection with a large new grant.

 

TOOLS 2012, “The Triumph of Objects”, Prague in May: Call for Workshops

Workshop proposals are invited for TOOLS 2012, The Triumph of Objectstools.ethz.ch, to be held in Prague May 28 to June 1. TOOLS is a federated set of conferences:

  • TOOLS EUROPE 2012: 50th International Conference on Objects, Models, Components, Patterns.
  • ICMT 2012: 5th International Conference on Model Transformation.
  • Software Composition 2012: 10th International Conference.
  • TAP 2012: 6th International Conference on Tests And Proofs.
  • MSEPT 2012: International Conference on Multicore Software Engineering, Performance, and Tools.

Workshops, which are normally one- or two-day long, provide organizers and participants with an opportunity to exchange opinions, advance ideas, and discuss preliminary results on current topics. The focus can be on in-depth research topics related to the themes of the TOOLS conferences, on best practices, on applications and industrial issues, or on some combination of these.

SUBMISSION GUIDELINES

Submission proposal implies the organizers’ commitment to organize and lead the workshop personally if it is accepted. The proposal should include:

  •  Workshop title.
  • Names and short bio of organizers .
  • Proposed duration.
  •  Summary of the topics, goals and contents (guideline: 500 words).
  •  Brief description of the audience and community to which the workshop is targeted.
  • Plans for publication if any.
  • Tentative Call for Papers.

Acceptance criteria are:

  • Organizers’ track record and ability to lead a successful workshop.
  •  Potential to advance the state of the art.
  • Relevance of topics and contents to the topics of the TOOLS federated conferences.
  •  Timeliness and interest to a sufficiently large community.

Please send the proposals to me (Bertrand.Meyer AT inf.ethz.ch), with a Subject header including the words “TOOLS WORKSHOP“. Feel free to contact me if you have any question.

DATES

  •  Workshop proposal submission deadline: 17 February 2012.
  • Notification of acceptance or rejection: as promptly as possible and no later than February 24.
  • Workshops: 28 May to 1 June 2012.

 

John McCarthy

John McCarthyJohn McCarthy, who died last week at the age of 84, was one of the true giants of computer science. Most remarkable about his contributions are their diversity, their depth, and how they span both theory and practice.

To talk about him it is necessary first to dispel an unjustly negative connotation. McCarthy was one of the founders of the discipline of artificial intelligence, its most forceful advocate and the inventor of its very name. In the “AI Winter” episode of the late 1970s and 1980s, that name suffered some disrepute as a result of a scathing report by James Lighthill blaming AI researchers for over-promising. In fact the promoters of AI may not have delivered exactly what they announced (who can accurately predict science?); but what they delivered is astounding. Many breakthroughs in computer science, both in theory (advances in lambda calculus and the theory of computation) and in the practice of programming (garbage collection, functional programming languages), can directly be traced to work in AI. Part of the problem is a phenomenon that I heard John McCarthy himself describe:  “As soon as it works, no one calls it AI any more.” Automatic garbage collection was once advanced artificial intelligence; now it is just an algorithm that makes sure your smartphone does not freeze up. In a different field, we have become used to computers routinely beating chess champions, a feat that critics of AI once deemed unthinkable.

The worst over-promises came not from researchers in the field such as McCarthy, who understood the difficulties, but from people like Herbert Simon, more of a philosopher, who in 1965 wrote that “machines will be capable, within twenty years, of doing any work a man can do.” McCarthy’s own best-known over-promise was to take up David Levy on his 1968 bet that no computer would be able to beat him within ten years. But McCarthy was only mistaken in under-estimating the time span: Deep Blue eventually proved him right.

The word that comes most naturally to mind when thinking about McCarthy is “brilliant.” He belonged to that category of scientists who produce the fundamental insights before anyone else, even if they do not always have the patience to finalize the details. The breathtaking paper that introduced Lisp [1] is labeled “Part 1”; there was never a “Part 2.” (Of course we have a celebrated example in computer science, this one from a famously meticulous author, of a seven-volume treaty which never materialized in full.) It was imprudent to announce a second part, but the first was enough to create a whole new school of programming. The Lisp 1.5 manual [2], published in 1962, was another masterpiece; as early as page 13 it introduces — an unbelievable feat, especially considering that the program takes hardly more than half a page — an interpreter for the language being defined, written in that very language! The more recent reader can only experience here the kind of visceral, poignant and inextinguishable jealously that overwhelms us the first time we realize that we will never be able to attend the première of Don Giovanni at the Estates Theater in Prague on 29 October, 1787 (exactly 224 years ago yesterday — did you remember to celebrate?). What may have been the reaction of someone in “Data Processing,” such as it was in 1962, suddenly coming across such a language manual?

These years, 1959-1963, will remain as McCarthy’s Anni Mirabiles. 1961 and 1962 saw the publication of two visionary papers [3, 4] which started the road to modern program verification (and where with the benefit of hindsight it seems that he came remarkably close to denotational semantics). In [4] he wrote

Instead of debugging a program, one should prove that it meets its specifications, and this proof should be checked by a computer program. For this to be possible, formal systems are required in which it is easy to write proofs. There is a good prospect of doing this, because we can require the computer to do much more work in checking each step than a human is willing to do. Therefore, the steps can be bigger than with present formal systems.

Words both precise and prophetic. The conclusion of [3] reads:

It is reasonable to hope that the relationship between computation and mathematical logic will be as fruitful in the next century as that between analysis and physics in the last. The development of this relationship demands a concern for both applications and for mathematical elegance.

“A concern for both applications and mathematical elegance” is an apt characterization of McCarthy’s own work. When he was not busy designing Lisp, inventing the notion of meta-circular interpreter and developing the mathematical basis of programming, he was building the Lisp garbage collector and proposing the concept of time-sharing. He also played, again in the same period, a significant role in another milestone development, Algol 60 — yet another sign of his intellectual openness and versatility, since Algol is (in spite of the presence of recursion, which McCarthy championed) an imperative language at the antipodes of Lisp.

McCarthy was in the 1960s and 70s the head of the Artificial Intelligence Laboratory at Stanford. For some reason the Stanford AI Lab has not become as legendary as Xerox PARC, but it was also the home to early versions of  revolutionary technologies that have now become commonplace. Email, which hardly anyone outside of the community had heard about, was already the normal way of communicating, whether with a coworker next door or with a researcher at MIT; the Internet was taken for granted; everyone was using graphical displays and full-screen user interfaces; outside, robots were playing volley-ball (not very successfully, it must be said); the vending machines took no coins, but you entered your login name and received a bill at the end of the month, a setup which never failed to astonish visitors; papers were printed with sophisticated fonts on a laser printer (I remember a whole group reading the successive pages of Marvin Minsky’s  frames paper [5] directly on the lab’s XGP, Xerox Graphics Printer, as  they were coming out, one by one, straight from MIT). Arthur Samuel was perfecting his checkers program. Those who were not programming in Lisp were hooked to SAIL, “Stanford Artificial Intelligence Language,” an amazing design which among other insights convinced me once and for all that one cannot seriously deal with data structures without the benefit of an automatic serialization mechanism. The building itself, improbably set up amid the pastures of the Santa Cruz foothills, was razed in the eighties and the lab moved to the main campus, but the spirit of these early years lives on.

McCarthy ran the laboratory in an open and almost debonair way; he was a legend and somewhat intimidating, but never arrogant and in fact remarkably approachable. I took the Lisp course from him; in my second or third week at Stanford, I raised my hand and with the unflappable assurance of the fully ignorant slowly asked a long question: “In all the recursive function definitions that you have shown so far, termination was obvious because there is some ‘n’ that decreases for every recursive call, and we treat the case ‘n = 0’ or ‘n = 1’ in a special, non-recursive way. But things won’t always be so simple. Is there some kind of grammatical criterion on Lisp programs that we could use to ascertain whether a recursive definition will always lead to a terminating computation?” There was a collective gasp from the older graduate students in the audience, amazed that a greenhorn would have the audacity to interrupt the course with such an incompetent query. But instead of dismissing me, McCarthy proceeded, with a smile, to explain the basics of undecidability. He had the same attitude in the many seminars that he taught, often on topics straddling computer science and philosophy, in a Socratic style where every opinion was welcome and no one was above criticism.

He also had a facetious side. At the end of a talk by McCarthy at SRI, Tony Hoare, who was visiting for a few days, asked a question; McCarthy immediately rejoined that he had expected that question, summoned to the stage a guitar-carrying researcher from the AI Lab, and proceeded with the answer in the form of a prepared song.

The progress of science and technology is a collective effort; it takes many people to turn new insights into everyday reality. The insights themselves come from a few individuals, a handful in every generation. McCarthy was one of these undisputed pioneers.

 

References

[1] John McCarthy: Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I, in Communications of the ACM, vol. 3, no. 4, 1960, pages 184-195.

[2] John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin, LISP 1.5 Programmer’s Manual, MIT, 1962. Available at Amazon  External Linkand also as a PDF External Link.

[3] John McCarthy: A Basis for a Mathematical Theory of Computation, first version in Proc. Western Joint Computer Conference, 1961, revised version in Computer Programming and Formal Systems, eds. P. Braffort and D. Hirschberg, North Holland, 1963. Available in various places on the Web, e.g. here External Link.

[4] John McCarthy: Towards a Mathematical Science of Computation, in IFIP Congress 1962, pages 21-28, available in various places on the Web, e.g. here External Link.

[5] Marvin Minsky:  A Framework for Representing Knowledge, MIT-AI Laboratory Memo 306, June 1974, available here External Link.

 

(This article was first published on my ACM blog.  I am resuming regular Monday publication.)

The story of our field, in a few short words

 

(With all dues to [1], but going up from four to five as it is good to be brief yet not curt.)

At the start there was Alan. He was the best of all: built the right math model (years ahead of the real thing in any shape, color or form); was able to prove that no one among us can know for sure if his or her loops — or their code as a whole — will ever stop; got to crack the Nazis’ codes; and in so doing kind of saved the world. Once the war was over he got to build his own CPUs, among the very first two or three of any sort. But after the Brits had used him, they hated him, let him down, broke him (for the sole crime that he was too gay for the time or at least for their taste), and soon he died.

There was Ed. Once upon a time he was Dutch, but one day he got on a plane and — voilà! — the next day he was a Texan. Yet he never got the twang. The first topic that had put him on  the map was the graph (how to find a path, as short as can be, from a start to a sink); he also wrote an Algol tool (the first I think to deal with all of Algol 60), and built an OS made of many a layer, which he named THE in honor of his alma mater [2]. He soon got known for his harsh views, spoke of the GOTO and its users in terms akin to libel, and wrote words, not at all kind, about BASIC and PL/I. All this he aired in the form of his famed “EWD”s, notes that he would xerox and send by post along the globe (there was no Web, no Net and no Email back then) to pals and foes alike. He could be kind, but often he stung. In work whose value will last more, he said that all we must care about is to prove our stuff right; or (to be more close to his own words) to build it so that it is sure to be right, and keep it so from start to end, the proof and the code going hand in hand. One of the keys, for him, was to use as a basis for ifs and loops the idea of a “guard”, which does imply that the very same code can in one case print a value A and in some other case print a value B, under the watch of an angel or a demon; but he said this does not have to be a cause for worry.

At about that time there was Wirth, whom some call Nick, and Hoare, whom all call Tony. (“Tony” is short for a list of no less than three long first names, which makes for a good quiz at a party of nerds — can you cite them all from rote?) Nick had a nice coda to Algol, which he named “W”; what came after Algol W was also much noted, but the onset of Unix and hence of C cast some shade over its later life. Tony too did much to help the field grow. Early on, he had shown a good way to sort an array real quick. Later he wrote that for every type of unit there must be an axiom or a rule, which gives it an exact sense and lets you know for sure what will hold after every run of your code. His fame also comes from work (based in part on Ed’s idea of the guard, noted above) on the topic of more than one run at once, a field that is very hot today as the law of Moore nears its end and every maker of chips has moved to  a mode where each wafer holds more than one — and often many — cores.

Dave (from the US, but then at work under the clime of the North) must not be left out of this list. In a paper pair, both from the same year and both much cited ever since,  he told the world that what we say about a piece of code must only be a part, often a very small part, of what we could say if we cared about every trait and every quirk. In other words, we must draw a clear line: on one side, what the rest of the code must know of that one piece; on the other, what it may avoid to know of it, and even not care about. Dave also spent much time to argue that our specs must not rely so much on logic, and more on a form of table.  In a later paper, short and sweet, he told us that it may not be so bad that you do not apply full rigor when you chart your road to code, as long as you can “fake” such rigor (his own word) after the fact.

Of UML, MDA and other such TLAs, the less be said, the more happy we all fare.

A big step came from the cold: not just one Norse but two, Ole-J (Dahl) and Kris, came up with the idea of the class; not just that, but all that makes the basis of what today we call “O-O”. For a long time few would heed their view, but then came Alan (Kay), Adele and their gang at PARC, who tied it all to the mouse and icons and menus and all the other cool stuff that makes up a good GUI. It still took a while, and a lot of hit and miss, but in the end O-O came to rule the world.

As to the math basis, it came in part from MIT — think Barb and John — and the idea, known as the ADT (not all TLAs are bad!), that a data type must be known at a high level, not from the nuts and bolts.

There also is a guy with a long first name (he hates it when they call him Bert) but a short last name. I feel a great urge to tell you all that he did, all that he does and all that he will do, but much of it uses long words that would seem hard to fit here; and he is, in any case, far too shy.

It is not all about code and we must not fail to note Barry (Boehm), Watts, Vic and all those to whom we owe that the human side (dear to Tom and Tim) also came to light. Barry has a great model that lets you find out, while it is not yet too late, how much your tasks will cost; its name fails me right now, but I think it is all in upper case.  At some point the agile guys — Kent (Beck) and so on — came in and said we had got it all wrong: we must work in pairs, set our goals to no more than a week away, stand up for a while at the start of each day (a feat known by the cool name of Scrum), and dump specs in favor of tests. Some of this, to be fair, is very much like what comes out of the less noble part of the male of the cow; but in truth not all of it is bad, and we must not yield to the urge to throw away the baby along with the water of the bath.

I could go on (and on, and on); who knows, I might even come back at some point and add to this. On the other hand I take it that by now you got the idea, and even on this last day of the week I have other work to do, so ciao.

Notes

[1] Al’s Famed Model Of the World, In Words Of Four Signs Or Fewer (not quite the exact title, but very close): find it on line here.

[2] If not quite his alma mater in the exact sense of the term, at least the place where he had a post at the time. (If we can trust this entry, his true alma mater would have been Leyde, but he did not stay long.)

PhD position: concurrent programming (SCOOP) for robotics

The ETH Chair of Software Engineering has won a grant from the Hasler foundation, in a joint project with the Technical University of Lucerne and the Autonomous Systems Lab of ETH, to develop a robotics framework involving concurrent computation. The project, called Roboscoop,  will produce a demonstrator system: a “SmartWalker” robot — a robotic version of  “walkers” used by elderly people and others with reduced mobility. The research proposal is available [2].

One of the major goals of the Roboscoop framework is to provide robotics applications with the full possibilities of concurrent programming. Many robotics developments use no or little concurrency because of the tricky programming involved in using threads, and the difficulty of getting applications right. With SCOOP [1] programmers have a simple, high-level mechanism that removes the risk of data races and other plagues of concurrent programming.

We are looking for someone with a strong background in both software engineering and robotics. If you are interested, please see the position announcement [3].

References

[1] On SCOOP see here and here. See also a YouTube video of a small robot programmed with  SCOOP as part of an earlier student project (by Ganesh Ramanathan).

[2] Roboscoop research proposal, here.

[2] Position announcement: here

Specification explosion

To verify software, we must specify it; otherwise there is nothing to verify against. People often cite the burden of specification as the major obstacle toward making verification practical. At issue are not only the effort required to express the goals of software elements (their contracts) but also intermediate assertions, or “verification conditions”, including loop invariants, required by the machinery of verification.

At a Microsoft Software Verification summer school [1] in Moscow on July 18 — the reason why there was no article on this blog last week — Stefan Tobies, one of the lecturers, made the following observation about the specification effort needed to produce fully verified software. In his experience, he said, the ratio of specification lines to program lines is three to one.

Such a specification explosion, to coin a phrase, has to be addressed by any practical approach to verification. It would be interesting to get estimates from others with verification experience.

Reducing specification explosion  is crucial to the Eiffel effort to provide “Verification As a Matter Of Course” [2]. The following three techniques should go a long way:

  • Loop invariant inference. Programmers can be expected to write contracts expressing the purpose of routines (preconditions, postconditions) and classes (class invariants), but often balk at writing the intermediate assertions necessary to prove the correctness of loops. An earlier article [3] mentioned some ongoing work on this problem and I hope to come back to the topic.
  • Frame conventions. As another recent article has discussed [4], a simple language convention can dramatically reduce the number of assertions by making frame conditions explicit.
  • Model-based contracts. This technique calls for a separate article; the basic idea is to express the effect of operations through high-level mathematical models relying on a library that describe such mathematical abstractions as sets, relations, functions and graphs.

The risk of specification explosion is serious enough to merit a concerted defense.

 

References

[1] Summer School in Software Engineering and Verification, details here.

[2] Verification As a Matter Of Course, slides of a March 2010 talk, see an earlier article on this blog.

[3] Contracts written by people, contracts written by machines, an earlier article on this blog.

[4] If I’m not pure, at least my functions are, an earlier article on this blog.

Towards a Calculus of Object Programs

I posted here a draft of a new article, Towards a Calculus of Object Programs.

Here is the abstract:

Verifying properties of object-oriented software requires a method for handling references in a simple and intuitive way, closely related to how O-O programmers reason about their programs. The method presented here, a Calculus of Object Programs, combines four components: compositional logic, a framework for describing program semantics and proving program properties; negative variables to address the specifics of O-O programming, in particular qualified calls; the alias calculus, which determines whether reference expressions can ever have the same value; and the calculus of object structures, a specification technique for the structures that arise during the execution of an object-oriented program.
The article illustrates the Calculus by proving the standard algorithm for reversing a linked list.

Testing insights

Lionel Briand and his group at the Simula Research Laboratory in Oslo have helped raise the standard for empirical research in testing and other software engineering practices by criticizing work that in their opinion relies on wrong assumptions or insufficiently supported evidence. In one of their latest papers [1] they take aim at “Adaptive Random Testing” (ART); one of the papers they criticize is from our group at ETH, on the ARTOO extension [2] to this testing method. Let’s examine the criticism!

We need a bit of background on random testing, ART, and ARTOO:

  • Random testing tries inputs based on a random process rather than attempting a more sophisticated strategy; it was once derided as silly [3], but has emerged in recent years as a useful technique. Our AutoTest tool [4], now integrated in EiffelStudio, has shown it to be particularly effective when applied to code equipped with contracts, which provide built-in test oracles. As a result of this combination, testing can be truly automatic: the two most tedious tasks of traditional testing, test case preparation and test oracle definition, can be performed without human intervention.
  • ART, developed by Chen and others [5], makes random testing not entirely random by ensuring that the inputs are spread reasonably evenly in the input domain.
  • ARTOO, part of Ilinca Ciupa’s PhD thesis on testing defended in 2008,   generalized ART to object-oriented programs, by defining a notion of distance between objects; the ARTOO strategy  avoids choosing objects that are too close to each other. The distance formula, which you can find in[2], combines three elementary distances: between the types of the objects involved,  the values in their primitive fields (integers etc.), and, recursively, the objects to which they have references.

Arcuri and Briand dispute the effectiveness of ART and criticize arguments that various papers have used to show its effectiveness. About the ARTOO paper they write

The authors concluded that ART was better than random testing since it needed to sample less test cases before finding the first failure. However, ART was also reported as taking on average 1.6 times longer due to the distance calculations!

To someone not having read our paper the comment and the exclamation mark would seem to suggest that the paper somehow downplays this property of random testing, but in fact it stresses it repeatedly. The property appears for example in boldface as part of the caption to Table 2: In most cases ARTOO requires significantly less tests to find a fault, but entails a time overhead, and again in boldface in the caption to Table 3: The overhead that the distance calculations introduce in the testing process causes ARTOO to require on average 1.6 times more time than RAND to find the first fault.

There is no reason, then, to criticize the paper on this point. It reports the results clearly and fairly.

If we move the focus from the paper to the method, however, Arcuri and Briand have a point. As they correctly indicate, the number of tests to first fault is not a particularly useful criterion. In fact I argued against it in another paper on testing [6]

The number of tests is not that useful to managers, who need help deciding when to stop testing and ship, or to customers, who need an estimate of fault densities. More relevant is the testing time needed to uncover the faults. Otherwise we risk favoring strategies that uncover a failure quickly but only after a lengthy process of devising the test; what counts is total time. This is why, just as flies get out faster than bees, a seemingly dumb strategy such as random testing might be better overall.

(To understand the mention of flies and bees you need to read [6].) The same article states, as its final principle:

Principle 7: Assessment criteria A testing strategy’s most important property is the number of faults it uncovers as a function of time.

The ARTOO paper, which appeared early in our testing work, used “time to first failure” because it has long been a standard criterion in the testing literature, but it should have applied our own advice and focused on more important properties of testing strategies.

The “principles” paper [6] also warned against a risk awaiting anyone looking for new test strategies:

Testing research is vulnerable to a risky thought process: You hit upon an idea that seemingly promises improvements and follow your intuition. Testing is tricky; not all clever ideas prove helpful when submitted to objective evaluation.

The danger is that the clever ideas may result in so much strategy setup time that any benefit on the rest of the testing process is lost. This danger threatens testing researchers, including those who are aware of it.

The idea of ARTOO and object distance remains attractive, but more work is needed to make it an effective contributor to automated random testing and demonstrate that effectiveness. We can be grateful to Arcuri and Briand for their criticism, and I hope they continue to apply their iconoclastic zeal to empirical software engineering work, ours included.

I have objections of my own to their method. They write that “all the work in the literature is based either on simulations or case studies with unreasonably high failure rates”. This is incorrect for our work, which does not use simulations, relying instead on actual, delivered software, where AutoTest routinely finds faults in an automatic manner.

In contrast, however, Arcuri and Briand rely on fault seeding (also known as fault introduction or fault injection):

To obtain more information on how shapes appear in actual SUT, we carried out a large empirical analysis on 11 programs. For each program, a series of mutants were generated to introduce faults in these programs in a systematic way. Faults generated through mutation [allow] us to generate a large number of faults, in an unbiased and varied manner. We generated 3727 mutants and selected the 780 of them with lower detection probabilities to carry out our empirical analysis of faulty region shapes.

In the absence of objective evidence attesting to the realism of fault seeding, I do not believe any insights into testing obtained from such a methodology. In fact we adopted, from the start of our testing work, the principle that we would never rely on fault seeding. The problem with seeded faults is that there is no guarantee they reflect the true faults that programmers make, especially the significant ones. Techniques for fault seeding are understandably good at introducing typographical mistakes, such as a misspelling or the replacement of a “+” by a “-”; but these are not interesting kinds of fault, as they are easily caught by the compiler, by inspection, by low-tech static tools, or by simple tests. Interesting faults are those resulting from a logical error in the programmer’s mind, and in my experience (I do not know of good empirical studies on this topic) seeding techniques do not generate them.

For these reasons, all our testing research has worked on real software, and all the faults that AutoTest has found were real faults, resulting from a programmer’s mistake.

We can only apply this principle because we work with software equipped with contracts, where faults will be detected through the automatic oracle of a violated assertion clause. It is essential, however, to the credibility and practicality of any testing strategy; until I see evidence to the contrary, I will continue to disbelieve any testing insights resulting from studies based on artificial fault injection.

References

[1] Andrea Arcuri and Lionel Briand: Adaptive Random Testing: An Illusion of Effectiveness, in ISSTA 2011 (International Symposium on Software Testing and Analysis), available here.

[2] Ilinca Ciupa, Andreas Leitner, Manuel Oriol and Bertrand Meyer: ARTOO: Adaptive Random Testing for Object-Oriented Software, in ICSE 2008: Proceedings of 30th International Conference on Software Engineering, Leipzig, 10-18 May 2008, IEEE Computer Society Press, 2008, also available here.

[3] Glenford J. Myers. The Art of Software Testing. Wiley, New York, 1979. Citation:

Probably the poorest methodology of all is random-input testing: the process of testing a program by selecting, at random, some subset of all possible input values. In terms of the probability of detecting the most errors, a randomly selected collection of test cases has little chance of being an optimal, or close to optimal, subset. What we look for is a set of thought processes that allow one to select a set of test data more intelligently. Exhaustive black-box and white-box testing are, in general, impossible, but a reasonable testing strategy might use elements of both. One can develop a reasonably rigorous test by using certain black-box-oriented test-case-design methodologies and then supplementing these test cases by examining the logic of the program (i.e., using white-box methods).

[4] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, available here. For practical uses of AutoTest within EiffelStudio see here.

[5] T. Y. Chen, H Leung and I K Mak: Adaptive Random Testing, in  Advances in Computer Science, ASIAN 2004, Higher-Level Decision Making,  ed. M.J. Maher,  Lecture Notes in Computer Science 3321, Springer-Verlag, pages 320-329, 2004, available here.

[6] Bertrand Meyer: Seven Principles of Software testing, in IEEE Computer, vol. 41, no. 10, pages 99-101, August 2008, also available here.

If I’m not pure, at least my functions are

..

If I’m not pure, at least my jewels are [1].

..

We often need to be reassured that a routine, usually a function, is “pure”, meaning that it does not change the state of the computation. For example, a function used in a contract element (precondition, postcondition, class invariant, loop invariant) should be purely descriptive, since it is a specification element; evaluating it, typically for testing and debugging, should not create a change of behavior — a “Heisenberg effect” — in the very computation that it is intended to assess. Another application is in a concurrency context, particularly in SCOOP (see earlier posts and forthcoming ones): if one or more functions are pure, several of their executions can run  concurrently on the same object.

The notion of purity admits variants. The usual notion is what  [2] calls weak purity, which precludes changes to previously existing objects but allow creating new objects. In the EiffelBase library we also encounter routines that have another form of purity, which we may call “relative” purity: they can leave the same state on exit as they found on entry, but in-between they might change the state.  For the rest of this discussion we will rely on the standard notion of weak purity: no changes permitted on existing objects.

It is often suggested that the programming language should support specifying that a routine is pure; many people have indeed proposed the addition of a keyword such as pure to Eiffel. One of the reasons this is not — in my opinion — such a great idea is that purity is just a special case of the more general problem of framing: specifying and verifying what a routine does not change. If we can specify an arbitrary frame property, then we can, as a special case covered by the general mechanism, specify that a routine changes nothing.

To see why framing is so important, consider a class ACCOUNT with a routine deposit that has the postcondition

balance = old balance + sum………..— Where sum is the argument of deposit

Framing here means stating that nothing else than balance changes; for example the account’s owner and its number should remain the same. It is not practical to write all individual postcondition clauses such as

owner= old owner
number=
old number

and so on. But we do need to specify these properties and enforce them, if only to avoid that a descendant class (maybe MAFIA_ACCOUNT) distort the rules defined by the original.

One technique is to add a so-called “modifies clause”, introduced by verification tools such as ESC-Java [3] and JML [4]. Modifies clauses raise some theoretical issues; in particular, the list of modified expressions is often infinite, so we must restrict ourselves to an abstract-data-type view where we characterize a class by commands and queries and the modifies clause only involves queries of the class. Many people find this hard to accept at first, since anything that is not talked about can change, but it is the right approach. A modifies clause of sorts, included in the postcondition, appeared in an earlier iteration of the Eiffel specification, with the keyword only (which is indeed preferable to modifies, which in the Eiffel style favoring grammatically simple keywords would be modify, since what we want to express is not that the routine must change anything at all  but that it may only change certain specified results!). The convention worked well with inheritance, since it included the rule that a clause such as only balance, in class  ACCOUNT, prescribes that the routine may not, in its modifies version as well as in any version redefined in descendants, change any other query known at the level of ACCOUNT; but a descendant version may change, subject to its own ACCOUNT clauses, any new query introduced by a descendant.

To declare a routine as pure, it would suffice to use an empty only clause (not very elegant syntactically — “only” what? — but one can get used to it).

This construct has been discarded, as it places too heavy a burden on the programmer-specifier. Here the key observation resulted from a non-scientific but pretty extensive survey I made of all the JML code I could get my hands on. I found that every time a query appeared in a modifies clause it was also listed in the postcondition! On reflection, this seems reasonable: if you are serious about specification, as anyone bothering to write such a clause surely is, you will not just express that something changes and stop there; you will write something about how it may change. Not necessarily the exact result, as in

my_query = precise_final_value

but at least some property of that result, as in

some_property (my_query)

This observation has, however, an inescapable consequence for language design: modifies or only clauses should be inferred by the compiler from the postcondition, not imposed on the programmer as an extra burden. The convention, which we may call the Implicit Framing Rule, is simple:

A routine may change the value of a query only if the query is specified in the routine’s postcondition

(or, if you like double negation, “no routine may change the value of a query other than those specified in its postcondition”). Here we say that a feature is “specified” in a postcondition if it appears there outside of the scope of an old expression. (Clearly, an occurrence such as old balance does not imply that balance can be modified, hence this restriction to occurrences outside of an old.)

With this convention the only clause becomes implicit: it would simply list all the queries specified in the postcondition, so there is no need for the programmer to write it. For the rare case of wanting to specify that a query q may change, but not wanting to specify how, it is easy to provide a library function, say involved, that always return true and can be used in postconditions, as in involved (q).

The convention is of course not just a matter of programming methodology but, in an IDE supporting verification, such as the EVE “Verification As a Matter Of Course” environment which we are building for Eiffel [5], the compiler will enforce the definition above — no change permitted to anything not specified in the postcondition — and produce an error in case of a violation. This also means that we can easily specify that a routine is pure: it must simply not specify any query in its postcondition. It may still list it in an old clause, as happens often in practice, e.g.

Result = old balance – Minimum_balance………..— In the postcondition of a function available_funds

Note the need to use old here. Apart from this addition of old to some postconditions, a considerable advantage of the convention is that most existing code using pure functions will be suitable to the new purity enforcement without any need to provide new annotations.

I believe that this is the only sustainable convention. It does not, of course, solve the frame problem by itself (for attempts in this direction see [6, 7]), but it is a necessary condition for a solution that is simple, easily taught, fairly easily implemented, and effective. It goes well with model-based specifications [8, 9], which I believe are the technique of most promise for usable  specifications of object-oriented software. And it provides a straightforward, no-frills way to enforce purity where prescribed by the Command-Query Separation principle [10, 11]: if I’m not pure, at least my functions must be.

References

[1] From the lyrics of the aria Glitter and Be Gay in Leonard Bernstein’s Candide, text by Lillian Hellman and others. Youtube offers several performances, including  by Diana Damrau (here) and Natalie Dessay (here) . For the text see e.g. here.

[2] Adam Darvas and Peter Müller: Reasoning About Method Calls in Interface Specifications, in Journal of Object Technology, Volume 5, no. 5, jUNE 2006, pages 59-85, doi:10.5381/jot.2006.5.5.a3, available here.

[3] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe and R. Stata: Extended static checking for Java, in PLDI 2002 (Programming Language Design and Implementation), pages 234–245, 2002.

[4] Gary Leavens et al.: Java Modeling Language, see references here.

[5] Julian Tschannen, Carlo A. Furia, Martin Nordio, and Bertrand Meyer: Verifying Eiffel Programs with Boogie, to appear in Boogie 2011, First International Workshop on Intermediate Verification Languages, Wroclaw, August 2011. See documentation about the EVE project on the project page.

[6] Ioannis Kassios: Dynamic Frames: Support for Framing, Dependencies and Sharing Without Restrictions, in Formal Methods 2006, eds. J. Misra, T. Nipkow and E. Sekerinski, Lecture Notes in Computer Science 4085, Springer Verlag, 2006, pages 268-283.

[7] Bernd Schoeller: Making Classes Provable through Contracts, Models and Frames, PhD thesis, ETH Zurich, 2007, available here

[8] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[9] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ’10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[10] Bertrand Meyer: Object-Oriented Software Construction, first (1988) and second (1997) editions, Prentice Hall.

[11] Bertrand Meyer: Touch of Class: An Introduction to Programming Well, Using Objects and Contracts, Springer Verlag, 2009.