Archive for the ‘Computer science’ Category.

Towards a Calculus of Object Programs

I posted here a draft of a new article, Towards a Calculus of Object Programs.

Here is the abstract:

Verifying properties of object-oriented software requires a method for handling references in a simple and intuitive way, closely related to how O-O programmers reason about their programs. The method presented here, a Calculus of Object Programs, combines four components: compositional logic, a framework for describing program semantics and proving program properties; negative variables to address the specifics of O-O programming, in particular qualified calls; the alias calculus, which determines whether reference expressions can ever have the same value; and the calculus of object structures, a specification technique for the structures that arise during the execution of an object-oriented program.
The article illustrates the Calculus by proving the standard algorithm for reversing a linked list.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Testing insights

Lionel Briand and his group at the Simula Research Laboratory in Oslo have helped raise the standard for empirical research in testing and other software engineering practices by criticizing work that in their opinion relies on wrong assumptions or insufficiently supported evidence. In one of their latest papers [1] they take aim at “Adaptive Random Testing” (ART); one of the papers they criticize is from our group at ETH, on the ARTOO extension [2] to this testing method. Let’s examine the criticism!

We need a bit of background on random testing, ART, and ARTOO:

  • Random testing tries inputs based on a random process rather than attempting a more sophisticated strategy; it was once derided as silly [3], but has emerged in recent years as a useful technique. Our AutoTest tool [4], now integrated in EiffelStudio, has shown it to be particularly effective when applied to code equipped with contracts, which provide built-in test oracles. As a result of this combination, testing can be truly automatic: the two most tedious tasks of traditional testing, test case preparation and test oracle definition, can be performed without human intervention.
  • ART, developed by Chen and others [5], makes random testing not entirely random by ensuring that the inputs are spread reasonably evenly in the input domain.
  • ARTOO, part of Ilinca Ciupa’s PhD thesis on testing defended in 2008,   generalized ART to object-oriented programs, by defining a notion of distance between objects; the ARTOO strategy  avoids choosing objects that are too close to each other. The distance formula, which you can find in[2], combines three elementary distances: between the types of the objects involved,  the values in their primitive fields (integers etc.), and, recursively, the objects to which they have references.

Arcuri and Briand dispute the effectiveness of ART and criticize arguments that various papers have used to show its effectiveness. About the ARTOO paper they write

The authors concluded that ART was better than random testing since it needed to sample less test cases before finding the first failure. However, ART was also reported as taking on average 1.6 times longer due to the distance calculations!

To someone not having read our paper the comment and the exclamation mark would seem to suggest that the paper somehow downplays this property of random testing, but in fact it stresses it repeatedly. The property appears for example in boldface as part of the caption to Table 2: In most cases ARTOO requires significantly less tests to find a fault, but entails a time overhead, and again in boldface in the caption to Table 3: The overhead that the distance calculations introduce in the testing process causes ARTOO to require on average 1.6 times more time than RAND to find the first fault.

There is no reason, then, to criticize the paper on this point. It reports the results clearly and fairly.

If we move the focus from the paper to the method, however, Arcuri and Briand have a point. As they correctly indicate, the number of tests to first fault is not a particularly useful criterion. In fact I argued against it in another paper on testing [6]

The number of tests is not that useful to managers, who need help deciding when to stop testing and ship, or to customers, who need an estimate of fault densities. More relevant is the testing time needed to uncover the faults. Otherwise we risk favoring strategies that uncover a failure quickly but only after a lengthy process of devising the test; what counts is total time. This is why, just as flies get out faster than bees, a seemingly dumb strategy such as random testing might be better overall.

(To understand the mention of flies and bees you need to read [6].) The same article states, as its final principle:

Principle 7: Assessment criteria A testing strategy’s most important property is the number of faults it uncovers as a function of time.

The ARTOO paper, which appeared early in our testing work, used “time to first failure” because it has long been a standard criterion in the testing literature, but it should have applied our own advice and focused on more important properties of testing strategies.

The “principles” paper [6] also warned against a risk awaiting anyone looking for new test strategies:

Testing research is vulnerable to a risky thought process: You hit upon an idea that seemingly promises improvements and follow your intuition. Testing is tricky; not all clever ideas prove helpful when submitted to objective evaluation.

The danger is that the clever ideas may result in so much strategy setup time that any benefit on the rest of the testing process is lost. This danger threatens testing researchers, including those who are aware of it.

The idea of ARTOO and object distance remains attractive, but more work is needed to make it an effective contributor to automated random testing and demonstrate that effectiveness. We can be grateful to Arcuri and Briand for their criticism, and I hope they continue to apply their iconoclastic zeal to empirical software engineering work, ours included.

I have objections of my own to their method. They write that “all the work in the literature is based either on simulations or case studies with unreasonably high failure rates”. This is incorrect for our work, which does not use simulations, relying instead on actual, delivered software, where AutoTest routinely finds faults in an automatic manner.

In contrast, however, Arcuri and Briand rely on fault seeding (also known as fault introduction or fault injection):

To obtain more information on how shapes appear in actual SUT, we carried out a large empirical analysis on 11 programs. For each program, a series of mutants were generated to introduce faults in these programs in a systematic way. Faults generated through mutation [allow] us to generate a large number of faults, in an unbiased and varied manner. We generated 3727 mutants and selected the 780 of them with lower detection probabilities to carry out our empirical analysis of faulty region shapes.

In the absence of objective evidence attesting to the realism of fault seeding, I do not believe any insights into testing obtained from such a methodology. In fact we adopted, from the start of our testing work, the principle that we would never rely on fault seeding. The problem with seeded faults is that there is no guarantee they reflect the true faults that programmers make, especially the significant ones. Techniques for fault seeding are understandably good at introducing typographical mistakes, such as a misspelling or the replacement of a “+” by a “-”; but these are not interesting kinds of fault, as they are easily caught by the compiler, by inspection, by low-tech static tools, or by simple tests. Interesting faults are those resulting from a logical error in the programmer’s mind, and in my experience (I do not know of good empirical studies on this topic) seeding techniques do not generate them.

For these reasons, all our testing research has worked on real software, and all the faults that AutoTest has found were real faults, resulting from a programmer’s mistake.

We can only apply this principle because we work with software equipped with contracts, where faults will be detected through the automatic oracle of a violated assertion clause. It is essential, however, to the credibility and practicality of any testing strategy; until I see evidence to the contrary, I will continue to disbelieve any testing insights resulting from studies based on artificial fault injection.

References

[1] Andrea Arcuri and Lionel Briand: Adaptive Random Testing: An Illusion of Effectiveness, in ISSTA 2011 (International Symposium on Software Testing and Analysis), available here.

[2] Ilinca Ciupa, Andreas Leitner, Manuel Oriol and Bertrand Meyer: ARTOO: Adaptive Random Testing for Object-Oriented Software, in ICSE 2008: Proceedings of 30th International Conference on Software Engineering, Leipzig, 10-18 May 2008, IEEE Computer Society Press, 2008, also available here.

[3] Glenford J. Myers. The Art of Software Testing. Wiley, New York, 1979. Citation:

Probably the poorest methodology of all is random-input testing: the process of testing a program by selecting, at random, some subset of all possible input values. In terms of the probability of detecting the most errors, a randomly selected collection of test cases has little chance of being an optimal, or close to optimal, subset. What we look for is a set of thought processes that allow one to select a set of test data more intelligently. Exhaustive black-box and white-box testing are, in general, impossible, but a reasonable testing strategy might use elements of both. One can develop a reasonably rigorous test by using certain black-box-oriented test-case-design methodologies and then supplementing these test cases by examining the logic of the program (i.e., using white-box methods).

[4] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, available here. For practical uses of AutoTest within EiffelStudio see here.

[5] T. Y. Chen, H Leung and I K Mak: Adaptive Random Testing, in  Advances in Computer Science, ASIAN 2004, Higher-Level Decision Making,  ed. M.J. Maher,  Lecture Notes in Computer Science 3321, Springer-Verlag, pages 320-329, 2004, available here.

[6] Bertrand Meyer: Seven Principles of Software testing, in IEEE Computer, vol. 41, no. 10, pages 99-101, August 2008, also available here.

VN:F [1.9.10_1130]
Rating: 8.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 7 votes)

Agile methods: the good, the bad and the ugly

It was a bit imprudent last Monday to announce the continuation of the SCOOP discussion for this week; with the TOOLS conference happening now, with many satellite events such as the Eiffel Design Feast of the past week-end and today’s “New Eiffel Technology Community” workshop, there is not enough time for a full article. Next week might also be problematic. The SCOOP series will resume, but in the meantime I will report on other matters.

As something that can be conveniently typed in while sitting in the back of the TOOLS room during fascinating presentations, here is a bit of publicity for the next round of one-day seminars for industry — “Compact Course” is the official terminology — that I will be teaching at ETH in Zurich next November (one in October), some of them with colleagues. It’s the most extensive session that we have ever done; you can see the full programs and registration information here.

  • Software Engineering for Outsourced and Distributed Development, 27 October 2011
    Taught with Peter Kolb and Martin Nordio
  • Requirements Engineering, 17 November
  • Software Testing and Verification: state of the art, 18 November
    With Carlo Furia and Sebastian Nanz
  • Agile Methods: the Good, the Bad and the Ugly, 23 November
  • Concepts and Constructs of Concurrent Computation, 24 November
    With Sebastian Nanz
  • Design by Contract, 25 November

The agile methods course is new; its summary reads almost like a little blog article, so here it is.

Agile methods: the Good, the Bad and the Ugly

Agile methods are wonderful. They’ll give you software in no time at all, turn your customers and users into friends, catch bugs before they catch you, change the world, and boost your love life. Do you believe these claims (even excluding the last two)? It’s really difficult to form an informed opinion, since most of the presentations of eXtreme Programming and other agile practices are intended to promote them (and the consultants to whom they provide a living), not to deliver an objective assessment.

If you are looking for a guru-style initiation to the agile religion, this is not the course for you. What it does is to describe in detail the corpus of techniques covered by the “agile” umbrella (so that you can apply them effectively to your developments), and assess their contribution to software engineering. It is neither “for” nor “against” agile methods but fundamentally descriptive, pedagogical, objective and practical. The truth is that agile methods include some demonstrably good ideas along with some whose benefits are at best dubious. In addition (and this should not be a surprise) they cannot make the fundamental laws of software engineering go away.

Agile methods have now been around for more than a decade, during which many research teams, applying proven methods of experimental science, have performed credible empirical studies of how well the methods really work and how they compare to more traditional software engineering practices. This important body of research results, although not widely known, is critical to managers and developers in industry for deciding whether and how to use agile development. The course surveys these results, emphasizing the ones most directly relevant to practitioners.

A short discussion session will enable participants with experience in agile methods to share their results.

Taking this course will give you a strong understanding of agile development, and a clear view of when, where and how to apply them.

Schedule

Morning session: A presentation of agile methods

  • eXtreme Programming, pair programming, Scrum, Test-Driven Development, continuous integration, refactoring, stakeholder involvement, feature-driven development etc.
  • The agile lifecycle.
  • Variants: lean programming etc.

Afternoon session (I): Assessment of agile methods

  • The empirical software engineering literature: review of available studies. Assessment of their value. Principles of empirical software engineering.
  • Agile methods under the scrutiny of empirical research: what helps, what harms, and what has no effect? How do agile methods fare against traditional techniques?
  • Examples: pair programming versus code reviews; tests versus specifications; iterative development versus “Big Upfront Everything”.

Afternoon session (II): Discussion and conclusion

This final part of the course will present, after a discussion session involving participants with experience in agile methods, a summary of the contribution of agile methods to software engineering.

It will conclude with advice for organizations involved in software development and interested in applying agile methods in their own environment.

Target groups

CIOs; software project leaders; software developers; software testers and QA engineers.

VN:F [1.9.10_1130]
Rating: 8.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Incremental research again

After some updating, I republished in the Communications of the ACM blog my discussion of the value of incremental research, which first appeared as an article here .

VN:F [1.9.10_1130]
Rating: 9.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Assessing concurrency models

By describing a  poorly conceived hypothetical experiment, last week’s article described the “Professor Smith syndrome” consisting of four risks that threaten the validity of empirical software engineering experiments relying on students in a course:

  • Professor Smith Risk 1: possible bias if the evaluator has a stake in the ideas or tools under assessment.
  • Professor Smith Risk 2: creating different levels of motivation in the different groups (Hawthorne effect).
  • Professor Smith Risk 3: extrapolating from students to professionals.
  • Professor Smith Risk 4: violation of educational ethics if the experiment may cause some students to learn better than others.

If you have developed a great new method or tool and would like to assess it, the best way to address Risk 1 is to find someone else to do the assessment. What if  this solution is not practical? Recently we wanted to get some empirical evidence on the merits of the SCOOP (Simple Concurrent Object-Oriented Programming) approach to concurrency [1, 2], on which I have worked for a long time and which is now part of EiffelStudio since the release of 6.8 a couple of weeks ago. We wanted to see if, despite the Professor Smith risks, we could do a credible study ourselves.

The ETH Software Architecture course[3], into which we introduced some introductory material on concurrency last year (as part of a general effort to push more concurrency into software courses at ETH), looked like a good place to try an evaluation; it is a second-year course, where students, or so we thought, would have little prior experience in concurrent software design.

The study’s authors — Sebastian Nanz, Faraz Torshizi and Michela Pedroni — paid special attention to the methodological issues. To judge for yourself whether we addressed them properly, you can read the current version of our paper to be presented at ESEM 2011 [4]. Do note that it is a draft and that we will improve the paper for final publication.

Here is some of what we did. I will not address the Professor Smith Risk 3, the use of students, which (as Lionel Briand has pointed out in a comment on the previous article) published work has studied; in a later article I will give  references to some of that work. But we were determined to tackle the other risks explicitly, so as to obtain credible results.

The basic experiment was a session in which the students were exposed to two different design methods for concurrent software: multithreaded programming in Java, which I’ll call “Java Threads”, and SCOOP. We wanted to explore whether it is easier to program in SCOOP than in Java. This is too general a hypothesis, so it was refined into three concrete hypotheses: is it easier to understand a SCOOP program? Is it easier to find errors in SCOOP programs? Do programmers using SCOOP make fewer errors?

A first step towards reducing the effect — Professor Smith Risk 1 — of any emotional attachment of the experimenters  to one of the approaches, SCOOP in our case, was to generalize the study. Although what directly interested us was to compare SCOOP against Java Threads, we designed the study as a general scheme to compare concurrency approaches; SCOOP and Java Threads are just an illustration, but anyone else interested in assessing concurrency techniques — say Erlang versus C# concurrency — can apply the same methodology. This decision had two benefits: it freed the study from dependency on the particular techniques, hence, we hope, reducing bias; and as side attraction of the kind that is hard for researchers to resist, it increased the publishability of the results.

Circumstances unexpectedly afforded us another protection against any for-SCOOP bias: unbeknownst to us at the time of the study’s design, a first-year course had newly added (in 2009, whereas our study was performed in 2010) an introduction to concurrent programming — using Java Threads! While we had thought that concurrency in any form would be new to most students, in fact almost all of them had now seen Java Threads before. (The new material in the first-year course was taken by ETH students only, but many transfer students had also already had an exposure to Java Threads.) On the other hand, students had not had any prior introduction to SCOOP. So any advantage that one of the approaches may have had because of students’ prior experience would work against our hypotheses. This unexpected development would not help if the study’s results heavily favored Java Threads, but if they favored SCOOP it would reinforce their credibility.

A particular pedagogical decision was made regarding the teaching of our concurrency material: it started with a self-study rather than a traditional lecture. One of the reasons for this decision was purely pedagogical: we felt (and the course evaluations confirmed) that at that stage of the semester the students would enjoy a break in the rhythm of the course. But another reason was to avoid any bias that might have arisen from any difference in the lecturers’ levels of enthusiasm and effectiveness in teaching the two approaches. In the first course session devoted to concurrency, students were handed study materials presenting Java Threads and SCOOP and containing a test to be taken; the study’s results are entirely based on their answers to these tests. The second session was a traditional lecture presenting both approaches again and comparing them. The purpose of this lecture was to make sure the students got the full picture with the benefit of a teacher’s verbal explanations.

The study material was written carefully and with a tone as descriptive and neutral as possible. To make comparisons meaningful, it does not follow a structure specific to Java Threads or  SCOOP  (as we might have used had we taught only one of these approaches); instead it relies in both cases on the same overall plan  (figure 2 of the paper), based on a neutral analysis of concurrency concepts and issues: threads, mutual exclusion, deadlock etc. Each section then presents, for one such general concurrency question, the solution proposed by Java Threads or SCOOP.

This self-study material, as well as everything else about the study, is freely available on the Web; see the paper for the links.

In the self-study, all students studied both the Java Threads and SCOOP materials. They were randomly assigned to two groups, for which the only difference was the order of studying the approaches. We feel that this decision addresses the ethical issue (Professor Smith Risk 4): any pedagogical effect of reading about A before B rather than the reverse, in the course of a few hours, has to be minimal if you end up reading about the two of them, and on the next day follow a lecture that also covers both.

Having all students study both approaches — a crossover approach in the terminology of [5] — should also address the Hawthorne effect (Professor Smith Risk 2): students have no particular incentive to feel that one of the approaches is more hip than the other. While they are not told that SCOOP is partly the work of the instructors, some of them may know or guess this information; the consequences, positive or negative, are limited, since they are asked in both cases to do as well as they can in answering the assessment questions.

The design of that evaluation is another crucial element in trying to avoid bias. We tried, to the extent possible, to base the assessment on objective criteria. For the first hypothesis (program understanding) the technique was to ask the students to predict the output of some simple concurrent programs. To address the risk of a binary correct/incorrect assessment, and get a more fine-grained view, we devised the programs so that they would produce output strings and measured the Levenshtein (edit) distance to the correct result. For the second hypothesis (ease of program debugging), we gave students programs exhibiting typical errors in both approaches and asked them to tell us both the line number of any error they found and an explanation. Assessing the explanation required human analysis; the idea of also assigning partial credit for pointing out a line number without providing a good explanation is that it may be meaningful that a student found that something is amiss even without being quite able to define what it is. The procedure for the third hypothesis (producing programs with fewer errors) was more complex and required two passes over the result; it requires some human analysis, as you will see in the article, but we hope that the two-pass process removes any bias.

This description of the study is only partial and you should read the article [4] for the full details of the procedure.

So what did we find in the end? Does SCOOP really makes concurrency easier to learn, concurrent programs easier to debug, and concurrent programmers less error-prone? Here too  I will refer you to the article. Let me simply mention that the results held some surprises.

In obtaining these results we tried very hard to address the Professor Smith syndrome and its four risks. Since all of our materials, procedures and data are publicly accessible, described in some detail in the paper, you can determine for yourself how well we met this objective, and whether it is possible to perform credible assessments even of one’s own work.

References

Further reading: for general guidelines on how to conduct empirical research see [5]; for ethical guidelines, applied to psychological research but generalizable, see [6].

[1] SCOOP Eiffel documentation, available here.

[2] SCOOP project documentation at ETH, available here.

[3] Software Architecture course at ETH, course page (2011).

[4] Sebastian Nanz, Faraz Torshizi, Michela Pedroni and Bertrand Meyer: Design of an Empirical Study for Comparing the Usability of Concurrent Programming Languages, to appear in ESEM 2011 (ACM/IEEE International Symposium on Empirical Software Engineering and Measurement), 22-23 September 2011. Draft available here.

[5] Barbara A. Kitchenham, Shari L. Pfleeger, Lesley M. Pickard, Peter W. Jones, David C. Hoaglin, Khaled El-Emam and Jarrett Rosenberg: Preliminary Guidelines for Empirical Research in Software Engineering, national Research Council Canada (NRC-CNRC), Report ERB-1082, 2001, available here.

[6] Robert Rosenthal: Science and ethics in conducting, analyzing, and reporting psychological research, in  Psychological Science, 5, 1994, p127-134. I found a copy cached by a search engine here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

The Professor Smith syndrome: Part 2

As stated in the Quiz of a few days ago (“Part 1 ”), we consider the following hypothetical report in experimental software engineering ([1], [2]):

Professor Smith has developed a new programming technique, “Suspect-Oriented Programming” (SOP). To evaluate SOP, he directs half of the students in his “Software Methodology” class to do the project using traditional techniques, and the others to use SOP.

He finds that projects by the students using SOP have, on the average, 15% fewer bugs than the others, and reports that SOP increases software reliability.

What’s wrong with this story?

Professor Smith’s attempt at empirical software engineering is problematic for at least four reasons. Others could arise, but we do not need to consider them if Professor Smith has applied the expected precautions: the number of students should be large enough (standard statistical theory will tell us how much to trust the result for various sample sizes); the students should be assigned to one of the two groups on a truly random basis; the problem should be amenable to both SOP and non-SOP techniques; and the assessment of the number of bugs should in the results should be based on fair and if possible automated evaluation. Some respondents to the quiz cited these problems, but they would apply to any empirical study and we can assume they are being taken care of.

The first problem to consider is that the evaluator and the author of the concept under evaluation are the same person. This is an approach fraught with danger. We have no reason to doubt Professor Smith’s integrity, but he is human. Deep down, he wants SOP to be better than the alternative. That is bound to affect the study. It would be much more credible if someone else, with no personal stake in SOP, had performed it.

The second problem mirrors the first on the students’ side. The students from group 1 were told that they used Professor Smith’s great idea, those from group 2 that they had to use old, conventional, boring stuff. Did both groups apply the same zeal to their work? After all, the students know that Professor Smith created SOP, and maybe he is an convincing advocate, so group 1 students will (consciously or not) do their best; those from group 2 have less incentive to go the extra mile. What we may have at play here is a phenomenon known as the Hawthorne effect [3]: if you know you are being tested for a new technique, you typically work harder and better — and may produce better results even if the technique is worthless! Experiments dedicated to studying this effect show that even  a group that is in reality using the same technique as another does better, at least at the beginning, if it is told that it is using a new, sexy technique.

The first and second problems arise in all empirical studies, software-related or not. They are the reason why medical experiments use placebos and double-blind techniques (where neither the subjects nor the experimenters themselves know who is using which variant). These techniques often do not directly transpose to software experiments, but we should all the same be careful about empirical studies of assessments of one’s own work and about possible Hawthorne effects.

The third problem, less critical, is the validity of a study relying on students. To what extent can we extrapolate from the results to a situation in industry? Software engineering students are on their way to becoming software professionals, but they are not professionals yet. This is a difficult issue because universities, rather than industry, are usually and understandably the place where experiments take place, an sometimes there is no other choice than using students. But then one can question the validity of the results. It depends on the nature of the questions being asked: if the question under study is whether a certain idea is easy to learn, using students is reasonable. But if it is, for example, whether a technique produces less buggy programs, the results can depend significantly on the subjects’ experience, which is different for students and professionals.

The last problem does not by itself affect the validity of the results, but it is a show-stopper nonetheless: Professor Smith’s experiment is unethical! If is is indeed true that SOP is better than the alternative, he is harming students from group 2; in the reverse case, he is harming students from group 1. Only in the case of the null hypothesis (using SOP makes no statistically significant difference) is the experiment ethical, but then it is also profoundly uninteresting. The rule in course-related experiments is a variant of the Hippocratic oath: before all, do not harm. The first purpose of a course is to enrich the students’ knowledge and skills; secondary aims, such as helping the professor’s research, are definitely acceptable, but must never impede the first. The setup described above is all the less acceptable that the project results presumably count towards the course grade, so the students who were forced to use the less good technique, if there demonstrably was one, have grounds to complain.

Note that Professor Smith could partially address this fairness problem by letting students choose their group, instead of assigning them randomly to group 1 or group 2 (based for example on the first letter of their names). But then the results would lose credibility, because this technique introduces self-selection and hence bias: the students who choose SOP may be the more intellectually curious students, and hence possibly the ones who do better anyway.

If Professor Smith cannot ensure fairness, he can still use students for his experiment, but has to run it outside of a course, for example by paying students, or running the experiment as a competition with some prizes for those who produce the programs with fewest bugs. This technique can work, although it introduces further dangers of self-selection. As part of a course, however, you just cannot assign students, on your own authority, to different techniques that might have an different effect on the core goal of the course: the learning experience.

So Professor Smith has a long way to go before he can run experiments that will convey a significant argument in favor of SOP.

Over the years I have seen, as a reader and sometimes as a referee, many Professor Smith papers: “empirical” evaluation of a technique by its own authors, using questionable techniques and not applying the necessary methodological precautions.

A first step is, whenever possible, to use experimenters who are from a completely different group from the developers of the ideas, as in two studies [4] [5] about the effectiveness of pair programming.

And yet! Sometimes no one else is available, and you do want to obtain objective empirical evidence about the merits of your own ideas. You are aware of the risk, and ready to face the cold reality, including if the results are unfavorable. Can you do it?

A recent attempt of ours seems to suggest that this is possible if you exert great care. It will presented in a paper at the next ESEM (Empirical Software Engineering and Measurement) and even though it discusses assessing some aspects of our own designs, using students, as part of the course project which counts for grading, and separating them into groups, we feel it was fair and ethical, and </modesty_filter_on>an ESEM referee wrote: “this is one of the best designed, conducted, and presented empirical studies I have read about recently”<modesty_filter_on>.

How did we proceed? How would you have proceeded? Think about it; feel free to express your ideas as comments to this post. In the next installment of this blog (The Professor Smith Syndrome: Part 3), I will describe our work, and let you be the judge.

References

[1] Bertrand Meyer: The rise of empirical software engineering (I): the good news, this blog, 30 July 2010, available here.
[2] Bertrand Meyer: The rise of empirical software engineering (II): what we are still missing, this blog, 31 July 2010, available here.

[3] On the Hawthorne effect, there is a good Wikipedia entry. Acknowledgment: I first heard about the Hawthorne effect from Barry Boehm.

[4] Jerzy R. Nawrocki, Michal Jasinski, Lukasz Olek and Barbara Lange: Pair Programming vs. Side-by-Side Programming, in EuroSPI 2005, pages 28-38. I do not have a URL for this article.

[5] Matthias Müller: Two controlled Experiments concerning the Comparison of Pair Programming to Peer Review, in  Journal of Systems and Software, vol. 78, no. 2, pages 166-179, November 2005; and Are Reviews an Alternative to Pair Programming ?, in  Journal of Empirical Software Engineering, vol. 9, no. 4, December 2004. I don’t have a URL for either version. I am grateful to Walter Tichy for directing me to this excellent article.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

In praise of Knuth and Liskov

In November of 2005, as part of the festivities of its 150-th anniversary, the ETH Zurich bestowed honorary doctorates on Don Knuth and Barbara Liskov. I gave the speech (the “laudatio”). It was published in Informatik Spektrum, the journal of Gesellschaft für Informatik, the German Computer Society, vo. 29, no. 1, February 2006, pages 74-76; I came across it recently and thought others might be interested in this homage to two great computer scientists.  The beginning was in German; I translated it into English. I also replaced a couple of German expressions by their translations: “ETH commencement” for ETH-Tag (the official name of the annual ceremony) and “main building” for Hauptgebäude.

I took this picture of Wirth, Liskov and Knuth (part of my gallery of computer scientists)  later that same day.

 

Laudatio

 In an institution, Ladies and Gentlement, which so proudly celebrates its hundred-and-fiftieth anniversary, a relatively young disciplines sometimes has cause for envy. We computer scientists are still the babies, or at least the newest kids on the block. Outside of this building, for example, you will see streets bearing such names as Clausius, yet there is neither a Von Neumann Lane nor a a Wirth Square. Youth, however,  also has its advantages; perhaps the most striking is that we still can, in our own lifetime, meet in person some of the very founders of our discipline. No living physicist has seen Newton; no chemist has heard Lavoisier. For us, it works. Today, Ladies and Gentlemen, we have the honor of introducing two of the undisputed pioneers of informatics.

Barbara Liskov

The first of our honorees today is Professor Barbara Liskov. To understand her contributions it is essential to realize the unfair competition in which the so-called Moore’s law pits computer software against computing hardware. To match the astounding progress of computing speed and memory over the past five decades, all that we have on the software side is our own intelligence which, it is safe to say, doesn’t double every eighteen months at constant price. The key to scaling up is abstraction; all advances in programming methodology have relied on new abstraction techniques. Perhaps the most significant is data abstraction, which enables us to organize complex systems on the basis of the types of objects they manipulate, defined in completely abstract terms. This is the notion of abstract data type, a staple component today of every software curriculum, including in the very first programming course here ETH. it was introduced barely thirty years ago in a seemingly modest article in SIGPLAN Notices — the kind of publication that hardly registers a ripple in science indexes — by Barbara Liskov and Stephen Zilles. Few papers have had a more profound impact on the theory and practice of software development than this contribution, “Programming with Abstract Data Types”.

The idea of abstract data types, or ADTs, is one of those Egg of Christopher Columbus moments; a seemingly simple intuition that changes the course of things. An ADT is a class of objects described in terms not of their internal properties, but of the operations applicable to them, and the abstract properties of these operations. Not by what they are, but by what they have. A rather capitalistic view of the world, but well suited to the description of complex systems where each part knows as little as possible about the others to protect itself about their future changes.

An abstraction such as ETH-Commencement could be described in a very concrete way: it happens in a certain place, consists of one event after another, gathers so many people. This is what we computer scientists call an implementation-oriented view, and relying on it means that we can’t change any detail without endangering the consistency of other processes, such as the daily planning of room allocation in the Main Building, which use it. In an ADT view, the abstraction “ETH Commencement” is characterized not by what it is but by what it has: a start, an end, an audience, and operations such as “Schedule the ETH Commencement”, “ Reschedule it”, “Start it”, “End it”. They provide to the rest of the world a clean, precisely specified interface which enables every ADT to use every other based on the minimum properties it requires, thus isolating them from irrelevant internal changes, and providing an irreplaceable weapon in the incessant task of software engineering: battling complexity.

Barbara Liskov didn’t stay with the theoretical concepts but implemented the ideas in the CLU language, one of the most influential of the set of programming languages that in the nineteen-seventies changed our perspective of how to develop good software.

She went on to seminal work on operating systems and distributed computing, introducing several widely applied concepts such as guardians, and always backing her theoretical innovations by building practical systems, from the CLU language and compiler to the Argus and Mercury distributed operating systems. Distributed systems, such as those which banks, airlines and other global enterprises run on multiple machines across multiple networks, raise particularly challenging issues. To quote from the introduction of her article on Argus:

A centralized system is either running or crashed, but a distributed system may be partly running and partly crashed. Distributed programs must cope with failures of the underlying hardware. Both the nodes and the network may fail. The goal of Argus is to provide mechanisms that make it easier for programmers to cope with these problems.

Barbara Liskov’s work introduced seminal concepts to deal with these extremely difficult problems.

Now Ford professor of engineering at MIT, she received not long ago the prestigious John von Neumann award of the IEEE; she has been one of the most influential people in software engineering. We are grateful for how Professor Barbara Liskov has helped shape the field are honored to have her at ETH today.

 Donald Knuth

In computer science and beyond, the name of Donald Knuth carries a unique aura. A professor at Stanford since 1968, now emeritus, he is the only person on record whose job title is the title of his own book: Professor of the Art of Computer Programming. This is for his eponymous multi-volume treatise, which established the discipline of algorithm analysis, and has had more effect than any other computer science publication. The Art of Computer Programming is a marvel of breadth, depth, completeness, mathematical rigor and clarity, not to forget humor. In that legendary book you will find exposed in detail the algorithms and data structures that lie at the basis of all software applications today. A Monte Carlo simulation, as a physicists may use, requires a number sequence that is both very long and very random-looking, even though the computer is a deterministic machine; if the simulation is any good, it almost certainly relies on the devious techniques which The Art of Computer Programming presents for making a perfectly deterministic sequence appear to have no order or other recognizable property. If you are running complex programs on your laptop, and they keep creating millions of software objects without clogging up gigabytes of memory, chances are the author of the garbage collector program is using techniques he learned from Knuth, with such delightful names as “the Buddy System”. If your search engine can at the blink of an eye find a needle of useful information in a haystack of tens of billions of Web pages, it’s most likely because they’ve been indexed using finely tuned data structures, such as hash tables, for which Knuth has been the reference for three decades through volume three, Searching and Sorting.

Knuth is famous for his precision and attention to detail, going so far as to offer a financial reward for every error found in his books, although one suspects this doesn’t cost him too much since people are so proud that instead of cashing the check they have it framed for display. The other immediately striking characteristic of Knuth is how profoundly he is driven by esthetics. This applies to performing arts, as anyone who was in the Fraumünster this morning and found out who the organist was can testify, but even more to his scientific work. The very title “the Art of computer programming” betrays this. Algorithms and data structures for Knuth are never dull codes for computers, but objects of intense esthetic pleasure and friendly discussion. This concern with beauty led to a major turn in his career, which delayed the continuation of the book series by many years but resulted in a development that has affected anyone who publishes scientific text. As he received the page proofs of the second edition of one of the volumes in the late seventies he was so repelled by its physical appearance, resulting from newly introduced computer typesetting technology, that he decided to build a revolutionary font design and text processing system, all by himself, from the ground up. This resulted in a number of publications such as a long and fascinating paper in the Bulletin of the American Mathematical Society entitled “The Letter S”, but even more importantly in widely successful and practical software programs which he wrote himself, TeX and Metafont, which have today become standards for scientific publishing. Here too he has shown the way in quality and rigor, being one of the very few people in the world who promise their software to be free of bugs, and backs that promise by giving a small financial reward for any counter-example.

His numerous other contributions are far too diverse to allow even a partial mention here; they have ranged across wide areas of computer science and mathematics.

To tell the truth, we are a little embarrassed that by bringing Professor Knuth here we are delaying by a bit more the long awaited release of volume 4. But we overcome this embarrassment in time to express our pride for having Donald Erwin Knuth at ETH for this anniversary celebration.

VN:F [1.9.10_1130]
Rating: 10.0/10 (9 votes cast)
VN:F [1.9.10_1130]
Rating: +9 (from 9 votes)

Publish no loop without its invariant

 

There may be no more blatant example of  the disconnect between the software engineering community and the practice of programming than the lack of widespread recognition for the fundamental role of loop invariants. 

Let’s recall the basics, as they are taught in the fourth week or so of the ETH introductory programming course [1], from the very moment the course introduces loops. A loop is a mechanism to compute a result by successive approximations. To describe the current approximation, there is a loop invariant. The invariant must be:

  1. Weak enough that we can easily ensure it on a subset, possibly trivial, of our data set. (“Easily” means than this task is substantially easier than the full problem we are trying to solve.)
  2. Versatile enough that if it holds on some subset of the data we can easily (in the same sense) make it hold on a larger subset — even if only slightly larger.
  3. Strong enough that, when it covers the entire data, it yields the result we seek.

As a simple example, assume we seek the maximum of an array a of numbers, indexed from 1. The invariant states that Result is the maximum of the array slice from 1 to i. Indeed:

  1. We can trivially obtain the invariant by setting Result to be a [1]. (It is then the maximum of the slice a [1..1].)
  2. If the invariant holds, we can extend it to a slightly larger slice — larger by just one element — by increasing i by 1 and updating Result to be the greater of the previous Result and the element a [i] (for the new  i).
  3. When the slice covers the entire array — that is, i = n — the invariant tells us that Result is the maximum of the slice a [1..n], giving us the result we seek.

You cannot understand the corresponding program text

    from
        i := 1; Result := a [1]
    until i = n loop
        i := i + 1
        if Result < a [i] then Result := a [i] end
    end

without understanding the loop invariant. That is true even of people who have never heard the term: they will somehow form a mental image of the intermediate situation that justifies the algorithm. With the formal notion, the reasoning becomes precise and checkable. The difference is the same as between a builder who has no notion of theory, and one who has learned the laws of mechanics and construction engineering.

As another example, take Levenshtein distance (also known as edit distance). It is the shortest sequence of operations (insert, delete or replace a character) that will transform a string into another. The algorithm (a form of dynamic programming) fills in a matrix top to bottom and left to right, each entry being one plus the maximum of the three neighboring ones to the top and left, except if the corresponding characters in the strings are the same, in which case it keeps the top-left neighbor’s value. The basic operation in the loop body reads

      if source [i] = target [j] then
           dist [i, j] := dist [i -1, j -1]
      else
           dist [i, j] := min (dist [i, j-1], dist [i-1, j-1], dist [i-1, j]) + 1
      end

You can run this and see it work, filling the array cell after cell, then delivering the result at (dist [M, N] (the bottom-right entry, M and i being the lengths of the source and target strings. Or just watch the animation on page 60 of [2]. It works, but why it works remains a total mystery until someone tells you the invariant:

Every value of dist filled so far is the minimum distance from the initial substrings of the source, containing characters at position 1 to p, to the initial substring of the target, positions 1 to q.

This is the rationale for the above code: we want to compute the next value, at position [i, j]; if the corresponding characters in the source and target are the same, no operation is needed to extend the result we had in the top-left neighbor (position [i-1, j-1]); if not, the best we can do is the minimum we can get by extending the results obtained for our three neighbors: through the insertion of source [i] if the minimum comes from the neighbor to the left, [i-1, j]; through the deletion of target [j] if it comes from the neighbor above; or through a replacement if from the top-left neighbor.

With this explanation, a mysterious, almost hermetic algorithm instantly becomes crystal-clear. 

Yet another example is in-place linked list reversal. The body of the loop is a pointer ballet:

temp := previous
previous
:= next
next
:= next.right
previous.put_right
(temp)

with proper initialization (set next to the value of first and previous to Void) and finalization (set first to the value of previous). This is not the only possible implementation, but all variants of the algorithm use a very similar scheme.

The code looks again pretty abstruse, and hard to get right if you do not remember it exactly. As in the other examples, the only way to understand it is to see the invariant, describing the intermediate assumption after a typical loop iteration. If the original situation was this:

List reversal: initial state

List reversal: initial state

then after a few iterations the algorithm yields this intermediate situation: 

List reversal: intermediate state

List reversal: intermediate state

 The figure illustrates the invariant:

Starting from previous and repeatedly following right links yields the elements of some initial part of the list, but in the reverse of their original order; starting from next and following right links yields the remaining elements, in their original order. 

Then it is clear what the loop body with its pointer ballet is about: it moves by one position to the right the boundary between the two parts, making sure that the invariant holds again in the new state, with one more element in the first (yellow) part and one fewer in the second (pink) part. At the end the second part will be empty and the first part will encompass all elements, so that (after resetting first to the value of previous) we get the desired result.

This example is particularly interesting because list reversal is a standard interview questions for programmers seeking a job; as a result, dozens of  pages around the Web helpfully present algorithms for the benefit of job candidates. I ran a search  on “List reversal algorithm” [3], which yields many such pages. It is astounding to see that from the first fifteen hits or so, which include pages from programming courses at both Stanford and MIT, not a single one mentions invariants, or (even without using the word) gives the above explanation. The situation is all the more bizarre that many of these pages — read them for yourself! — go into intricate details about variants of the pointer manipulations. There are essentially no correctness arguments.

If you go a bit further down the search results, you will find some papers that do reference invariants, but here is the catch: rather than programming or algorithms papers, they are papers about software verification, such as one by Richard Bornat which uses a low-level (C) version of the example to illustrate separation logic [4]. These are good papers but they are completely distinct from those directed at ordinary programmers, who simply wish to learn a basic algorithm, understand it in depth, and remember it on the day of the interview and beyond.

This chasm is wrong. Software verification techniques are not just good for the small phalanx of experts interested in formal proofs. The basic ideas have potential applications to the daily business of programming, as the practice of Eiffel has shown (this is the concept of  “Verification As a Matter Of Course” briefly discussed in an earlier post [5]). Absurdly, the majority of programmers do not know them.

It’s not that they cannot do their job: somehow they eke out good enough results, most of the time. After all, the European cathedrals of the middle ages were built without the benefit of sophisticated mathematical models, and they still stand. But today we would not hire a construction engineer who had not studied the appropriate mathematical techniques. Why should we make things different for software engineering, and deprive practitioners from the benefits of solid, well-accepted theory?  

As a modest first step, there is no excuse, ever, for publishing a loop without the basic evidence of its adequacy: the loop invariant.

References

[1] Bertrand Meyer: Touch of Class: Learning to Program Well, Using Objects and Contracts, Springer, 2009. See course page (English version) here.

[2] Course slides on control structures,  here in PowerPoint (or here in PDF, without the animation); see example starting on page 51, particularly the animation on page 54. More recent version in German here (and in PDF here), animation on page 60.

[3] For balance I ran the search using Qrobe, which combines results from Ask, Bing and Google.

[4] Richard Bornat, Proving Pointer Programs in Hoare Logic, in  MPC ’00, 5th International Conference on Mathematics of Program Construction, 2000, available here.

[5] Bertrand Meyer, Verification as a Matter of Course, a post on this blog.

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

The rise of empirical software engineering (I): the good news

 

RecycledIn the next few days I will post a few comments about a topic of particular relevance to the future of our field: empirical software engineering. I am starting by reposting two entries originally posted in the CACM blog. Here is the first. Let me use this opportunity to mention the LASER summer school [1] on this very topic — it is still possible to register.

Empirical software engineering papers, at places like ICSE (the International Conference on Software Engineering), used to be terrible.

There were exceptions, of course, most famously papers by Basili, Zelkowitz, Rombach, Tichy, Berry, Humphrey, Gilb, Boehm, Lehmann, Belady and a few others, who kept hectoring the community about the need to base our opinions and practices on evidence rather than belief. But outside of these cases the typical ICSE empirical paper — I sat through a number of them — was depressing: we made these measurements in our company, found these results, just believe us. A question here in the back? Can you reproduce our results? Access our code? We’d love you to, but unfortunately we work for a company — the Call for Papers said industry contributions were welcome, didn’t it? — and we can’t give you the details. So sorry. But trust us, we checked our results.

Actually, there was another kind of empirical paper, which did not suffer from such secrecy: the university study. Hi, I am professor Bright, the well-known author of the Bright method of software development. Everyone knows it’s the best, but we wanted to assess it scientifically through a rigorous empirical study. I gave the same programming problem to two groups of third-year undergraduates; one group was told to use the Bright method, the other not. Guess what? The Bright group performed 67.94% better! I see the session chair wanting to move to the next speaker; see the details in the paper.

For years, this was most of what we had: unverifiable industry reports and unconvincing student experiments.

And suddenly the scene has changed. Empirical software engineering studies are in full bloom; the papers are flowing, and many are good!

What triggered this radical change is the availability of open-source repositories. Projects such as Linux, Eclipse, Apache, EiffelStudio and many others have records going back 10, 15, sometimes 20 years. These records contain the true history of the project: commits (into the configuration management system), bug reports, bug fixes, test runs and their results, developers involved, and many more elements of project data. All of a sudden empirical research has what any empirical science needs: a large corpus of objects to analyze.

Open-source projects have given the decisive jolt, but now we can rely on industrial data as well: Microsoft and other companies have started making their own records selectively available to researchers. In the work of authors such as Zeller from Sarrebruck, Gall from Uni. Zurich or Nagappan from Microsoft, systematic statistical techniques yield answers, sometimes surprising, to questions on which we could only speculate. Do novices or experts cause more bugs? Does test coverage correlate with software quality, and if so, positively or negatively? Little by little, we are learning about the true properties of software products and processes, based not on fantasies but on quantitative analysis of meaningful samples.

The trend is unmistakable, and irreversible.

Not all is right yet; in the second installment of this post I will describe some of what still needs to be improved for empirical software engineering to achieve full scientific rigor.

Reference

[1] LASER summer school 2010, at http://se.ethz.ch/laser.

VN:F [1.9.10_1130]
Rating: 4.5/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 4 votes)

Another DOSE of distributed software development

The software world is not flat; it is multipolar. Gone are the days of one-site, one-team developments. The increasingly dominant model today is a distributed team; the place where the job gets done is the place where the appropriate people reside, even if it means that different parts of the job get done in different places.

This new setup, possibly the most important change to have affected the practice of software engineering in this early part of the millennium,  has received little attention in the literature; and even less in teaching techniques. I got interested in the topic several years ago, initially by looking at the phenomenon of outsourcing from a software engineering perspective [1]. At ETH, since 2004, Peter Kolb and I, aided by Martin Nordio and Roman Mitin, have taught a course on the topic [2], initially called “software engineering for outsourcing”. As far as I know it was the first course of its kind anywhere; not the first course about outsourcing, but the first to explore the software engineering implications, rather than business or political issues. We also teach an industry course on the same issues [3], attended since 2005 by several hundred participants, and started, with Mathai Joseph from Tata Consulting Services, the SEAFOOD conference [4], Software Engineering Advances For Outsourced and Offshore Development, whose fourth edition starts tomorrow in Saint Petersburg.

After a few sessions of the ETH course we realized that the most important property of the mode of software development explored in the course is not that it involves outsourcing but that it is distributed. In parallel I became directly involved with highly distributed development in the practice of Eiffel Software’s development. In 2007 we renamed the ETH course “Distributed and Outsourced Software Engineering” (DOSE) to acknowledge the broadened scope. The topic is still new; each year we learn a little more about what to teach and how to teach it.

The 2007 session saw another important addition. We felt it was no longer sufficient to talk about distributed development, but that students should practice it. Collaboration between groups in Zurich and other groups in Zurich was not good enough. So we contacted colleagues around the world interested in similar issues, and received an enthusiastic response. The DOSE project is itself distributed: teams from students in different universities collaborate in a single development. Typically, we have two or three geographically distributed locations in each project group. The participating universities have been Politecnico di Milano (where our colleagues Carlo Ghezzi and Elisabetta di Nitto have played a major role in the current version of the project), University of Nijny-Novgorod in Russia, University of Debrecen in Hungary, Hanoi University of Technology in Vietnam, Odessa National Polytechnic in the Ukraine and (across town for us) University of Zurich. For the first time in 2010 a university from the Western hemisphere will join: University of Rio Cuarto in Argentina.

We have extensively studied how the projects actually fare (see publications [4-8]). For students, the job is hard. Often, after a couple of weeks, many want to give up: they have trouble reaching their partner teams, understanding their accents on Skype calls, agreeing on modes of collaboration, finalizing APIs, devising a proper test plan. Yet they hang on and, in most cases, succeed. At the end of the course they tell us how much they have learned about software engineering. For example I know few better way of teaching the importance of carefully documented program interfaces — including contracts — than to ask the students to integrate their modules with code from another team halfway around the globe. This is exactly what happens in industrial software development, when you can no longer rely on informal contacts at the coffee machine or in the parking lot to smooth out misunderstandings: software engineering principles and techniques come in full swing. With DOSE, students learn and practice these fundamental techniques in the controlled environment of a university project.

An example project topic, used last year, was based on an idea by Martin Nordio. He pointed out that in most countries there are some card games played in that country only. The project was to program such a game, where the team in charge of the game logic (what would be the “business model” in an industrial project) had to explain enough of their country’s game, and abstractly enough, to enable the other team to produce the user interface, based on a common game engine started by Martin. It was tough, but some of the results were spectacular, and these are students who will not need more preaching on the importance of specifications.

We are currently preparing the next session of DOSE, in collaboration with our partner universities. The more the merrier: we’d love to have other universities participate, including from the US. Adding extra spice to the project, the topic will be chosen among those from the ICSE SCORE competition [9], so that winning students have the opportunity to attend ICSE in Hawaii. If you are teaching a suitable course, or can organize a student group that will fit, please read the project description [10] and contact me or one of the other organizers listed on the page. There is a DOSE of madness in the idea, but no one, teacher or student,  ever leaves the course bored.

References

[1] Bertrand Meyer: Offshore Development: The Unspoken Revolution in Software Engineering, in Computer (IEEE), January 2006, pages 124, 122-123. Available here.

[2] ETH course page: see here for last year’s session (description of Fall 2010 session will be added soon).

[3] Industry course page: see here for latest (June 2010( session (description of November 2010 session will be added soon).

[4] SEAFOOD 2010 home page.

[5] Bertrand Meyer and Marco Piccioni: The Allure and Risks of a Deployable Software Engineering Project: Experiences with Both Local and Distributed Development, in Proceedings of IEEE Conference on Software Engineering & Training (CSEE&T), Charleston (South Carolina), 14-17 April 2008, ed. H. Saiedian, pages 3-16. Preprint version  available online.

[6] Bertrand Meyer:  Design and Code Reviews in the Age of the Internet, in Communications of the ACM, vol. 51, no. 9, September 2008, pages 66-71. (Original version in Proceedings of SEAFOOD 2008 (Software Engineering Advances For Offshore and Outsourced Development,  Lecture Notes in Business Information Processing 16, Springer Verlag, 2009.) Available online.

[7] Martin Nordio, Roman Mitin, Bertrand Meyer, Carlo Ghezzi, Elisabetta Di Nitto and Giordano Tamburelli: The Role of Contracts in Distributed Development, in Proceedings of SEAFOOD 2009 (Software Engineering Advances For Offshore and Outsourced Development), Zurich, June-July 2009, Lecture Notes in Business Information Processing 35, Springer Verlag, 2009. Available online.

[8] Martin Nordio, Roman Mitin and Bertrand Meyer: Advanced Hands-on Training for Distributed and Outsourced Software Engineering, in ICSE 2010: Proceedings of 32th International Conference on Software Engineering, Cape Town, May 2010, IEEE Computer Society Press, 2010. Available online.

[9] ICSE SCORE 2011 competition home page.

[10] DOSE project course page.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

From programming to software engineering: ICSE keynote slides available

In response to many requests, I have made available [1] the slides of my education keynote at ICSE earlier this month. The theme was “From programming to software engineering: notes of an accidental teacher”. Some of the material has been presented before, notably at the Informatics Education Europe conference in Venice in 2009. (In research you can give a new talk every month, but in education things move at a more senatorial pace.) Still, part of the content is new. The talk is a summary of my experience teaching programming and software engineering at ETH.

The usual caveats apply: these are only slides (I did not write a paper), and not all may be understandable independently of the actual talk.

Reference

[1] From programming to software engineering: notes of an accidental teacher, slides from a keynote talk at ICSE 2010.

VN:F [1.9.10_1130]
Rating: 7.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

The other impediment to software engineering research

In the decades since structured programming, many of the advances in software engineering have come out of non-university sources, mostly of four kinds:

  • Start-up technology companies  (who played a large role, for example, in the development of object technology).
  • Industrial research labs, starting with Xerox PARC and Bell Labs.
  • Independent (non-university-based) author-consultants. 
  • Independent programmer-innovators, who start open-source communities (and often start their own businesses after a while, joining the first category).

 Academic research has had its part, honorable but limited.

Why? In earlier posts [1] [2] I analyzed one major obstacle to software engineering research: the absence of any obligation of review after major software disasters. I will come back to that theme, because the irresponsible attitude of politicial authorities hinders progress by depriving researchers of some of their most important potential working examples. But for university researchers there is another impediment: the near-impossibility of developing serious software.

If you work in theory-oriented parts of computer science, the problem is less significant: as part of a PhD thesis or in preparation of a paper you can develop a software prototype that will support your research all the way to the defense or the publication, and can be left to wither gracefully afterwards. But software engineering studies issues that arise for large systems, where  “large” encompasses not only physical size but also project duration, number of users, number of changes. A software engineering researcher who only ever works on prototypes will be denied the opportunity to study the most significant and challenging problems of the field. The occasional consulting job is not a substitute for this hands-on experience of building and maintaining large software, which is, or should be, at the core of research in our field.

The bodies that fund research in other sciences understood this long ago for physics and chemistry with their huge labs, for mechanical engineering, for electrical engineering. But in computer science or any part of it (and software engineering is generally viewed as a subset of computer science) the idea that we would actually do something , rather than talk about someone else’s artifacts, is alien to the funding process.

The result is an absurd situation that blocks progress. Researchers in experimental physics or mechanical engineering employ technicians: often highly qualified personnel who help researchers set up experiments and process results. In software engineering the equivalent would be programmers, software engineers, testers, technical writers; in the environments that I have seen, getting financing for such positions from a research agency is impossible. If you have requested a programmer position as part of a successful grant request, you can be sure that this item will be the first to go. Researchers quickly understand the situation and learn not even to bother including such requests. (I have personally never seen a counter-example. If you have a different experience, I will be interested to learn who the enlightened agency is. )

The result of this attitude of funding bodies is a catastrophe for software engineering research: the only software we can produce, if we limit ourselves to official guidelines, is demo software. The meaningful products of software engineering (large, significant, usable and useful open-source software systems) are theoretically beyond our reach. Of course many of us work around the restrictions and do manage to produce working software, but only by spending considerable time away from research on programming and maintenance tasks that would be far more efficiently handled by specialized personnel.

The question indeed is efficiency. Software engineering researchers should program as part of their normal work:  only by writing programs and confronting the reality of software development can we hope to make relevant contributions. But in the same way that an experimental physicist is helped by professionals for the parts of experimental work that do not carry a research value, a software engineering researcher should not have to spend time on porting the software to other architectures, performing configuration management, upgrading to new releases of the operating system, adapting to new versions of the libraries, building standard user interfaces, and all the other tasks, largely devoid of research potential, that software-based innovation requires.

Until  research funding mechanisms integrate the practical needs of software engineering research, we will continue to be stymied in our efforts to produce a substantial effect on the quality of the world’s software.

References

[1] The one sure way to advance software engineering: this blog, see here.
[2] Dwelling on the point: this blog, see here.

VN:F [1.9.10_1130]
Rating: 8.3/10 (18 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 8 votes)

Verification As a Matter Of Course

At the ACM Symposium on Applied Computing (SAC) in Sierre last week, I gave a talk entitled “How you will be programming in 10 years”, describing a number of efforts by various people, with a special emphasis on our work at both ETH and Eiffel Software, which I think point to the future of software development. Several people have asked me for the slides, so I am making them available [1].

It occurred to me after the talk that the slogan “Verification As a Matter Of Course” (VAMOC) characterizes the general idea well. The world needs verified software, but the software development community is reluctant  to use traditional heavy-duty verification techniques. While some of the excuses are unacceptable, others sources of resistance are justified and it is our job to make verification part of the very fabric of everyday software development.

My bet, and the basis of large part of both Eiffel and the ETH verification work, is that it is possible to bring verification to practicing developers as a natural, unobtrusive component of the software development process, through the tools they use.

The talk also broaches on concurrency, where many of the same ideas apply; CAMOC is the obvious next slogan.

Reference

[1] Slides of “How you will be programming in 10 years” talk (PDF).

VN:F [1.9.10_1130]
Rating: 8.8/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

The theory and calculus of aliasing

In a previous post I briefly mentioned some work that I am doing on aliasing. There is a draft paper [1], describing the theory, calculus, graphical notation (alias diagrams) and implementation. Here I will try to give an idea of what it’s about, with the hope that you will be intrigued enough to read the article. Even before you read it you might try out the implementation[2], a simple interactive interface with all the examples of the article .

What the article does not describe in detail — that will be for a companion paper— is how the calculus will be used as part of a general framework for developing object-oriented software proved correct from the start, the focus of our overall  “Trusted Components” project’ [3]. Let me simply state that the computation of aliases is the key missing step in the effort to make correctness proofs a standard part of software development. This is a strong claim which requires some elaboration, but not here.

The alias calculus asks a simple question: given two expressions representing references (pointers),  can their values, at a given point in the program, ever reference the same object during execution?

As an example  application, consider two linked lists x and y, which can be manipulated with operations such as extend, which creates a new cell and adds it at the end of the list:

lists

 The calculus makes it possible to prove that if  x and y are not aliased to each other, then none of the pointers in any of the cells in either of the lists can point to  (be aliased to) any cell of  the other. If  x is initially aliased to y, the property no longer holds. You can run the proof (examples 18 and 19) in the downloadable implementation.

The calculus gives a set of rules, each applying to a particular construct of the language, and listed below.

The rule for a construct p is of the form

          a |= p      =   a’
 

where a and a’ are alias relations; this states that executing p  in a state where the alias relation is a will yield the alias relation a’ in the resulting state. An alias relation is a symmetric, irreflexive relation; it indicates which expressions and variables can be aliased to each other in a given state.

The constructs p considered in the discussion are those of a simplified programming language; a modern object-oriented language such as Eiffel can easily be translated into that language. Some precision will be lost in the process, meaning that the alias calculus (itself precise) can find aliases that would not exist in the original program; if this prevents proofs of desired properties, the cut instruction discussed below serves to correct the problem.

The first rule is for the forget instruction (Eiffel: x := Void):

          a |= forget x       =   a \- {x}

where the \- operator removes from a relation all the elements belonging to a given set A. In the case of object-oriented programming, with multidot expressions x.y.z, the application of this rule must remove all elements whose first component, here x, belongs to A.

The rule for creation is the same as for forget:

         a |= create x          =   a \- {x}

The two instructions have different semantics, but the same effect on aliasing.

The calculus has a rule for the cut instruction, which removes the connection between two expressions:

        a |= cut x, y       =   a — <x, y>

where is set difference and <x, y> includes the pairs [x, y] and [y, x] (this is a special case of a general notation defined in the article, using the overline symbol). The cut   instruction corresponds, in Eiffel, to cut   x /=end:  a hint given to the alias calculus (and proved through some other means, such as standard axiomatic semantics) that some references will not be aliased.

The rule for assignment is

      a |= (x := y)      =   given  b = a \- {x}   then   <b È {x} x (b / y)}> end

where b /y (“quotient”), similar to an equivalence class in an equivalence relation, is the set of elements aliased to y in b, plus y itself (remember that the relation is irreflexive). This rule works well for object-oriented programming thanks to the definition of the \- operator: in x := x.y, we must not alias x to x.y, although we must alias it to any z that was aliased to x.y.

The paper introduces a graphical notation, alias diagrams, which makes it possible to reason effectively about such situations. Here for example is a diagram illustrating the last comment:

Alias diagram for a multidot assignment

Alias diagram for a multidot assignment

(The grayed elements are for explanation and not part of the final alias relation.)

For the compound instruction, the rule is:

           a |= (p ;  q)      =   (a |= p) |= q)

For the conditional instruction, we get:

           a |= (then p else  q end)      =   (a |= p) È  (a |= q)

Note the form of the instruction: the alias calculus ignores information from the then clause present in the source language. The union operator is the reason why  alias relations,  irreflexive and symmetric, are not  necessarily transitive.

The loop instruction, which also ignores the test (exit or continuation condition), is governed by the following rule:

           a |= (loop p end)       =   tN

where span style=”color: #0000ff;”>N is the first value such that tN = tN+1 (in other words, tN is the fixpoint) in the following sequence:

            t0          =    a
           tn+1       =   (tn È (tn |= p))     

The existence of a fixpoint and the correctness of this formula to compute it are the result of a theorem in the paper, the “loop aliasing theorem”; the proof is surprisingly elaborate (maybe I missed a simpler one).

For procedures, the rule is

         a |= call p        =   a |= p.body

where p.body is the body of the procedure. In the presence of recursion, this gives rise to a set of equations, whose solution is the fixpoint; again a theorem is needed to demonstrate that the fixpoint exists. The implementation directly applies fixpoint computation (see examples 11 to 13 in the paper and implementation).

The calculus does not directly consider routine arguments but treats them as attributes of the corresponding class; so a call is considered to start with assignments of the form f : = a for every pair of formal and actual arguments f and a. Like the omission of conditions in loops and conditionals, this is a source of possible imprecision in translating from an actual programming language into the calculus, since values passed to recursive activations of the same routine will be conflated.

Particularly interesting is the last rule, which generalizes the previous one to qualified calls of the form x. f (…)  as they exist in object-oriented programming. The rule involves the new notion of inverse variable, written x’ where x is a variable. Laws of the calculus (with Current denoting the current object, one of the fundamental notions of object-oriented programming) are

        Current.x            = x   
        x.Current            = x
        x.x’                      = Current
        x’.x                      = Current

In other words, Current plays the role of zero element for the dot operator, and variable inversion is the inverse operation. In a call x.f, where x denotes the supplier object (the target of the call), the inverse variable provides a back reference to the client object (the caller), indispensable to interpret references in the original context. This property is reflected by the qualified client rule, which uses  the auxiliary operator n (where x n a, for a relation a and a variable x, is the set of pairs [x.u, y.v] such that the pair [u, v] is in a). The rule is:

         a |= call x.r       =   x n ((x’ n a ) |= call r)

You need to read the article for the full explanation, but let me offer the following quote from the corresponding section (maybe you will note a familiar turn of phrase):

Thus we are permitted to prove that the unqualified call creates certain aliasings, on the assumption that it starts in its own alias environment but has access to the caller’s environment through the inverted variable, and then to assert categorically that the qualified call has the same aliasings transposed back to the original environment. This change of environment to prove the unqualified property, followed by a change back to the original environment to prove the qualified property, explains well the aura of magic which attends a programmer’s first introduction to object-oriented programming.

I hope you will enjoy the calculus and try the examples in the implementation. It is fun to apply, and there seems to be no end to the potential applications.

Reference

[1] Bertrand Meyer: The Theory and Calculus of Aliasing, draft paper, first published 12 January 2009 (revised 21 January 2010), available here and also at arxiv.org.
[2] Implementation (interactive version to try all the examples of the paper): downloadable Windows executable here.
[3] Bertrand Meyer: The Grand Challenge of Trusted Components, in 2003 International Conference on Software Engineering, available here.

VN:F [1.9.10_1130]
Rating: 7.9/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Just another day at the office

In the past few weeks I wrote a program to compute the aliases of variables and expressions in an object-oriented program (based on a new theory [1]).

For one of the data structures, I needed a specific notion of equality, so I did the standard thing in Eiffel: redefine the is_equal function inherited from the top class ANY, to implement the desired variant.

The first time I ran the result, I got a postcondition violation. The violated postcondition clauses was not even any that I wrote: it was an original postcondition of is_equal (other: like Current)  in ANY, which my redefinition inherited as per the rules of Design by Contract; it reads

symmetric: Result implies other ~ Current

meaning: equality is symmetric, so if Result is true, i.e. the Current object is equal to other, then other must also be equal to Current. (~ is object equality, which applies the local version is is_equal).  What was I doing wrong? The structure is a list, so the code iterates on both the current list and the other list:

from
    start ; other.start ; Result := True
until (not Result) or after loop
        if other.after then Result := False else
              Result := (item ~ other.item)
              forth ; other.forth
        end
end

Simple enough: at each position check whether the item in the current list is equal to the item in the other list, and if so move forth in both the current list and the other one; stop whenever we find two unequal elements, or we exhaust either list as told by after list. (Since is_equal is a function and not produce any side effect, the actual code saves the cursors before the iteration and restores them afterwards. Thanks to Ian Warrington for asking about this point in a comment to this post. The new across loop variant described in  two later postings uses external cursors and manages them automatically, so this business of maintaining the cursor manually goes away.)

The problem is that with this algorithm it is possible to return True if the first list was exhausted but not the second, so that the first list is a subset of the other rather than identical. The correction is immediate: add

Result and other_list.after

after the loop; alternatively, enclose the loop in a conditional so that it is only executed if count = other.count (this solution is  better since it saves much computation in cases of lists of different sizes, which cannot be equal).

The lesson (other than that I need to be more careful) is that the error was caught immediately, thanks to a postcondition violation — and one that I did not even have to write. Just another day at the office; and let us shed a tear for the poor folks who still program without this kind of capability in their language and development environment.

Reference

[1] Bertrand Meyer: The Theory and Calculus of Aliasing, draft paper, available here.

VN:F [1.9.10_1130]
Rating: 9.6/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Touch of Class book page available

The book page for Touch of Class (my introductory programming textbook), announced in the book, is finally available, courtesy Vladimir Tochilin:

touch.ethz.ch

It includes some book extracts (prefaces, table of contents, an entire sample chapter, for which I chose the Recursion chapter), a list of known errata and a wiki page to report new errata, a discussion forum, links to the full set of slides (PowerPoint, PDF) for the associated course, video recordings of that course at ETH, and a special “instructor’s corner” for those having adopted the textbook for their courses.

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Dwelling on the point

Once again, and we are not learning!

La Repubblica of last Thursday [1] and other Italian newspapers have reported on a “computer” error that temporarily brought thousands of accounts at the national postal service bank into the red. It is a software error, due to a misplacement of the decimal points in some transactions.

As usual the technical details are hazy; La Repubblica writes that:

Because of a software change that did not succeed, the computer system did not always read the decimal point during transactions”.

As a result, it could for example happen that a 15.00-euro withdrawal was understood as 1500 euros.
I have no idea what “reading the decimal point ” means. (There is no mention of OCR, and the affected transactions seem purely electronic.) Only some of the 12 million checking or “Postamat” accounts were affected; the article cites a number of customers who could not withdraw money from ATMs because the system wrongly treated their accounts as over-drawn. It says that this was the only damage and that the postal service will send a letter of apology. The account leaves many questions unanswered, for example whether the error could actually have favored some customers, by allowing them to withdraw money they did not have, and if so what will happen.

The most important unanswered question is the usual one: what was the software error? As usual, we will probably never know. The news items will soon be forgotten, the postal service will somehow fix its code, life will go on. Nothing will be learned; the next time around similar causes will produce similar effects.

I criticized this lackadaisical attitude in an earlier column [2] and have to hammer its conclusion again: any organization using public money should be required, when it encounters a significant software malfunction, to let experts investigate the incident in depth and report the results publicly. As long as we keep forgetting our errors we will keep repeating them. Where would airline safety be in the absence of thorough post-accident reports? That a software error did not kill anyone is not a reason to ignore it. Whether it is the Italian post messing up, a US agency’s space vehicle crashing on the moon or any other software fault causing systems to fail, it is not enough to fix the symptoms: we must have a professional report and draw the lessons for the future.

Reference

[1] Luisa Grion: Poste in tilt per una virgola — conti gonfiati, stop ai prelievi. In La Repubblica, 26 November 2009, page 18 of the print version. (At the time of writing it does not appear at repubblica.it,  but see  the TV segment also titled “Poste in tilt per una virgola” on Primocanale Web TV here, and other press articles e.g. in Il Tempo here.)

[2] On this blog: The one sure way to advance software engineering (post of 21 August 2009).

VN:F [1.9.10_1130]
Rating: 9.5/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Knuth & company

Remember Stacey’s in Palo Alto and San Francisco? Only a decade ago these and other technical bookstores were the mecca of the tech industry, where developers would swarm at any time of day to catch up on the latest releases. One well-known Valley entrepreneur even told me she did her hiring there, spotting customers who looked like developers and picked the right books.

Times have changed. After all the other branches, Stacey’s venerable San Francisco’s store finally closed earlier this year [1], the victim of the bubble’s burst, of Amazon, and of the crisis. Many others in the US and elsewhere met the same fate.

But wait… In Gaul, a little village is resisting the onslaught. A couple of streets away from the Shakespeare and company bookstore immortalized by Hemingway, Scott Fitzgerald and so many others, the Paris technical bookstore Le Monde en Tique [3] is a true legend of its own. Le Monde en Tique has, for the past twenty years, carried the best selection of computer and other technology books within a good thousand kilometers. I was there on Oct. 10, after the European Computer Science Summit (more on ECSS soon) for a book signing of Touch of Class:

Holding books

The store’s name, “the World in Tics”, is a wink to the many “tics” served by its books, from informatics (computer science and information technology) to “telematics” (computer  communication). Housed in a beautiful stone edifice in the heart of medieval Paris, on a little winding street close to the river, Le Monde en Tique is more than a bookstore: it is a haven for the local developer community, a place to stop by on a Saturday afternoon for a passionate discussion on agility, Ajax or Agitar. There is even a small garden:

The garden

There is a secret to Le Monde en Tique’s continued success: the quality of its offerings. The unique skill of  Jean Demétreau and his team is their availability to locate hot new books, including those from the US and elsewhere, before anyone else does; the large Paris bookstores like FNAC, and the Web sites such as fnac.fr and amazon.fr are often several weeks behind. Not just the French sites, as a matter of fact: in the case of Touch of Class, Le Monde en Tique had the book about a month ahead of amazon.com USA. This is why people come from very far away to get both the latest offerings and the classics.

So here’s my plug: whether you have just heard about a great new book, or haven’t heard anything and want to find out what is the latest great new book, this is the place to go. It might already be late when you finally take yourself away from browsing the shelves; but as you go out of the store and walk a few steps, the sight will not be too bad:

Notre Dame on an October evening

References

[1] Matthai Kuruvila, Stacey’s Bookstore closing down in S.F., in SFGate (San Francisco Chronicle), 5 January 2009, available here.

[2] Shakespeare and company bookstore in Paris: see here.

[3] Le Monde en Tique bookstore, 6 rue Maître Albert, Paris: see www.lmet.fr.

VN:F [1.9.10_1130]
Rating: 9.5/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

The great programming haiku competition

In a few weeks I will be teaching again my Introductory Programming course at ETH, based for the first time on the published “Touch of Class” textbook [1]. For fun (mine if no one else’s) every lecture will conclude with a haiku summarizing the topic.

I made up a few, given below, and am opening a competition for more. Every proposal should be submitted in the form of a comment to this post. Every winner’s haiku and name will appear in the course slides, and in the special Programming Haiku page which will be added to the book’s site. There are four rules:

  • The contribution has to be a proper haiku: “three unrhymed lines of five, seven, and five syllables”.
  • It must summarize the principal concept of a chapter or main section of the textbook or, better yet, of one of the course’s lectures; see [2] for the lecture plan.
  • It must give the book reference (chapter or section) or lecture number or both.
  • The prize committee’s members are secret and its judgments final.

Here, for a start, are my own examples.

Proof of the undecidability of the halting problem

Section 7.5 of the book; lecture 5.2.

If it stops, it loops,
Yet if it looped, it would stop.
Sad contradiction.

Recursion

Chapter 14, especially section 14.3; lecture 9.1.


Often, I call you.
But when the going gets tough,
I will call myself.

Topological sort

Chapter 15; lectures 11.1 and 11.2.


Partial to total?
With the right data structures,
O of m plus n.

Dynamic binding

Section 16.3; lecture 8.1.


O-O programmers:
How many to screw a bulb?
None whatsoever.

Deferred classes

Section 16.5; lecture 8.1.


Do not implement!
Though for a truly Zen spec
You need a contract.

References

[1] Touch of Class: An Introduction to Programming Well Using Objects and Contracts, Springer Verlag, 2009. See Amazon page (still wrongly says the book is not yet published).

[2] “Introduction to Programming” course at ETH Zurich, Fall 2009: course page. This does not have the slides yet, but you can see last year’s slides in last year’s page.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Rejection letter classic

Part of the experience of being a scientist, in the industrial age of publication, is the rejection letter; especially the damning review whose author, anonymous of course, does not appear particularly competent. I have my own treasured collection, which I will publish one day. For a fiction so artfully designed as to be almost as good as the real thing, you can check  Simone Santini’s hilarious parody [1], a true classic.

Although there are a few references to it around the Web, I do not think it is as well known as it deserves to be. What Santini did was to imagine rejection letters for famous papers. He stated [2] that:

The reviews are a collage of reviews that I have seen of some papers (mine and of other people) that have been rejected because, I thought, the reviewer had completely misunderstood the paper. After a rejection at a database conference for what I thought were completely preposterous reasons, I had the idle thought that today even Codd’s paper on relational data bases (the foundation of the whole field) would never make it into a major data base conference…Many of the sentences that I use in the article are from actual reviews.

A sample from the imaginary Codd rejection letter:

E.F. CODD “A Relational Model of Data for Large Shared Data Banks.”  … The formalism is needlessly complex and mathematical, using concepts and notation with which the average data bank practitioner is unfamiliar. The paper doesn’t tell us how to translate its arcane operations into executable block access.

Adding together the lack of any real-world example, performance experiment, and implementation indication or detail, we are left with an obscure exercise using unfamiliar mathematics and of little or no practical consequence. It can be safely rejected.

All the others are gems too: Turing’s Entscheidungsproblem paper (“If the article is accepted, Turing should remember that the language of this journal is English and change the title accordingly”); Dijstra’s Goto considered harmful; Hoare’s 1969 axiomatic semantics paper (the author “should also extend the method to be applicable to a standard programming language such as COBOL or PL/I and provide the details of his implementation, possibly with a few graphics to show how the system works in practice”) etc.

To avoid a spoiler I will  cite no more;  you should read the paper if you do not know it yet. It rings so true.

References

[1] Simone Santini: We Are Sorry to Inform You …, in IEEE Computer, vol. 38, no. 12, pp. 128, 126-127, December 2005,  online on the IEEE site. There is also a copy here.

 [2] http://www.omlettesoft.com/newjournal.php3?who=Lord+Omlette&id=1134629858.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

“Touch of Class” published

My textbook Touch of Class: An Introduction to Programming Well Using Objects and Contracts [1] is now available from Springer Verlag [2]. I have been told of many bookstores in Europe that have it by now; for example Amazon Germany [3] offers immediate delivery. Amazon US still lists the book as not yet published [4], but I think this will be corrected very soon.

touch_of_class

The book results from six years of teaching introductory programming at ETH Zurich. It is richly illustrated in full color (not only with technical illustrations but with numerous photographs of people and artefacts). It is pretty big, but designed so that a typical one-semester introductory course can cover most of the material.

Many topics are addressed (see table of contents below), including quite a few that are seldom seen at the introductory level. Some examples, listed here in random order: a fairly extensive introduction to software engineering including things like requirements engineering (not usually mentioned in programming courses, with results for everyone to see!) and CMMI, a detailed discussion of how to implement recursion, polymorphism and dynamic binding and their role for software architecture, multiple inheritance, lambda calculus (at an introductory level of course), a detailed analysis of the Observer and Visitor patterns, event-driven programming, the lure and dangers of references and aliasing, topological sort as an example of both algorithm and API design, high-level function closures, software tools, properties of computer hardware relevant for programmers, undecidability etc.

The progression uses an object-oriented approach throughout; the examples are in Eiffel, and four appendices present the details of Java, C#, C++ and C. Concepts of Design by Contract and rigorous development are central to the approach; for example, loops are presented as a technique for computing a result by successive approximation, with a central role for the concept of loop invariant. This is not a “formal methods” book in the sense of inflicting on the students a heavy mathematical apparatus, but it uses preconditions, postconditions and invariants throughout to alert them to the importance of reasoning rigorously about programs. The discussion introduces many principles of sound design, in line with the book’s subtitle, “Learning to Program Well”.

The general approach is “Outside-In” (also known as “Inverted Curriculum” and described at some length in some of my articles, see e.g. [5]): students have, right from the start, the possibility of working with real software, a large (150,000-line) library that has been designed specifically for that purpose. Called Traffic, this library simulates traffic in a city; it is graphical and of good enough visual quality to be attractive to today’s “Wii generation” students, something that traditional beginners’ exercises, like computing the 7-th Fibonacci number, cannot do (although we have these too as well). Using the Traffic software through its API, students can right from the first couple of weeks produce powerful applications, without understanding the internals of the library. But they do not stop there: since the whole thing is available in open source, students learn little by little how the software is made internally. Hence the name “Outside-In”: understand the interface first, then dig into the internals. Two advantages of the approach are particularly worth noting:

  • It emphasizes the value of abstraction, and particular contracts, not by preaching but by showing to students that abstraction helps them master a large body of professional-level software, doing things that would otherwise be unthinkable at an introductory level.
  • It addresses what is probably today the biggest obstacle to teaching introductory programming: the wide diversity of initial student backgrounds. The risk with traditional approaches is either to fly too high and lose the novices, or stay too low and bore those who already have programming experience. With the Outside-In method the novices can follow the exact path charted from them, from external API to internal implementation; those who already know something about programming can move ahead of the lectures and start digging into the code by themselves for information and inspiration.

(We have pretty amazing data on students’ prior programming knowledge, as  we have been surveying students for the past six years, initially at ETH and more recently at the University of York in England thanks to our colleague Manuel Oriol; some day I will post a blog entry about this specific topic.)

The book has been field-tested in its successive drafts since 2003 at ETH, for the Introduction to Programming course (which starts again in a few weeks, for the first time with the benefit of the full text in printed form). Our material, such as a full set of slides, plus exercises, video recordings of the lectures etc. is available to any instructor selecting the text. I must say that Springer did an outstanding job with the quality of the printing and I hope that instructors, students, and even some practitioners already in industry will like both form and content.

Table of contents

Front matter: Community resource, Dedication (to Tony Hoare and Niklaus Wirth), Prefaces, Student_preface, Instructor_preface, Note to instructors: what to cover?, Contents

PART I: Basics
1 The industry of pure ideas
2 Dealing with objects
3 Program structure basics
4 The interface of a class
5 Just Enough Logic
6 Creating objects and executing systems
7 Control structures
8 Routines, functional abstraction and information hiding
9 Variables, assignment and references
PART II: How things work
10 Just enough hardware
11 Describing syntax
12 Programming languages and tools
PART III: Algorithms and data structures
13 Fundamental data structures, genericity, and algorithm complexity
14 Recursion and trees
15 Devising and engineering an algorithm: Topological Sort
PART IV: Object-Oriented Techniques
16 Inheritance
17 Operations as objects: agents and lambda calculus
18 Event-driven design
PART V: Towards software engineering
19 Introduction to software engineering
PART VI: Appendices
A An introduction to Java (from material by Marco Piccioni)
B An introduction to C# (from material by Benjamin Morandi)
C An introduction to C++ (from material by Nadia Polikarpova)
D From C++ to C
E Using the EiffelStudio environment
Picture credits
Index

References

[1] Bertrand Meyer, Touch of Class: An Introduction to Programming Well Using Objects and Contracts, Springer Verlag, 2009, 876+lxiv pages, Hardcover, ISBN: 978-3-540-92144-8.

[2] Publisher page for [1]: see  here. List price: $79.95. (The page says “Ships in 3 to 4 weeks” but I think this is incorrect as the book is available; I’ll try to get the mention corrected.)

[3] Amazon.de page: see here. List price: EUR 53.45 (with offers starting at EUR 41.67).

[4] Amazon.com page: see here. List price: $63.96.

[5] Michela Pedroni and Bertrand Meyer: The Inverted Curriculum in Practice, in Proceedings of SIGCSE 2006, ACM, Houston (Texas), 1-5 March 2006, pages 481-485; available online.

VN:F [1.9.10_1130]
Rating: 7.4/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

One cheer for incremental research

[Note: an updated version of this article (June 2011) appears in the Communications of the ACM blog.]

The world of research funding, always a little strange, has of late been prey to a new craze: paradigm-shift mania. We will only fund twenty curly-haired cranky-sounding visionaries in the hope that one of them will invent relativity. The rest of you — bit-players! Petty functionaries! Slaves toiling at incremental research!  — should be ashamed of even asking.

Take this from the US National Science Foundation’s current description of funding for Computer Systems Research [1]:

CSR-funded projects will enable significant progress on challenging high-impact problems, as opposed to incremental progress on familiar problems.

 The European Research Council is not to be left behind [2]:

Projects being highly ambitious, pioneering and unconventional

Research proposed for funding to the ERC should aim high, both with regard to the ambition of the envisaged scientific achievements as well as to the creativity and originality of proposed approaches, including unconventional methodologies and investigations at the interface between established disciplines. Proposals should rise to pioneering and far-reaching challenges at the frontiers of the field(s) addressed, and involve new, ground-breaking or unconventional methodologies, whose risky outlook is justified by the possibility of a major breakthrough with an impact beyond a specific research domain/discipline.

Frontiers! Breakthrough! Rise! Aim high! Creativity! Risk! Impact! Pass me the adjective bottle. Ground-breaking! Unconventional! Highly ambitious! Major! Far-reaching! Pioneering! Galileo and Pasteur only please — others need not apply.

As everyone knows including the people who write such calls, this is balderdash. First, 99.97% of all research (precise statistic derived from my own ground-breaking research, further funding welcome) is incremental. Second, when a “breakthrough” does happen — the remaining 0.03%  — it was often not planned as a breakthrough.

Incremental research is a most glorious (I have my own supply of adjectives) mode of doing science. Beginning PhD students can be forgiven for believing the myth of the lone genius who penetrates the secrets of time and space by thinking aloud during long walks with his best friend [3]; we all, at some stage, shared that delightful delusion. But every researcher, presumably including those who go on to lead research agencies,  quickly grows up and learns that it is not how things happen. You read someone else’s solution to a problem, and you improve on it. Any history of science will tell you that for every teenager who from getting hit by a falling apple intuits the structure of the universe there are hundreds of great researchers who look at the state of the art and decide they can do a trifle better.

Here is a still recent example, particularly telling because we have the account from the scientist himself. It would not be much of an exaggeration to characterize the entire field of program proving over the past four decades as a long series of variations on Tony Hoare’s 1969 Axiomatic Semantics paper [4]. Here Hoare’s recollection, from his Turing Award lecture [5]:

In October 1968, as I unpacked my papers in my new home in Belfast, I came across an obscure preprint of an article by Bob Floyd entitled “Assigning Meanings to Programs.” What a stroke of luck! At last I could see a way to achieve my hopes for my research. Thus I wrote my first paper on the axiomatic approach to computer programming, published in the Communications of the ACM in October 1969.

(See also note [6].) Had the research been submitted for funding, we can imagine the reaction: “Dear Sir, as you yourself admit, Floyd has had the basic idea [7] and you are just trying to present the result better. This is incremental research; we are in the paradigm-shift business.” And yet if Floyd had the core concepts right it is Hoare’s paper that reworked and extended them into a form that makes practical semantic specifications and proofs possible. Incremental research at its best.

The people in charge of research programs at the NSF and ERC are themselves scientists and know all this. How come they publish such absurd pronouncements? There are two reasons. One is the typical academic’s fascination with industry and its models. Having heard that venture capitalists routinely fund ten projects and expect one to succeed, they want to transpose that model to science funding; hence the emphasis on “risk”. But the transposition is doubtful because venture capitalists assess their wards all the time and, as soon as they decide a venture is not going to break out, they cut the funding overnight, often causing the company to go under. This does not happen in the world of science: most projects, and certainly any project that is supposed to break new ground, gets funded for a minimum of three to five years. If the project peters out, the purse-holder will only realize it after spending all the money.

The second reason is a sincere desire to avoid mediocrity. Here we can sympathize with the funding executives: they have seen too many “here is my epsilon addition to the latest buzzword” proposals. The last time I was at ECOOP, in 2005, it seemed every paper was about bringing some little twist to aspect-oriented programming. This kind of research benefits no one and it is understandable that the research funders want people to innovate. But telling submitters that every project has to be epochal (surprisingly, “epochal” is missing from the adjectives in the descriptions above  — I am sure this will soon be corrected) will not achieve this result.

It achieves something else, good neither for research nor for research funding: promise inflation. Being told that they have to be Darwin or nothing, researchers learn the game and promise the moon; they also get the part about “risk” and emphasize how uncertain the whole thing is and how high the likelihood it will fail. (Indeed, since — if it works — it will let cars run from water seamlessly extracted from the ambient air, and with the excedent produce free afternoon tea.)

By itself this is mostly entertainment, as no one believes the hyped promises. The real harm, however, is to honest scientists who work in the normal way, proposing to bring an important contribution to the  solution of an important problem. They risk being dismissed as small-timers with no vision.

Some funding agencies have kept their heads cool. How refreshing, after the above quotes, to read the general description of funding by the Swiss National Science Foundation [8]:

The central criteria for evaluation are the scientific quality, originality and project methodology as well as qualifications and track record of the applicants. Grants are awarded on a competitive basis.

In a few words, it says all there is to say. Quality, originality, methodology, and track record. Will the research be “ground-breaking” or “incremental”? We will know when it is done.

I am convinced that the other agencies will come to their senses and stop the paradigm-shift nonsense. One reason for hope is in the very excesses of the currently fashionable style. The European Research Council quote includes, by my count, nineteen ways of saying that proposals must be daring. Now it is a pretty universal rule of life that someone who finds it necessary to say the same thing nineteen times in a single paragraph does not feel sure about it. He is trying to convince himself. At some point the people in charge will realize that such hype does not breed breakthroughs; it breeds more hype.

Until that happens there is something that some of us can do: refuse to play the game. Of course we are all convinced that our latest idea is the most important one ever conceived by humankind, and we want to picture it in the most favorable light. But we should resist the promise inflation. Such honesty comes at a risk. (I still remember a project proposal, many years ago, which came back with glowing reviews: the topic was important, the ideas right, the team competent. The agency officer’s verdict: reject. The proposers are certain to succeed, so it’s not research.) For some people, there is really no choice but to follow the lead: if your entire career depends on getting external funding, no amount of exhortation will prevent you from saying what the purse-holders want to hear. But those of us who do have a choice (that is to say, will survive even if a project is rejected) should refuse the compromission. We should present our research ideas for what they are.

So: one cheer for incremental research.

Wait, isn’t the phrase supposed to be “two cheers” [9]?

All right, but let’s go at it incrementally. One and one-tenth cheer for incremental research. 

References

 

[1]  National Science Foundation, Division of Computer and Network Systems: Computer Systems Research  (CSR), at http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=13385.

[2] European Research Council: Advanced Investigators Grant, at http://erc.europa.eu/index.cfmfuseaction=page.display&topicID=66.

[3] The Berne years; see any biography of Albert Einstein.

[4] C.A.R. Hoare: An axiomatic basis for computer programming, in Communications of the ACM, vol. 12, no 10, pages 576–580,583, October 1969.

[5] C.A.R. Hoare: The Emperor’s Old Clothes, in Communications of the ACM, vol. 24, no.  2, pages 75 – 83, February 1981.

[6] In the first version of this essay I wrote “Someone should celebrate the anniversary!”. Moshe Vardi, editor of Communications of the ACM, has informed me that the October 2009 issue will include a retrospective by Hoare on the 1969 paper. I cannot wait to see it.

[7] Robert W. Floyd: Assigning meanings to programs, in Proceedings of the American Mathematical Society Symposia on Applied Mathematics, vol. 19, pp. 19–31, 1967.

[8] Swiss National Science Foundation:  Projects – Investigator-Driven Research, at http://www.snf.ch/E/funding/projects/Pages/default.aspx. Disclosure: The SNSF kindly funds some of my research.

[9] E.M. Forster: Two Cheers for Democracy, Edward Arnold, 1951.

VN:F [1.9.10_1130]
Rating: 8.3/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 3 votes)

The good and the ugly

Once in a while one hits a tool that is just right. An example worth publicizing is the EasyChair system for conference management [1], which  — after a first experience as reviewer —  I have selected whenever I was in a position to make the choice for a new conference in recent years.

At first sight, a conference management system does not seem so hard to put together; it is in fact a traditional project topic for software engineering courses. But this apparent simplicity is deceptive, as a usable system must accommodate countless small and large needs. To take just one example, you can be a member of a program committee for a conference and also submit a paper to it; this implies strict rules about what you can see, for example reviews of other people’s papers with the referees’ names, and what you should not see. Taking care of myriad such rules and requirements requires in-depth domain knowledge about conferences, and a thorough analysis.

EasyChair is based on such an analysis. It knows what a conference is, and understands what its users need. Here for example is my login screen on EasyChair:

easychair

EasyChair knows about me: I only have one user name and one password. It knows the conferences in which I have been involved (and found them by itself). It knows about my various roles: chair, author etc., and will let me do different things depending on the role I choose.

The rest of the tool is up to the standards set by this initial screen. Granted, the Web design is very much vintage 1994; a couple of hours on the site by a professional graphics designer would not hurt, but, really, who cares? What matters is the functionality, and it is not by accident that EasyChair’s author is a brilliant logician [2]. Here is someone who truly understands the business of organizing and refereeing a conference, has translated this understanding into a solid logical model, and has at every step put himself in the shoes of the participants in the process. As a user you feel that everything has been done to make you feel comfortable  and perform efficiently, while protecting you from hassle.

Because this is all so simple and natural, you might forget that the system required extensive design. If you need proof, it suffices to consider, by contrast, the ScholarOne system, which as punishment for our sins both ACM and IEEE use for their journals.

Even after the last user still alive has walked away, ScholarOne will remain in the annals of software engineering, as a textbook illustration of how not to design a system and its user interface. Not the visuals; no doubt that site had a graphics designer. But everything is designed to make the system as repellent as possible for its users. You keep being asked for information that you have already entered. If you are a reviewer for Communications of the ACM and submit a paper to an IEEE Computer Society journal, the system does not remember you, since CACM has its own sub-site; you must re-enter everything. Since your identifier is your email address, you will have two passwords with the same id, which confuses the browser. (I keep forgetting the appropriate password, which the site obligingly emails me, in clear.) IEEE publications have a common page, but here is how it looks:

scholarone-detail

See the menu on the right? It is impossible  to see the full names of each of the “Transactio…”. (No tooltips, of course.) Assume you just want to know what one of them is, for example “th-cs”: if you select it you are prompted to provide all kinds of information (which you have entered before for other publications), before you can even proceed.

This user interface design (the minuscule menu, an example of what Scott Meyers calls the “Keyhole problem” [3]) is only a small part of usability flaws that plague the system. The matter is one of design: the prevailing viewpoint is that of the  designers and administrators, not the users. I was not really surprised when I found out that the system comes from the same source as the ISI Web of Science system (which should never be used for computer science, see [4]).

It is such a pleasure in contrast to see a system like EasyChair  — for all I know a one-man effort — with its attention to user needs, its profound understanding of the problem domain, and its constant improvements over the years.

References

[1] EasyChair system, at http://www.easychair.org.

[2] Andrei Voronkov, http://www.voronkov.com/.

[3] Scott Meyers, The Keyhole Problem, at http://www.aristeia.com/TKP/draftPaper.pdf; see also slides at http://se.ethz.ch/~meyer/publications/OTHERS/scott_meyers/keyhole.pdf

[4]  Bertrand Meyer, Christine Choppy, Jan van Leeuwen, Jørgen Staunstrup: Research Evaluation for Computer Science, in  Communications  of the ACM, vol. 52, no. 4, pages 131-134, online at http://portal.acm.org/citation.cfm?id=1498765.1498780 (requires subscription). Longer version, available at http://www.informatics-europe.org/docs/research_evaluation.pdf (free access).

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)