Archive for August 2013

Empirical answers to fundamental software engineering questions

This is a slightly reworked version of an article in the CACM blog, which also served as the introduction to a panel which I moderated at ESEC/FSE 2013 last week; the panelists were Harald Gall, Mark Harman, Giancarlo Succi (position paper only) and Tony Wasserman.

For all the books on software engineering, and the articles, and the conferences, a remarkable number of fundamental questions, so fundamental indeed that just about every software project runs into them, remain open. At best we have folksy rules, some possibly true, others doubtful, and others — such as “adding people to a late software project delays it further” [1] — wrong to the point of absurdity. Researchers in software engineering should, as their duty to the community of practicing software practitioners, try to help provide credible answers to such essential everyday questions.

The purpose of this panel discussion is to assess what answers are already known through empirical software engineering, and to define what should be done to get more.

Empirical software engineering” applies the quantitative methods of the natural sciences to the study of software phenomena. One of its tasks is to subject new methods — whose authors sometimes make extravagant and unsupported claims — to objective scrutiny. But the benefits are more general: empirical software engineering helps us understand software construction better.

There are two kinds of target for empirical software studies: products and processes. Product studies assess actual software artifacts, as found in code repositories, bug databases and documentation, to infer general insights. Project studies assess how software projects proceed and how their participants work; as a consequence, they can share some properties with studies in other fields that involve human behavior, such as sociology and psychology. (It is a common attitude among computer scientists to express doubts: “Do you really want to bring us down to the standards of psychology and sociology?” Such arrogance is not justified. These sciences have obtained many results that are both useful and sound.)

Empirical software engineering has been on a roll for the past decade, thanks to the availability of large repositories, mostly from open-source projects, which hold information about long-running software projects and can be subjected to data mining techniques to identify important properties and trends. Such studies have already yielded considerable and often surprising insights about such fundamental matters as the typology of program faults (bugs), the effectiveness of tests and the value of certain programming language features.

Most of the uncontested successes, however, have been from the product variant of empirical software engineering. This situation is understandable: when analyzing a software repository, an empirical study is dealing with a tangible and well-defined artifact; if any of the results seems doubtful, it is possible and sometimes even easy for others to reproduce the study, a key condition of empirical science. With processes, the object of study is more elusive. If I follow a software project working with Scrum and another using a more traditional lifecycle, and find that one does better than the other, how do I know what other factors may have influenced the outcome? And even if I bring external factors under control how do I compare my results with those of another researcher following other teams in other companies? Worse, in a more realistic scenario I do not always have the luxury of tracking actual industry projects since few companies are enlightened enough to let researchers into their developments; how do I know that I can generalize to industry the conclusions of experiments made with student groups?

Such obstacles do not imply that sound results are impossible; studies involving human behavior in psychology and sociology face many of the same difficulties and yet do occasionally yield insights. But these obstacles explain why there are still few incontrovertible results on process aspects of software engineering. This situation is regrettable since it means that projects large and small embark on specific methods, tools and languages on the basis of hearsay, opinions and sometimes hype rather than solid knowledge.

No empirical study is going to give us all-encompassing results of the form “Agile methods yield better products” or “Object-oriented programming is better than functional programming”. We are entitled to expect, however, that they help practitioners assess some of the issues that await every project. They should also provide a perspective on the conventional wisdom, justified or not, that pervades the culture of software engineering. Here are some examples of general statements and questions on which many people in the field have opinions, often reinforced by the literature, but crying for empirical backing:

  • The effect of requirements faults: the famous curve by Boehm is buttressed by old studies on special kinds of software (large mission-critical defense projects). What do we really lose by not finding an error early enough?
  • The cone of uncertainty: is that idea just folklore?
  • What are the successful techniques for shortening delivery time by adding manpower?
  • The maximum compressibility factor: is there a nominal project delivery time, and how much can a project decrease it by throwing in money and people?
  • Pair programming: when does it help, when does it hurt? If it has any benefits, are there in quality or in productivity (delivery time)?
  • In iterative approaches, what is the ideal time for a sprint under various circumstances?
  • How much requirements analysis should be done at the beginning of a project, and how much deferred to the rest of the cycle?
  • What predictors of size correlate best with observed development effort?
  • What predictors of quality correlate best with observed quality?
  • What is the maximum team size, if any, beyond which a team should be split?
  • Is it better to use built-in contracts or just to code assertions in tests?

When asking these and other similar questions relating to core aspects of practical software development, I sometimes hear “Oh, but we know the answer conclusively, thanks to so-and-so’s study“. This may be true in some cases, but in many others one finds, in looking closer, that the study is just one particular experiment, fraught with the same limitations as any other.

The principal aim of the present panel is to find out, through the contributions of the panelists which questions have useful and credible empirical answers already available, whether or not widely known. The answers must indeed be:

  • Empirical: obtained through objective quantitative studies of projects.
  • Useful: providing answers to questions of interest to practitioners.
  • Credible: while not necessarily absolute (a goal difficult to reach in any matter involving human behavior), they must be backed by enough solid evidence and confirmation to be taken as a serious input to software project decisions.

An auxiliary outcome of the panel should be to identify fundamental questions on which credible, useful empirical answers do not exist but seem possible, providing fuel for researchers in the field.

To mature, software engineering must shed the folkloric advice and anecdotal evidence that still pervade the field and replace them with convincing results, established with all the limitations but also the benefits of quantitative, scientific empirical methods.


[1] From Brooks’s Mythical Man-Month.

VN:F [1.9.10_1130]
Rating: 8.7/10 (10 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Concurrency video

Our Concurrency Made Easy project, the result of an ERC Advanced Investigator Grant, is trying to solve the problem of making concurrent programming simple, reliable and effective. It has spurred related efforts, in particular the Roboscoop project applying concurrency to robotics software.

Sebastian Nanz and other members of the CME project at ETH have just produced a video that describes the aims of the project and presents some of the current achievements. The video is available on the CME project page [1] (also directly on YouTube [2]).


[1] Concurrency Made Easy project, here.

[2] YouTube CME video, here.

VN:F [1.9.10_1130]
Rating: 9.5/10 (17 votes cast)
VN:F [1.9.10_1130]
Rating: +13 (from 15 votes)

Smaller, better textbook

A new version of my Touch of Class [1] programming textbook is available. It is not quite a new edition but more than just a new printing. All the typos that had been reported as of a few months ago have been corrected.

The format is also significantly smaller. This change is more than a trifle. When а  reader told me for the first time “really nice book, pity it is so heavy!”, I commiserated and did not pay much attention. After twenty people said that, and many more after them, including professors looking for textbooks for their introductory programming classes, I realized it was a big deal. The reason the book was big and heavy was not so much the size of the contents (876 is not small, but not outrageous for a textbook introducing all the fundamental concepts of programming). Rather, it is a technical matter: the text is printed in color, and Springer really wanted to do a good job, choosing thick enough paper that the colors would not seep though. In addition I chose a big font to make the text readable, resulting in a large format. In fact I overdid it; the font is bigger than necessary, even for readers who do not all have the good near-reading sight of a typical 19-year-old student.

We kept the color and the good paper,  but reduced the font size and hence the length and width. The result is still very readable, and much more portable. I am happy to make my contribution to reducing energy consumption (at ETH alone, think of the burden on Switzerland’s global energy bid of 200+ students carrying the book — as I hope they do — every morning on the buses, trains and trams crisscrossing the city!).

Springer also provides electronic access.

Touch of Class is the textbook developed on the basis of the Introduction to Programming course [2], which I have taught at ETH Zurich for the last ten years. It provides a broad overview of programming, starting at an elementary level (zeros and ones!) and encompassing topics not usually covered in introductory courses, even a short introduction to lambda calculus. You can get an idea of the style of coverage of such topics by looking up the sample chapter on recursion at Examples of other topics covered include a general introduction to software engineering and software tools. The presentation uses full-fledged object-oriented concepts (including inheritance, polymorphism, genericity) right from the start, and Design by Contract throughout. Based on the “inverted curriculum” ideas on which I published a number of articles, it presents students with a library of reusable components, the Traffic library for graphical modeling of traffic in a city, and builds on this infrastructure both to teach students abstraction (reusing code through interfaces including contracts) and to provide them models of high-quality code for imitation and inspiration.

For more details see the article on this blog that introduced the book when it was first published [3].


[1] Bertrand Meyer, Touch of Class: An Introduction to Programming Well Using Objects and Contracts, Springer Verlag, 2nd printing, 2013. The Amazon page is here. See the book’s own page (with slides and other teaching materials, sample chapter etc.) here. (Also available in Russian, see here.)

[2] Einführung in die Programmierung (Introduction to Programming) course, English course page here.

[3] Touch of Class published, article on this blog, 11 August 2009, see [1] here.

VN:F [1.9.10_1130]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 7 votes)

Barenboim = Rubinstein?


I have always admired Daniel Barenboim, both as a pianist and as a conductor — and not just because years ago, from pictures on disk covers, we looked strikingly alike, see e.g. [1] which could almost be me at that time (Then I went to see him in concert and realized that he was a good 15 centimeters shorter; the pictures were only head-and-shoulders. Since that time the difference of our physical appearances has considerably increased, not compensated, regrettably, by any decrease of the difference of our musical abilities.)

Nowadays you can find lots of good music, an unbelievable quantity in fact, on YouTube. Like this excellent performance of Beethoven’s Emperor Concerto [2] by Barenboim with a Danish orchestra.

If you go to that page and expand the “about” tab an interesting story unfolds. If I parse it right (I have no direct information) it is a record of a discussion between the person who uploaded the video and the YouTube copyright police. It seems YouTube initially rejected the upload on the basis that it violated no fewer than three different copyrights, all apparently for recordings of the concerto: one by the Berlin Philharmonic (pianist not named), one by Arthur Rubinstein (orchestra not named), and one by Ivan Szekely (orchestra not named). The uploader contested these copyright claims, pointing out that the performers are different in all four cases. It took a little more than a month before YouTube accepted the explanation and released the video on 22 April 2012.

Since the page clearly listed  the performers’ names and contained a full video, the initial copyright complaints must have been made on the basis of the audio track alone. Further, the detection must have been automatic, as it is hard to imagine that either YouTube or the copyright owners employ a full staff of music experts to listen all day to recordings on the web and,  once in a while, write an email of the form “I just heard something at http://musicsite.somewhere that sounds suspiciously close to bars 37-52  of what I remember from the `Adagio un poco mosso’ in the 1964 Rubinstein performance, or possibly his 1975 performance, of Beethoven’s Emperor“. (The conductor in Rubinstein’s 1975 recording, by the way, is… Daniel Barenboim.) Almost certainly, the check is done by a program which scours the Web for clones.

It seems, then, that the algorithm used by YouTube or whoever runs these checks can, reasonably enough, detect that a recording is from a certain piece of music, but — now the real scandal — cannot distinguish between Rubinstein and Barenboim.

If this understanding is correct one would like to think that some more research can solve the problem. That would assume that humans can always distinguish performers. On French radio there used to be a famous program, the “Tribune of Record Critics“, where for several hours on every Sunday the moderator would play excerpts of a given piece in various interpretations, and the highly opinionated star experts on the panel would praise some to the sky and excoriate others (“This Karajan guy — does he even know what music is about?“).  One day, probably an April 1st,  they broadcast a parody of themselves, pretending to fight over renditions of Beethoven’s The Ruins of Athens overture while all were actually the same recording being played again and again. After that I always wondered whether in normal instances of the program the technicians were not tempted once in a while to switch recordings to fool the experts. (The version of the program that runs today, which is much less fun, relies on blind tasting, if I may call it that way.) Presumably no professional listener would ever confuse the playing of Barenboim (the pianist) with that of Rubinstein. Presumably… and yet reading about the very recent Joyce Batto scandal [3], in which a clever fraudster  tricked the whole profession  for a decade about more than a hundred recordings, is disturbing.

If my understanding of the situation regarding the Barenboim video is correct, then it remarkable that any classical music recordings can appear at all on YouTube without triggering constant claims of copyright infringement; specifically, any multiple recordings of the same piece. In classical music, interpretation is crucial, and one never tires of comparing performances of the same piece by different artists, with differences that can be subtle at times and striking at others. Otherwise, why would we go hear Mahler’s 9th or see Cosi Fan Tutte after having been there, done that so many times? And now we can perform even more comparisons without leaving home, just by browsing YouTube. Try for example Schumann’s Papillons by Arrau, Kempf, Argerich and — my absolute favorite for many years — Richter. Perhaps a reader with expertise on the topic can tell us about the current state of plagiarism detection for music: how finely can it detect genuine differences of interpretation, without being fooled by simple tricks as were used in the Batto case?

Still. To confuse Barenboim with Rubinstein!

References and notes

[1] A photograph of the young Barenboim: see here.

[2] Video recording of performance of Beethoven’s 5th piano concert by Daniel Barenboim and Det kongelige kapel conducted by Michael Schønvandt on the occasion of the Sonning Prize award, 2009, uploaded to YouTube by “mugge62” and available here.

[3] Wikipedia entry on Joyce Hatto and the Barrington-Coupe fraud, here.

VN:F [1.9.10_1130]
Rating: 8.8/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)