LASER 2014 (Elba, September)

2014 marks the 10-th anniversary (11th edition) of the LASER summer school. The school will be held September 7-14, 2014, and the detailed information is here.

LASER (the name means Laboratory for Applied Software Engineering Research) is dedicated to practical software engineering. The roster of speakers since we started is a who’s who of innovators in the field. Some of the flavor of the school can gathered from the three proceedings volumes published in Springer LNCS (more on the way) or simply by browsing the pages of the schools from previous years.

Usually we have a theme, but to mark this anniversary we decided to go for speakers first; we do have a title, “Leading-Edge Software Engineering”, but broad enough to encompass a wide variety of a broad range of topics presented by star speakers: Harald Gall, Daniel Jackson, Michael Jackson, Erik Meijer (appearing at LASER for the third time!), Gail Murphy and Moshe Vardi. With such a cast you can expect to learn something important regardless of your own primary specialty.

LASER is unique in its setting: a 5-star hotel in the island paradise of Elba, with outstanding food and countless opportunities for exploring the marvelous land, the beaches, the sea, the geology (since antiquity Elba has been famous for its stones and minerals) and the history, from the Romans to Napoleon, who in the 9 months of his reign changed the island forever. The school is serious stuff (8:30 to 13 and 17 to 20 every day), but with enough time to enjoy the surroundings.

Registration is open now.

Niklaus Wirth birthday symposium, 20 February, Zurich

In honor of Niklaus Wirth’s 80-th birthday we are organizing a symposium at ETH on February 20, 2014. This is a full-day event with invited talks by:

  • Vint Cerf
  • Hans Eberlé
  • Michael Franz
  • me
  • Carroll Morgan
  • Martin Odersky
  • Clemens Szyperski
  • Niklaus Wirth himself

From the symposium’s web page:

Niklaus Wirth was a Professor of Computer Science at ETH Zürich, Switzerland, from 1968 to 1999. His principal areas of contribution were programming languages and methodology, software engineering, and design of personal workstations. He designed the programming languages Algol W, Pascal, Modula-2, and Oberon, was involved in the methodologies of structured programming and stepwise refinement, and designed and built the workstations Lilith and Ceres. He published several text books for courses on programming, algorithms and data structures, and logical design of digital circuits. He has received various prizes and honorary doctorates, including the Turing Award, the IEEE Computer Pioneer, and the Award for outstanding contributions to Computer Science Education.

Participation is free (including breaks, lunch and the concluding “Apéro”) but space is strictly limited and we expect to run out of seats quickly. So if you are interested (but only if you are certain to attend) please register right away.

Symposium page and access to registration form: here.

Informatics education in Europe: Just the facts

 

In 2005 a number of us started Informatics Europe [1], the association of university departments and industrial research labs in computer science in Europe. The association has now grown to 80 members across the entire continent; it organizes the annual European Computer Science Summit and has published a number of influential reports. The last one just came out: Informatics Education in Europe: Institutions, Degrees, Students, Positions, Salaries — Key Data 2008-2012 [2]. The principal author is Cristina Pereira, who collected and organized the relevant data over more than a year; I helped with the preparation of the final text.

At the beginning of Informatics Europe we considered with particular attention  the model of the Computing Research Association [3], which played a crucial role in giving computer science (informatics) its due place in the US academic landscape. Several past and current officers of the CRA,  such as Willy Zwaenepoel, Ed Lazowska, Bob Constable, Andy Bernat, Jeannette Wing, Moshe Vardi and J Strother Moore gave keynotes at our early conferences and we of course asked them for the secrets of their organization’s success. One answer that struck us was the central role played by data collection. Just gathering the  facts, such as degrees and salaries, established for the first time a solid basis for serious discussions. We took this advice to heart and the report is the first result.

Gathering the information is particularly difficult for Europe given the national variations and the absence of centralized statistical data. Even the list of names under which institutions teach informatics in Europe fills a large table in the report. Cristina’s decision was, from the start, to favor quality over quantity: to focus on impeccable data for countries for which we could get it, rather than trying to cover the whole continent with data of variable credibility.

The result is the first systematic repository of basic information on informatics education in Europe: institutions, degrees offered and numbers awarded, student numbers, position titles and definitions, and (a section which will not please everyone) salaries for PhD students, postdocs and professors of various ranks.

The report is a first step; it only makes sense if we can regularly continue to update it and particularly extend it to other countries. But even in its current form (and with the obvious observation that my opinion is not neutral)  I see it as a major step forward for the discipline in Europe. We need an impeccable factual basis to convince the public at large and political decision-makers to give informatics the place it deserves in today’s educational systems.

References

[1] Informatics Europe site, see here.

[2] Cristina Pereira and Bertrand Meyer: Informatics Education in Europe: Institutions, Degrees, Students, Positions, Salaries — Key Data 2008-2012, Informatics Europe report, 30 September 2013, available here.

[3] Computing Research Association (US), see here.

.

Reading notes: misclassified bugs

 

(Please note the general disclaimer [1].)

How Misclassification Impacts Bug Prediction [2], an article to be presented on Thursday at ICSE, is the archetype of today’s successful empirical software engineering research, deriving significant results from the mining of publicly available software project repositories — in this case Tomcat5 and three others from Apache, as well as Rhino from Mozilla. The results are in some sense meta-results, because many studies have already mined the bug records of such repositories to draw general lessons about bugs in software development; what Herzig, Just and Zeller now tell us is that the mined data is highly questionable: many problems classified as bugs are not bugs.

The most striking results (announced in a style a bit stentorian to my taste, but indeed striking) are that: every third bug report does not describe a bug, but a request for a new feature, an improvement, better documentation or tests, code cleanup or refactoring; and that out of five program files marked as defective, two do not in fact contain any bug.

These are both false positive results. The repositories signal very few misclassifications the other way: only a small subset of enhancement and improvement requests (around 5%) should have been classified as bugs, and even fewer faulty files are missed (8%, but in fact less than 1% if one excludes an outlier, tomcat5 with 38%, a discrepancy that the paper does not discuss).

The authors have a field day, in the light of this analysis, of questioning the validity of the many studies in recent years — including some, courageously cited, by Zeller himself and coauthors — that start from bug repositories to derive general lessons about bugs and their properties.

The methodology is interesting if a bit scary. The authors (actually, just the two non-tenured authors, probably just a coincidence) analyzed 7401 issue reports manually; more precisely, one of them analyzed all of them and the second one took a second look at the reports that came out from the first step as misclassified, without knowing what the proposed reclassification was, then the results were merged. At 4 minutes per report this truly stakhanovite effort took 90 working days. I sympathize, but I wonder what the rules are in Saarland for experiments involving living beings, particularly graduate students.

Precise criteria were used for the reclassification; for example a report describes a bug, in the authors’ view, if it mentions a null pointer exception (I will skip the opportunity of a pitch for Eiffel’s void safety mechanism), says that the code has to be corrected to fix the semantics, or if there is a “memory issue” or infinite loop. These criteria are reasonable if a bit puzzling (why null pointer exceptions and not other crashes such as arithmetic overflows?); but more worryingly there is no justification for them. I wonder  how much of the huge discrepancy found by the authors — a third or reported bugs are not bugs, and 40% of supposedly defective program files are not defective — can be simply explained by different classification criteria applied by the software projects under examination. The authors give no indication that they interacted with the people in charge of these projects. To me this is the major question hovering over this paper and its spectacular results. If you are in the room and get the chance, don’t hesitate to ask this question on my behalf or yours!

Another obvious question is how much the results depend on the five projects selected. If there ever was room for replicating a study (a practice whose rarity in software engineering we lament, but whose growth prospects are limited by the near-impossibility of convincing selective software engineering venues to publish confirmatory empirical studies), this would be it. In particular it would be good to see some of the results for commercial products.

The article offers an explanation for the phenomena it uncovered: in its view, the reason why so many bug reports end up misclassified is the difference of perspective between users of the software, who complain about the problems they encounter,  and the software professionals  who prepare the actual bug reports. The explanation is plausible but I was surprised not to see any concrete evidence that supports it. It is also surprising that the referees did not ask the authors to provide more solid arguments to buttress that explanation. Yet another opportunity to raise your hand and ask a question.

This (impressive) paper will call everyone’s attention to the critical problem of data quality in empirical studies. It is very professionally prepared, and could, in addition to its specific contributions, serve as a guide on how to get an empirical software engineering paper accepted at ICSE: take a critical look at an important research area; study it from a viewpoint that has not been considered much so far; perform an extensive study, with reasonable methodological assumptions; derive a couple of striking results, making sure they are both visibly stated and backed by the evidence; and include exactly one boxplot.

Notes and references

[1] This article review is part of the “Reading Notes” series. General disclaimer here.

[2] Kim Herzig, Sascha Just and Andreas Zeller: It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, in ICSE 2013, available here. According to the ICSE program the paper will be presented on May 23 in the Bug Prediction session, 16 to 17:30.

Reading notes: the design of bug fixes

 

To inaugurate the “Reading Notes” series [1] I will take articles from the forthcoming International Conference on Software Engineering. Since I am not going to ICSE this year I am instead spending a little time browsing through the papers, obligingly available on the conference site. I’ll try whenever possible to describe a paper before it is presented at the conference, to alert readers to interesting sessions. I hope in July and August to be able to do the same for some of the papers to be presented at ESEC/FSE [2].

Please note the general disclaimer [1].

The Design of Bug Fixes [3] caught my attention partly for selfish reasons, since we are working, through the AutoFix project [3], on automatic bug fixing, but also out of sheer interest and because I have seen previous work by some of the authors. There have been article about bug patterns before, but not so much is known with credible empirical evidence about bug fixes (corrections of faults). When a programmer encounters a fault, what strategies does he use to correct it? Does he always produce the best fix he can, and if so, why not? What is the influence of the project phase on such decisions (e.g. will you fix a bug the same way early in the process and close to shipping)? These are some of the questions addressed by the paper.

The most interesting concrete result is a list of properties of bug fixes, classified along two criteria: nature of a fix (the paper calls it “design space”), and reasoning behind the choice of a fix. Here are a few examples of the “nature” classification:

  • Data propagation: the bug arises in a component, fix it in another, for example a library class.
  • More or less accuracy: are we fixing the symptom or the cause?
  • Behavioral alternatives: rather than directly correcting the reported problem, change the user-experienced behavior (evoking the famous quip that “it’s not a bug, it’s a feature”). The authors were surprised to see that developers (belying their geek image) seem to devote a lot of effort trying to understand how users actually use the products, but also found that even so developers do not necessarily gain a solid, objective understanding of these usage patterns. It would be interesting to know if the picture is different for traditional locally-installed products and for cloud-based offerings, since in the latter case it is possible to gather more complete, accurate and timely usage data.

On the “reasoning” side, the issue is why and how programmers decide to adopt a particular approach. For example, bug fixes tend to be more audacious (implying redesign if appropriate) at the beginning of a project, and more conservative as delivery nears and everyone is scared of breaking something. Another object of the study is how deeply developers understand the cause rather than just the symptom; the paper reports that 18% “did not have time to figure out why the bug occurred“. Surprising or not, I don’t know, but scary! Yet another dimension is consistency: there is a tension between providing what might ideally be the best fix and remaining consistent with the design decisions that underlie a software system throughout its architecture.

I was more impressed by the individual categories of the classification than by that classification as a whole; some of the categories appear redundant (“interface breakage“, “data propagation” and “internal vs external“, for example, seem to be pretty much the same; ditto for “cause understanding” and “accuracy“). On the other hand the paper does not explicitly claim that the categories are orthogonal. If they turn this conference presentation into a journal article I am pretty sure they will rework the classification and make it more robust. It does not matter that it is a bit shaky at the moment since the main insights are in the individual kinds of fix and fix-reasoning uncovered by the study.

The authors are from Microsoft Research (one of them was visiting faculty) and interviewed numerous programmers from various Microsoft product groups to find out how they fix bugs.

The paper is nicely written and reads easily. It includes some audacious syntax, as in “this dimension” [internal vs external] “describes how much internal code is changed versus external code is changed as part of a fix“. It has a discreet amount of humor, some of which may escape non-US readers; for example the authors explain that when approaching programmers out of the blue for the survey they tried to reassure them through the words “we are from Microsoft Research, and we are here to help“, a wry reference to the celebrated comment by Ronald Reagan (or his speechwriter) that the most dangerous words in the English language are “I am from the government, and I am here to help“. To my taste the authors include too many details about the data collection process; I would have preferred the space to be used for a more detailed discussion of the findings on bug fixes. On the other hand we all know that papers to selective conferences are written for referees, not readers, and this amount of methodological detail was probably the minimum needed to get past the reviewers (by avoiding the typical criticism, for empirical software engineering research, that the sample is too small, the questions biased etc.). Thankfully, however, there is no pedantic discussion of statistical significance; the authors openly present the results as dependent on the particular population surveyed and on the interview technique. Still, these results seem generalizable in their basic form to a large subset of the industry. I hope their publication will spawn more detailed studies.

According to the ICSE program the paper will be presented on May 23 in the Debugging session, 13:30 to 15:30.

Notes and references

[1] This article review is part of the “Reading Notes” series. General disclaimer here.

[2] European Software Engineering Conference 2013, Saint Petersburg, Russia, 18-24 August, see here.

[3] Emerson Murphy-Hill, Thomas Zimmerman, Christian Bird and Nachiappan Nagapan: The Design of Bug Fixes, in ICSE 2013, available here.

[4] AutoFix project at ETH Zurich, see project page here.

[5] Ronald Reagan speech extract on YouTube.

Specify less to prove more

Software verification is progressing slowly but surely. Much of that progress is incremental: making the fundamental results applicable to real programs as they are built every day by programmers working in standard circumstances. A key condition is to minimize the amount of annotations that they have to provide.

The article mentioned in my previous post, “Program Checking With Less Hassle” [1], to be presented at VSTTE in San Francisco on Friday by its lead author, Julian Tschannen, introduces several interesting contributions in this direction. One of the surprising conclusions is that sometimes it pays to specify less. That goes against intuition: usually, the more specification information (correctness annotations) you provide the more you help the prover. But in fact partial specifications can hurt rather than help. Consider for example a swap routine with a partial specification, which actually stands in the way of a proof. If modularity is not a concern, for example if the routine is part of the code being verified rather than of a library, it may be more effective to ignore the specification and use the routine’s implementation. This is particularly appropriate for smallhelper routines such as the swap example.

This inlining technique is applicable in other cases, for example to make up for a missing precondition: assume that a helper routine will only work for x > 0 but does not state that precondition, or maybe states only the weaker one x ≥ 0 ; in the code, however, it is only called with positive arguments. If we try to verify the code modularly we will fail, as indeed we should since the routine is incorrect as a general-purpose primitive. But within the context of the code there is nothing wrong with it. Forgetting the contract of the routine if any, and instead using its actual implementation, we may be able to show that everything is fine.

Another component of the approach is to fill in preconditions that programmers have omitted because they are somehow obvious to them. For example it is tempting and common to write just a [1] > 0 rather than a /= Void and then a [1] > 0 for a detachable array a. The tool takes care of  interpreting the simpler precondition as the more complete one.

The resulting “two-step verification”, integrated into the AutoProof verification tool for Eiffel, should turn out to be an important simplification towards the goal of “Verification As a Matter Of Course” [2].

References

[1] Julian Tschannen, Carlo A. Furia, Martin Nordio and Bertrand Meyer: Program Checking With Less Hassle, in VSTTE 2013, Springer LNCS, to appear, draft available here; presentation on May 17 in the 15:30-16:30  session.

[2] Verification As a Matter Of Course, article in this blog, 29 March 2010, see here.

Presentations at ICSE and VSTTE

 

The following presentations from our ETH group in the ICSE week (International Conference on Software Engineering, San Francisco) address important issues of software specification and verification, describing new techniques that we have recently developed as part of our work building EVE, the Eiffel Verification Environment. One is at ICSE proper and the other at VSTTE (Verified Software: Tools, Theories, Experiments). If you are around please attend them.

Julian Tschannen will present Program Checking With Less Hassle, written with Carlo A. Furia, Martin Nordio and me, at VSTTE on May 17 in the 15:30-16:30 session (see here in the VSTTE program. The draft is available here. I will write a blog article about this work in the coming days.

Nadia Polikarpova will present What Good Are Strong Specifications?, written with , Carlo A. Furia, Yu Pei, Yi Wei and me at ICSE on May 22 in the 13:30-15:30 session (see here in the ICSE program). The draft is available here. I wrote about this paper in an earlier post: see here. It describes the systematic application of theory-based modeling to the full specification and verification of advanced software.

LASER summer school: Software for the Cloud and Big Data

The 2013 LASER summer school, organized by our chair at ETH, will take place September 8-14, once more in the idyllic setting of the Hotel del Golfo in Procchio, on the island of Elba in Italy. This is already the 10th conference; the roster of speakers so far reads like a who’s who of software engineering.

The theme this year is Software for the Cloud and Big Data and the speakers are Roger Barga from Microsoft, Karin Breitman from EMC,  Sebastian Burckhardt  from Microsoft,  Adrian Cockcroft from Netflix,  Carlo Ghezzi from Politecnico di Milano,  Anthony Joseph from Berkeley,  Pere Mato Vila from CERN and I.

LASER always has a strong practical bent, but this year it is particularly pronounced as you can see from the list of speakers and their affiliations. The topic is particularly timely: exploring the software aspects of game-changing developments currently redefining the IT scene.

The LASER formula is by now well-tuned: lectures over seven days (Sunday to Saturday), about five hours in the morning and three in the early evening, by world-class speakers; free time in the afternoon to enjoy the magnificent surroundings; 5-star accommodation and food in the best hotel of Elba, made affordable as we come towards the end of the season (and are valued long-term customers). The group picture below is from last year’s school.

Participants are from both industry and academia and have ample opportunities for interaction with the speakers, who typically attend each others’ lectures and engage in in-depth discussions. There is also time for some participant presentations; a free afternoon to discover Elba and brush up on your Napoleonic knowledge; and a boat trip on the final day.

Information about the 2013 school can be found here.

LASER 2012, Procchio, Hotel del Golvo

Conferences: Publication, Communication, Sanction

Recycled(This article was first published in the Communications of the ACM blog.)

A healthy discussion is taking place in the computer science community on our publication culture. It was spurred by Lance Fortnow’s 2009 article [1]; now Moshe Vardi has taken the lead to prepare a report on the topic, following a workshop in Dagstuhl in November [2]. The present article and one that follows (“The Waves of Publication”)  are intended as contributions to the debate.

One of the central issues is what to do with conferences. Fortnow had strong words for the computer science practice of using conferences as the selective publication venues, instead of relying on journals as traditional scientific disciplines do. The criticism is correct, but if we look at the problem from a practical perspective it is unlikely that top conferences will lose their role as certifiers of quality. This is not a scientific matter but one of power. People in charge of POPL or OOPSLA have decisive sway over the careers (one is tempted to say the lives) of academics, particularly young academics, and it is a rare situation in human affairs that people who have critical power voluntarily renounce it. Maybe the POPL committee will see the light: maybe starting in 2014 it will accept all reasonable papers somehow related to “principles of programming languages”, turn the event itself into a pleasant multi-track community affair where everyone in the field can network, and hand over the selection and stamp-of-approval job to a journal such as TOPLAS. Dream on; it is not going to happen.

We should not, however, remain stuck with the status quo and all its drawbacks. That situation is unsustainable. As a single illustration, consider the requirement, imposed by all conferences, that having a paper pass the refereeing process is not enough: you must also register. A couple of months before the conference, authors of accepted papers (at least, they thought their paper was accepted) receive a threatening email telling them that unless they register and pay their paper will not be published after all. Now assume an author, in a field where a conference is the top token of recognition, has his visa application rejected by the country of the conference — a not so uncommon situation — and does not register. (Maybe he does not mind paying the fee, but he does not want to lie by pretending he is going to attend whereas he knows he will not.) He has lost his opportunity for publication and perhaps severely harmed this career. What have such requirements to do with science?

To understand what can be done, we need to analyze the role of conferences. In an earlier article  [3] I described four “modes and uses” of publication: Publication, Exam, Business and Ritual. From the organizers’ viewpoint, ignoring the Business and Ritual aspects although they do play a significant role, a conference has three roles: Publication, Communication and Sanction. The publication part corresponds to the proceedings of the conference, which makes articles available to the community at large, not just the conference attendees. The communication part only addresses the attendees: it includes the presentation of papers as well as all other interactions made possible by being present at a conference. The sanction part (corresponding to the “exam” part of the more general classification) is the role of a renowned conference as a stamp of approval for the best work of the moment.

What we should do is separate these roles. A conference can play all three roles, but it can also select two of them, or even just one. A well-established, prestigious conference will want to retain its sanctioning role: accepted papers get the stamp of approval. It will also remain an event, where people meet. And it may distribute proceedings. But the three roles can also be untied:

  • Publication is the least critical, and can easily be removed from the other two, since everything will be available on the Web. In fact the very notion of proceedings is quickly becoming fuzzy: more and more conferences save money by not distributing printed proceedings to attendees, sometimes not printing any proceedings at all; and some even spare themselves the production of a proceedings-on-a-stick, putting the material on the Web instead. A conference may still decide to have its own proceedings, or it might outsource that part to a journal. Each conference will make these decisions based on its own culture, tradition, ambition and constraints. For authors, the decision does not particularly matter: what counts are the sanction, which is provided by the refereeing process, and the availability of their material to the world, which will be provided in any scenario (at least in computer science where we have, thankfully, the permission to put our papers on our own web sites, an acquired right that our colleagues from other disciplines do not all enjoy).
  • Separating sanction from communication is a natural step. Acceptance and participation are two different things.

Conference organizers should not be concerned about lost revenue: most authors will still want to participate in the conference, and will get the funding since institutions are used to pay for travel to present accepted papers; some new participants might come, attracted by more interaction-oriented conference styles; and organizers can replace the requirement to register by a choice between registering and paying a publication fee.

Separating the three roles does not mean that any established conference renounces its sanctioning status, acquired through the hard work of building the conference’s reputation, often over decades. But everyone gets more flexibility. Several combinations are possible, such as:

  • Sanction without communication or publication: papers are submitted for certification through peer-review, they are available on the Web anyway, and there is no need for a conference.
  • Publication without sanction or communication: an author puts a paper on his web page or on a self-publication site such as ArXiv.
  • Sanction and communication without publication: a traditional selective conference, which does not bother to produce proceedings.
  • Communication without sanction: a working conference whose sole aim is to advance the field through presentations and discussions, and accepts any reasonable submission. It may be by invitation (a kind of advance sanction). It may have proceedings (publication) or not.

Once we understand that the three roles are not inextricably tied, the stage is clear for removal for some impediments to a more effective publication culture. Some, not all. The more general problem is the rapidly changing nature of scientific publication, what may be called the concentric waves of publication. That will be the topic of the next article.

References

[1] Lance Fortnow: Time for Computer Science to Grow Up, in Communications of the ACM, Vol. 52, no. 8, pages 33-35, 2009, available here.

[2] Dagstuhl: Perspectives Workshop: Publication Culture in Computing Research, see here.

[3] Bertrand Meyer: The Modes and Uses of Scientific Publication, article on this blog, 22 November 2011, see here.