Agile methods: the good, the bad and the ugly

It was a bit imprudent last Monday to announce the continuation of the SCOOP discussion for this week; with the TOOLS conference happening now, with many satellite events such as the Eiffel Design Feast of the past week-end and today’s “New Eiffel Technology Community” workshop, there is not enough time for a full article. Next week might also be problematic. The SCOOP series will resume, but in the meantime I will report on other matters.

As something that can be conveniently typed in while sitting in the back of the TOOLS room during fascinating presentations, here is a bit of publicity for the next round of one-day seminars for industry — “Compact Course” is the official terminology — that I will be teaching at ETH in Zurich next November (one in October), some of them with colleagues. It’s the most extensive session that we have ever done; you can see the full programs and registration information here.

  • Software Engineering for Outsourced and Distributed Development, 27 October 2011
    Taught with Peter Kolb and Martin Nordio
  • Requirements Engineering, 17 November
  • Software Testing and Verification: state of the art, 18 November
    With Carlo Furia and Sebastian Nanz
  • Agile Methods: the Good, the Bad and the Ugly, 23 November
  • Concepts and Constructs of Concurrent Computation, 24 November
    With Sebastian Nanz
  • Design by Contract, 25 November

The agile methods course is new; its summary reads almost like a little blog article, so here it is.

Agile methods: the Good, the Bad and the Ugly

Agile methods are wonderful. They’ll give you software in no time at all, turn your customers and users into friends, catch bugs before they catch you, change the world, and boost your love life. Do you believe these claims (even excluding the last two)? It’s really difficult to form an informed opinion, since most of the presentations of eXtreme Programming and other agile practices are intended to promote them (and the consultants to whom they provide a living), not to deliver an objective assessment.

If you are looking for a guru-style initiation to the agile religion, this is not the course for you. What it does is to describe in detail the corpus of techniques covered by the “agile” umbrella (so that you can apply them effectively to your developments), and assess their contribution to software engineering. It is neither “for” nor “against” agile methods but fundamentally descriptive, pedagogical, objective and practical. The truth is that agile methods include some demonstrably good ideas along with some whose benefits are at best dubious. In addition (and this should not be a surprise) they cannot make the fundamental laws of software engineering go away.

Agile methods have now been around for more than a decade, during which many research teams, applying proven methods of experimental science, have performed credible empirical studies of how well the methods really work and how they compare to more traditional software engineering practices. This important body of research results, although not widely known, is critical to managers and developers in industry for deciding whether and how to use agile development. The course surveys these results, emphasizing the ones most directly relevant to practitioners.

A short discussion session will enable participants with experience in agile methods to share their results.

Taking this course will give you a strong understanding of agile development, and a clear view of when, where and how to apply them.

Schedule

Morning session: A presentation of agile methods

  • eXtreme Programming, pair programming, Scrum, Test-Driven Development, continuous integration, refactoring, stakeholder involvement, feature-driven development etc.
  • The agile lifecycle.
  • Variants: lean programming etc.

Afternoon session (I): Assessment of agile methods

  • The empirical software engineering literature: review of available studies. Assessment of their value. Principles of empirical software engineering.
  • Agile methods under the scrutiny of empirical research: what helps, what harms, and what has no effect? How do agile methods fare against traditional techniques?
  • Examples: pair programming versus code reviews; tests versus specifications; iterative development versus “Big Upfront Everything”.

Afternoon session (II): Discussion and conclusion

This final part of the course will present, after a discussion session involving participants with experience in agile methods, a summary of the contribution of agile methods to software engineering.

It will conclude with advice for organizations involved in software development and interested in applying agile methods in their own environment.

Target groups

CIOs; software project leaders; software developers; software testers and QA engineers.

The Professor Smith syndrome: Part 2

As stated in the Quiz of a few days ago (“Part 1 ”), we consider the following hypothetical report in experimental software engineering ([1], [2]):

Professor Smith has developed a new programming technique, “Suspect-Oriented Programming” (SOP). To evaluate SOP, he directs half of the students in his “Software Methodology” class to do the project using traditional techniques, and the others to use SOP.

He finds that projects by the students using SOP have, on the average, 15% fewer bugs than the others, and reports that SOP increases software reliability.

What’s wrong with this story?

Professor Smith’s attempt at empirical software engineering is problematic for at least four reasons. Others could arise, but we do not need to consider them if Professor Smith has applied the expected precautions: the number of students should be large enough (standard statistical theory will tell us how much to trust the result for various sample sizes); the students should be assigned to one of the two groups on a truly random basis; the problem should be amenable to both SOP and non-SOP techniques; and the assessment of the number of bugs should in the results should be based on fair and if possible automated evaluation. Some respondents to the quiz cited these problems, but they would apply to any empirical study and we can assume they are being taken care of.

The first problem to consider is that the evaluator and the author of the concept under evaluation are the same person. This is an approach fraught with danger. We have no reason to doubt Professor Smith’s integrity, but he is human. Deep down, he wants SOP to be better than the alternative. That is bound to affect the study. It would be much more credible if someone else, with no personal stake in SOP, had performed it.

The second problem mirrors the first on the students’ side. The students from group 1 were told that they used Professor Smith’s great idea, those from group 2 that they had to use old, conventional, boring stuff. Did both groups apply the same zeal to their work? After all, the students know that Professor Smith created SOP, and maybe he is an convincing advocate, so group 1 students will (consciously or not) do their best; those from group 2 have less incentive to go the extra mile. What we may have at play here is a phenomenon known as the Hawthorne effect [3]: if you know you are being tested for a new technique, you typically work harder and better — and may produce better results even if the technique is worthless! Experiments dedicated to studying this effect show that even  a group that is in reality using the same technique as another does better, at least at the beginning, if it is told that it is using a new, sexy technique.

The first and second problems arise in all empirical studies, software-related or not. They are the reason why medical experiments use placebos and double-blind techniques (where neither the subjects nor the experimenters themselves know who is using which variant). These techniques often do not directly transpose to software experiments, but we should all the same be careful about empirical studies of assessments of one’s own work and about possible Hawthorne effects.

The third problem, less critical, is the validity of a study relying on students. To what extent can we extrapolate from the results to a situation in industry? Software engineering students are on their way to becoming software professionals, but they are not professionals yet. This is a difficult issue because universities, rather than industry, are usually and understandably the place where experiments take place, an sometimes there is no other choice than using students. But then one can question the validity of the results. It depends on the nature of the questions being asked: if the question under study is whether a certain idea is easy to learn, using students is reasonable. But if it is, for example, whether a technique produces less buggy programs, the results can depend significantly on the subjects’ experience, which is different for students and professionals.

The last problem does not by itself affect the validity of the results, but it is a show-stopper nonetheless: Professor Smith’s experiment is unethical! If is is indeed true that SOP is better than the alternative, he is harming students from group 2; in the reverse case, he is harming students from group 1. Only in the case of the null hypothesis (using SOP makes no statistically significant difference) is the experiment ethical, but then it is also profoundly uninteresting. The rule in course-related experiments is a variant of the Hippocratic oath: before all, do not harm. The first purpose of a course is to enrich the students’ knowledge and skills; secondary aims, such as helping the professor’s research, are definitely acceptable, but must never impede the first. The setup described above is all the less acceptable that the project results presumably count towards the course grade, so the students who were forced to use the less good technique, if there demonstrably was one, have grounds to complain.

Note that Professor Smith could partially address this fairness problem by letting students choose their group, instead of assigning them randomly to group 1 or group 2 (based for example on the first letter of their names). But then the results would lose credibility, because this technique introduces self-selection and hence bias: the students who choose SOP may be the more intellectually curious students, and hence possibly the ones who do better anyway.

If Professor Smith cannot ensure fairness, he can still use students for his experiment, but has to run it outside of a course, for example by paying students, or running the experiment as a competition with some prizes for those who produce the programs with fewest bugs. This technique can work, although it introduces further dangers of self-selection. As part of a course, however, you just cannot assign students, on your own authority, to different techniques that might have an different effect on the core goal of the course: the learning experience.

So Professor Smith has a long way to go before he can run experiments that will convey a significant argument in favor of SOP.

Over the years I have seen, as a reader and sometimes as a referee, many Professor Smith papers: “empirical” evaluation of a technique by its own authors, using questionable techniques and not applying the necessary methodological precautions.

A first step is, whenever possible, to use experimenters who are from a completely different group from the developers of the ideas, as in two studies [4] [5] about the effectiveness of pair programming.

And yet! Sometimes no one else is available, and you do want to obtain objective empirical evidence about the merits of your own ideas. You are aware of the risk, and ready to face the cold reality, including if the results are unfavorable. Can you do it?

A recent attempt of ours seems to suggest that this is possible if you exert great care. It will presented in a paper at the next ESEM (Empirical Software Engineering and Measurement) and even though it discusses assessing some aspects of our own designs, using students, as part of the course project which counts for grading, and separating them into groups, we feel it was fair and ethical, and </modesty_filter_on>an ESEM referee wrote: “this is one of the best designed, conducted, and presented empirical studies I have read about recently”<modesty_filter_on>.

How did we proceed? How would you have proceeded? Think about it; feel free to express your ideas as comments to this post. In the next installment of this blog (The Professor Smith Syndrome: Part 3), I will describe our work, and let you be the judge.

References

[1] Bertrand Meyer: The rise of empirical software engineering (I): the good news, this blog, 30 July 2010, available here.
[2] Bertrand Meyer: The rise of empirical software engineering (II): what we are still missing, this blog, 31 July 2010, available here.

[3] On the Hawthorne effect, there is a good Wikipedia entry. Acknowledgment: I first heard about the Hawthorne effect from Barry Boehm.

[4] Jerzy R. Nawrocki, Michal Jasinski, Lukasz Olek and Barbara Lange: Pair Programming vs. Side-by-Side Programming, in EuroSPI 2005, pages 28-38. I do not have a URL for this article.

[5] Matthias Müller: Two controlled Experiments concerning the Comparison of Pair Programming to Peer Review, in  Journal of Systems and Software, vol. 78, no. 2, pages 166-179, November 2005; and Are Reviews an Alternative to Pair Programming ?, in  Journal of Empirical Software Engineering, vol. 9, no. 4, December 2004. I don’t have a URL for either version. I am grateful to Walter Tichy for directing me to this excellent article.

About Watts Humphrey

Watts Humphrey, 2007

At FOSE (see previous post [1]) we will honor the memory of Watts Humphrey, the pioneer of disciplined software engineering, who left us in October. A blog entry on my Communications of the ACM blog [2] briefly recalls some of Humphrey’s main contributions.

References

[1] The Future Of Software Engineering: previous entry of this blog.
[2] Watts Humphrey: In Honor of a Pioneer, in CACM blog.

The rise of empirical software engineering (I): the good news

 

RecycledIn the next few days I will post a few comments about a topic of particular relevance to the future of our field: empirical software engineering. I am starting by reposting two entries originally posted in the CACM blog. Here is the first. Let me use this opportunity to mention the LASER summer school [1] on this very topic — it is still possible to register.

Empirical software engineering papers, at places like ICSE (the International Conference on Software Engineering), used to be terrible.

There were exceptions, of course, most famously papers by Basili, Zelkowitz, Rombach, Tichy, Berry, Humphrey, Gilb, Boehm, Lehmann, Belady and a few others, who kept hectoring the community about the need to base our opinions and practices on evidence rather than belief. But outside of these cases the typical ICSE empirical paper — I sat through a number of them — was depressing: we made these measurements in our company, found these results, just believe us. A question here in the back? Can you reproduce our results? Access our code? We’d love you to, but unfortunately we work for a company — the Call for Papers said industry contributions were welcome, didn’t it? — and we can’t give you the details. So sorry. But trust us, we checked our results.

Actually, there was another kind of empirical paper, which did not suffer from such secrecy: the university study. Hi, I am professor Bright, the well-known author of the Bright method of software development. Everyone knows it’s the best, but we wanted to assess it scientifically through a rigorous empirical study. I gave the same programming problem to two groups of third-year undergraduates; one group was told to use the Bright method, the other not. Guess what? The Bright group performed 67.94% better! I see the session chair wanting to move to the next speaker; see the details in the paper.

For years, this was most of what we had: unverifiable industry reports and unconvincing student experiments.

And suddenly the scene has changed. Empirical software engineering studies are in full bloom; the papers are flowing, and many are good!

What triggered this radical change is the availability of open-source repositories. Projects such as Linux, Eclipse, Apache, EiffelStudio and many others have records going back 10, 15, sometimes 20 years. These records contain the true history of the project: commits (into the configuration management system), bug reports, bug fixes, test runs and their results, developers involved, and many more elements of project data. All of a sudden empirical research has what any empirical science needs: a large corpus of objects to analyze.

Open-source projects have given the decisive jolt, but now we can rely on industrial data as well: Microsoft and other companies have started making their own records selectively available to researchers. In the work of authors such as Zeller from Sarrebruck, Gall from Uni. Zurich or Nagappan from Microsoft, systematic statistical techniques yield answers, sometimes surprising, to questions on which we could only speculate. Do novices or experts cause more bugs? Does test coverage correlate with software quality, and if so, positively or negatively? Little by little, we are learning about the true properties of software products and processes, based not on fantasies but on quantitative analysis of meaningful samples.

The trend is unmistakable, and irreversible.

Not all is right yet; in the second installment of this post I will describe some of what still needs to be improved for empirical software engineering to achieve full scientific rigor.

Reference

[1] LASER summer school 2010, at http://se.ethz.ch/laser.

Another DOSE of distributed software development

The software world is not flat; it is multipolar. Gone are the days of one-site, one-team developments. The increasingly dominant model today is a distributed team; the place where the job gets done is the place where the appropriate people reside, even if it means that different parts of the job get done in different places.

This new setup, possibly the most important change to have affected the practice of software engineering in this early part of the millennium,  has received little attention in the literature; and even less in teaching techniques. I got interested in the topic several years ago, initially by looking at the phenomenon of outsourcing from a software engineering perspective [1]. At ETH, since 2004, Peter Kolb and I, aided by Martin Nordio and Roman Mitin, have taught a course on the topic [2], initially called “software engineering for outsourcing”. As far as I know it was the first course of its kind anywhere; not the first course about outsourcing, but the first to explore the software engineering implications, rather than business or political issues. We also teach an industry course on the same issues [3], attended since 2005 by several hundred participants, and started, with Mathai Joseph from Tata Consulting Services, the SEAFOOD conference [4], Software Engineering Advances For Outsourced and Offshore Development, whose fourth edition starts tomorrow in Saint Petersburg.

After a few sessions of the ETH course we realized that the most important property of the mode of software development explored in the course is not that it involves outsourcing but that it is distributed. In parallel I became directly involved with highly distributed development in the practice of Eiffel Software’s development. In 2007 we renamed the ETH course “Distributed and Outsourced Software Engineering” (DOSE) to acknowledge the broadened scope. The topic is still new; each year we learn a little more about what to teach and how to teach it.

The 2007 session saw another important addition. We felt it was no longer sufficient to talk about distributed development, but that students should practice it. Collaboration between groups in Zurich and other groups in Zurich was not good enough. So we contacted colleagues around the world interested in similar issues, and received an enthusiastic response. The DOSE project is itself distributed: teams from students in different universities collaborate in a single development. Typically, we have two or three geographically distributed locations in each project group. The participating universities have been Politecnico di Milano (where our colleagues Carlo Ghezzi and Elisabetta di Nitto have played a major role in the current version of the project), University of Nijny-Novgorod in Russia, University of Debrecen in Hungary, Hanoi University of Technology in Vietnam, Odessa National Polytechnic in the Ukraine and (across town for us) University of Zurich. For the first time in 2010 a university from the Western hemisphere will join: University of Rio Cuarto in Argentina.

We have extensively studied how the projects actually fare (see publications [4-8]). For students, the job is hard. Often, after a couple of weeks, many want to give up: they have trouble reaching their partner teams, understanding their accents on Skype calls, agreeing on modes of collaboration, finalizing APIs, devising a proper test plan. Yet they hang on and, in most cases, succeed. At the end of the course they tell us how much they have learned about software engineering. For example I know few better way of teaching the importance of carefully documented program interfaces — including contracts — than to ask the students to integrate their modules with code from another team halfway around the globe. This is exactly what happens in industrial software development, when you can no longer rely on informal contacts at the coffee machine or in the parking lot to smooth out misunderstandings: software engineering principles and techniques come in full swing. With DOSE, students learn and practice these fundamental techniques in the controlled environment of a university project.

An example project topic, used last year, was based on an idea by Martin Nordio. He pointed out that in most countries there are some card games played in that country only. The project was to program such a game, where the team in charge of the game logic (what would be the “business model” in an industrial project) had to explain enough of their country’s game, and abstractly enough, to enable the other team to produce the user interface, based on a common game engine started by Martin. It was tough, but some of the results were spectacular, and these are students who will not need more preaching on the importance of specifications.

We are currently preparing the next session of DOSE, in collaboration with our partner universities. The more the merrier: we’d love to have other universities participate, including from the US. Adding extra spice to the project, the topic will be chosen among those from the ICSE SCORE competition [9], so that winning students have the opportunity to attend ICSE in Hawaii. If you are teaching a suitable course, or can organize a student group that will fit, please read the project description [10] and contact me or one of the other organizers listed on the page. There is a DOSE of madness in the idea, but no one, teacher or student,  ever leaves the course bored.

References

[1] Bertrand Meyer: Offshore Development: The Unspoken Revolution in Software Engineering, in Computer (IEEE), January 2006, pages 124, 122-123. Available here.

[2] ETH course page: see here for last year’s session (description of Fall 2010 session will be added soon).

[3] Industry course page: see here for latest (June 2010( session (description of November 2010 session will be added soon).

[4] SEAFOOD 2010 home page.

[5] Bertrand Meyer and Marco Piccioni: The Allure and Risks of a Deployable Software Engineering Project: Experiences with Both Local and Distributed Development, in Proceedings of IEEE Conference on Software Engineering & Training (CSEE&T), Charleston (South Carolina), 14-17 April 2008, ed. H. Saiedian, pages 3-16. Preprint version  available online.

[6] Bertrand Meyer:  Design and Code Reviews in the Age of the Internet, in Communications of the ACM, vol. 51, no. 9, September 2008, pages 66-71. (Original version in Proceedings of SEAFOOD 2008 (Software Engineering Advances For Offshore and Outsourced Development,  Lecture Notes in Business Information Processing 16, Springer Verlag, 2009.) Available online.

[7] Martin Nordio, Roman Mitin, Bertrand Meyer, Carlo Ghezzi, Elisabetta Di Nitto and Giordano Tamburelli: The Role of Contracts in Distributed Development, in Proceedings of SEAFOOD 2009 (Software Engineering Advances For Offshore and Outsourced Development), Zurich, June-July 2009, Lecture Notes in Business Information Processing 35, Springer Verlag, 2009. Available online.

[8] Martin Nordio, Roman Mitin and Bertrand Meyer: Advanced Hands-on Training for Distributed and Outsourced Software Engineering, in ICSE 2010: Proceedings of 32th International Conference on Software Engineering, Cape Town, May 2010, IEEE Computer Society Press, 2010. Available online.

[9] ICSE SCORE 2011 competition home page.

[10] DOSE project course page.

The other impediment to software engineering research

In the decades since structured programming, many of the advances in software engineering have come out of non-university sources, mostly of four kinds:

  • Start-up technology companies  (who played a large role, for example, in the development of object technology).
  • Industrial research labs, starting with Xerox PARC and Bell Labs.
  • Independent (non-university-based) author-consultants. 
  • Independent programmer-innovators, who start open-source communities (and often start their own businesses after a while, joining the first category).

 Academic research has had its part, honorable but limited.

Why? In earlier posts [1] [2] I analyzed one major obstacle to software engineering research: the absence of any obligation of review after major software disasters. I will come back to that theme, because the irresponsible attitude of politicial authorities hinders progress by depriving researchers of some of their most important potential working examples. But for university researchers there is another impediment: the near-impossibility of developing serious software.

If you work in theory-oriented parts of computer science, the problem is less significant: as part of a PhD thesis or in preparation of a paper you can develop a software prototype that will support your research all the way to the defense or the publication, and can be left to wither gracefully afterwards. But software engineering studies issues that arise for large systems, where  “large” encompasses not only physical size but also project duration, number of users, number of changes. A software engineering researcher who only ever works on prototypes will be denied the opportunity to study the most significant and challenging problems of the field. The occasional consulting job is not a substitute for this hands-on experience of building and maintaining large software, which is, or should be, at the core of research in our field.

The bodies that fund research in other sciences understood this long ago for physics and chemistry with their huge labs, for mechanical engineering, for electrical engineering. But in computer science or any part of it (and software engineering is generally viewed as a subset of computer science) the idea that we would actually do something , rather than talk about someone else’s artifacts, is alien to the funding process.

The result is an absurd situation that blocks progress. Researchers in experimental physics or mechanical engineering employ technicians: often highly qualified personnel who help researchers set up experiments and process results. In software engineering the equivalent would be programmers, software engineers, testers, technical writers; in the environments that I have seen, getting financing for such positions from a research agency is impossible. If you have requested a programmer position as part of a successful grant request, you can be sure that this item will be the first to go. Researchers quickly understand the situation and learn not even to bother including such requests. (I have personally never seen a counter-example. If you have a different experience, I will be interested to learn who the enlightened agency is. )

The result of this attitude of funding bodies is a catastrophe for software engineering research: the only software we can produce, if we limit ourselves to official guidelines, is demo software. The meaningful products of software engineering (large, significant, usable and useful open-source software systems) are theoretically beyond our reach. Of course many of us work around the restrictions and do manage to produce working software, but only by spending considerable time away from research on programming and maintenance tasks that would be far more efficiently handled by specialized personnel.

The question indeed is efficiency. Software engineering researchers should program as part of their normal work:  only by writing programs and confronting the reality of software development can we hope to make relevant contributions. But in the same way that an experimental physicist is helped by professionals for the parts of experimental work that do not carry a research value, a software engineering researcher should not have to spend time on porting the software to other architectures, performing configuration management, upgrading to new releases of the operating system, adapting to new versions of the libraries, building standard user interfaces, and all the other tasks, largely devoid of research potential, that software-based innovation requires.

Until  research funding mechanisms integrate the practical needs of software engineering research, we will continue to be stymied in our efforts to produce a substantial effect on the quality of the world’s software.

References

[1] The one sure way to advance software engineering: this blog, see here.
[2] Dwelling on the point: this blog, see here.

Verification As a Matter Of Course

At the ACM Symposium on Applied Computing (SAC) in Sierre last week, I gave a talk entitled “How you will be programming in 10 years”, describing a number of efforts by various people, with a special emphasis on our work at both ETH and Eiffel Software, which I think point to the future of software development. Several people have asked me for the slides, so I am making them available [1].

It occurred to me after the talk that the slogan “Verification As a Matter Of Course” (VAMOC) characterizes the general idea well. The world needs verified software, but the software development community is reluctant  to use traditional heavy-duty verification techniques. While some of the excuses are unacceptable, others sources of resistance are justified and it is our job to make verification part of the very fabric of everyday software development.

My bet, and the basis of large part of both Eiffel and the ETH verification work, is that it is possible to bring verification to practicing developers as a natural, unobtrusive component of the software development process, through the tools they use.

The talk also broaches on concurrency, where many of the same ideas apply; CAMOC is the obvious next slogan.

Reference

[1] Slides of “How you will be programming in 10 years” talk (PDF).

Dwelling on the point

Once again, and we are not learning!

La Repubblica of last Thursday [1] and other Italian newspapers have reported on a “computer” error that temporarily brought thousands of accounts at the national postal service bank into the red. It is a software error, due to a misplacement of the decimal points in some transactions.

As usual the technical details are hazy; La Repubblica writes that:

Because of a software change that did not succeed, the computer system did not always read the decimal point during transactions”.

As a result, it could for example happen that a 15.00-euro withdrawal was understood as 1500 euros.
I have no idea what “reading the decimal point ” means. (There is no mention of OCR, and the affected transactions seem purely electronic.) Only some of the 12 million checking or “Postamat” accounts were affected; the article cites a number of customers who could not withdraw money from ATMs because the system wrongly treated their accounts as over-drawn. It says that this was the only damage and that the postal service will send a letter of apology. The account leaves many questions unanswered, for example whether the error could actually have favored some customers, by allowing them to withdraw money they did not have, and if so what will happen.

The most important unanswered question is the usual one: what was the software error? As usual, we will probably never know. The news items will soon be forgotten, the postal service will somehow fix its code, life will go on. Nothing will be learned; the next time around similar causes will produce similar effects.

I criticized this lackadaisical attitude in an earlier column [2] and have to hammer its conclusion again: any organization using public money should be required, when it encounters a significant software malfunction, to let experts investigate the incident in depth and report the results publicly. As long as we keep forgetting our errors we will keep repeating them. Where would airline safety be in the absence of thorough post-accident reports? That a software error did not kill anyone is not a reason to ignore it. Whether it is the Italian post messing up, a US agency’s space vehicle crashing on the moon or any other software fault causing systems to fail, it is not enough to fix the symptoms: we must have a professional report and draw the lessons for the future.

Reference

[1] Luisa Grion: Poste in tilt per una virgola — conti gonfiati, stop ai prelievi. In La Repubblica, 26 November 2009, page 18 of the print version. (At the time of writing it does not appear at repubblica.it,  but see  the TV segment also titled “Poste in tilt per una virgola” on Primocanale Web TV here, and other press articles e.g. in Il Tempo here.)

[2] On this blog: The one sure way to advance software engineering (post of 21 August 2009).

Specifying user interfaces

Many blogs including this one rely on the WordPress software. In previous states of the present page you may have noticed a small WordPress bug, which I find interesting.

“Tags” are a nifty WordPress feature. When you post a message, you can specify one or more informative “tags”. The tags of all messages appear in the right sidebar, each with a smaller or bigger font size depending on the number of messages that specified it. You can click such a tag in the sidebar and get, on the left, a page containing all the associated messages.

Now assume that many posts use a particular tag; in our example it is “Design by Contract”, not unexpected for this blog. Assume further that the tag name is long. It is indeed in this case: 18 characters. As a side note, no problem would arise if I used normal spaces in the name, which would then appear on two or three lines; precisely to avoid this  I use HTML “non-breaking spaces”. This is probably not in the WordPress spirit, but any other long tag without spaces would create the same problem. That problem is a garbled display:

dbc_overflows

The long tag overflows the bluish browser area assigned to tags, producing an ugly effect. This behavior is hard to defend: either the tag should have been rejected as too long when the poster specified it or it should fit in its zone, whether by truncation or by applying a smaller font.

I quickly found a workaround, not nice but good enough: make sure that some short tag  (such as “Hoare”) appears much more often than the trouble-making tag. Since font size indicates the relative frequency of tags, the long one will be scaled down to a smaller font which fits.

Minor as it is, this WordPress glitch raises some general questions. First, is it really a bug? Assume, by a wild stretch of imagination, that a jury had to resolve this question; it could easily find an expert to answer positively, by stating that the behavior does not satisfy reasonable user expectations, and another who notes that it is not buggy behavior since it does not appear to violate any expressly stated property of the specification. (At least I did not find in the WordPress documentation any mention of either the display size of tags or a limit on tag length; if I missed it please indicate the reference.)

Is it a serious matter? Not in this particular example; uncomely Web display does not kill.   But the distinction between “small matter of esthetics” and software fault can be tenuous. We may note in particular that the possibility for large data to overflow its assigned area is a fundamental source of security risks; and even pure user interface issues can become life-threatening in the case of critical applications such as air-traffic control.

Our second putative expert is right, however: no behavior is buggy unless it contradicts a specification. Where will the spec be in such an example? There are three possibilities, each with its limitations.

The first solution is to expect that in a carefully developed system every such property will have to be specified. This is conceivable, but hard, and the question arises of how to make sure nothing has been forgotten. Past  some threshold of criticality and effort, the only specifications that count are formal; there is not that much literature on specifying user interfaces formally, since much of the work on formal specifications has understandably concentrated on issues thought to be more critical.

Because of the tediousness of specifying such general properties again and again for each case, it might be better  — this is the second solution — to specify them globally, for an entire system, or for the user interfaces of an entire class of systems. Like any serious effort at specification, if it is worth doing, it is worth doing formally.

In either of these approaches the question remains of how we know we have specified everything of interest. This question, specification completeness, is not as hopeless as most people think; I plan to write an entry about it sometime (hint:  bing for “guttag horning”). Still, it is hard to be sure you did not miss anything relevant. Remember this the next time formal methods advocates — who should know better — tell you that with their techniques there “no longer is a need to test”, or when you read about the latest OS kernel that is “guaranteed correct and secure”. However important formal methods and proofs are, they can only guarantee satisfaction of the properties that the specifier has considered and stated. To paraphrase someone [1], I would venture that

Proofs can only show the absence of envisaged bugs, never rule out the presence of unimagined ones.

This is one of the reasons why tests will always, regardless of the progress of proofs, remain an indispensable part of the software development landscape [2]. Whatever you have specified and proved, you will still want to run the system (or, for certain classes of embedded software, some simulation of it) and see the results for yourself.

What then if we do not explicitly specify the desired property (as we did in the two approaches considered so far) but testing or actual operation still reveals some behavior that is clearly unsatisfactory? On what basis do we complain to the software’s producer? A solution here, the third in our list, might be to rely on generally accepted standards of professional development. This is common in other kinds of engineering: if you commission someone to build you a house, the contract may not explicitly state that the roof should not fall on your head while you are asleep; when this happens, you will still sue and accuse the builder of malpractice. Such remedies can work for software too, but the rules are murkier because we have not accepted, or even just codified, a set of general professional practices that would cover such details as “no display of information should overflow its assigned area”.

Until then I will remember to use one short tag a lot.

References

[1] Edsger W. Dijksra, Notes on Structured Programming, in Dahl, Dijkstra, Hoare, Structured Programming, Academic Press, 1972.

[2]  See Tests And Proofs (TAP) conference series, since 2007. The next conference, program-chaired by Angelo Gargantini and Gordon Fraser, will take place in the week of the TOOLS Federated Conferences in Málaga, Spain, in the week of June 28, 2010.