The Future Of Software Engineering

In case you haven’t heard about it yet, let me point you to FOSE, the Future Of Software Engineering [1] symposium in Zurich next week, organized by Sebastian Nanz. It is all made of invited talks; it is hard to think (with the possible exception of the pioneers’ conference [2]) of any previous gathering of so many software engineering innovators:

  • Barry Boehm
  • Manfred Broy
  • Patrick Cousot
  • Erich Gamma
  • Yuri Gurevich
  • Michael Jackson
  • Rustan Leino
  • David Parnas
  • Dieter Rombach
  • Joseph Sifakis
  • Niklaus Wirth
  • Pamela Zave
  • Andreas Zeller

The symposium is over two days. It is followed by a special event on “Eiffel at 25” which, as the rest of FOSE, is resolutely forward-looking, presenting a number of talks on current Eiffel developments, particularly in the areas of verification integrated in the development cycle (see “Verification As A Matter Of Course” [3]) and concurrent programming.

References

[1] Future Of Software Engineering (FOSE): symposium home page.
[2] Broy and Denert, editors: Software Pioneers, Springer, 2002. See publisher’s page.
[3] Verification As a Matter Of Course (VAMOC): an earlier entry of this blog.

Every bilingual dictionary should be a Galois connection

A Galois connection (for anyone not familiar with the concept, the Wikipedia entry is decent)  between two partially ordered sets consists of two total functions f: AB and g: B → A such that for  all a: A and b: B

(f (a)  ≤  b)        (g  (b)  ≥  a)

The simplest and most common example uses powersets and inclusion: for some sets X and Y, A is ℙ (X), the set of subsets of X, and B is (Y); the ≤ order relation is simply ⊆, inclusion between subsets. So the condition is that for arbitrary subsets a and b of X and Y:

(f (a)  ⊆  b)        (g  (b)  ⊇  a)

Pictorially:

A Galois connection between powersets
A Galois connection between powersets

(Instead of starting with total functions f and g between  ℙ (X) and ℙ (Y) you may also use possibly partial functions f’ : A -|-> B and g’ : B -|-> A, and use for f and g the associated image functions, which are total.)

Now you might think that this post continues with abstract interpretation or some such topic, but what I really want to talk about is dictionaries. Bilingual dictionaries. You need them if you are learning a language, and they would seem to be the ideal application for computers, including shirt-pocket computers (more commonly known as smartphones). Hyperlinking frees us from the tyranny of page turning and makes dictionary browsing an exciting and entirely new experience: you can type partial words and see them completed, make mistakes and see them corrected, discover a new word and see it memorized into the interactive equivalent of flashcards. If in the definition of a word you see another that catches your attention, in either the source or the target language, you can click it and see its own definition. You can travel back and forth, retain your browsing history, and test yourself repeatedly.

Unfortunately, what I have described is only the theory. Current electronic bilingual dictionaries — at least those I tried, but I tried quite a few, involving a variety of languages — fall short of this ideal. In addition, they are typically of rather bad linguistic quality as compared to their print competitors.

An example of a seemingly fundamental requirement that every bilingual dictionary should satisfy (and that dictionaries on the market fail to meet), is that the relationship it defines between two languages must  be a Galois connection, both ways. If you are looking for the translation of a word or group of words a in X, and obtain a set b of equivalents in Y, then it is pretty hard to justify that when you go back the translations for b do not include a!

I have yet, however, to find a Galois dictionary. As an example among hundreds that I encountered in recent months, take the Pons ($30) French-German dictionary. As the names suggest f will be the function yielding the French translation of a set of German words and g  the German translation of a set of French words. Now g {(“approximation“)}  includes “Näherungswert“; but then f ({“Näherungswert“}) only lists “Valeur approchée“!

The Galois requirement is not just a matter of principle; it makes the dictionary useful for native speakers of either language. If, as here,  g  (b)  ⊆  a but (f (a    b), and a includes  the most common words in Y for the concepts at hand, the native Y speaker may find the right translation (Näherungswert is indeed pretty good for approximation in the mathematical usage of this word),  but the native speaker of X will be misled. Indeed valeur approchée is not  the best term for the concept of mathematical approximation in French.

More generally, the reader who is trying to master both of the dictionary’s languages will be cheated. Such a reader wants to use the dictionary not just to get quick translations (there’s Google and Bing Translate for that), but to gain deep insights into the languages and their correspondence. How can one learn without the ability to check translations back and forth?

I wrote to Pons to report this problem (and others). To their great credit they took the trouble to answer my message in detail; but here is their tack on the issue:

As far as the choice of headwords for the source and the target language is concerned, PONS always is doing this choice for each language volume separatedly as we the dictionaries are made for special target groups and in different sizes we have to make a choice of words and this is done with regard to the importance a word has in each language – source and target language – and not by simply changing source and target language. This special elaboration of headword lists for each language can imply that a word which can be found in one volume of the dictionary is not necessarily part of the other volume.

I am not sure I understand what this means, but I am much too kind to wish upon dictionary authors, if they do not fix their systems, the sad fate of Évariste Galois.

Email and its perils

Email is fast and convenient. It is also risky. Here are three common sources of incidents with email. They are not new, but they keep biting even the most experienced email users.

1. Risks of using Bcc

Alice writes to Bob. She wants Carol to know what she wrote, but she does not want Bob to know that she is keeping Carol informed. So she copies Carol in the form of a “Blind carbon copy” (Bcc). The Bcc mechanism is meant exactly for such situations: while Bcc recipients see the list of other recipients (To and Cc), these other recipients see no mention of Bcc recipients.

Now Carol sees the messages and responds to Alice. But to respond she uses, perhaps inadvertently, “Reply all”. The reply goes to both Alice and Bob. All of Alice’s efforts to keep mum about Carol’s involvement are lost!

The risk in this situation is that Alice has no way to control what Carol may do. At issue here is not a conscious effort by Carol to break confidentiality: in that case Alice could do nothing anyway as soon as she has sent Carol the information. The worrying possibility is that Carol may use “Reply all” by mistake.

Rule 1 (temporary): never use Bcc. Alice should send the message to Bob only, and forward a copy separately to Carol alone.

I start with this form of the rule because it is easy to remember and usually appropriate, but in one case it is too strong. Remove Bob from the picture. Then if Carol is a  person, it is pointless for Alice to use Bcc for her, rather than To (or Cc), since Carol knows everything there is to know. But now assume that Carol is actually the name of a mailing list. Members of the mailing list should know the originator, Alice, but they should not know about each other, or even about the name of the list. If these are the constraints, there is no risk in using Bcc. Hence the revised version of the rule:

Rule 1 (final): never use Bcc except for all recipients of a message.

Indeed what was wrong in the first example was not the use of Bcc as such, but the mix with To (or Cc). Bcc must be used only for either all the recipients of a message or none of them.

2. Risks of not using Bcc

Very recent case (today, actually) from the refereeing process of a prestigious computer science conference. The program chair sends to all authors of submitted papers,  say <submitters@famous_computer_science_conference.org>, a general message about the refereeing process. But he uses that address in the “To” or “Cc” field, not “bcc”.

One author, say Alice, has a question about the process and responds to the message, inadvertently using “Reply all”. Then:

  • Everyone knows that Alice has submitted a paper.
  • Many of the other authors are away and have “automatic reply” set up. So Alice herself now knows the names of quite a few other members of the community who have submitted papers!

famous_computer_science_conference.org” is the most important conference in its field, and essentially every researcher in that community submits at least one paper every year. Knowing who has submitted is confidential information. That information becomes interesting a few months later, when the conference program is published and you find out that Bob tried and was not accepted. All the more juicy information, of course, if Bob is a senior (and arrogant) researcher.

Rule 2: when a mailing list includes people who should not know who else is on the list, either set up the list so that only the administrators can post to it, or use Bcc (respecting Rule 1, in its final form).

3. The importance of not being Allison

Another common risk, which strikes all the time, is automatic address completion by email clients (Outlook, Thunderbird etc.). You type part of the name or address of a frequent correspondent, and the system completes the email address.  So convenient! Except when the completion is wrong and you do not check it.

I frequently receive email that was misaddressed because of unchecked completion. (The latest case was last week.) Here is an example of completion that was expected but did not happen. A few years ago, in an institution of which I was then a member, a high-level executive wanted to send to her secretary the results of a job candidate’s evaluation. Highly confidential stuff. The secretary was called Allison and had an address of the form <allison@our_department.com>. The executive was used to typing just all and relying on address completion. Somehow, that particular time the completion did not occur; perhaps she was not using her familiar email setup. As a result the message went to <all@our_department.com>, that is to say, everyone in the organization. The recipients were, of course, delighted to get the inside story on the candidate.

Rule 3:

  • Always check the recipient addresses visually and carefully.
  • Never hire a secretary called Allen, Allison or Allistair.

The cloud and its risks

There is so much fervor around cloud computing that no one seems to note the downsides. Here is a little cautionary story.

Like many people, I have come to rely increasingly, for collaborative work, on Google Docs. As a text editor it is a kind of primitive Word. But it is web-based, so that several people can share a document. They will not just have read access, but can all write into the document, even at the same time. The system is indeed pretty good at incorporating changes by different users, as long as they do not affect the exact same place in the document. It also uses a modern approach to version management, originally popularized by wikis: a built-in history mechanism saves your successive revisions, enabling you to go back to any previous version. That mechanism is not ideal (the level of granularity is too small, so that it takes a long time to locate an earlier version), but it does provide safety.

In principle, indeed, safety is one of the great advantages of a cloud-based approach: you don’t have — so the mantra goes — to worry about backups, or even about saving your document; you don’t even need any local storage; everything is recorded on the server. If you make a mistake, or simply want to recover some piece of your text that you had discarded, just go back far enough in the history. “Control-S” (save the document) still exists, presumably for the sake of users born in the twentieth century, but it is just redundant.  With the cloud, we are told, you no longer need to save. Just write your stuff, the theory goes: the infrastructure is taking care of remembering your steps.

Thus indeed does the theory go. Reality does not always follow theory. For example, the reality  will not follow the theory if the implementation has bugs.

One beautiful evening some weeks ago I was busy polishing a technical note, and enjoying the Google Docs diagramming facilities that I had just discovered — basic, but good enough to prepare figures to illustrate a technical document. In fact the core of my document was a complex technical diagram, which I had spent several hours to develop. Then I prepared another, much simpler diagram, basically a couple of rectangles and an arrow between them. At that point I wanted to make sure my valuable efforts were not lost and, somewhat instinctively (I was born in the 20th century) I saved; if I had not, the system would have done it for me anyway. Under my very eyes, the page redisplayed, with the figures — there were 5 or 6 of them altogether — all turned into an identical one, the trivial little diagram I had entered last. After a moment of panic I realized that the history would be there, so I could at least go back to a recent version with the appropriate figures; relax, this is the cloud, the server is keeping my history for me! No such luck, though: all the figures in all the earlier versions had been overwritten in the same way. Gone forever.

At that point I realized I must have hit a major bug. Of course I am a programmer and could more or less guess what kind of bug this could be: a reference assignment instead of a clone, or maybe, in Eiffel terms, a clone rather than a deep_clone. (As far as I know Google is not using Eiffel; I’ll let the reader decide whether to jump from correlation to causation.) As to the history, any decent implementation stores “diffs”  rather than full copies, so if an object has changed but it is still at the same place the reference is the same: there is no “diffs” to store.

Guessing the programming error provided little consolation: my brilliant diagram was lost for humankind. Just to check that the bug was real I tried a couple of times to modify one of the figures again, in a small way; sure enough, all figures in the document immediately redisplayed to an identical version, the one I had just produced. The software was obviously broken.

I decided not to take any more risks and recreated the figures in a Word document, then prepared screen shots and included them in the Google Doc. It was rather painful to redo everything but at least I knew that I would have to redo it just once. In the process I did not forget to type Control-S every once in a while, with the same feeling of warmth and safety as if revisiting a treasured childhood home, carefully  maintained by the grandparents away from the fads and bustle of the big city.

I do not know if there is customer support for Google Docs; if there is, it is not obvious. In any case it is a free service, so one has little ground to complain. I happen, however, to have friends in high places, and through them was able in just a few hours to reach the Google engineers in charge. They reacted very promptly, confirmed it was a bug,  and corrected it. They were very kind and valiantly tried to recover my figures; but at least they had corrected the problem, sparing many other users from an experience that, indeed, I do not recommend.

While preparing this note I had some further contacts with Google engineers, who commented:

At Google we use logging, on disk snapshots, replicated storage, tape backups, and other systems to deliver massively redundant data protection. In the rare cases of bugs like this one, where user data is impacted, we continue to have a consistent and impressive track-record in successfully recovering that data.”

There is an uplifting moral to the story: the bug was fixed in a matter of hours. Most Google Docs users probably did not notice it. A bug of similar severity, in a traditional product that gets released and sent out to users, would have required a new release and a new download. On the cloud it is enough to fix the server copy.

Other lessons, however, are less encouraging.

First, one may surmise that the very process of an official release in a traditional product mode, and the resulting heightened impact of bugs, lead to a more careful Quality Assurance process than the  “forever beta” culture of cloud-based deployment. I have no evidence that my bug story was due to an unsatisfactory QA process. It may just be a one-time blip. But the temptation definitely exists in a cloud-based project to lower one’s guard because of the expectation, conscious or not, that any error will be found by some user and promptly corrected for all users.

An even more scary observation is that on the cloud you are trusting everything to a provider and its software, correct or not, robust or not, secure or not. The recurring leaks of customer information from  Web sites are a constant reminder of that risk. In the text processing example, any user of a text editor or document processing knows that, if he saves his work once in a while, possible damage if the tool crashes or misbehaves is circumscribed: at worst, you will lose the changes since the last save. When working on the cloud, you typically do not make local copies: if a tool messes up and loses the record, you have lost everything. It is gone for good.

In technology we have hype, buzzwords, fads, and successful advances. These are not necessarily disjoint categories, but often successive steps in the life of  a new ideas.   What characterizes the transition from a fad to a successful technology is that in the latter case one knows clearly both the advantages and the drawbacks. Cloud computing is advanced enough to reach that stage.

Speech synthesis technology

Rereading an article from last year’s August New Yorker, a discussion of e-paper devices and especially the Kindle by Nicholson Baker [1]:

Reading some of “Max,” a James Patterson novel, I experimented with the text-to-speech feature. The robo-reader had a polite, halting, Middle European intonation, like Tom Hanks in “The Terminal,” and it was sometimes confused by periods. Once it thought “miss.” was the abbreviation of a state name: “He loved the chase, the hunt, the split-second intersection of luck and skill that allowed him to exercise his perfection, his inability to Mississippi.” I turned the machine off.

Reference

Nicholson Baker: A New Page, in The New Yorker, August 3, 2009, pages 24-30, also available at http://www.newyorker.com/reporting/2009/08/03/090803fa_fact_baker?currentPage=all.

The rise of empirical software engineering (II): what we are still missing

p> 

Recycled(This article was initially published in the CACM blog.)

The previous post under  the heading of empirical software engineering hailed the remarkable recent progress of this field, made possible in particular by the availability of large-scale open-source repositories and by the opening up of some commercial code bases.

Has the empirical side of software engineering become a full member of empirical sciences? One component of the experimental method is still not quite there: reproducibility. It is essential to the soundness of natural sciences; when you publish a result there, the expectation is that others will be able to replicate it. Perhaps such duplication does not happen as often and physicists and biologists would have us believe, but it does happen, and the mere possibility that someone could check your results (and make a name for himself, especially if you are famous, by disproving them) keeps experimenters on their toes. 

If we had the same norms in empirical software engineering, empirical papers would all contain a clause such as

Hampi’s source code and documentation, experimental data, and additional results are available at http://people.csail.mit.edu/akiezun/hampi

This example is, in fact, a real quote, from a paper [1] at the 2009 ISSTA conference. It shows exactly what we expect for an experimental software engineering publication: below are my results, if you want to rerun the experiments here is the URL where you will find the code (source and binary) and the data.

Unfortunately, such professionalism is the exception rather than the rule. I performed a quick check — entirely informal, as this is a blog post, not an empirical research paper! — in the ISSTA ’09 proceedings. ISSTA, an ACM conference is a good sample point, since it covers testing (plus other approaches to program analysis) and almost every paper has an  “experiment” section. I found only a very small number that, like the one cited above, give explicit reproducibility information. (Disclosure: one of those papers is ours [2].)

I believe that the situation will change dramatically and that in a few years it will be impossible to submit an empirical paper without including such information. Computer science, or at least some areas of software engineering, should actually consider themselves privileged when it comes to allowing reproducibility: all that we have to do to reproduce a result, in testing for example, is to run a program. That is easier than for a zoologist — wishing to reproduce a colleague’s experiment precisely — to gather in his lab the appropriate number of flies, chimpanzees or killer whales.

In some types of empirical software research, such as the assessment of process models or design techniques, reproducing an experiment’s setup is harder than when all you have to do is to rerun a program. But regardless of the area we must develop a true  culture of reproducibility. It is not yet there. I have personally come to take experimental results with a grain of salt; not that I particulary suspect foul play, but I simply know how easy it is, in the absence of external validation, to make a mistake in the experiments and, unwittingly, publish a paper with wrong results.

Developing a culture of reproducibility also has an effect on the refereeing process. In submitting papers with precise instructions to reproduce our results, we have sometimes remarked that referees never contact us. I hope this means they always succeed; I suspect, however, that in many cases they just do not try. If you think further about the implications, providing reproducibility instructions for a submitted paper is scary: after all a software run may fail to run for marginal reasons, such as the wrong hardware configuration or a misunderstanding of the instructions. You do not want to perform all the extra work (of making your results reproducible) just to have the paper summarily rejected because the referee is running Windows 95. Ideally, then, referees should have the possibility to ask technical questions — but anonymously, since this is the way most refereeing works. Conferences and journals generally do not support such a process.

These obstacles are implementation issues, however, and will go away. What matters for the growth of the discipline is that it needs, like experimental sciences before it, to embrace a true culture of reproducibility.

References

[1] Adam Kieun, Vijay Ganesh, Philip J. Guo, Pieter Hooimeijer, Michael D. Ernst: HAMPI: A Solver for String Constraints, Proceedings of the 2009 ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’09), July 19-23, 2009, Chicago.

[2] Nadia Polikarpova, Ilinca Ciupa  and Bertrand Meyer: A Comparative Study of Programmer-Written and Automatically Inferred Contracts, Proceedings of the 2009 ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’09), July 19-23, 2009, Chicago.

The rise of empirical software engineering (I): the good news

 

RecycledIn the next few days I will post a few comments about a topic of particular relevance to the future of our field: empirical software engineering. I am starting by reposting two entries originally posted in the CACM blog. Here is the first. Let me use this opportunity to mention the LASER summer school [1] on this very topic — it is still possible to register.

Empirical software engineering papers, at places like ICSE (the International Conference on Software Engineering), used to be terrible.

There were exceptions, of course, most famously papers by Basili, Zelkowitz, Rombach, Tichy, Berry, Humphrey, Gilb, Boehm, Lehmann, Belady and a few others, who kept hectoring the community about the need to base our opinions and practices on evidence rather than belief. But outside of these cases the typical ICSE empirical paper — I sat through a number of them — was depressing: we made these measurements in our company, found these results, just believe us. A question here in the back? Can you reproduce our results? Access our code? We’d love you to, but unfortunately we work for a company — the Call for Papers said industry contributions were welcome, didn’t it? — and we can’t give you the details. So sorry. But trust us, we checked our results.

Actually, there was another kind of empirical paper, which did not suffer from such secrecy: the university study. Hi, I am professor Bright, the well-known author of the Bright method of software development. Everyone knows it’s the best, but we wanted to assess it scientifically through a rigorous empirical study. I gave the same programming problem to two groups of third-year undergraduates; one group was told to use the Bright method, the other not. Guess what? The Bright group performed 67.94% better! I see the session chair wanting to move to the next speaker; see the details in the paper.

For years, this was most of what we had: unverifiable industry reports and unconvincing student experiments.

And suddenly the scene has changed. Empirical software engineering studies are in full bloom; the papers are flowing, and many are good!

What triggered this radical change is the availability of open-source repositories. Projects such as Linux, Eclipse, Apache, EiffelStudio and many others have records going back 10, 15, sometimes 20 years. These records contain the true history of the project: commits (into the configuration management system), bug reports, bug fixes, test runs and their results, developers involved, and many more elements of project data. All of a sudden empirical research has what any empirical science needs: a large corpus of objects to analyze.

Open-source projects have given the decisive jolt, but now we can rely on industrial data as well: Microsoft and other companies have started making their own records selectively available to researchers. In the work of authors such as Zeller from Sarrebruck, Gall from Uni. Zurich or Nagappan from Microsoft, systematic statistical techniques yield answers, sometimes surprising, to questions on which we could only speculate. Do novices or experts cause more bugs? Does test coverage correlate with software quality, and if so, positively or negatively? Little by little, we are learning about the true properties of software products and processes, based not on fantasies but on quantitative analysis of meaningful samples.

The trend is unmistakable, and irreversible.

Not all is right yet; in the second installment of this post I will describe some of what still needs to be improved for empirical software engineering to achieve full scientific rigor.

Reference

[1] LASER summer school 2010, at http://se.ethz.ch/laser.

Another DOSE of distributed software development

The software world is not flat; it is multipolar. Gone are the days of one-site, one-team developments. The increasingly dominant model today is a distributed team; the place where the job gets done is the place where the appropriate people reside, even if it means that different parts of the job get done in different places.

This new setup, possibly the most important change to have affected the practice of software engineering in this early part of the millennium,  has received little attention in the literature; and even less in teaching techniques. I got interested in the topic several years ago, initially by looking at the phenomenon of outsourcing from a software engineering perspective [1]. At ETH, since 2004, Peter Kolb and I, aided by Martin Nordio and Roman Mitin, have taught a course on the topic [2], initially called “software engineering for outsourcing”. As far as I know it was the first course of its kind anywhere; not the first course about outsourcing, but the first to explore the software engineering implications, rather than business or political issues. We also teach an industry course on the same issues [3], attended since 2005 by several hundred participants, and started, with Mathai Joseph from Tata Consulting Services, the SEAFOOD conference [4], Software Engineering Advances For Outsourced and Offshore Development, whose fourth edition starts tomorrow in Saint Petersburg.

After a few sessions of the ETH course we realized that the most important property of the mode of software development explored in the course is not that it involves outsourcing but that it is distributed. In parallel I became directly involved with highly distributed development in the practice of Eiffel Software’s development. In 2007 we renamed the ETH course “Distributed and Outsourced Software Engineering” (DOSE) to acknowledge the broadened scope. The topic is still new; each year we learn a little more about what to teach and how to teach it.

The 2007 session saw another important addition. We felt it was no longer sufficient to talk about distributed development, but that students should practice it. Collaboration between groups in Zurich and other groups in Zurich was not good enough. So we contacted colleagues around the world interested in similar issues, and received an enthusiastic response. The DOSE project is itself distributed: teams from students in different universities collaborate in a single development. Typically, we have two or three geographically distributed locations in each project group. The participating universities have been Politecnico di Milano (where our colleagues Carlo Ghezzi and Elisabetta di Nitto have played a major role in the current version of the project), University of Nijny-Novgorod in Russia, University of Debrecen in Hungary, Hanoi University of Technology in Vietnam, Odessa National Polytechnic in the Ukraine and (across town for us) University of Zurich. For the first time in 2010 a university from the Western hemisphere will join: University of Rio Cuarto in Argentina.

We have extensively studied how the projects actually fare (see publications [4-8]). For students, the job is hard. Often, after a couple of weeks, many want to give up: they have trouble reaching their partner teams, understanding their accents on Skype calls, agreeing on modes of collaboration, finalizing APIs, devising a proper test plan. Yet they hang on and, in most cases, succeed. At the end of the course they tell us how much they have learned about software engineering. For example I know few better way of teaching the importance of carefully documented program interfaces — including contracts — than to ask the students to integrate their modules with code from another team halfway around the globe. This is exactly what happens in industrial software development, when you can no longer rely on informal contacts at the coffee machine or in the parking lot to smooth out misunderstandings: software engineering principles and techniques come in full swing. With DOSE, students learn and practice these fundamental techniques in the controlled environment of a university project.

An example project topic, used last year, was based on an idea by Martin Nordio. He pointed out that in most countries there are some card games played in that country only. The project was to program such a game, where the team in charge of the game logic (what would be the “business model” in an industrial project) had to explain enough of their country’s game, and abstractly enough, to enable the other team to produce the user interface, based on a common game engine started by Martin. It was tough, but some of the results were spectacular, and these are students who will not need more preaching on the importance of specifications.

We are currently preparing the next session of DOSE, in collaboration with our partner universities. The more the merrier: we’d love to have other universities participate, including from the US. Adding extra spice to the project, the topic will be chosen among those from the ICSE SCORE competition [9], so that winning students have the opportunity to attend ICSE in Hawaii. If you are teaching a suitable course, or can organize a student group that will fit, please read the project description [10] and contact me or one of the other organizers listed on the page. There is a DOSE of madness in the idea, but no one, teacher or student,  ever leaves the course bored.

References

[1] Bertrand Meyer: Offshore Development: The Unspoken Revolution in Software Engineering, in Computer (IEEE), January 2006, pages 124, 122-123. Available here.

[2] ETH course page: see here for last year’s session (description of Fall 2010 session will be added soon).

[3] Industry course page: see here for latest (June 2010( session (description of November 2010 session will be added soon).

[4] SEAFOOD 2010 home page.

[5] Bertrand Meyer and Marco Piccioni: The Allure and Risks of a Deployable Software Engineering Project: Experiences with Both Local and Distributed Development, in Proceedings of IEEE Conference on Software Engineering & Training (CSEE&T), Charleston (South Carolina), 14-17 April 2008, ed. H. Saiedian, pages 3-16. Preprint version  available online.

[6] Bertrand Meyer:  Design and Code Reviews in the Age of the Internet, in Communications of the ACM, vol. 51, no. 9, September 2008, pages 66-71. (Original version in Proceedings of SEAFOOD 2008 (Software Engineering Advances For Offshore and Outsourced Development,  Lecture Notes in Business Information Processing 16, Springer Verlag, 2009.) Available online.

[7] Martin Nordio, Roman Mitin, Bertrand Meyer, Carlo Ghezzi, Elisabetta Di Nitto and Giordano Tamburelli: The Role of Contracts in Distributed Development, in Proceedings of SEAFOOD 2009 (Software Engineering Advances For Offshore and Outsourced Development), Zurich, June-July 2009, Lecture Notes in Business Information Processing 35, Springer Verlag, 2009. Available online.

[8] Martin Nordio, Roman Mitin and Bertrand Meyer: Advanced Hands-on Training for Distributed and Outsourced Software Engineering, in ICSE 2010: Proceedings of 32th International Conference on Software Engineering, Cape Town, May 2010, IEEE Computer Society Press, 2010. Available online.

[9] ICSE SCORE 2011 competition home page.

[10] DOSE project course page.

Analyzing a software failure

More than once I have emphasized here [1] [2] the urgency of rules requiring systematic a posteriori analysis of software mishaps that have led to disasters. I have a feeling that many more posts will be necessary before the idea registers.

Some researchers are showing the way. In a June 2009 article [4], Tetsuo Tamai from the University of Tokyo published a fascinating dissection of the 2005 Mizuo Securities incident at the Tokyo Stock Exchange, where market havoc resulted from a software fault that prevented proper execution of the cancel command after an employee who wanted to sell one share at 610,000 yen mistakenly switched the two numbers.

I found out only recently about the article while browsing Dines Bjørner’s page and hitting on an unpublished paper [3] where Bjørner proposes a mathematical model for the trading rules. Tamai’s article deserves to be widely read.

References

[1] The one sure way to advance software engineering: this blog, see here.
[2] Dwelling on the point: this blog, see here.
[3] Dines Bjørner: The TSE Trading Rules, version 2, unpublished report, 22 February 2010, available online.
[4] Tetsuo Tamai: Social Impact of Information System Failures, in IEEE Computer, vol. 42, no. 6, June 2009, pages 58-65, available online (with registration); the article’s text is also included in [3].