From duplication to duplicity: a short history of the technology of knowledge transfer

1440:

Gutenberg discovers the secret of producing, out of one text, many books for the benefit of many people.

2007:

Von und Zu Guttenberg discovers the secret of producing, out of many texts, one book for the benefit of one person.

 

 

Email and its perils

Email is fast and convenient. It is also risky. Here are three common sources of incidents with email. They are not new, but they keep biting even the most experienced email users.

1. Risks of using Bcc

Alice writes to Bob. She wants Carol to know what she wrote, but she does not want Bob to know that she is keeping Carol informed. So she copies Carol in the form of a “Blind carbon copy” (Bcc). The Bcc mechanism is meant exactly for such situations: while Bcc recipients see the list of other recipients (To and Cc), these other recipients see no mention of Bcc recipients.

Now Carol sees the messages and responds to Alice. But to respond she uses, perhaps inadvertently, “Reply all”. The reply goes to both Alice and Bob. All of Alice’s efforts to keep mum about Carol’s involvement are lost!

The risk in this situation is that Alice has no way to control what Carol may do. At issue here is not a conscious effort by Carol to break confidentiality: in that case Alice could do nothing anyway as soon as she has sent Carol the information. The worrying possibility is that Carol may use “Reply all” by mistake.

Rule 1 (temporary): never use Bcc. Alice should send the message to Bob only, and forward a copy separately to Carol alone.

I start with this form of the rule because it is easy to remember and usually appropriate, but in one case it is too strong. Remove Bob from the picture. Then if Carol is a  person, it is pointless for Alice to use Bcc for her, rather than To (or Cc), since Carol knows everything there is to know. But now assume that Carol is actually the name of a mailing list. Members of the mailing list should know the originator, Alice, but they should not know about each other, or even about the name of the list. If these are the constraints, there is no risk in using Bcc. Hence the revised version of the rule:

Rule 1 (final): never use Bcc except for all recipients of a message.

Indeed what was wrong in the first example was not the use of Bcc as such, but the mix with To (or Cc). Bcc must be used only for either all the recipients of a message or none of them.

2. Risks of not using Bcc

Very recent case (today, actually) from the refereeing process of a prestigious computer science conference. The program chair sends to all authors of submitted papers,  say <submitters@famous_computer_science_conference.org>, a general message about the refereeing process. But he uses that address in the “To” or “Cc” field, not “bcc”.

One author, say Alice, has a question about the process and responds to the message, inadvertently using “Reply all”. Then:

  • Everyone knows that Alice has submitted a paper.
  • Many of the other authors are away and have “automatic reply” set up. So Alice herself now knows the names of quite a few other members of the community who have submitted papers!

famous_computer_science_conference.org” is the most important conference in its field, and essentially every researcher in that community submits at least one paper every year. Knowing who has submitted is confidential information. That information becomes interesting a few months later, when the conference program is published and you find out that Bob tried and was not accepted. All the more juicy information, of course, if Bob is a senior (and arrogant) researcher.

Rule 2: when a mailing list includes people who should not know who else is on the list, either set up the list so that only the administrators can post to it, or use Bcc (respecting Rule 1, in its final form).

3. The importance of not being Allison

Another common risk, which strikes all the time, is automatic address completion by email clients (Outlook, Thunderbird etc.). You type part of the name or address of a frequent correspondent, and the system completes the email address.  So convenient! Except when the completion is wrong and you do not check it.

I frequently receive email that was misaddressed because of unchecked completion. (The latest case was last week.) Here is an example of completion that was expected but did not happen. A few years ago, in an institution of which I was then a member, a high-level executive wanted to send to her secretary the results of a job candidate’s evaluation. Highly confidential stuff. The secretary was called Allison and had an address of the form <allison@our_department.com>. The executive was used to typing just all and relying on address completion. Somehow, that particular time the completion did not occur; perhaps she was not using her familiar email setup. As a result the message went to <all@our_department.com>, that is to say, everyone in the organization. The recipients were, of course, delighted to get the inside story on the candidate.

Rule 3:

  • Always check the recipient addresses visually and carefully.
  • Never hire a secretary called Allen, Allison or Allistair.

Speech synthesis technology

Rereading an article from last year’s August New Yorker, a discussion of e-paper devices and especially the Kindle by Nicholson Baker [1]:

Reading some of “Max,” a James Patterson novel, I experimented with the text-to-speech feature. The robo-reader had a polite, halting, Middle European intonation, like Tom Hanks in “The Terminal,” and it was sometimes confused by periods. Once it thought “miss.” was the abbreviation of a state name: “He loved the chase, the hunt, the split-second intersection of luck and skill that allowed him to exercise his perfection, his inability to Mississippi.” I turned the machine off.

Reference

Nicholson Baker: A New Page, in The New Yorker, August 3, 2009, pages 24-30, also available at http://www.newyorker.com/reporting/2009/08/03/090803fa_fact_baker?currentPage=all.

Another DOSE of distributed software development

The software world is not flat; it is multipolar. Gone are the days of one-site, one-team developments. The increasingly dominant model today is a distributed team; the place where the job gets done is the place where the appropriate people reside, even if it means that different parts of the job get done in different places.

This new setup, possibly the most important change to have affected the practice of software engineering in this early part of the millennium,  has received little attention in the literature; and even less in teaching techniques. I got interested in the topic several years ago, initially by looking at the phenomenon of outsourcing from a software engineering perspective [1]. At ETH, since 2004, Peter Kolb and I, aided by Martin Nordio and Roman Mitin, have taught a course on the topic [2], initially called “software engineering for outsourcing”. As far as I know it was the first course of its kind anywhere; not the first course about outsourcing, but the first to explore the software engineering implications, rather than business or political issues. We also teach an industry course on the same issues [3], attended since 2005 by several hundred participants, and started, with Mathai Joseph from Tata Consulting Services, the SEAFOOD conference [4], Software Engineering Advances For Outsourced and Offshore Development, whose fourth edition starts tomorrow in Saint Petersburg.

After a few sessions of the ETH course we realized that the most important property of the mode of software development explored in the course is not that it involves outsourcing but that it is distributed. In parallel I became directly involved with highly distributed development in the practice of Eiffel Software’s development. In 2007 we renamed the ETH course “Distributed and Outsourced Software Engineering” (DOSE) to acknowledge the broadened scope. The topic is still new; each year we learn a little more about what to teach and how to teach it.

The 2007 session saw another important addition. We felt it was no longer sufficient to talk about distributed development, but that students should practice it. Collaboration between groups in Zurich and other groups in Zurich was not good enough. So we contacted colleagues around the world interested in similar issues, and received an enthusiastic response. The DOSE project is itself distributed: teams from students in different universities collaborate in a single development. Typically, we have two or three geographically distributed locations in each project group. The participating universities have been Politecnico di Milano (where our colleagues Carlo Ghezzi and Elisabetta di Nitto have played a major role in the current version of the project), University of Nijny-Novgorod in Russia, University of Debrecen in Hungary, Hanoi University of Technology in Vietnam, Odessa National Polytechnic in the Ukraine and (across town for us) University of Zurich. For the first time in 2010 a university from the Western hemisphere will join: University of Rio Cuarto in Argentina.

We have extensively studied how the projects actually fare (see publications [4-8]). For students, the job is hard. Often, after a couple of weeks, many want to give up: they have trouble reaching their partner teams, understanding their accents on Skype calls, agreeing on modes of collaboration, finalizing APIs, devising a proper test plan. Yet they hang on and, in most cases, succeed. At the end of the course they tell us how much they have learned about software engineering. For example I know few better way of teaching the importance of carefully documented program interfaces — including contracts — than to ask the students to integrate their modules with code from another team halfway around the globe. This is exactly what happens in industrial software development, when you can no longer rely on informal contacts at the coffee machine or in the parking lot to smooth out misunderstandings: software engineering principles and techniques come in full swing. With DOSE, students learn and practice these fundamental techniques in the controlled environment of a university project.

An example project topic, used last year, was based on an idea by Martin Nordio. He pointed out that in most countries there are some card games played in that country only. The project was to program such a game, where the team in charge of the game logic (what would be the “business model” in an industrial project) had to explain enough of their country’s game, and abstractly enough, to enable the other team to produce the user interface, based on a common game engine started by Martin. It was tough, but some of the results were spectacular, and these are students who will not need more preaching on the importance of specifications.

We are currently preparing the next session of DOSE, in collaboration with our partner universities. The more the merrier: we’d love to have other universities participate, including from the US. Adding extra spice to the project, the topic will be chosen among those from the ICSE SCORE competition [9], so that winning students have the opportunity to attend ICSE in Hawaii. If you are teaching a suitable course, or can organize a student group that will fit, please read the project description [10] and contact me or one of the other organizers listed on the page. There is a DOSE of madness in the idea, but no one, teacher or student,  ever leaves the course bored.

References

[1] Bertrand Meyer: Offshore Development: The Unspoken Revolution in Software Engineering, in Computer (IEEE), January 2006, pages 124, 122-123. Available here.

[2] ETH course page: see here for last year’s session (description of Fall 2010 session will be added soon).

[3] Industry course page: see here for latest (June 2010( session (description of November 2010 session will be added soon).

[4] SEAFOOD 2010 home page.

[5] Bertrand Meyer and Marco Piccioni: The Allure and Risks of a Deployable Software Engineering Project: Experiences with Both Local and Distributed Development, in Proceedings of IEEE Conference on Software Engineering & Training (CSEE&T), Charleston (South Carolina), 14-17 April 2008, ed. H. Saiedian, pages 3-16. Preprint version  available online.

[6] Bertrand Meyer:  Design and Code Reviews in the Age of the Internet, in Communications of the ACM, vol. 51, no. 9, September 2008, pages 66-71. (Original version in Proceedings of SEAFOOD 2008 (Software Engineering Advances For Offshore and Outsourced Development,  Lecture Notes in Business Information Processing 16, Springer Verlag, 2009.) Available online.

[7] Martin Nordio, Roman Mitin, Bertrand Meyer, Carlo Ghezzi, Elisabetta Di Nitto and Giordano Tamburelli: The Role of Contracts in Distributed Development, in Proceedings of SEAFOOD 2009 (Software Engineering Advances For Offshore and Outsourced Development), Zurich, June-July 2009, Lecture Notes in Business Information Processing 35, Springer Verlag, 2009. Available online.

[8] Martin Nordio, Roman Mitin and Bertrand Meyer: Advanced Hands-on Training for Distributed and Outsourced Software Engineering, in ICSE 2010: Proceedings of 32th International Conference on Software Engineering, Cape Town, May 2010, IEEE Computer Society Press, 2010. Available online.

[9] ICSE SCORE 2011 competition home page.

[10] DOSE project course page.

Analyzing a software failure

More than once I have emphasized here [1] [2] the urgency of rules requiring systematic a posteriori analysis of software mishaps that have led to disasters. I have a feeling that many more posts will be necessary before the idea registers.

Some researchers are showing the way. In a June 2009 article [4], Tetsuo Tamai from the University of Tokyo published a fascinating dissection of the 2005 Mizuo Securities incident at the Tokyo Stock Exchange, where market havoc resulted from a software fault that prevented proper execution of the cancel command after an employee who wanted to sell one share at 610,000 yen mistakenly switched the two numbers.

I found out only recently about the article while browsing Dines Bjørner’s page and hitting on an unpublished paper [3] where Bjørner proposes a mathematical model for the trading rules. Tamai’s article deserves to be widely read.

References

[1] The one sure way to advance software engineering: this blog, see here.
[2] Dwelling on the point: this blog, see here.
[3] Dines Bjørner: The TSE Trading Rules, version 2, unpublished report, 22 February 2010, available online.
[4] Tetsuo Tamai: Social Impact of Information System Failures, in IEEE Computer, vol. 42, no. 6, June 2009, pages 58-65, available online (with registration); the article’s text is also included in [3].

PGT-PPP (Pretty Good Translation, Pretty Poor Privacy)

What’s in a URL? To someone who gains access to your computer, your browsing history will provide interesting information. More interesting, in some cases, than you might think.

Try Google Translate, for example. Say you want to translate “To be or not to be, that is the question” into the language of your choice. Go to http://translate.google.com, type your text, select the source language (actually you can skip this step, the tool will detect the language automatically) and the target language (that you will have to do, Google does not read into your mind yet). You get the translation; rather, a Pretty Good Translation, almost never quite right in my experience, but sufficient to give you a Pretty Good Idea. This is the result of modern work in computational linguistics, based on statistics and large-scale data mining rather than a traditional syntax-directed attempt at perfection.

Now look into the URL:

    http://translate.google.com/#auto|sv|To%20be%20or%20not%20to%20be%2C%20that%20is%20the%20question

Your text is encoded in it! This is true even for very long texts. Building up such complex URLs is one of the time-honored techniques for simulating state in the stateless HTTP protocol. But now anyone who sees your browsing history will know the precise texts that interested you enough to make you want to translate them. A Pretty Good Window on your personal interests! And not necessarily something that you want automatically archived.

Google Translate and other translation sites are great tools to facilitate our life, especially when dealing with languages we know superficially or not at all. But maybe there is a way to provide the service without opening such a large window on the detailed questions that occupy our minds?

Programming on the cloud?

I am blogging live from the “Cloud Futures” conference organized by Microsoft in Redmond [1]. We had two excellent keynotes today, by Ed Lazowska [1] and David Patterson.

Lazowska emphasized the emergence of a new kind of science — eScience — based on analysis of enormous amounts of data. His key point was that this approach is a radical departure from “computational science” as we know it, based mostly on large simulations. With the eScience paradigm, the challenge is to handle the zillions of bytes of data that are available, often through continuous streams, in such fields as astronomy, oceanography or biology. It is unthinkable in his view to process such data through super-computing architectures specific to an institution; the Cloud is the only solution. One of the reasons (developed more explicitly in Patterson’s talk) is that cloud computing supports scaling down as well as scaling up. If your site experiences sudden bursts of popularity — say you get slashdotted — followed by downturns, you just cannot size the hardware right.

Lazowska also noted that it is impossible to convince your average  university president that Cloud is the way to go, as he will get his advice from the science-by-simulation  types. I don’t know who the president is at U. of Washington, but I wonder if the comment would apply to Stanford?

The overall argument for cloud computing is compelling. Of course the history of IT is a succession of swings of the pendulum between centralization and delocalization: mainframes, minis, PCs, client-server, “thin clients”, “The Network Is The Computer” (Sun’s slogan in the late eighties), smart clients, Web services and so on. But this latest swing seems destined to define much of the direction of computing for a while.

Interestingly, no speaker so far has addressed issues of how to program reliably for the cloud, even though cloud computing seems only to add orders of magnitude to the classical opportunities for messing up. Eiffel and contracts have a major role to play here.

More generally the opportunity to improve quality should not be lost. There is a widespread feeling (I don’t know of any systematic studies) that a non-negligible share of results generated by computational science are just bogus, the product of old Fortran programs built by generations of graduate students with little understanding of software principles. At the very least, moving to cloud computing should encourage the use of 21-th century tools, languages and methods. Availability on the cloud should also enhance a critical property of good scientific research: reproducibility.

Software engineering is remarkably absent from the list of scientific application areas that speaker after speaker listed for cloud computing. Maybe software engineering researchers are timid, and do not think of themselves as deserving large computing resources; consider, however, all the potential applications, for example in program verification and empirical software engineering. The cloud is a big part of our own research in verification; in particular the automated testing paradigm pioneered by AutoTest [3] fits ideally with the cloud and we are actively working in this direction.

Lazowska mentioned that development environments are the ultimate application of cloud computing. Martin Nordio at ETH has developed, with the help of Le Minh Duc, a Master’s student at Hanoi University of Technology, a cloud-based version of EiffelStudio: CloudStudio, which I will present in my talk at the conference tomorrow. I’ll write more about it in later posts; just one note for the moment: no one should ever be forced again to update or commit.

References

[1] Program of the Cloud Futures conference.

[2] Keynote by Ed Lazowska. You can see his slides here.

[3] Bertrand Meyer, Arno Fiva, Ilinca Ciupa, Andreas Leitner, Yi Wei, Emmanuel Stapf: Programs That Test Themselves. IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009; online version here.

Dwelling on the point

Once again, and we are not learning!

La Repubblica of last Thursday [1] and other Italian newspapers have reported on a “computer” error that temporarily brought thousands of accounts at the national postal service bank into the red. It is a software error, due to a misplacement of the decimal points in some transactions.

As usual the technical details are hazy; La Repubblica writes that:

Because of a software change that did not succeed, the computer system did not always read the decimal point during transactions”.

As a result, it could for example happen that a 15.00-euro withdrawal was understood as 1500 euros.
I have no idea what “reading the decimal point ” means. (There is no mention of OCR, and the affected transactions seem purely electronic.) Only some of the 12 million checking or “Postamat” accounts were affected; the article cites a number of customers who could not withdraw money from ATMs because the system wrongly treated their accounts as over-drawn. It says that this was the only damage and that the postal service will send a letter of apology. The account leaves many questions unanswered, for example whether the error could actually have favored some customers, by allowing them to withdraw money they did not have, and if so what will happen.

The most important unanswered question is the usual one: what was the software error? As usual, we will probably never know. The news items will soon be forgotten, the postal service will somehow fix its code, life will go on. Nothing will be learned; the next time around similar causes will produce similar effects.

I criticized this lackadaisical attitude in an earlier column [2] and have to hammer its conclusion again: any organization using public money should be required, when it encounters a significant software malfunction, to let experts investigate the incident in depth and report the results publicly. As long as we keep forgetting our errors we will keep repeating them. Where would airline safety be in the absence of thorough post-accident reports? That a software error did not kill anyone is not a reason to ignore it. Whether it is the Italian post messing up, a US agency’s space vehicle crashing on the moon or any other software fault causing systems to fail, it is not enough to fix the symptoms: we must have a professional report and draw the lessons for the future.

Reference

[1] Luisa Grion: Poste in tilt per una virgola — conti gonfiati, stop ai prelievi. In La Repubblica, 26 November 2009, page 18 of the print version. (At the time of writing it does not appear at repubblica.it,  but see  the TV segment also titled “Poste in tilt per una virgola” on Primocanale Web TV here, and other press articles e.g. in Il Tempo here.)

[2] On this blog: The one sure way to advance software engineering (post of 21 August 2009).

Methods need theory

For someone in search of a software development method, the problem is not to find answers; it’s to find out how good the proposed answers are. We have lots of methods — every year brings its new harvest — but the poor practitioner is left wondering why last year’s recipe is not good enough after all, and why he or she has to embrace this year’s buzz instead. Anyone looking for serious conceptual arguments has to break through the hype and find the precious few jewels of applicable wisdom.

This is the start of an article that Ivar Jacobson and I just wrote for Dr. Dobb’s Journal;  it is available in the online edition [1] and will appear (as I understand) in the next paper edition. The article is a plea for a rational, science-based approach to software development methodology, and a call for others to join us in establishing a sound basis.

Reference

[1] Ivar Jacobson and Bertrand Meyer, Methods Need Theory, Dr. Dobb’s Journal, August 2009, available online.

Computer technology: making mozzies out of betties

Are you a Beethoven or a Mozart? If you’ll pardon the familarity, are you more of a betty or more of a mozzy? I am a betty. I am not referring to my musical abilities but to my writing style; actually, not the style of my writings (I haven’t completed any choral fantasies yet) but the style of my writing process. Mozart is famous for impeccable manuscripts; he could be writing in a stagecoach bumping its way through the Black Forest, on the kitchen table in the miserable lodgings of his second, ill-fated Paris trip, or in the antechamber of Archbishop Colloredo — no matter: the score comes out immaculate, not reflecting any of the doubts, hesitations and remorse that torment mere mortals. 

 Mozart

Beethoven’s music, note-perfect in its final form, came out of a very different process. Manuscripts show notes overwritten, lines struck out in rage, pages torn apart. He wrote and rewrote and gave up and tried again and despaired and came back until he got it the way it had to be.

Beethoven

How I sympathize! I seldom get things right the first time, and when I had to use a pen and paper I  almost never could produce a clean result; there always was one last detail to change. As soon as I could, I got my hands on typewriters, which removed the effects of ugly handwriting, but did not solve the problem of second thoughts followed by third thoughts and many more. Only with computers did it become possible to work sensibly. Even with a primitive text editor, the ability to try out ideas then correct and correct and correct is a profound change of the creation process. Once you have become used to the electronic medium, using a pen and paper seems as awkard and insufferable as, for someone accustomed to driving a car, being forced to travel in an oxen cart.

This liberating effect, the ability to work on your creations as a sculptor kneading an infinitely malleable material, is one of the greatest contributions of computer technology. Here we are talking about text, but the effect is just as profound on other media, as any architect or graphic artist will testify.

The electronic medium does not just give us more convenience; it changes the nature of writing (or composing, or designing). With paper, for example, there is a great practical difference between introducing new material at the end of the existing text,  which is easy, and inserting it at some unforeseen position, which is cumbersome and sometimes impossible. With computerized tools, it doesn’t matter. The change of medium changes the writing process and ultimately the writing: with paper the author ends up censoring himself to avoid practically painful revisions; with software tools, you work in whatever order suits you.

Technical texts, with their numbered sections and subsections, are another illustration of the change: with a text processor you do not need to come up with the full plan first, in an effort to avoid tedious renumbering later. You will use such a top-down scheme if it fits your natural way of working, but you can use any other  one you like, and renumber the existing sections at the press of a key. And just think of the pain it must have been to produce an index in the old days: add a page (or, worse, a paragraph, since it moves the following ones in different ways) and you would  have to recheck every single entry.

Recent Web tools have taken this evolution one step further, by letting several people revise a text collaboratively and concurrently (and, thanks to the marvels of  longest-common-subsequence algorithms and the resulting diff tools, retreat to an earlier version if in our enthusiasm to change our design we messed it up) . Wikis and Google Docs are the most impressive examples of these new techniques for collective revision.

Whether used by a single writer or in a collaborative development, computer tools have changed the very process of creation by freeing us from the tyranny of physical media and driving to zero the logistic cost of  one or a million changes of mind. For the betties among us, not blessed with an inborn ability to start at A, smoothly continue step by step, and end at Z, this is a life-changer. We can start where we like, continue where we like, and cover up our mistakes when we discover them. It does not matter how messy the process is, how many virtual pages we tore away, how much scribbling it took to bring a paragraph to a state that we like: to the rest of the world, we can present a result as pristine as the manuscript of a Mozart concerto.

These advances are not appreciated enough; more importantly, we do not take take enough advantage of them. It is striking, for example, to see that blogs and other Web pages too often remain riddled with typos and easily repairable mistakes. This is undoubtedly because the power of computer technology tempts us to produce ever more documents and in the euphoria to neglect the old ones. But just as importantly that technology empowers  us to go back and improve. The old schoolmaster’s advice — revise and revise again [1] — can no longer  be dismissed as an invitation to fruitless perfectionism; it is right, it is fun to apply, and at long last it is feasible.

Reference

 

[1] “Vingt fois sur le métier remettez votre ouvrage” (Twenty times back to the loom shall you bring your design), Nicolas Boileau