I am giving my “inaugural lecture” at ITMO in Saint Petersburg tomorrow (Thursday, 28 February 2013) at 14 (2 PM) local time, meaning e.g. 11 AM in Western Europe and 2 AM (ouch!) in California. See here for the announcement. The title is “Programming: Magic, Art, Routine or Science?“. The talk will be streamed live: see here.
Archive for February 2013
(Adapted from an article in the Communications of the ACM blog.)
I have become interested in agile methods because they are all the rage now in industry and, upon dispassionate examination, they appear to be a pretty amazing mix of good and bad ideas. I am finishing a book that tries to sort out the nuggets from the gravel .
An interesting example is the emphasis on developing a system by successive increments covering expanding slices of user functionality. This urge to deliver something that can actually be shown — “Are we shipping yet?” — is excellent. It is effective in focusing the work of a team, especially once the foundations of the software have been laid. But does it have to be the only way of working? Does it have to exclude the time-honored engineering practice of building the infrastructure first? After all, when a building gets constructed, it takes many months before any “user functionality” becomes visible.
In a typical exhortation , the Poppendiecks argue that:
The “right the first time“ approach may work for well-structured problems, but the “try-it, test-it, fix-it“ approach is usually the better approach for ill-structured problems.
Very strange. It is precisely ill-structured problems that require deeper analysis before you jump in into wrong architectural decisions which may require complete rework later on. Doing prototypes to try possible solutions can be a great way to evaluate potential solutions, but a prototype is an experiment, something quite different from an increment (an early version of the future system).
One of the problems with the agile literature is that its enthusiastic admonitions to renounce standard software engineering practices are largely based on triumphant anecdotes of successful projects. I am willing to believe all these anecdotes, but they are only anecdotes. In the present case systematic empirical evidence does not seem to support the agile view. Boehm and Turner  write:
Experience to date indicates that low-cost refactoring cannot be depended upon as projects scale up.
The only sources of empirical data we have encountered come from less-expert early adopters who found that even for small applications the percentage of refactoring and defect-correction effort increases with [the size of requirements].
They do not cite references here, and I am not aware of any empirical study that definitely answers the question. But their argument certainly fits my experience. In software as in engineering of any kind, experimenting with various solutions is good, but it is critical to engage in the appropriate Big Upfront Thinking to avoid starting out with the wrong decisions. Some of the worst project catastrophes I have seen were those in which the customer or manager was demanding to see something that worked right away — “it doesn’t matter if it’s not the whole thing, just demonstrate a piece of it!“ — and criticized the developers who worked on infrastructure that did not produce immediately visible results (in other words, were doing their job of responsible software professionals). The inevitable result: feel-good demos throughout the project, reassured customer, and nothing to deliver at the end because the difficult problems have been left to rot. System shelved and never to be heard of again.
When the basis has been devised right, perhaps with nothing much to show for months, then it becomes critical to insist on regular visible releases. Doing it prematurely is just sloppy engineering.
The problem here is extremism. Software engineering is a difficult balance between conflicting criteria. The agile literature’s criticism of teams that spend all their time on design or on foundations and never deliver any usable functionality is unfortunately justified. It does not mean that we have to fall into the other extreme and discard upfront thinking.
In the agile tradition of argument by anecdote, here is an extract from James Surowiecki’s “Financial Page” article in last month’s New Yorker. It’s not about software but about the current Boeing 787 “Dreamliner” debacle:
Determined to get the Dreamliners to customers quickly, Boeing built many of them while still waiting for the Federal Aviation Administration to certify the plane to fly; then it had to go back and retrofit the planes in line with the FAA’s requirements. “If the saying is check twice and build once, this was more like build twice and check once”, [an industry analyst] said to me. “With all the time and cost pressures, it was an alchemist’s recipe for trouble.”
(Actually, the result is “build twice and check twice”, or more, since every time you rebuild you must also recheck.) Does that ring a bell?
Erich Kästner’s ditty about reaching America, cited in a previous article , is once again the proper commentary here.
 Bertrand Meyer: Agile! The Good, the Hype and the Ugly, Springer, 2013, to appear.
 Mary and Tom Poppendieck: Lean Software Development — An Agile Toolkit, Addison-Wesley, 2003.
 Barry W. Boehm and Richard Turner: Balancing Agility with Discipline — A Guide for the Perplexed, Addison-Wesley, 2004. (Second citation slightly abridged.)
 James Surowiecki, in the New Yorker, 4 February 2013, available here.
 Hitting on America, article from this blog, 5 December 2012, available here.
(This article first appeared in the Communications of the ACM blog.)
The very concept of publication has changed, half of its traditional meaning having disappeared in hardly more than a decade. Or to put it differently (if you will accept the metaphor, explained below), how it has lost its duality: no longer particle, just wave.
Process and product
Some words ending with ation (atio in Latin) describe a change of state: restoration, dilatation. Others describe the state itself, or one of its artifacts: domination, fascination. And yet others play both roles: decoration can denote either the process of embellishing (she works in interior decoration), or an element of the resulting embellishment (Christmas tree decoration).
Since at least Gutenberg, publication has belonged to that last category: both process and artifact. A publication is an artifact, such as an article or a book accessible to a community of readers. We are referring to that view when we say “she has a long publication list or “Communications of the ACM is a prestigious publication.” But the word also denotes a process, built from the verb “publish” the same way “restoration” is built from “restore” and “insemination” from “inseminate”: the publication of her latest book took six months.
The thesis of this article is that the second view of publication will soon be gone, and its purpose is to discuss the consequences for scientists.
Let me restrict the scope: I am only discussing scientific publication, and more specifically the scientific article. The situation for books is less clear; for all the attraction of the Kindle and other tablets, the traditional paper book still has many advantages and it would be risky to talk about its demise. For the standard scholarly article, however, electronic media and the web are quickly destroying the traditional setup.
That was then . . .
Let us step back a bit to what publication, the process, was a couple of decades ago. When you wrote something, you could send it by post to your friends (Edsger Dijkstra famously turned this idea into his modus operandi, regularly xeroxing his “EWD” memos  to a few dozen people) , but if you wanted to make it known to the world you had to go through the intermediation of a PUBLISHER — the mere word was enough to overwhelm you with awe. That publisher, either a non-profit organization or a commercial house, was in charge not only of selecting papers for a conference or journal but of bringing the accepted ones to light. Once you got the paper accepted began a long and tedious process of preparing the text to the publisher’s specifications and correcting successive versions of “galley proofs.” That step could be painful for papers having to do with programming, since in the early days typesetters had no idea how to lay out code. A few months or a couple of years later, you received a package in the mail and proudly opened the journal or proceedings at the page where YOUR article appeared. You would also, usually for a fee, receive fifty or so separately printed (tirés à part) reprints of just your article, typeset the same way but more modestly bound. Ah, the discrete charm of 20-th century publication!
. . . and this is now
Cut to today. Publishers stopped long ago to do the typesetting for you. They impose the format, obligingly give you LaTex, Word or FrameMaker templates, and you take care of everything. We have moved to WYSIWYG publishing: the version you write is the version you submit through a site such as EasyChair or CyberChair and the version that, after correction, will be published. The middlemen have been cut out.
We moved to this system because technology made it possible, and also because of the irresistible lure, for publishers, of saving money (even if, in the long term, they may have removed some of the very reasons for their existence). The consequences of this change go, however, far beyond money.
To understand how fundamentally the stage has changed, let us go back for a moment to the old system. It has many advantages, but also limitations. Some are obvious, such as the amount of work required, involving several people, and the delay from paper completion to paper publication. But in my view the most significant drawback has to do with managing change. If after publication you find a mistake, you must convince the journal to include an erratum: a new mini-article, subject to the same process. That requirement is reasonable enough but the scheme does not support a significant mode of scientific writing: working repeatedly on a single article and progressively refining it. This is not the “LPU” (Least Publishable Unit) style of publishing, but a process of studying an important idea or research project and aiming towards the ideal paper about it by successive approximation. If six months after the original publication of an article you have learned more about the topic and how to present it, the publication strategy is not obvious: resubmit it and risk being accused of self-plagiarism; avoid repetition of basic elements, making the article harder to read independently; artificially increase differences. This conundrum is one of the legitimate sources of the LPU phenomenon: faced with the choice between freezing material and repeating it, people end up publishing it bit by bit.
Now back again to today. If you are a researcher, you want the world to know about your ideas as soon as they are in a clean form. Today you can do this easily: no need to photocopy page after page and lick postage stamps on envelopes the way Dijkstra did; just generate a PDF and put it on your Web page or (to help establish a record if a question of precedence later comes up) on ArXiv. Just to make sure no one misses the information, tweet about it and announce it on your Facebook and LinkedIn pages. Some authors do this once the paper has been accepted, but many start earlier, at the time of submission or even before. I should say here that not all disciplines allow such author behavior; in biology and medicine in particular publishers appear to limit authors’ rights to distribute their own texts. Computer scientists would not tolerate such restrictions, and publishers, whether nonprofit or commercial, largely leave us alone when we make our work available on the Web.
But we are talking about far more than copyright and permissions (in this article I am in any case staying out of these emotionally and politically charged issues, open access and the like, and concentrating on the effect of technology changes on the process of publishing and the publication culture). The very notion of publication has changed. The process part is gone; only the result remains, and that result can be an evolving product, not a frozen artifact.
Particle, or wave?
Another way to describe the difference is that a traditional publication, for example an article published in a journal, is like a particle: an identifiable material object. With the ease of modification, a publication becomes more like a wave, which allows an initial presentation to propagate to successively wider groups of readers:
Maybe you start with a blog entry, then you register the first version of the work as a technical report in your institution or on ArXiv, then you submit it to a workshop, then to a conference, then a version of record in a journal.
In the traditional world of publication each of these would have to be made sufficiently different to avoid the accusation of plagiarism. (There is some tolerance, for example a technical report is usually not considered prior publication, and it is common to submit an extended version of a conference paper to a journal — but the journal will require that you include enough new material, typically “at least 30%.”)
For people who like to polish their work repeatedly, that traditional model is increasingly hard to accept. If you find an error, or a better way to express something, or a complementary result, you just itch to make the change here and now. And you can. Not on a publisher’s site, but on your own, or on ArXiv. After all, one of the epochal contributions of computer technology, not heralded loudly enough, is, as I argued in another blog article , the ease with which we can change, extend and refine our creations, developing like a Beethoven and releasing like a Mozart.
The “publication as product” becomes an evolving product, available at every step as a snapshot of the current state. This does not mean that you can cover up your mistakes with impunity: archival sites use “diff” techniques to maintain a dated record of successive versions, so that in case of doubt, or of a dispute over precedence, one can assess beyond doubt who released what statement when. But you can make sure that at any time the current version is the one you like best. Often, it is better than the official version on the conference or journal site, which remains frozen forever in the form it had on the day of its release.
What then remains of “publication as process”? Not much; in the end, a mere drag-and-drop from the work folder to the publication folder.
Well, there is an aspect I have not mentioned yet.
Apart from its material side, now gone or soon to be gone, the traditional publication process has another role: what a recent article in this blog  called sanction. You want to publish your latest scientific article in Communications of the ACM not just because it will end up being printed and mailed, but because acceptance is a mark of recognition by experts. There is a whole gradation of prestige, well known to researchers in every particular field: conferences are better than workshops, some journals are as good as conferences or higher, some conferences are far more prestigious than others, and so on.
That sanction, that need for an independent stamp of approval, will remain (and, for academics, young academics in particular, is of ever growing importance). But now it can be completely separated from the publication process and largely separated (in computer science, where conferences are so important today) from the conference process.
Here then is what I think scientific publication will become. The researcher (the author) will largely be in control of his or her own text as it goes through the successive waves described above. A certified record will be available to verify that at time t the document d had the content c. Then at specific stages the author will submit the paper. Submit in the sense of appraisal and, if the appraisal is succesful, certification. The submission may be to a conference: you submit your paper for presentation at this year’s ICSE, POPL or SIGGRAPH. (At the recent Dagstuhl publication culture workshop, Nicolas Holzschuch mentioned that some graphics conferences accept for presentation work that has already been published; isn’t this scheme more reasonable than the currently dominant practice of conference-as-publication?) You may also submit your work, once it reaches full maturity, to a journal. Acceptance does not have to mean that any trees get cut, that any ink gets spread, or even that any bits get moved: it simply enables the journal’s site to point to the article, and your site to add this mark of recognition.
There may also be other forms of recognition, social-network or Trip Advisor style: the community gets to pitch in, comment and assess. Don’t laugh too soon. Sure, scientific publication has higher standards than Wikipedia, and will not let the wisdom of the crowds replace the judgment of experts. But sometimes you want to publish for communication, not sanction; especially if you have the privilege of no longer being trapped in the publish-or-perish race you may simply want to make your research known, and you have little patience for navigating the meanders of conventional publication, genuflecting to the publications of PC members, and following the idiosyncratic conventional structure of the chosen conference community. Then you just publish and let the world decide.
In most cases, of course, we do need the sanction, but there is no absolute reason it should be tied to the traditional structures of journal publication and conference participation. There will be resistance, if only because of the economic interests involved; some of what we know today will remain, albeit with a different focus: conferences, as a place where the best work of the moment is presented (independently of its publication); printed books, as noted; and printed journals that bring real added value in the form of high-quality printing, layout and copy editing (and might still insist that you put on their site a copy of your paper rather than, or in addition to, a reference to your working version).
The trend, however, is irresistible. Publication is no longer a process, it is a product, increasingly under the control of the authors. As a product it is no longer a defined particle but a wave, progressively improving as it reaches successive classes of readership, undergoes successive steps of refinement and receives, informally from the community and officially from more or less prestigious sources, successive stamps of approval.
 Dijkstra archive at the University of Texas at Austin, here.
 Bertrand Meyer: Computer Technology: Making Mozzies out of Betties, article on this blog, 2 August 2009, available here.
 Bertrand Meyer: Conferences: Publication, Communication, Sanction, article on this blog, 6 February 2013, available here.
A healthy discussion is taking place in the computer science community on our publication culture. It was spurred by Lance Fortnow’s 2009 article ; now Moshe Vardi has taken the lead to prepare a report on the topic, following a workshop in Dagstuhl in November . The present article and one that follows (“The Waves of Publication”) are intended as contributions to the debate.
One of the central issues is what to do with conferences. Fortnow had strong words for the computer science practice of using conferences as the selective publication venues, instead of relying on journals as traditional scientific disciplines do. The criticism is correct, but if we look at the problem from a practical perspective it is unlikely that top conferences will lose their role as certifiers of quality. This is not a scientific matter but one of power. People in charge of POPL or OOPSLA have decisive sway over the careers (one is tempted to say the lives) of academics, particularly young academics, and it is a rare situation in human affairs that people who have critical power voluntarily renounce it. Maybe the POPL committee will see the light: maybe starting in 2014 it will accept all reasonable papers somehow related to “principles of programming languages”, turn the event itself into a pleasant multi-track community affair where everyone in the field can network, and hand over the selection and stamp-of-approval job to a journal such as TOPLAS. Dream on; it is not going to happen.
We should not, however, remain stuck with the status quo and all its drawbacks. That situation is unsustainable. As a single illustration, consider the requirement, imposed by all conferences, that having a paper pass the refereeing process is not enough: you must also register. A couple of months before the conference, authors of accepted papers (at least, they thought their paper was accepted) receive a threatening email telling them that unless they register and pay their paper will not be published after all. Now assume an author, in a field where a conference is the top token of recognition, has his visa application rejected by the country of the conference — a not so uncommon situation — and does not register. (Maybe he does not mind paying the fee, but he does not want to lie by pretending he is going to attend whereas he knows he will not.) He has lost his opportunity for publication and perhaps severely harmed this career. What have such requirements to do with science?
To understand what can be done, we need to analyze the role of conferences. In an earlier article  I described four “modes and uses” of publication: Publication, Exam, Business and Ritual. From the organizers’ viewpoint, ignoring the Business and Ritual aspects although they do play a significant role, a conference has three roles: Publication, Communication and Sanction. The publication part corresponds to the proceedings of the conference, which makes articles available to the community at large, not just the conference attendees. The communication part only addresses the attendees: it includes the presentation of papers as well as all other interactions made possible by being present at a conference. The sanction part (corresponding to the “exam” part of the more general classification) is the role of a renowned conference as a stamp of approval for the best work of the moment.
What we should do is separate these roles. A conference can play all three roles, but it can also select two of them, or even just one. A well-established, prestigious conference will want to retain its sanctioning role: accepted papers get the stamp of approval. It will also remain an event, where people meet. And it may distribute proceedings. But the three roles can also be untied:
- Publication is the least critical, and can easily be removed from the other two, since everything will be available on the Web. In fact the very notion of proceedings is quickly becoming fuzzy: more and more conferences save money by not distributing printed proceedings to attendees, sometimes not printing any proceedings at all; and some even spare themselves the production of a proceedings-on-a-stick, putting the material on the Web instead. A conference may still decide to have its own proceedings, or it might outsource that part to a journal. Each conference will make these decisions based on its own culture, tradition, ambition and constraints. For authors, the decision does not particularly matter: what counts are the sanction, which is provided by the refereeing process, and the availability of their material to the world, which will be provided in any scenario (at least in computer science where we have, thankfully, the permission to put our papers on our own web sites, an acquired right that our colleagues from other disciplines do not all enjoy).
- Separating sanction from communication is a natural step. Acceptance and participation are two different things.
Conference organizers should not be concerned about lost revenue: most authors will still want to participate in the conference, and will get the funding since institutions are used to pay for travel to present accepted papers; some new participants might come, attracted by more interaction-oriented conference styles; and organizers can replace the requirement to register by a choice between registering and paying a publication fee.
Separating the three roles does not mean that any established conference renounces its sanctioning status, acquired through the hard work of building the conference’s reputation, often over decades. But everyone gets more flexibility. Several combinations are possible, such as:
- Sanction without communication or publication: papers are submitted for certification through peer-review, they are available on the Web anyway, and there is no need for a conference.
- Publication without sanction or communication: an author puts a paper on his web page or on a self-publication site such as ArXiv.
- Sanction and communication without publication: a traditional selective conference, which does not bother to produce proceedings.
- Communication without sanction: a working conference whose sole aim is to advance the field through presentations and discussions, and accepts any reasonable submission. It may be by invitation (a kind of advance sanction). It may have proceedings (publication) or not.
Once we understand that the three roles are not inextricably tied, the stage is clear for removal for some impediments to a more effective publication culture. Some, not all. The more general problem is the rapidly changing nature of scientific publication, what may be called the concentric waves of publication. That will be the topic of the next article.
 Lance Fortnow: Time for Computer Science to Grow Up, in Communications of the ACM, Vol. 52, no. 8, pages 33-35, 2009, available here.
 Dagstuhl: Perspectives Workshop: Publication Culture in Computing Research, see here.
 Bertrand Meyer: The Modes and Uses of Scientific Publication, article on this blog, 22 November 2011, see here.