Archive for the ‘Recycled’ Category.

The end of software engineering and the last methodologist

(Reposted from the CACM blog [*].)

Software engineering was never a popular subject. It started out as “programming methodology”, evoking the image of bearded middle-aged men telling you with a Dutch, Swiss-German or Oxford accent to repent and mend your ways. Consumed (to paraphrase Mark Twain) by the haunting fear that someone, somewhere, might actually enjoy coding.

That was long ago. With a few exceptions including one mentioned below, to the extent that anyone still studies programming methodology, it’s in the agile world, where the decisive argument is often “I always say…”. (Example from a consultant’s page:  “I always tell teams: `I’d like a [user] story to be small, to fit in one iteration but that isn’t always the way.’“) Dijkstra did appeal to gut feeling but he backed it through strong conceptual arguments.

The field of software engineering, of which programming methodology is today just a small part, has enormously expanded in both depth and width. Conferences such as ICSE and ESEC still attract a good crowd, the journals are buzzing, the researchers are as enthusiastic as ever about their work, but… am I the only one to sense frustration? It is not clear that anyone outside of the community is interested. The world seems to view software engineering as something that everyone in IT knows because we all develop software or manage people who develop software. In the 2017 survey of CS faculty hiring in the U.S., software engineering accounted, in top-100 Ph.D.-granting universities, for 3% of hires! (In schools that stop at the master’s level, the figure is 6%; not insignificant, but not impressive either given that these institutions largely train future software engineers.) From an academic career perspective, the place to go is obviously  “Artificial Intelligence, Data Mining, and Machine Learning”, which in those top-100 universities got 23% of hires.

Nothing against our AI colleagues; I always felt “AI winter” was an over-reaction [1], and they are entitled to their spring. Does it mean software engineering now has to go into a winter of its own? That is crazy. Software engineering is more important than ever. The recent Atlantic  “software apocalypse” article (stronger on problems than solutions) is just the latest alarm-sounding survey. Or, for just one recent example, see the satellite loss in Russia [2] (juicy quote, which you can use the next time you teach a class about the challenges of software testing: this revealed a hidden problem in the algorithm, which was not uncovered in decades of successful launches of the Soyuz-Frigate bundle).

Such cases, by the way, illustrate what I would call the software professor’s dilemma, much more interesting in my opinion than the bizarre ethical brain-teasers (you see what I mean, trolley levers and the like) on which people in philosophy departments spend their days: is it ethical for a professor of software engineering, every morning upon waking up, to go to cnn.com in the hope that a major software-induced disaster has occurred,  finally legitimizing the profession? The answer is simple: no, that is not ethical. Still, if you have witnessed the actual state of ordinary software development, it is scary to think about (although not to wish for) all the catastrophes-in-waiting that you suspect are lying out there just waiting for the right circumstances .

So yes, software engineering is more relevant than ever, and so is programming methodology. (Personal disclosure: I think of myself as the very model of a modern methodologist [3], without a beard or a Dutch accent, but trying to carry, on today’s IT scene, the torch of the seminal work of the 1970s and 80s.)

What counts, though, is not what the world needs; it is what the world believes it needs. The world does not seem to think it needs much software engineering. Even when software causes a catastrophe, we see headlines for a day or two, and then nothing. Radio silence. I have argued to the point of nausea, including at least four times in this blog (five now), for a simple rule that would require a public auditing of any such event; to quote myself: airline transportation did not become safer by accident but by accidents. Such admonitions fall on deaf ears. As another sign of waning interest, many people including me learned much of what they understand of software engineering through the ACM Risks Forum, long a unique source of technical information on software troubles. The Forum still thrives, and still occasionally reports about software engineering issues, but most of the traffic is about privacy and security (with a particular fondness for libertarian rants against any reasonable privacy rule that the EU passes). Important topics indeed, but where do we go for in-depth information about what goes wrong with software?

Yet another case in point is the evolution of programming languages. Language creation is abuzz again with all kinds of fancy new entrants. I can think of one example (TypeScript) in which the driving force is a software engineering goal: making Web programs safer, more scalable and more manageable by bringing some discipline into the JavaScript world. But that is the exception. The arguments for many of the new languages tend to be how clever they are and what expressive new constructs they introduce. Great. We need new ideas. They would be even more convincing if they addressed the old, boring problems of software engineering: correctness, robustness, extendibility, reusability.

None of this makes software engineering less important, or diminishes in the least the passion of those of us who have devoted our careers to the field. But it is time to don our coats and hats: winter is upon us.

Notes

[1] AI was my first love, thanks to Jean-Claude Simon at Polytechnique/Paris VI and John McCarthy at Stanford.

[2] Thanks to Nikolay Shilov for alerting me to this information. The text is in Russian but running it through a Web translation engine (maybe this link will work) will give the essentials.

[3] This time borrowing a phrase from James Noble.

[*] I am reposting these CACM blog articles rather than just putting a link, even though as a software engineer I do not like copy-paste. This is my practice so far, and it might change since it raises obvious criticism, but here are the reasons: (A) The audiences for the two blogs are, as experience shows, largely disjoint. (B) I like this site to contain a record of all my blog articles, regardless of what happens to other sites. (C) I can use my preferred style conventions.

VN:F [1.9.10_1130]
Rating: 9.9/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

Informatics: catch them early

 

Recycled[I occasionally post on the Communications of the ACM blog. It seems that there is little overlap between readers of that blog and the present one, so — much as I know, from software engineering, about the drawbacks of duplication — I will continue to repost articles here when relevant.]

Some call it computer science, others informatics, but they face the same question: when do we start teaching the subject? In many countries where high schools began to introduce it in the seventies, they actually retreated since then; sure, students are shown how to use word processors and spreadsheets, but that’s not the point.

Should we teach computer science in secondary (and primary) school? In a debate at SIGCSE a few years ago, Bruce Weide said in strong words that we should not: better give students the strong grounding in mathematics and especially logic that they will need to become good at programming and CS in general. I found the argument convincing: I teach first-semester introductory programming to 200 entering CS students every year, and since many have programming experience, of highly diverse nature but usually without much of a conceptual basis, I find myself unteaching a lot. In a simple world, high-school teachers would teach students to reason, and we would teach them to program. The world, however, is not simple. The arguments for introducing informatics earlier are piling up:

  • What about students who do not enter CS programs?
  • Many students will do some programming anyway. We might just as well teach them to do it properly, rather than let them develop bad practices and try to correct them at the university level.
  • Informatics is not just a technique but an original scientific discipline, with its own insights and paradigms (see [2] and, if I may include a self-citation, [3]). Its intellectual value is significant for all educated citizens, not just computer scientists.
  • Countries that want to be ahead of the race rather than just consumers of IT products need their population to understand the basic concepts, just as they want everyone, and not just future mathematicians, to master the basics of arithmetics, algebra and geometry.

These and other observations led Informatics Europe and ACM Europe, two years ago, to undertake the writing of a joint report, which has now appeared [1]. The report is concise and makes strong points, emphasizing in particular the need to distinguish education in informatics from a mere training in digital literacy (the mastery of basic IT tools, the Web etc.). The distinction is often lost on the general public and decision-makers (and we will surely have to emphasize it again and again).

The report proposes general principles for both kinds of programs, emphasizing in particular:

  • For digital literacy, the need to teach not just how-tos but also safe, ethical and effective use of IT resources and tools.
  • For informatics, the role of this discipline as a cross-specialty subject, like mathematics.

The last point is particularly important since we should make it clear that we are not just pushing (out of self-interest, as members of any discipline could) for schools to give our specialty a share, but that informatics is a key educational, scientific and economic resource for the citizens of any modern country.

The report is written from a European perspective, but the analysis and conclusions will, I think, be useful in any country.

It does not include any detailed curriculum recommendation, first because of the wide variety of educational contexts, but also because that next task is really work for another committee, which ACM Europe and Informatics Europe are in the process of setting up. The report also does not offer a magic solution to the key issue of bootstrapping the process — by finding teachers to make the courses possible, and courses to justify training the teachers — but points to successful experiences in various countries that show a way to break the deadlock.

The introduction of informatics as a full-fledged discipline in the K-12 curriculum is clearly where the winds of history are blowing. Just as the report was being finalized, the UK announced that it was making CS one of the choices of required scientific topics would become a topic in the secondary school exam on a par with traditional sciences. The French Academy of Sciences recently published its own report on the topic, and many other countries have similar recommendations in progress. The ACM/IE report is a major milestone which should provide a common basis for all these ongoing efforts.

References

[1] Informatics education: Europe Cannot Afford to Miss the Boat,  Report of the joint Informatics Europe & ACM Europe Working Group on Informatics Education,  April 2013,  available here.

[2] Jeannette Wing: Computational Thinking, in Communications of the ACM, vol. 49, no. 3, March 2006, pages 33-35, available here.

[3] Bertrand Meyer: Software Engineering in the Academy, in Computer (IEEE), vol. 34, no. 5, May 2001, pages 28-35, available here.

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

The Modes and Uses of Scientific Publication

p> 

Recycled(This article was initially published in the CACM blog.)
Publication is about helping the advancement of humankind. Of course.

Let us take this basis for granted and look at the other, possibly less glamorous aspects.

Publication has four modes: Publicity; Exam; Business; and Ritual.

1. Publication as Publicity

The first goal of publication is to tell the world that you have discovered something: “See how smart I am!” (and how much smarter than all the others out there!). In a world devoid of material constraints for science, or where the material constraints are handled separately, as in 19th-century German universities where professors were expected to fund their own labs, this would be the only mode and use of publication. Science today is a more complex edifice.

A good sign that Publication as Publicity is only one of the modes is that with today’s technology we could easily skip all the others. If all we cared about were to make our ideas and results known, we would simply put out our papers on ArXiv or just our own Web page. But almost no one stops there; researchers submit to conferences and journals, demonstrating how crucial the other three modes are to the modern culture of science.

2. Publication as Exam

Academic careers depend on a publication record. Actually this is not supposed to be the case; search and tenure committees are officially interested in “impact,” but any candidate is scared of showing a short publication list where competitors have tens or (commonly) hundreds of items.

We do not just publish; we want to be chosen for publication. Authors are proud of the low acceptance rates of conferences at which their papers have been accepted; in the past few years it has in fact become common practice, in publication lists attached to CVs, to list this percentage next to each accepted article. Acceptance rates are carefully tracked; see for example [2] for software engineering.

As Jeff Naughton has pointed out [1], this mode of working amounts to giving researchers the status of students forced to take exams again and again. Maybe that part is inevitable; the need to justify ourselves anew every morning may be an integral part of being a scientist, especially one funded by other people’s money. Two other consequences of this phenomenon are, I believe, more damaging.

The first risk directly affects the primary purpose of publication (remember the advancement of humankind?): a time-limited review process with low acceptance rates implies that some good papers get rejected and some flawed ones accepted. Everyone in software engineering knows (and recent PC chairs have admitted) that getting a paper accepted at the International Conference on Software Engineering is in part a lottery; with an acceptance rate hovering around 13%, this is inevitable. The mistakes occur both ways: papers accepted or even getting awards, then shown a few months later to be inaccurate; and innovative papers getting rejected because some sentence rubbed the referees the wrong way, or some paper was not cited. With a 4-month review cycle, and the next deadline coming several months later, the publication of a truly important result can be delayed significantly.

The second visible damage is publication inflation. Today’s research environment channels productive research teams towards an LPU (Least Publishable Unit) publication practice, causing an explosion of small contributions and the continuous decrease of the ratio of readers to writers. When submitting a paper I have always had, as my personal goal, to be read; but looking at the overall situation of computer science publication today suggests that this is not the dominant view: the overwhelming goal of publication is publication.

3. Publication as Business

Publishing requires an infrastructure, and money plays a role. Conferences in particular are a business. They have a budget to balance, not always an easy task, although a truly successful conference can be a big money-maker for its sponsor, commercial or non-profit. The financial side of conference publication has its consequences on authors: if you do not pay your fees, not only will you be unable to participate, but your paper will not be published.

One can deplore these practices, in particular their effect on authors from less well-endowed institutions, but they result from today’s computer science publication culture with its focus on the conference, what Lance Fortnow has called “A Journal in a Hotel”.

Sometimes the consequences border on the absurd. The ASE conference (Automated Software Engineering) accepts some contributions as “short papers”. Fair enough. At ASE 2009, “short paper” did not mean a shorter conference presentation but the permission to put up a poster and stand next to it for a while and answer passersby’s questions. For that privilege — and the real one: a publication in the conference volume — one had to register for the conference. ASE 2009 was in New Zealand, the other end of the world for a majority of authors. I ceded to the injunction: who was I to tell the PhD student whose work was the core of the submission, and who was so happy to have a paper accepted at a well-ranked conference, that he was not going to be published after all? But such practices are dubious. It would be more transparent to set up an explicit pay-for-play system, with page charges: at least the money would go to a scientific society or a university. Instead we ended up funding (in addition to the conference, which from what I heard was an excellent experience) airlines and hotels.

What makes such an example remarkable is that a reasonable justification exists for every one of its components: a highly selective refereeing process to maintain the value of the publication venue; limiting the number of papers selected for full presentation, to avoid a conference with multiple parallel tracks (and the all too frequent phenomenon of conference sessions whose audience consists of the three presenters plus the session chair); making sure that authors of published papers actually attend the event, so that it is a real conference with personal encounters, not just an opportunity to increment one’s publication count. The concrete result, however, is that authors of short papers have the impression of being ransomed without getting the opportunity to present their work in a serious way. Literally seconds as I was going to hit the “publish” button for the present article, an author of an accepted short paper for ASE 2012 (where the process appears similar) sent an email to complain, triggering a new discussion. We clearly need to find better solutions to resolve the conflicting criteria.

4. Publication as Ritual

Many of the seminal papers in science, including some of the most influential in computer science, defy classification and used a distinctive, one-of-a-kind style. Would they stand a chance in one of today’s highly ranked conferences, such as ICSE in software or VLDB in databases? It’s hard to guess. Each community has developed its own standard look-and-feel, so that after a while all papers start looking the same. They are like a classical mass with its Te Deum, Agnus Dei and Kyrie Eleison. (The “Te Deum” part is, in a conference submission, spread throughout the paper, in the form of adoring citations of the program committee members’ own divinely inspired articles, good for their H-indexes if they bless your own offering.)

All empirical software engineering papers, for example, have the obligatory “Threats to Validity” section, which is has developed into a true art form. The trick is the same as in the standard interview question “What can you say about your own deficiencies?”, to which every applicant know the key: describe a personality trait so that you superficially appear self-critical but in reality continue boasting, as in “sometimes I take my work too much to heart” [3]. The “Threats to Validity” section follows the same pattern: you try to think of all possible referee objections, the better to refute them.

Another part of the ritual is the “related work” section, treacherous because you have to make sure not to omit anything that a PC member finds important; also, you must walk a fine line between criticizing existing research too much, which could offend someone, or not enough, which enables the referee to say that you are not bringing anything significantly new. I often wonder who, besides the referees, reads those sections. But here too it is easier to lament than to fault the basic idea or propose better solutions. We do want to avoid wasting our time on papers whose authors are not aware of previous work. The related work section allows referees to perform this check. Its importance in the selection process has, however, grown out of proportion. It is one thing to make sure that a paper is state-of-the-art, but another to reject it (as often happens) because it fails to cite a particular contribution whose results would not directly affect its own. Here we move from the world of the rational to the world of the ritual. An extreme and funny recent example — funny to me, not necessarily to the coauthors — is a rejection from  APSEC 2011, the Australia-Pacific Software Engineering Conference, based on one review (the others were positive) that stated: “How novel is this? Are [there] not any cloud-based IDEs out there that have [a] similar awareness model integrated into their CM? This is something the related work [section] fails to describe precisely. [4] The ritual here becomes bizarre: as far as we know, no existing system discusses a similar model; the reviewer too does not know of any; but he blasts the paper all the same for not citing work that he thinks must have been done by someone, somehow, somewhere. APSEC is a fine conference — it has to be, from the totally unbiased criterion that it accepted another one of our submissions this year! — and this particular paper may or may not have been ready for publication; judge it for yourself [5]. Such examples suggest, however, that the ritual of computer science publication has its limits.

Publicity, Exam, Business, Ritual: to which one of the four modes of publication are you most attuned? Oh, sorry, I forgot: in your case, it is solely for the advancement of humankind.

References and notes

[1] Jeffrey F. Naughton, DBMS Research: First 50 Years, Next 50 Years, slides of keynote at 26th IEEE International Conference on Data Engineering, 2010, available at lazowska.cs.washington.edu/naughtonicde.pdf .

[2] Tao Xie, Software Engineering Conferences, at people.engr.ncsu.edu/txie/seconferences.htm .

[3] I once saw on French TV a hilarious interview of an entrepreneur who had started a software company in Vietnam, where job candidates just did not know “the code”, and moved on, in response to such a question, to tell the interviewer about being rude to their mother and all the other horrible things they had done in their lives.

[4] The words in brackets were not in the review but I added them for clarity.

[5] Martin Nordio, H.-Christian Estler, Carlo A. Furia and Bertrand Meyer: Collaborative Software Development on the Web, available at arxiv.org/abs/1105.0768 .

(This article was first published on the CACM blog in September 2011.)

VN:F [1.9.10_1130]
Rating: 9.3/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +9 (from 9 votes)

Long AND clear?

recycled-small (Originally a Risks forum posting, 1998.)

Although complaints about Microsoft Word’s eagerness to correct what it sees as mistakes are not new in the Risks forum, I think it is still useful to protest vehemently the way recent versions of Word promote the dumbing down of English writing by flagging (at least when you use their default options) any sentence that, according to some mysterious criterion, it deems too long, even if the sentence is made of several comma- or semicolon-separated clauses, and even though it is perfectly obvious to anyone, fan of Proust or not, that clarity is not a direct function of length, since it is just as easy to write obscurely with short sentences as with longish ones and, conversely, quite possible to produce an absolutely limpid sentence that is very, very long.

VN:F [1.9.10_1130]
Rating: 8.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)