Archive for the ‘General technology’ Category.

Software for Robotics: 2016 LASER summer school, 10-18 September, Elba

The 2016 session of the LASER summer school, now in its 13th edition, has just been announced. The theme is new for the school, and timely: software for robotics. Below is the announcement.

School site: here

The 2016 LASER summer school will be devoted to Software for Robotics. It takes place from 10 to 18 September in the magnificent setting of the Hotel del Golfo in Procchio, Elba Island, Italy.

Robotics is progressing at an amazing pace, bringing improvements to almost areas of human activity. Today’s robotics systems rely ever more fundamentally on complex software, raising difficult issues. The LASER 2016 summer school both covers the current state of robotics software technology and open problems. The lecturers are top international experts with both theoretical contributions and major practical achievements in developing robotics systems.
The LASER school is intended for professionals from the industry (engineers and managers) as well as university researchers, including PhD students. Participants learn about the most important software technology advances from the pioneers in the field. The school’s focus is applied, although theory is welcome to establish solid foundations. The format of the school favors extensive interaction between participants and speakers.
The speakers include:

  • Joydeep Biswas, University of Massachussetts, on Development, debugging, and maintenance of deployed robots
  • Davide Brugali, University of Bergamo, on Managing software variability in robotic control systems
  • Nenad Medvidovic, University of Southern California, on Software Architectures of Robotics Systems
  • Bertrand Meyer, Politecnico di Milano and Innopolis University, with Jiwon Shin, on Concurrent Object-Oriented Robotics Software: Concepts, Framework and Applications
  • Issa Nesnas, NASA Jet Propulsion Laboratory, on Experiences from robotic software development for research and planetary flight robots
  • Richard Vaughan, Simon Fraser University

Organized by Politecnico di Milano, the school takes place at the magnificent Hotel del Golfo (http://www.hoteldelgolfo.it/) in Golfo di Procchio, Elba. Along with an intensive scientific program, participants will have time to enjoy the natural and cultural riches of this history-laden jewel of the Mediterranean.

For more information about the school, the speakers and registration see here.

.

— Bertrand Meyer

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Design by Contract: ACM Webinar this Thursday

A third ACM webinar this year (after two on agile methods): I will be providing a general introduction to Design by Contract. The date is this coming Thursday, September 17, and the time is noon New York (18 Paris/Zurich, 17 London, 9 Los Angeles, see here for hours elsewhere). Please tune in! The event is free but requires registration here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

How to learn languages

Most people in technology, trade, research or education work in an international environment and need to use a foreign language which they learned at some earlier stage [1]. It is striking to see how awfully most of us perform. International conferences are a particular pain; many speakers are impossible to understand. You just want to go home and read the paper — or, often, not.

Teachers — English teachers in the case of the most commonly used international language — often get the blame, as in “They teach us Shakespeare’s English instead of what we need for today’s life“, but such complaints are  unfounded: look at any contemporary language textbook and you will see that it is all about some Svetlanas or Ulriches or Natsukos meeting or tweeting their friends Cathy and Bill.

It is true, though, that everyone teaches languages the wrong way.

There is only one way to teach languages right: start with the phonetics. Languages were spoken [2] millennia before they ever got written down. The basis of all natural languages is vocal. If you do not pronounce a language right you do not speak that language. It is unconscionable for example that most of us non-native speakers, when using English, still have an accent. We should have got rid of its last traces by age 12.

I cannot understand why people who are otherwise at the vanguard of intellectual achievement make a mess of their verbal expression, seemingly not even realizing there might be a problem. Some mistakes seem to be handed out from generation to generation. Most French speakers of English, for example, pronounce the “ow” of “allow” as in “low”, not “cow” (it took a long time before a compassionate colleague finally rid me of that particular mistake, and I don’t know how many more I may still be making); Italians seem to have a particular fondness for pronouncing as a “v” the “w” of “write”; an so on.

The only place I ever saw that taught languages right was a Soviet school for interpreters. Graduating students of French, having had no exposure to the language before their studies, spoke it like someone coming out of a Métro station. (Actually they spoke more like a grandmother coming out of the Métro, since they had little access to contemporary materials, but that would have been easy to fix.) The trick: they spent their entire first year doing phonetics, getting the “r”and the “u” and so on right, shedding the intonation of their native tongue. That year was entirely devoted to audio practice in a phonetics lab. At the end of it they did not know the meaning of what they were saying,but they said it perfectly. Then came a year of grammar, then a year of conversation. Then came the Métro result. (This is not an apology of the Soviet Union. Someone there just happened to get that particular thing right.)

We should teach everyone this way. There is no reason to tolerate phonetic deviations. If you do not get the sounds exactly as they should be, everything else will be flawed. Take, for example, the “r”. If, like me, you cannot roll your “r”s, then when you try to speak Russian or Italian, even if you think you can get the other sounds right you don’t because  your tongue or palate or teeth are in the wrong place. Another example is the “th” sound in English (two distinct sounds in fact) which I never got right. I can fake it but then something else comes out wrong and I still sound foreign. My high-school teachers — to whom I owe gratitude for so much else — should have tortured me until my “th”s were perfect. True, teaching time is a fixed-pie problem, but I am sure something else could have been sacrificed. Since, for example, I can answer in a blink that seven times nine is sixty-three, I must at some stage have taken the time to memorize it. In retrospect I would gladly sacrifice that element of knowledge, which I can reconstruct when needed, for the ability to roll my “r”s.

Age is indeed critical. While we humans can learn anything at any time, it is a well-known fact (although the reasons behind it remain mysterious) that until puberty we are malleable and can learn languages perfectly.  Witness bilingual and trilingual children; they do not have any accent. But around the time we develop new abilities and desires our brain shuts itself off to that particular skill; from then on we can only learn languages at great pain, with only remote hopes of reaching the proficiency of natives. The time to learn the phonetics of a foreign language, and learn it perfectly, is around the age of nine or ten at the latest. Then, at the age of reason, we should learn the structures — the grammar. Declensions in German, the use of tenses in English. Conversation — Svetlana greeting Cathy — can come later if there is time left. Once you have the basics wired into your head, the rest is trivial.

Focusing children on phonetics as the crucial part of learning a language will also help them shine. Like physical appearance, verbal clarity is an enormous advantage. I must not be the only one in conferences who pays far more attention to the content of an article if the speaker has impeccable pronunciation, innate or learned. Syntax and choice of words come next. Of course substance matters; we have all heard top scientists with accents thicker than a Humvee tire and grammar thinner than a summer dress.  Everyone else needs fluency.

Conceivably, someone might object that a year of phonetic drilling is not the most amusing pastime for a 10-year-old. Without even noting that it’s not worse than having to learn to play the violin — where did we ever get the idea that learning should be fun?

As to me, like those who before they die want to get into space, visit the capitals of all countries on earth or reach the top of Mount Everest, I have my dream; it has lesser impact on the environment and depends on me, not on the help of others: just once, I’d like to roll an “r” like a Polish plumber.

Notes

[1] Many native English speakers provide the exception to this observation, since they often do not learn any foreign language beyond “Buon giorno” and “melanzane alla parmigiana”, and hence will probably not see the point of this note.

[2] And motioned. Sign language as practiced by deaf people (informally before it was codified starting from the 17th century on) is also a potential teaching start.

VN:F [1.9.10_1130]
Rating: 9.8/10 (14 votes cast)
VN:F [1.9.10_1130]
Rating: +8 (from 8 votes)

When pictures lie

 

One of the most improvable characteristics of scientific papers is the graphical presentation of numerical data. It is sad to see that thirty years after Tufte published the first edition of his masterpiece [1] many authors are still including grossly inaccurate graphics. Sadder still when the authors are professional graphists, who should know better. Take this chart [2] from the last newsletter of Migros, Switzerland’s largest supermarket chain. To convince Swiss people that they should not worry about their food bills, it displays the ratio of food expenses to revenue in various countries. There would be many good ways to represent this information graphically, but someone thought it clever to draw variable-size coins of the respective currencies. According to the text, “the bigger the circle, the larger the income’s share devoted to food“.

Just a minor problem: the visual effect is utterly misleading. Taking three examples from the numbers given, the ratio is roughly 10% for Switzerland, 30% for Russia, 40% for Morocco. And, sure enough,  compared to the Swiss coin in the figure, the Russian coin is about three times bigger and the Moroccan coin four times… in diameter! What the eye sees, of course, is the area. Since the area varies as the square of the diameter, one gets the impression that Russians spend nine times, not three, and Moroccans sixteen times, not four, as much as the Swiss.

To convey the correct suggestion, the diameter of the Russian coin should have been about 73% larger than the Swiss coin’s diameter (the square root of three is about 1.73) , and the diameter of the Moroccan coin twice larger, that is to say half of what it is.

The impression is particularly misleading for countries where the ratio, unlike in Russia or Morocco, is close to Switzerland’s. Most interestingly, although no doubt by accident, for neighboring countries, where Swiss people are prone to go shopping in search of a bargain, a practice that possibly does not enthuse Migros. The extra percentage devoted to food (using this time  no longer rough approximations but precise values from the figures given in the Migros page) is 4% for Austria (10.9 ratio vs Switzerland’s 10.2), 8% for Germany (11.1), 30% for France (13.3) and  43% for Italy (14.6). But if you look at the picture the circles suggest much bigger differences; for example the Italian circle is obviously computed from the ratio of the squares, 14.62 / 10.62, showing an increase of 104%. In other words, Italians proportionally devote to food a little over two-fifths more  than the Swiss, but the graph suggests they spend twice as much.

On the premise that one should not ascribe to malevolence what can be explained by ignorance, I hope the Migros graphists will get a copy of Tufte’s book for their future endeavors.

Read Tufte too if you want the pictures in your papers to be not just attractive but accurate.

References

[1] Edward R. Tufte: The Visual Display of Quantitative Information, Graphics Press, second edition, 2001. See his site here.

[2] “How much do we spend to feed ourselves?” on the Migros site, available here for the French version (replace “fr” in the URL by “de” for German and “it” for Italian, I did not see an English version). Click on the figure for a readable version.

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Crossing the Is and doting on the Ts

 

Last week at the the CSEE&T conference in Klagenfurt (the conference page is here, I gave a keynote), a panel discussed how universities should prepare students for software engineering. Barry Boehm, one of the panelists, stated the following principle, which afterwards he said he had learned from Simon Ramo, co-founder of TRW. In hiring people, he stated, it is better to avoid candidates with an I-shaped profile: narrowly specialized in one topic that they have explored to exhaustion. Better look for a T: someone who has mastered an area in depth and then branched out to learn about many others.

I started playing with the variants. One should avoid the hyphens, or em-dashes, ““: people with a smattering of everything but no detailed knowledge of anything. Boehm said that this is the reason he always argued against establishing such undergraduate majors as systems engineering. A variant of the hyphen is the overline ““: graduates who supposedly are so smart that they can learn anything, but whose actual knowledge is limited to abstract notions.

Along with the T we should consider the “bottom” symbol of denotational semantics: . It corresponds to people who have a broad educational base, for example in mathematics, and have deepened it by focusing on a particular topic. The T and can be combined into an H turned on its side, H on the side: acquiring a solid foundation, specializing, then using that experience to become familiar with new areas.

Extending the permutation group, I am not sure what a “+” profile would be, but in a discussion last night Rustan Leino and Peter Müller suggested the “O”, ability to to circle around topics, and the umlaut, knowing a thing or two; in fact, exactly two.

 

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

Reuse of another kind

This is a plug for a family member, but for a good cause: reuse and the environment. There are many reasons for promoting reuse in software; in other fields some of those reasons apply too, plus many others, economic and environmental. The core concern is to reduce waste of all kinds.

One of the most appalling sources of waste is the growing use of throw-away food containers. It is also avoidable.

A new company, bizeebox [1], has devised an ingenious scheme to let restaurants use containers that can be washed and reused many times. It is easy to use by both restaurants and customers, saves money, and avoids heaping tons of waste on landfills. Their slogan: “Take the Waste Out of Takeout“.

The founders of bizeebox have run successful pilot projects and are now starting for real with restaurants in the Ann Arbor area. They have launched a campaign on Indiegogo, with a $30,000 funding goal. Reuse deserves a chance; so do they.

References

[1] bizeebox page, at bizeebox.com.

[2] bizeebox Indiegogo campaign: here.

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

The laws of branching (part 2): Tichy and Joy

Recently I mentioned the first law of branching (see earlier article) to Walter Tichy, famed creator of RCS, the system that established modern configuration management. He replied with the following anecdote, which is worth reproducing in its entirety (in his own words):

I started work on RCS in 1980, because I needed an alternative for SCCS, for which the license cost would have been prohibitive. Also, I wanted to experiment with reverse deltas. With reverse deltas, checking out the latest version is fast, because it is stored intact. For older ones, RCS applied backward deltas. So the older revisions took longer to extract, but that was OK, because most accesses are to the newest revision anyway.

At first, I didn’t know how to handle branches in this scheme. Storing each branch tip in full seemed like a waste. So I simply left out the branches.

It didn’t take long an people were using RCS. Bill Joy, who was at Berkeley at the time and working on Berkeley Unix, got interested. He gave me several hints about unpleasant features of SCCS that I should correct. For instance, SCCS didn’t handle identification keywords properly under certain circumstances, the locking scheme was awkward, and the commands too. I figured out a way to solve these issue. Bill was actually my toughest critic! When I was done with all the modifications, Bill cam back and said that he was not going to use RCS unless I put in branches. So I figured out a way. In order to reconstruct a branch tip, you start with the latest version on the main trunk, apply backwards deltas up to the branch point, and then apply forward deltas out to the branch tip. I also implemented a numbering scheme for branches that is extensible.

When discussing the solution, Bill asked me whether this scheme meant that it would take longer to check in and out on branches. I had to admit that this was true. With the machines at that time (VAXen) efficiency was important. He thought about this for a moment and then said that that was actually great. It would discourage programmers from using branches! He felt they were a necessary evil.

VN:F [1.9.10_1130]
Rating: 5.8/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: -2 (from 4 votes)

Another displaced business

Front-page notice in yesterday’s Tages Anzeiger (one of the principal Swiss newspapers):

Dear Readers:

From today the employment-ads section will no longer appear as a separate supplement, but directly as a section of the Tuesdays and Thursday paper. The reason is the ever smaller number of position offerings.

It seems clear that what has decreased is not the number of positions offered — the job market in Switzerland is healthy — but the number of those offered through newspapers. End of an era.

VN:F [1.9.10_1130]
Rating: 7.5/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: -1 (from 1 vote)

The laws of branching (part 1)

 

The first law of branching is: don’t. There is no other law.

The only sane way to develop software in a group, whether collocated or distributed, is to have a single branch (“trunk”) to which everyone commits changes, with constant running of the regression test suite to make sure that any breaking change is detected and corrected right away.

To allow branching, that is to say the emergence of separate lines of development with the expectation that they will be merged back later on, is to guarantee disaster. It is easy to diverge, but hard to converge; not only hard, but unpredictable. It can take days, weeks or more to reconcile changes and resolve conflicts, when the reason for the changes is no longer fresh in the developers’ memories, and the developers themselves may even no longer be there. Conflicts should be detected right away, and corrected immediately.

The EiffelStudio development never uses branches. A related development, EVE (Eiffel Verification Environment), maintained at ETH, includes all research tools, and is reconciled every Friday with the EiffelStudio trunk, so it is never allowed to diverge into a separate branch. Most other successful teams I know apply the first law of branching too. Solve conflicts before they have had the time to become conflicts.

VN:F [1.9.10_1130]
Rating: 5.3/10 (27 votes cast)
VN:F [1.9.10_1130]
Rating: -9 (from 19 votes)

Empirical answers to fundamental software engineering questions

This is a slightly reworked version of an article in the CACM blog, which also served as the introduction to a panel which I moderated at ESEC/FSE 2013 last week; the panelists were Harald Gall, Mark Harman, Giancarlo Succi (position paper only) and Tony Wasserman.

For all the books on software engineering, and the articles, and the conferences, a remarkable number of fundamental questions, so fundamental indeed that just about every software project runs into them, remain open. At best we have folksy rules, some possibly true, others doubtful, and others — such as “adding people to a late software project delays it further” [1] — wrong to the point of absurdity. Researchers in software engineering should, as their duty to the community of practicing software practitioners, try to help provide credible answers to such essential everyday questions.

The purpose of this panel discussion is to assess what answers are already known through empirical software engineering, and to define what should be done to get more.

Empirical software engineering” applies the quantitative methods of the natural sciences to the study of software phenomena. One of its tasks is to subject new methods — whose authors sometimes make extravagant and unsupported claims — to objective scrutiny. But the benefits are more general: empirical software engineering helps us understand software construction better.

There are two kinds of target for empirical software studies: products and processes. Product studies assess actual software artifacts, as found in code repositories, bug databases and documentation, to infer general insights. Project studies assess how software projects proceed and how their participants work; as a consequence, they can share some properties with studies in other fields that involve human behavior, such as sociology and psychology. (It is a common attitude among computer scientists to express doubts: “Do you really want to bring us down to the standards of psychology and sociology?” Such arrogance is not justified. These sciences have obtained many results that are both useful and sound.)

Empirical software engineering has been on a roll for the past decade, thanks to the availability of large repositories, mostly from open-source projects, which hold information about long-running software projects and can be subjected to data mining techniques to identify important properties and trends. Such studies have already yielded considerable and often surprising insights about such fundamental matters as the typology of program faults (bugs), the effectiveness of tests and the value of certain programming language features.

Most of the uncontested successes, however, have been from the product variant of empirical software engineering. This situation is understandable: when analyzing a software repository, an empirical study is dealing with a tangible and well-defined artifact; if any of the results seems doubtful, it is possible and sometimes even easy for others to reproduce the study, a key condition of empirical science. With processes, the object of study is more elusive. If I follow a software project working with Scrum and another using a more traditional lifecycle, and find that one does better than the other, how do I know what other factors may have influenced the outcome? And even if I bring external factors under control how do I compare my results with those of another researcher following other teams in other companies? Worse, in a more realistic scenario I do not always have the luxury of tracking actual industry projects since few companies are enlightened enough to let researchers into their developments; how do I know that I can generalize to industry the conclusions of experiments made with student groups?

Such obstacles do not imply that sound results are impossible; studies involving human behavior in psychology and sociology face many of the same difficulties and yet do occasionally yield insights. But these obstacles explain why there are still few incontrovertible results on process aspects of software engineering. This situation is regrettable since it means that projects large and small embark on specific methods, tools and languages on the basis of hearsay, opinions and sometimes hype rather than solid knowledge.

No empirical study is going to give us all-encompassing results of the form “Agile methods yield better products” or “Object-oriented programming is better than functional programming”. We are entitled to expect, however, that they help practitioners assess some of the issues that await every project. They should also provide a perspective on the conventional wisdom, justified or not, that pervades the culture of software engineering. Here are some examples of general statements and questions on which many people in the field have opinions, often reinforced by the literature, but crying for empirical backing:

  • The effect of requirements faults: the famous curve by Boehm is buttressed by old studies on special kinds of software (large mission-critical defense projects). What do we really lose by not finding an error early enough?
  • The cone of uncertainty: is that idea just folklore?
  • What are the successful techniques for shortening delivery time by adding manpower?
  • The maximum compressibility factor: is there a nominal project delivery time, and how much can a project decrease it by throwing in money and people?
  • Pair programming: when does it help, when does it hurt? If it has any benefits, are there in quality or in productivity (delivery time)?
  • In iterative approaches, what is the ideal time for a sprint under various circumstances?
  • How much requirements analysis should be done at the beginning of a project, and how much deferred to the rest of the cycle?
  • What predictors of size correlate best with observed development effort?
  • What predictors of quality correlate best with observed quality?
  • What is the maximum team size, if any, beyond which a team should be split?
  • Is it better to use built-in contracts or just to code assertions in tests?

When asking these and other similar questions relating to core aspects of practical software development, I sometimes hear “Oh, but we know the answer conclusively, thanks to so-and-so’s study“. This may be true in some cases, but in many others one finds, in looking closer, that the study is just one particular experiment, fraught with the same limitations as any other.

The principal aim of the present panel is to find out, through the contributions of the panelists which questions have useful and credible empirical answers already available, whether or not widely known. The answers must indeed be:

  • Empirical: obtained through objective quantitative studies of projects.
  • Useful: providing answers to questions of interest to practitioners.
  • Credible: while not necessarily absolute (a goal difficult to reach in any matter involving human behavior), they must be backed by enough solid evidence and confirmation to be taken as a serious input to software project decisions.

An auxiliary outcome of the panel should be to identify fundamental questions on which credible, useful empirical answers do not exist but seem possible, providing fuel for researchers in the field.

To mature, software engineering must shed the folkloric advice and anecdotal evidence that still pervade the field and replace them with convincing results, established with all the limitations but also the benefits of quantitative, scientific empirical methods.

Note

[1] From Brooks’s Mythical Man-Month.

VN:F [1.9.10_1130]
Rating: 8.7/10 (10 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 5 votes)

Informatics: catch them early

 

Recycled[I occasionally post on the Communications of the ACM blog. It seems that there is little overlap between readers of that blog and the present one, so — much as I know, from software engineering, about the drawbacks of duplication — I will continue to repost articles here when relevant.]

Some call it computer science, others informatics, but they face the same question: when do we start teaching the subject? In many countries where high schools began to introduce it in the seventies, they actually retreated since then; sure, students are shown how to use word processors and spreadsheets, but that’s not the point.

Should we teach computer science in secondary (and primary) school? In a debate at SIGCSE a few years ago, Bruce Weide said in strong words that we should not: better give students the strong grounding in mathematics and especially logic that they will need to become good at programming and CS in general. I found the argument convincing: I teach first-semester introductory programming to 200 entering CS students every year, and since many have programming experience, of highly diverse nature but usually without much of a conceptual basis, I find myself unteaching a lot. In a simple world, high-school teachers would teach students to reason, and we would teach them to program. The world, however, is not simple. The arguments for introducing informatics earlier are piling up:

  • What about students who do not enter CS programs?
  • Many students will do some programming anyway. We might just as well teach them to do it properly, rather than let them develop bad practices and try to correct them at the university level.
  • Informatics is not just a technique but an original scientific discipline, with its own insights and paradigms (see [2] and, if I may include a self-citation, [3]). Its intellectual value is significant for all educated citizens, not just computer scientists.
  • Countries that want to be ahead of the race rather than just consumers of IT products need their population to understand the basic concepts, just as they want everyone, and not just future mathematicians, to master the basics of arithmetics, algebra and geometry.

These and other observations led Informatics Europe and ACM Europe, two years ago, to undertake the writing of a joint report, which has now appeared [1]. The report is concise and makes strong points, emphasizing in particular the need to distinguish education in informatics from a mere training in digital literacy (the mastery of basic IT tools, the Web etc.). The distinction is often lost on the general public and decision-makers (and we will surely have to emphasize it again and again).

The report proposes general principles for both kinds of programs, emphasizing in particular:

  • For digital literacy, the need to teach not just how-tos but also safe, ethical and effective use of IT resources and tools.
  • For informatics, the role of this discipline as a cross-specialty subject, like mathematics.

The last point is particularly important since we should make it clear that we are not just pushing (out of self-interest, as members of any discipline could) for schools to give our specialty a share, but that informatics is a key educational, scientific and economic resource for the citizens of any modern country.

The report is written from a European perspective, but the analysis and conclusions will, I think, be useful in any country.

It does not include any detailed curriculum recommendation, first because of the wide variety of educational contexts, but also because that next task is really work for another committee, which ACM Europe and Informatics Europe are in the process of setting up. The report also does not offer a magic solution to the key issue of bootstrapping the process — by finding teachers to make the courses possible, and courses to justify training the teachers — but points to successful experiences in various countries that show a way to break the deadlock.

The introduction of informatics as a full-fledged discipline in the K-12 curriculum is clearly where the winds of history are blowing. Just as the report was being finalized, the UK announced that it was making CS one of the choices of required scientific topics would become a topic in the secondary school exam on a par with traditional sciences. The French Academy of Sciences recently published its own report on the topic, and many other countries have similar recommendations in progress. The ACM/IE report is a major milestone which should provide a common basis for all these ongoing efforts.

References

[1] Informatics education: Europe Cannot Afford to Miss the Boat,  Report of the joint Informatics Europe & ACM Europe Working Group on Informatics Education,  April 2013,  available here.

[2] Jeannette Wing: Computational Thinking, in Communications of the ACM, vol. 49, no. 3, March 2006, pages 33-35, available here.

[3] Bertrand Meyer: Software Engineering in the Academy, in Computer (IEEE), vol. 34, no. 5, May 2001, pages 28-35, available here.

VN:F [1.9.10_1130]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)

Apocalypse no! (Part 2)

 

Recycled(Revised from an article originally published in the CACM blog. Part 2 of a two-part article.)

Part 1 of this article (to be found here, please read it first) made fun of authors who claim that software engineering is a total failure — and, like everyone else, benefit from powerful software at every step of their daily lives.

Catastrophism in talking about software has a long history. Software engineering started around 1966 [1] with the recognition of a “software crisis“. For many years it was customary to start any article on any software topic by a lament about the terrible situation of the field, leaving in your reader’s mind the implicit suggestion that the solution to the “crisis” lay in your article’s little language or tool.

After the field had matured, this lugubrious style went out of fashion. In fact, it is hard to sustain: in a world where every device we use, every move we make and every service we receive is powered by software, it seems a bit silly to claim that in software development everyone is wrong and everything is broken.

The apocalyptic mode has, however, made a comeback of late in the agile literature, which is fond in particular of citing the so-called “Chaos” reports. These reports, emanating from Standish, a consulting firm, purport to show that a large percentage of projects either do not produce anything or do not meet their objectives. It was fashionable to cite Standish (I even included a citation in a 2003 article), until the methodology and the results were thoroughly debunked starting in 2006 [2, 3, 4]. The Chaos findings are not replicated by other studies, and the data is not available to the public. Capers Jones, for one, publishes his sources and has much more credible results.

Yet the Chaos results continue to be reverently cited as justification for agile processes, including, at length, in the most recent book by the creators of Scrum [5].

Not long ago, I raised the issue with a well-known software engineering author who was using the Standish findings in a talk. Wasn’t he aware of the shakiness of these results? His answer was that we don’t have anything better. It did not sound like the kind of justification we should use in a mature discipline. Either the results are sound, or we should not rely on them.

Software engineering is hard enough and faces enough obstacles, so obvious to everyone in the industry and to every user of software products, that we do not need to conjure up imaginary scandals and paint a picture of general desolation and universal (except for us, that is) incompetence. Take Schwaber and Sutherland, in their introductory chapter:

You have been ill served by the software industry for 40 years—not purposely, but inextricably. We want to restore the partnership.

No less!

Pretending that the whole field is a disaster and no one else as a clue may be a good way to attract attention (for a while), but it is infantile as well as dishonest. Such gross exaggerations discredit their authors, and beyond them, the ideas they promote, good ones included.

As software engineers, we can in fact feel some pride when we look at the world around us and see how much our profession has contributed to it. Not just the profession as a whole, but, crucially, software engineering research: advances in programming methodology, software architecture, programming languages, specification techniques, software tools, user interfaces, formal modeling of software concepts, process management, empirical analysis and human aspects of computing have, step after step, belied the dismal picture that may have been painfully accurate in the early days.

Yes, challenges and unsolved problems face us at every corner of software engineering. Yes, we are still at the beginning, and on many topics we do not even know how to proceed. Yes, there are lots of things to criticize in current practices (and I am not the least vocal of the critics, including in this blog). But we need a sense of measure. Software engineering has made tremendous progress over the last five decades; neither the magnitude of the remaining problems nor the urge to sell one’s products and services justifies slandering the rest of the discipline.

Notes and references

[1] Not in 1968 with the NATO conference, as everyone seems to believe. See an earlier article in this blog.

[2] Robert L. Glass: The Standish report: does it really describe a software crisis?, in Communications of the ACM, vol. 49, no. 8, pages 15-16, August 2006; see here.

[3] J. Laurens Eveleens and Chris Verhoef: The Rise and Fall of the Chaos Report Figures, in IEEE Software, vol. 27, no. 1, Jan-Feb 2010, pages 30-36; see here.

[4] S. Aidane, The “Chaos Report” Myth Busters, 26 March 2010, see here.

[5] Ken Schwaber and Jeff Sutherland: Software in 30 Days: How Agile Managers Beat the Odds, Delight Their Customers, And Leave Competitors In the Dust, Wiley, 2012.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

LASER summer school: Software for the Cloud and Big Data

The 2013 LASER summer school, organized by our chair at ETH, will take place September 8-14, once more in the idyllic setting of the Hotel del Golfo in Procchio, on the island of Elba in Italy. This is already the 10th conference; the roster of speakers so far reads like a who’s who of software engineering.

The theme this year is Software for the Cloud and Big Data and the speakers are Roger Barga from Microsoft, Karin Breitman from EMC,  Sebastian Burckhardt  from Microsoft,  Adrian Cockcroft from Netflix,  Carlo Ghezzi from Politecnico di Milano,  Anthony Joseph from Berkeley,  Pere Mato Vila from CERN and I.

LASER always has a strong practical bent, but this year it is particularly pronounced as you can see from the list of speakers and their affiliations. The topic is particularly timely: exploring the software aspects of game-changing developments currently redefining the IT scene.

The LASER formula is by now well-tuned: lectures over seven days (Sunday to Saturday), about five hours in the morning and three in the early evening, by world-class speakers; free time in the afternoon to enjoy the magnificent surroundings; 5-star accommodation and food in the best hotel of Elba, made affordable as we come towards the end of the season (and are valued long-term customers). The group picture below is from last year’s school.

Participants are from both industry and academia and have ample opportunities for interaction with the speakers, who typically attend each others’ lectures and engage in in-depth discussions. There is also time for some participant presentations; a free afternoon to discover Elba and brush up on your Napoleonic knowledge; and a boat trip on the final day.

Information about the 2013 school can be found here.

LASER 2012, Procchio, Hotel del Golvo

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Doing it right or doing it over?

(Adapted from an article in the Communications of the ACM blog.)

I have become interested in agile methods because they are all the rage now in industry and, upon dispassionate examination, they appear to be a pretty amazing mix of good and bad ideas. I am finishing a book that tries to sort out the nuggets from the gravel [1].

An interesting example is the emphasis on developing a system by successive increments covering expanding slices of user functionality. This urge to deliver something that can actually be shown — “Are we shipping yet?” — is excellent. It is effective in focusing the work of a team, especially once the foundations of the software have been laid. But does it have to be the only way of working? Does it have to exclude the time-honored engineering practice of building the infrastructure first? After all, when a building gets constructed, it takes many months before any  “user functionality” becomes visible.

In a typical exhortation [2], the Poppendiecks argue that:

The right the first time approach may work for well-structured problems, but the try-it, test-it, fix-it approach is usually the better approach for ill-structured problems.

Very strange. It is precisely ill-structured problems that require deeper analysis before you jump in into wrong architectural decisions which may require complete rework later on. Doing prototypes to try possible solutions can be a great way to evaluate potential solutions, but a prototype is an experiment, something quite different from an increment (an early version of the future system).

One of the problems with the agile literature is that its enthusiastic admonitions to renounce standard software engineering practices are largely based on triumphant anecdotes of successful projects. I am willing to believe all these anecdotes, but they are only anecdotes. In the present case systematic empirical evidence does not seem to support the agile view. Boehm and Turner [3] write:

Experience to date indicates that low-cost refactoring cannot be depended upon as projects scale up.

and

The only sources of empirical data we have encountered come from less-expert early adopters who found that even for small applications the percentage of refactoring and defect-correction effort increases with [the size of requirements].

They do not cite references here, and I am not aware of any empirical study that definitely answers the question. But their argument certainly fits my experience. In software as in engineering of any kind, experimenting with various solutions is good, but it is critical to engage in the appropriate Big Upfront Thinking to avoid starting out with the wrong decisions. Some of the worst project catastrophes I have seen were those in which the customer or manager was demanding to see something that worked right away — “it doesn’t matter if it’s not the whole thing, just demonstrate a piece of it! — and criticized the developers who worked on infrastructure that did not produce immediately visible results (in other words, were doing their job of responsible software professionals). The inevitable result: feel-good demos throughout the project, reassured customer, and nothing to deliver at the end because the difficult problems have been left to rot. System shelved and never to be heard of again.

When the basis has been devised right, perhaps with nothing much to show for months, then it becomes critical to insist on regular visible releases. Doing it prematurely is just sloppy engineering.

The problem here is extremism. Software engineering is a difficult balance between conflicting criteria. The agile literature’s criticism of teams that spend all their time on design or on foundations and never deliver any usable functionality is unfortunately justified. It does not mean that we have to fall into the other extreme and discard upfront thinking.

In the agile tradition of argument by anecdote, here is an extract from James Surowiecki’s  “Financial Page” article in last month’s New Yorker. It’s not about software but about the current Boeing 787 “Dreamliner” debacle:

Determined to get the Dreamliners to customers quickly, Boeing built many of them while still waiting for the Federal Aviation Administration to certify the plane to fly; then it had to go back and retrofit the planes in line with the FAA’s requirements. “If the saying is check twice and build once, this was more like build twice and check once”, [an industry analyst] said to me. “With all the time and cost pressures, it was an alchemist’s recipe for trouble.”

(Actually, the result is “build twice and check twice”, or more, since every time you rebuild you must also recheck.) Does that ring a bell?

Erich Kästner’s ditty about reaching America, cited in a previous article [5], is once again the proper commentary here.

References

[1] Bertrand Meyer: Agile! The Good, the Hype and the Ugly, Springer, 2013, to appear.

[2] Mary and Tom Poppendieck: Lean Software Development — An Agile Toolkit, Addison-Wesley, 2003.

[3] Barry W. Boehm and Richard Turner: Balancing Agility with Discipline — A Guide for the Perplexed, Addison-Wesley, 2004. (Second citation slightly abridged.)

[4] James Surowiecki, in the New Yorker, 4 February 2013, available here.

[5] Hitting on America, article from this blog, 5 December 2012, available here.

VN:F [1.9.10_1130]
Rating: 8.9/10 (9 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

Your IP: does Google care?

 

A search for my name on Google Scholar [1] shows, at the top of the resulting list, my book Object-Oriented Software Construction [2], with over 7800 citations in the scientific literature. Very nice (thanks, and keep those citations coming!).

That top result is a link to a pirated version [3] of the full content — 1350 pages or so — at an organization in Indonesia, “Institut Teknologi Telkom”, whose logo bears the slogan “Center of Excellence in ICT”. The text has been made available, along with the entire contents of several other software engineering textbooks, in a directory helpfully called “ebooks”, apparently by a user with the initials “kms”. I think I know his full name but attempts at emailing him failed. I wrote a couple of times to the site’s webmaster, who does not respond.

Needless to say, the work is copyrighted and that online copy is not authorized. (I realize that to some people the very idea of protecting intellectual property is anathema, but I, not they, wrote the book, and for the time being it is not public property.)

At least Google could avoid directing people to a pirated text as the first answer to a query about my publications. I was able to to bring the issue to the attention of someone at Google; that result is already something of a miracle, as anyone who tries to interact with a human being regarding a Google-related problem can testify. The history of that interaction, which was initially about something else, might serve as the subject for another article. The person refused to do anything and pointed me to an online tool [4] for removing search results.

Navigating the tool proved to be an obstacle course, starting with the absence of Google Scholar among the Google products listed (I inquired and was told to use “Web Search”). Interestingly, to use this service, you have to be logged in as a Gmail user; I do have a gmail account, but I know several people, including a famous computer scientist, who refuse to open one out of fear for their privacy. Think of the plight of someone who has a complaint against Google results affecting his privacy, and to lodge that complaint must first register as a Google user! I did not have that problem but had to navigate the obstacle course. (It includes one of those “Captchas” that are so good at preventing automatic tools from deciphering the words that humans can’t read them either — I have pretty good eyesight and still I had to try five times. Fodder for yet another article.) But I succeeded, sent my request, and got an automatic acknowledgement. Then…

Then nothing. No answer. The search results remain the same. No one seems to care.

Here is a little thought experiment. Imagine you violated Google’s IP, for example by posting some Google proprietary code on your Web page. Now I have a hunch that they would respond faster. Much faster. This is all pure speculation of course, and I am not advising anyone to try the experiment for real. Pure speculation.

In the meantime, maybe I can at least use the opportunity for some self-promotion. The book is actually pretty good, I think. You can buy it at Amazon [5] for $97.40, a bit less for a used copy. But why pay? Google invites you to read it for free. Just follow the link they obligingly provide at [1].

References

[1] Result of a search for author:”b meyer” on Google Scholar: see here.

[2] Bertrand Meyer: Object-Oriented Software Construction, 2nd edition, Prentice Hall, 1997. See the book’s page at Eiffel Software here and the Wikipedia entry here. Note that either would be appropriate for Google Scholar to identify the book.

[3] Bootlegged version of [1] here.

[4] Google: “Removing content from Google”, page available here.

[5] Amazon book page for [1]: here.

VN:F [1.9.10_1130]
Rating: 8.2/10 (23 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 8 votes)

Interesting functionalities

Admittedly the number of people who will find the following funny, or of any interest at all, is pretty limited, so it is a good bet that about 99.86% of the faithful readers of this blog can safely wait for the next post. To appreciate the present one you must be a French-speaking nerd with an interest in both Pushkin’s poetry and Bayesian language translation. Worse, you must have a sense of the comic that somehow matches mine. (Then let’s make that 99.98%.)

Anyway, if you are still with me: there is a famous Pushkin love poem entitled Я помню чудное мгновенье…: “I remember the magic moment…” (the moment when he first saw her). I have no idea why I fed it into Google Translate; sheer idleness I suppose. In the line

И снились милые черты

the poet states that the delicate features of his beloved came to him in his sleep. Literally: “And [your] gentle [face’s] features came to my dreams”.

Google renders this into French as “Et rêvé de fonctionnalités intéressantes”: And dreamed of interesting functionalities. (The direct translation to English is less fun.)

Kudos to Google for capturing the subtle mood of 21st-century  passion. When the true nerd — I know what I am talking about — feels romantic, falls asleep, and starts dreaming, we now know what he dreams of: interesting functionalities.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Who gets what

 

The following rules are not all that’s needed to understand life, but they go a long way.

1. It’s the tenor who gets the girl. (The baritone never stood a chance1.)

2. It’s the physicists who get the money.

 

Note

1Actually I can think of a couple of exceptions (both Russian, I wonder what that means): Ruslan and Ludmilla (bass), Prince Igor (he is old though).

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Hitting on America

 

The study of agile methods is good for your skeptical bones.

“Build the simplest thing that works, then refactor if needed.”

Maybe. Maybe. But what about getting it right the first time around?

Erich Kästner wrote an apposite ditty on this topic [1]:

They tell you it’s OK if first you fail;
OK perhaps — but not so practical.
Not all who for India set sail
Hit on America.

Note

[1] My translation. The original reads:

Irrtümer haben ihren Wert;
Jedoch noch hie und da.
Nicht jeder, der nach Indien fährt,
Endeckt Amerika.

VN:F [1.9.10_1130]
Rating: 6.1/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 7 votes)

Why so many features?

 

It is a frequent complaint that production software contains too many features: “I use only  maybe 5% of Microsoft Word!” , with the implication that the other 95% are useless, and apparently without the consideration that maybe someone else needs them; how do you know that what is good enough for you is good enough for everyone?

The agile literature frequently makes this complaint against “software bloat“, and has turned it into a principle: build minimal software.

Is software really bloated? Rather than trying to answer this question it is useful to analyze where features come from. In my experience there are three sources: internal ideas; suggestions from the field; needs of key customers.

1. Internal ideas

A software system is always devised by a person or group, who have their own views of what it should offer. Many of the more interesting features come from these inventors and developers, not from the market. A competent group does not wait for users or prospects to propose features, but comes up with its own suggestions all the time.

This is usually the source of the most innovative ideas. Major breakthroughs do not arise from collecting customer wishes but from imagining a new product that starts from a new basis and proposing it to the market without waiting for the market to request it.

2. Suggestions from the field

Customers’ and prospects’ wishes do have a crucial role, especially for improvements to an existing product. A good marketing department will serve as the relay between the field’s wishes and the development team. Many such suggestions are of the “Check that box!” kind: customers and particularly prospects look at the competition and want to make sure that your product does everything that the others do. These suggestions push towards me-too features; they are necessary to keep up with the times, but must be balanced with suggestions from the other two sources, since if they were the only inspiration they would lead to a product that has the same functionality as everyone else’s, only delivered a few months later, not the best recipe for success.

3. Key customers

Every company has its key customers, those who give you so much business that you have to listen to them very carefully. If it’s Boeing calling, you pay more attention than to an unknown individual who has just acquired a copy. I suspect that many of the supposedly strange features, of products the ones that trigger “why would anyone ever need this?” reactions, simply come from a large customer who, at some point in the product’s history, asked for a really, truly, absolutely indispensable facility. And who are we — this includes Microsoft and Adobe and just about everyone else — to say that it is not required or not important?

It is easy to complain about software bloat, and examples of needlessly complex system abound. But your bloat may be my lifeline, and what I dismiss as superfluous may for you be essential. To paraphrase a comment by Ichbiah, the designer of Ada, small systems solve small problems. Outside of academic prototypes it is inevitable that  a successful software system will grow in complexity if it is to address the variety of users’ needs and circumstances. What matters is not size but consistency: maintaining a well-defined architecture that can sustain that growth without imperiling the system’s fundamental solidity and elegance.

VN:F [1.9.10_1130]
Rating: 8.5/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Salad requirements, requirements salad

 

You know what salad is.

Salad is made of green leaves. Actually no, there are lots of other colors, lots of other kinds; and many, such as rice salad, pasta salad, potato salad, include no leaves at all.

In any case, salad is made of vegetables. Actually no: fruit salad.

I meant vegetal, as in non-animal. Actually no: salads often contain cheese, meat, fish, seafood.

In any case, salad is a cold dish. Actually no: did you never try a warm goat cheese salad?

Salad has dressing. Actually no: I know quite a few people who shun dressings.

Salads are consumed at the beginning of a meal. Actually no: in France, the normal place of a salad is after the main course.

At least they are only part of a meal. Actually no: have you not heard of the dinner salad?

Salads have something to do with salt. Actually no: although you are right etymologically, as the word comes through the French salade from the Latin saleta, salty, in our blood-pressure-conscious world the cook often does not put any salt.

Salads are only consumed at lunch. At dinner too. And maybe… I take that back.

I know a salad when I see one. Or maybe when I taste one. Although I have never tried blindfolded.

Then explain to us what it is.

Well, if it says “salad” on the menu it must be a salad.

Can you do better?

I will have to come back to you on that one.

If it is so hard to come up with a convincing definition for such a banal notion (and it is real fun to look at good dictionaries and see the contortions they go through in trying to make some sense of it), no wonder software requirements specifications (SRS) are so hard. One of the obligatory steps in a requirements process —  “agile” or not — is to build up a glossary for the project [1]: a set of definitions for the terms of the trade, those words from the problem domain that the stakeholders throw in assuredly all the time in discussions, with the assumption that everyone else understands, except that when you try to understand too you realize there is no clear definition and even, in some cases, different people understand them in different ways.

If definitions are so hard, are requirements then impossible? The trick is that we often do not need a dictionary-style definition of what things are; we only need to know what they have, in other words what are their properties and operations. This is the abstract data type approach, also known as object technology. But it is still hard to convince the stakeholders to explain what they mean.

The German language has one more use of salads: the affectionate term to describe the jumble of wires that mars the back of your desk (I am guessing) and also the front of mine (in this case I know) is Kabelsalat, cable salad [2]. More than a few SRS are like that too: requirements salads.

References

[1] IEEE: Standard 830-1998, Recommended Practice for Software Requirements Specifications, available (for a fee) here.

[2] German Wikipedia: Kabelsalat entry, available here.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (10 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 9 votes)

TOOLS 2012, “The Triumph of Objects”, Prague in May: Call for Workshops

Workshop proposals are invited for TOOLS 2012, The Triumph of Objectstools.ethz.ch, to be held in Prague May 28 to June 1. TOOLS is a federated set of conferences:

  • TOOLS EUROPE 2012: 50th International Conference on Objects, Models, Components, Patterns.
  • ICMT 2012: 5th International Conference on Model Transformation.
  • Software Composition 2012: 10th International Conference.
  • TAP 2012: 6th International Conference on Tests And Proofs.
  • MSEPT 2012: International Conference on Multicore Software Engineering, Performance, and Tools.

Workshops, which are normally one- or two-day long, provide organizers and participants with an opportunity to exchange opinions, advance ideas, and discuss preliminary results on current topics. The focus can be on in-depth research topics related to the themes of the TOOLS conferences, on best practices, on applications and industrial issues, or on some combination of these.

SUBMISSION GUIDELINES

Submission proposal implies the organizers’ commitment to organize and lead the workshop personally if it is accepted. The proposal should include:

  •  Workshop title.
  • Names and short bio of organizers .
  • Proposed duration.
  •  Summary of the topics, goals and contents (guideline: 500 words).
  •  Brief description of the audience and community to which the workshop is targeted.
  • Plans for publication if any.
  • Tentative Call for Papers.

Acceptance criteria are:

  • Organizers’ track record and ability to lead a successful workshop.
  •  Potential to advance the state of the art.
  • Relevance of topics and contents to the topics of the TOOLS federated conferences.
  •  Timeliness and interest to a sufficiently large community.

Please send the proposals to me (Bertrand.Meyer AT inf.ethz.ch), with a Subject header including the words “TOOLS WORKSHOP“. Feel free to contact me if you have any question.

DATES

  •  Workshop proposal submission deadline: 17 February 2012.
  • Notification of acceptance or rejection: as promptly as possible and no later than February 24.
  • Workshops: 28 May to 1 June 2012.

 

VN:F [1.9.10_1130]
Rating: 7.3/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Guest article: funding great research

In a blog article posted in its original version on this blog [1] and in a revised version on the Communications of the ACM blog [2], I emphasized the relevance of incremental research. Recently Mikkel Thorup sent me some interesting comments, which I am publishing here as the first Guest Column of this blog.

References

[1] Bertrand Meyer: One Cheer for Incremental Research, in the present blog, 10 August 2009, available here

[2] Bertrand Meyer: Long Live Incremental Research, in Communications of the ACM Blog, 13 June 2011, available here.

Guest article by Mikkel Thorup: Funding Great Research

Research foundations want great research projects. However, a while back Bertrand Meyer wrote an interesting blog post: Long Live Incremental Research [2]. With examples he showed that many of the greatest results of research could not possibly be the projected results of great sounding project descriptions. His conclusion is that we should drop the high-flying ambitions from project descriptions, and instead support more incremental research proposals, hoping that great stuff will happen on the way. Indeed incremental research is perfect for research projects with predictable deliverables. However, I suggest the opposite conclusion; namely that we for some of the funding drop the project description.

The basic idea is that foundations should encourage researchers to look for results far better than those that can reasonably be projected. In particular, researchers should be free to follow their inspiration when they see new exiting opportunities. This is not done by tying researchers to incremental projects. Instead we can sometimes switch to result based funding, that is, funding based on results already achieved (with emphasis on the more recent past). Such result based funding is more like rewards for great results, and it offers researchers the perfect incentive to do their very best so as to secure future funding.

Consider a researcher with a history of brilliant ideas taking research in surprising new directions. If we try casting this as a project, the referees will rightly complain: “It is not clear how the applicant will come up with a brilliant idea, nor is it clear what the surprise will be”. With such lack of focus and feasibility, a low project score is expected.  If the project description has a predefined weight of, say, 40%, then the overall score will be too low for funding, regardless of the researcher’s established track record of succeeding in unlikely situations.  However, research needs great new ideas. Therefore we need some result based funding so that we can support researchers with a proven talent for generating great new ideas even if we do not quite understand how it will happen.

The above problem is often very real in my field of theoretical computer science. Like in other fields, theoretical research is only interesting if it contains surprises (otherwise it is more like development). A project plan would make sense if the starting point was a surprising idea or approach that it would take years to develop, but in theory, the most exciting ideas are often strikingly simple. When first you have such an idea, you are typically close to done, ready to start writing a paper. Thus, if you have a great idea when you apply for a grant, you will typically be done long before you get the grant. The essence of the research is thus the unpredictable search for powerful ideas and insights. The most appropriate project description is therefore just a description of the importance of the area to be researched and the type of results aimed for. The track record shows which researchers have the talent to succeed.

Dropping the how-part of the project description will greatly increase methodological diversity, allowing researchers to use the strategy that has proved most suitable for their area and their own talent and skills.  As a simple example, Bertrand suggested funding incremental research, hoping that great surprising things would turn up on the way. My strategy is the opposite. I try to spend as much time as possible on overly ambitious targets. Most of the time I fail, but I rarely come home empty-handed, for by studying the unknown I nearly always discover something new, sometimes even more interesting than the original target. From the perspective of ambition, I see it as an advantage that I minimize time spend on easy targets, but foundations seem to prefer that you take a planned path with some guaranteed targets on the way. The point here is not to argue whether one strategy is superior to the other, but rather to embrace the diversity of strategies that may work depending on the area and the individual researcher.

Perhaps more seriously, if a target is hard to achieve, it may be because it requires a crazy approach that would not look reasonable to anyone else, but which may work for a researcher thanks to his special talents and intuition. Indeed I have often been positively surprised seeing how others succeeded using an approach I had myself dismissed.  As a project, such crazy approaches would fail on perceived feasibility, but the point in result based funding is that researchers are free to use whatever approach they find most efficient. Funding is given to those who prove successful. This gives the perfect incentive to do great work, securing future funding.

Result-based funding would also reduce resources needed to evaluate applications. It is very hard for a general panel to evaluate the methodology and success probability of a project.  Moreover, it requires an intimate knowledge of a field to evaluate how big a difference a result would make relative to what is already known. However, handling published results, we know what happened and we can rely on peer-review for the difference it made to the field. All the panel has to do is to evaluate how the successes meet with the objectives of the foundation.

Let us, as an example, take something like the ERC Advanced Investigator Grant which welcomes high risk high gain research. It would seem that aiming for surprising breakthroughs in an important area would fall well within this scope. Having researchers with proven skills explore the area and follow their inspiration may be the optimal strategy. Uncertainty about what they would find should not be worse than high risk. In fact, based on past performance, it may be safe to assume that they will discover something interesting if not ground-breaking. However, when projects are scored on focused feasibility, such projects will fail even if their expected return is very high. It has to be possible to get a high overall score for promising research even if standard project parameters like focus and feasibility would be counterproductive.  At the end of the day, what we want are results, not project descriptions, so what should determine the overall score is which proposal is expected to yield the greatest results.

Long live great research!

Mikkel Thorup

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Averaging

 

A statistical textbook [1] contains this gem of wisdom:

Only a fool would conclude that a painting that was judged as excellent by one person and contemptible by another ought therefore to be classified as mediocre.

Common sense indeed; but does the procedure not recall how the typical conference program committee works, with averages obligingly computed by the supporting web-based systems?

Reference

[1] David C. Howell: Statistical Methods for Psychology, sixth edition, Thomson-Wadsworth, 2007

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

Fun with Bayes

 

Try this:  go to translate.google.com, choose Russian as the source and English as the target languages. In the input field, type“Андрей Иванович мне писал” or, if you do not have a Cyrillic keyboard, the transliteration into the Latin alphabet, “Andrej Ivanovich mne pisal”, which (unless you uncheck the default option) will be automatically transcribed into Cyrillic as you go. The correct English translation appears: “Andrey wrote to me”.

Correct yes, but partial: the input did not read “Andrey” but “Andrey Ivanovich”. Russians have a first name, a last name and also a “patronymic”  based on the father’s name: our Andrey’s father is or was called Ivan. Following the characters in, say, War and Peace, would be next to impossible without patronymics (it’s hard enough with them). On the other hand, English usually omits the patronymic, so if you are translating something simpler than a Tolstoy novel it is reasonable for an automated tool to yield “Andrey” as the translation of “Andrey Ivanovich”. In some cases, depending on the context, it gives “Andrei”, and in some others the anglicized “Andrew”.

Google Translate has yet another translation for “Andrey Ivanovich”. Assume that you want to be specific; maybe you know two people called Andrey and use the patronymic to distinguish between them. You want to say, for example,

       I have in mind not Andrey Nikolaevich, but Andrey Petrovich.

You can enter this as “Ja imeju v vidu ne Andreja Nikolaevicha, no Andreja Petrovicha”, or copy-paste the Cyrillic: Я имею в виду не Андрея Николаевича, но Андрея Петровича. (Note for Russian speakers: the word expected after the comma is of course а, but Google Translate, knowing that а can mean “and” in other contexts, translates it here with the opposite of the intended meaning. This is why I use но, which sounds strange but understandable, and is correctly translated.). Try it now and see what comes out on the English side:

      I do not mean Kolmogorov, but Andrei Petrovich.

Google Translate, in other words, has another translation for “Andrey Nikolaevich”:

       Kolmogorov

The great mathematician A.N. Kolmogorov (1903-1987) indeed had this first name and this patronymic; to conclude that anyone with these names is also called Kolmogorov is not, however, a step that most of us are prepared to take, especially those of us with a (living) friend called Andrey Nikolaevich.

A favorite of Russians is “Dobroje Utro, Dmitri Anatolevich”, meaning “Good morning, Dimitri Anatolevich”, which Google translates into “Good morning, Mr. President”. I will let you figure this one out.

All the translations cited were, by the way, obtained on the date of this post; algorithms can change. For other examples, see this page, in Russian (thanks to Sergey Velder for bringing it to my attention).

What happened? Automatic translation has made great progress in recent years, largely as a result of the switch from structural, precise techniques based on linguistic theory to approximate methods based on statistics. These methods rely on an immense corpus of existing human translations, accessible on the Internet, and apply Bayesian techniques to match every text element to the most frequently encountered translation of a similar phrase in existing translations. This switch has caused a revolution in translation, making it possible to get approximate equivalents. Personally I find them most useful for a language I do not know at all: if I want to read a Web page in Korean, I can get its general idea, which I could not have done fifteen years ago without finding a native speaker. For a language that I know imperfectly, the help is less clear, because the translations are almost never entirely right; in fact they are almost always, beyond the level of simple phrases, grammatically incorrect.

With Bayesian techniques it is understandable why “Andrey Nikolaevich” sometimes comes out in English as “Kolmogorov”: he is probably the most famous of all Andrey Nikolaevichs in Google’s database of Russian-English translation pairs. If you do not know the database, the behavior is mysterious, as you cannot usually guess whether the translation in a particular context will be “Andrey, “Andrei”, “Andrew”, “Andrey N.” or “Kolmogorov”, the five variants that I have seen (try your own experiments!). Some cases are predictable once you know that the techniques are statistical: if you include the word “Teorema” (theorem) anywhere close, you are sure to get “Kolmogorov”. But usually there is no obvious clue.

Statistical techniques are great but such examples, beyond the fun, show their limits. I truly hope that in the future they can be combined with more exact techniques based on sound linguistics.

Postscript: are you bytypal?

Perhaps I should explain why I use Google Translate with Russian as the source language. I do not use it for translation, but I do need it to type texts in Russian. I could use a Cyrillic keyboard, but I don’t because I am a very fast touch typist on the English (QWERTY) keyboard. (Learning to type at a professional level early in life was one of the most useful skills I ever acquired — not as useful as grammar, set theory or axiomatic semantics, but far more useful than separation logic.) So it is convenient for me to type in Latin letters, say “Dostojeksky”, and rely on a tool that immediately transliterates into the Cyrillic equivalent, here Достоевский . Then I can copy-paste the result into, for example, an email to a Russian-speaking recipient.

I used to rely on a tool that does exactly this, Translit (www.translit.ru); I have of late found Google Translate generally more convenient because it does not just transliterate but relies on its database to correct some typos. I do not need the translation (except possibly to check that what I wrote makes sense), but I see it anyway; that is how I ran into the Bayesian fun described above.

As a matter of fact the transliteration tool is good but, as often with software from Google, only  “almost” right. Sometimes it simply refuses to transliterate what I wrote, because it insists on its own misguided idea of what I meant. The Auto Correct option of Microsoft Office has the good sense, when it wrongly corrects your input and  you retype it, to obey you the second time around; but Google Translate’s transliteration facility does not seem to have any such policy: it sticks to its own view, right or wrong. As a consequence it has occasionally taken me a good five minutes of fighting the tool to enter a single word. Such glitches might be removed over time, but at the moment they are sufficiently annoying that I am thinking of teaching myself to touch-type in Cyrillic.

Is this possible? Initially I learned to type on the French (AZERTY) keyboard and I had to unlearn it, since otherwise a Q would occasionally come out as an A, a Z as a W and a semicolon as an M. I know bilingual people, but none who have programmed themselves to touch-type on different keyboards. Anyone out there willing to comment on the experience of bytypalism?

VN:F [1.9.10_1130]
Rating: 8.4/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 6 votes)

Nastiness in computer science

 

Recycled(This article was originally published in the CACM blog.)
 

Are we malevolent grumps? Nothing personal, but as a community computer scientists sometimes seem to succumb to negativism.

They admit it themselves. A common complaint in the profession (at least in academia) is that instead of taking a cue from our colleagues in more cogently organized fields such as physics, who band together for funds, promotion, and recognition, we are incurably fractious. In committees, for example, we damage everyone’s chances by badmouthing colleagues with approaches other than ours. At least this is a widely perceived view (“Circling the wagons and shooting inward,” as Greg Andrews put it in a recent discussion). Is it accurate?

One statistic that I have heard cited is that in 1-to-5 evaluations of projects submitted to the U.S. National Science Foundation the average grade of computer science projects is one full point lower than the average for other disciplines. This is secondhand information, however, and I would be interested to know if readers with direct knowledge of the situation can confirm or disprove it.

More such examples can be found in the material from a recent keynote by Jeffrey Naughton, full of fascinating insights (see his Powerpoint slides External Link). Naughton, a database expert, mentions that only one paper out of 350 submissions to SIGMOD 2010 received a unanimous “accept” from its referees, and only four had an average accept recommendation. As he writes, “either we all suck or something is broken!

Much of the other evidence I have seen and heard is anecdotal, but persistent enough to make one wonder if there is something special with us. I am reminded of a committee for a generously funded CS award some time ago, where we came close to not giving the prize at all because we only had “good” proposals, and none that a committee member was willing to die for. The committee did come to its senses, and afterwards several members wondered aloud what was the reason for this perfectionism that almost made us waste a great opportunity to reward successful initiatives and promote the discipline.

We come across such cases so often—the research project review that gratuitously but lethally states that you have “less than a 10% chance” of reaching your goals, the killer argument  “I didn’t hear anything that surprised me” after a candidate’s talk—that we consider such nastiness normal without asking any more whether it is ethical or helpful. (The “surprise” comment is particularly vicious. Its real purpose is to make its author look smart and knowledgeable about the ways of the world, since he is so hard to surprise; and few people are ready to contradict it: Who wants to admit that he is naïve enough to have been surprised?)

A particular source of evidence is refereeing, as in the SIGMOD example.  I keep wondering at the sheer nastiness of referees in CS venues.

We should note that the large number of rejected submissions is not by itself the problem. Naughton complains that researchers spend their entire careers being graded, as if passing exams again and again. Well, I too like acceptance better than rejection, but we have to consider the reality: with acceptance rates in the 8%-20% range at good conferences, much refereeing is bound to be negative. Nor can we angelically hope for higher acceptance rates overall; research is a competitive business, and we are evaluated at every step of our careers, whether we like it or not. One could argue that most papers submitted to ICSE and ESEC are pretty reasonable contributions to software engineering, and hence that these conferences should accept four out of five submissions; but the only practical consequence would be that some other venue would soon replace ICSE and ESEC as the publication place that matters in software engineering. In reality, rejection remains a frequent occurrence even for established authors.

Rejecting a paper, however, is not the same thing as insulting the author under the convenient cover of anonymity.

The particular combination of incompetence and arrogance that characterizes much of what Naughton calls “bad refereeing” always stings when you are on the receiving end, although after a while it can be retrospectively funny; one day I will publish some of my own inventory, collected over the years. As a preview, here are two comments on the first paper I wrote on Eiffel, rejected in 1987 by the IEEE Transactions on Software Engineering (it was later published, thanks to a more enlightened editor, Robert Glass, in the Journal of Systems and Software, 8, 1988, pp. 199-246 External Link). The IEEE rejection was on the basis of such review gems as:

  • I think time will show that inheritance (section 1.5.3) is a terrible idea.
  • Systems that do automatic garbage collection and prevent the designer from doing his own memory management are not good systems for industrial-strength software engineering.

One of the reviewers also wrote: “But of course, the bulk of the paper is contained in Part 2, where we are given code fragments showing how well things can be done in Eiffel. I only read 2.1 arrays. After that I could not bring myself to waste the time to read the others.” This is sheer boorishness passing itself off as refereeing. I wonder if editors in other, more established disciplines tolerate such attitudes. I also have the impression that in non-CS journals the editor has more personal leverage. How can the editor of IEEE-TSE have based his decision on such a biased an unprofessional review? Quis custodiet ipsoes custodes?

“More established disciplines”: Indeed, the usual excuse is that we are still a young field, suffering from adolescent aggressiveness. If so, it may be, as Lance Fortnow has argued in a more general context, “time for computer science to grow up.” After some 60 or 70 years we are not so young any more.

What is your experience? Is the grass greener elsewhere? Are we just like everyone else, or do we truly have a nastiness problem in computer science?

VN:F [1.9.10_1130]
Rating: 9.5/10 (31 votes cast)
VN:F [1.9.10_1130]
Rating: +19 (from 19 votes)

A safe and stable solution

Reading about the latest hullabaloo around Android’s usage of Java, and more generally following the incessant flow of news about X suing Y in the software industry (with many combinations of X and Y) over Java and other object-oriented technologies, someone with an Eiffel perspective can only smile. Throughout its history, suggestions to use Eiffel have often been met initially — along with “Will Eiffel still be around next year?”, becoming truly riotous after 25 years — with objections of proprietariness, apparently because Eiffel initially came from a startup company. In contrast, many other approaches, from C++ to Smalltalk and Java, somehow managed to get favorable vibes from the media; the respective institutions, from AT&T to Xerox and Sun, must be disinterested benefactors of humanity.

Now many who believed this are experiencing a next-morning surprise, discovering under daylight that the person next to whom they wake up is covered with patents and lawsuits.

For their part, people who adopted Eiffel over the years and went on to develop project after project  do not have to stay awake worrying about legal issues and the effects of corporate takeovers; they can instead devote their time to building the best software possible with adequate methods, notations and tools.

This is a good time to recall the regulatory situation of Eiffel. First, the Eiffel Software implementation (EiffelStudio): the product can be used through either an open-source and a proprietary licenses. With both licenses the software is exactly the same; what differs is the status of the code users generate: with the open-source license, they are requested to make their own programs open-source; to keep their code proprietary, they need the commercial license. This is a fair and symmetric requirement. It is made even more attractive by the absence of any run-time fees or royalties of the kind typically charged by database vendors.

The open-source availability of the entire environment, over 2.5 millions line of (mostly Eiffel) code, has spurred the development of countless community contributions, with many more in progress.

Now for the general picture on the language, separate from any particular implementation. Java’s evolution has always been tightly controlled by Sun and now its successor Oracle. There may actually be technical arguments in favor of the designers retaining a strong say in the evolution of a language, but they no longer seem to apply any more now that most of the Java creators have left the company. Contrast this with Eiffel, which is entirely under the control of an international standards committee at ECMA International, the oldest and arguably the most prestigious international standards body for information technology. The standard is freely available online from the ECMA site [1]. It is also an ISO standard [2].

The standardization process is the usual ECMA setup, enabling any interested party to participate. This is not just a statement of principle but the reality, to which I can personally testify since, in spite of being the language’s original designer and author of the reference book, I lost countless battles in the discussions that led to the current standard and continue in preparation of the next version. While I was not always pleased on the moment, the committee’s collegial approach has led to a much more solid result than any single person could have achieved.

The work of ECMA TC49-TG4 (the Eiffel standard committee) has disproved the conventional view that committees can only design camels. In fact TC49-TG4 has constantly worked to keep the language simple and manageable, not hesitating to remove features deemed obsolete or problematic, while extending the range of the language and increasing the Eiffel programmer’s power of expression. As a result, Eiffel today is an immensely better language than when we started our work in 2002. Without a strong community-based process we would never, for example, have made Eiffel the first widespread language to guarantee void-safety (the compile-time removal of null-pointer-dereferencing errors), a breakthrough for software reliability.

Open, fair, free from lawsuits and commercial fights, supported by an enthusiastic community: for projects that need a modern quality-focused software framework, Eiffel is a safe and stable solution.

References

[1] ECMA International: Standard ECMA-367: Eiffel: Analysis, Design and Programming Language, 2nd edition (June 2006), available here (free download).

[2] International Organization for Standardization: ISO/IEC 25436:2006: Information technology — Eiffel: Analysis, Design and Programming Language, available here (for a fee; same text as [1], different formatting).

VN:F [1.9.10_1130]
Rating: 5.0/10 (33 votes cast)
VN:F [1.9.10_1130]
Rating: -1 (from 27 votes)

Stendhal on abstraction

This week we step away from our usual sources of quotations — the Hoares and Dijkstras and Knuths — in favor an author who might seem like an unlikely inspiration for a technology blog: Stendhal. A scientist may like anyone else be fascinated by Balzac, Flaubert, Tolstoy or Dostoevsky, but they live in an entirely different realm; Stendhal is the mathematician’s novelist. Not particularly through the themes of his works (as could be the case with  Borges or Eco), but because of their clear structure and elegant style,  impeccable in its conciseness and razor-like in its precision. Undoubtedly his writing was shaped by his initial education; he prepared for the entrance exam of the then very young École Polytechnique, although at the last moment he yielded instead to the call of the clarion.

The scientific way of thinking was not just an influence on his writing; he understood the principles of scientific reasoning and knew how to explain them. Witness the following text, which explains just about as well as anything I know the importance of abstraction. In software engineering (see for example [1]), abstraction is the key talent, a talent of a paradoxical nature: the basic ideas take a few minutes to explain, and a lifetime to master. In this effort, going back to the childhood memories of Henri Beyle (Stendhal’s real name) is not a bad start.

Stendhal’s Life of Henri Brulard is an autobiography, with only the thinnest of disguises into a novel (compare the hero’s name with the author’s). In telling the story of his morose childhood in Grenoble, the narrator grumbles about the incompetence of his first mathematics teacher, a Mr. Dupuy, who taught mathematics “as a set of recipes to make vinegar” (comme une suite de recettes pour faire du vinaigre) and tells how his father found a slightly better one, Mr. Chabert. Here is the rest of the story, already cited in [2]. The translation is mine; you can read the original below, as well as a German version. Instead of stacks and circles  — or a university’s commencement day, see last week’s posting — the examples invoke eggs and cheese, but wouldn’t you agree that this paragraph is as good a definition of abstraction, directly applicable to software abstractions, and specifically to abstract data types and object abstractions (yes, it does discuss “objects”!), as any other?

So I went to see Mr. Chabert. Mr. Chabert was indeed less ignorant than Mr. Dupuy. Through him I discovered Euler and his problems on the number of eggs that a peasant woman brings to the market where a scoundrel steals a fifth of them, then she leaves behind the entire half of the remainder and so forth. This opened my mind, I glimpsed what it means to use the tool called algebra. I’ll be damned if anyone had ever explained it to me; endlessly Mr. Dupuy spun pompous sentences on the topic, but never did he say this one simple thing: it is a division of labor, and like every division of labor it creates wonders by allowing the mind to concentrate all its forces on just one side of objects, on just one of their qualities. What difference it would have made if Mr. Dupuy had told us: This cheese is soft or is it hard; it is white, it is blue; it is old, it is young; it is mine, it is yours; it is light or it is heavy. Of so many qualities, let us only consider the weight. Whatever that weight is, let us call it A. And now, no longer thinking of cheese, let us apply to A everything we know about quantities. Such a simple thing; and yet no one was explaining it to us in that far-away province [3]. Since that time, however, the influence of the École Polytechnique and Lagrange’s ideas may have trickled down to the provinces.

References

[1] Jeff Kramer: Is abstraction the key to computing?, in Communications of The ACM, vol. 50, 2007, pages 36-42.
[2] Bertrand Meyer and Claude Baudoin: Méthodes de Programmation, Eyrolles, 1978, third edition, 1982.
[3] No doubt readers from Grenoble, site of great universities and specifically one of the shrines of French computer science, will appreciate how Stendhal calls it  “that backwater” (cette province reculée).

Original French text

J’allai donc chez M. Chabert. M. Chabert était dans le fait moins ignare que M. Dupuy. Je trouvai chez lui Euler et ses problèmes sur le nombre d’œufs qu’une paysanne apportait au marché lorsqu’un méchant lui en vole un cinquième, puis elle laisse toute la moitié du reste, etc., etc. Cela m’ouvrit l’esprit, j’entrevis ce que c’était que se servir de l’instrument nommé algèbre. Du diable si personne me l’avait jamais dit ; sans cesse M. Dupuy faisait des phrases emphatiques sur ce sujet, mais jamais ce mot simple : c’est une division du travail qui produit des prodiges comme toutes les divisions du travail et permet à l’esprit de réunir toutes ses forces sur un seul côté des objets, sur une seule de leurs qualités. Quelle différence pour nous si M. Dupuy nous eût dit : Ce fromage est mou ou il est dur ; il est blanc, il est bleu ; il est vieux, il est jeune ; il est à moi, il est à toi ; il est léger ou il est lourd. De tant de qualités ne considérons absolument que le poids. Quel que soit ce poids, appelons-le A. Maintenant, sans plus penser absolument au fromage, appliquons à A tout ce que nous savons des quantités. Cette chose si simple, personne ne nous la disait dans cette province reculée ; depuis cette époque, l’École polytechnique et les idées de Lagrange auront reflué vers la province.

German translation (by Benjamin Morandi)

Deshalb ging ich zu Herrn Chabert. In der Tat war Herr Chabert weniger ignorant als Herr Dupuy. Bei ihm fand ich Euler und seine Probleme über die Zahl von Eiern, die eine Bäuerin zum Markt brachte, als ein Schurke ihr ein Fünftel stahl, sie dann die Hälfte des Restes hinterliest u.s.w. Es hat mir die Augen geöffnet. Ich sah was es bedeutet, das Algebra genannte Werkzeug zu benutzen. Unaufhörlich machte Herr Dupuy emphatische Sätze über dieses Thema, aber niemals dieses einfache Wort: Es ist eine Arbeitsteilung, die wie alle Arbeitsteilungen Wunder herstellt und dem Geist ermöglicht seine Kraft ganz auf eine einzige Seite von Objekten zu konzentrieren, auf eine Einzige ihrer Qualitäten. Welch Unterschied für uns, wenn uns Herr Dupuy gesagt hätte: Dieser Käse ist weich oder er ist hart; er ist weiss, er ist blau; er ist alt, er ist jung; er gehört dir, er gehört mir; er ist leicht oder er ist schwer. Bei so vielen Qualitäten betrachten wir unbedingt nur das Gewicht. Was dieses Gewicht auch sei, nennen wir es A. Jetzt, ohne unbedingt weiterhin an Käse denken zu wollen, wenden wir auf A alles an, was wir über Mengen wissen. Diese einfach Sache sagte uns niemand in dieser zurückgezogenen Provinz; von dieser Epoche an werden die École Polytechnique und die Ideen von Lagrange in die Provinz zurückgeflossen sein.

VN:F [1.9.10_1130]
Rating: 9.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

From duplication to duplicity: a short history of the technology of knowledge transfer

1440:

Gutenberg discovers the secret of producing, out of one text, many books for the benefit of many people.

2007:

Von und Zu Guttenberg discovers the secret of producing, out of many texts, one book for the benefit of one person.

 

 

VN:F [1.9.10_1130]
Rating: 9.9/10 (16 votes cast)
VN:F [1.9.10_1130]
Rating: +10 (from 12 votes)

Email and its perils

Email is fast and convenient. It is also risky. Here are three common sources of incidents with email. They are not new, but they keep biting even the most experienced email users.

1. Risks of using Bcc

Alice writes to Bob. She wants Carol to know what she wrote, but she does not want Bob to know that she is keeping Carol informed. So she copies Carol in the form of a “Blind carbon copy” (Bcc). The Bcc mechanism is meant exactly for such situations: while Bcc recipients see the list of other recipients (To and Cc), these other recipients see no mention of Bcc recipients.

Now Carol sees the messages and responds to Alice. But to respond she uses, perhaps inadvertently, “Reply all”. The reply goes to both Alice and Bob. All of Alice’s efforts to keep mum about Carol’s involvement are lost!

The risk in this situation is that Alice has no way to control what Carol may do. At issue here is not a conscious effort by Carol to break confidentiality: in that case Alice could do nothing anyway as soon as she has sent Carol the information. The worrying possibility is that Carol may use “Reply all” by mistake.

Rule 1 (temporary): never use Bcc. Alice should send the message to Bob only, and forward a copy separately to Carol alone.

I start with this form of the rule because it is easy to remember and usually appropriate, but in one case it is too strong. Remove Bob from the picture. Then if Carol is a  person, it is pointless for Alice to use Bcc for her, rather than To (or Cc), since Carol knows everything there is to know. But now assume that Carol is actually the name of a mailing list. Members of the mailing list should know the originator, Alice, but they should not know about each other, or even about the name of the list. If these are the constraints, there is no risk in using Bcc. Hence the revised version of the rule:

Rule 1 (final): never use Bcc except for all recipients of a message.

Indeed what was wrong in the first example was not the use of Bcc as such, but the mix with To (or Cc). Bcc must be used only for either all the recipients of a message or none of them.

2. Risks of not using Bcc

Very recent case (today, actually) from the refereeing process of a prestigious computer science conference. The program chair sends to all authors of submitted papers,  say <submitters@famous_computer_science_conference.org>, a general message about the refereeing process. But he uses that address in the “To” or “Cc” field, not “bcc”.

One author, say Alice, has a question about the process and responds to the message, inadvertently using “Reply all”. Then:

  • Everyone knows that Alice has submitted a paper.
  • Many of the other authors are away and have “automatic reply” set up. So Alice herself now knows the names of quite a few other members of the community who have submitted papers!

famous_computer_science_conference.org” is the most important conference in its field, and essentially every researcher in that community submits at least one paper every year. Knowing who has submitted is confidential information. That information becomes interesting a few months later, when the conference program is published and you find out that Bob tried and was not accepted. All the more juicy information, of course, if Bob is a senior (and arrogant) researcher.

Rule 2: when a mailing list includes people who should not know who else is on the list, either set up the list so that only the administrators can post to it, or use Bcc (respecting Rule 1, in its final form).

3. The importance of not being Allison

Another common risk, which strikes all the time, is automatic address completion by email clients (Outlook, Thunderbird etc.). You type part of the name or address of a frequent correspondent, and the system completes the email address.  So convenient! Except when the completion is wrong and you do not check it.

I frequently receive email that was misaddressed because of unchecked completion. (The latest case was last week.) Here is an example of completion that was expected but did not happen. A few years ago, in an institution of which I was then a member, a high-level executive wanted to send to her secretary the results of a job candidate’s evaluation. Highly confidential stuff. The secretary was called Allison and had an address of the form <allison@our_department.com>. The executive was used to typing just all and relying on address completion. Somehow, that particular time the completion did not occur; perhaps she was not using her familiar email setup. As a result the message went to <all@our_department.com>, that is to say, everyone in the organization. The recipients were, of course, delighted to get the inside story on the candidate.

Rule 3:

  • Always check the recipient addresses visually and carefully.
  • Never hire a secretary called Allen, Allison or Allistair.
VN:F [1.9.10_1130]
Rating: 6.9/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

Speech synthesis technology

Rereading an article from last year’s August New Yorker, a discussion of e-paper devices and especially the Kindle by Nicholson Baker [1]:

Reading some of “Max,” a James Patterson novel, I experimented with the text-to-speech feature. The robo-reader had a polite, halting, Middle European intonation, like Tom Hanks in “The Terminal,” and it was sometimes confused by periods. Once it thought “miss.” was the abbreviation of a state name: “He loved the chase, the hunt, the split-second intersection of luck and skill that allowed him to exercise his perfection, his inability to Mississippi.” I turned the machine off.

Reference

Nicholson Baker: A New Page, in The New Yorker, August 3, 2009, pages 24-30, also available at http://www.newyorker.com/reporting/2009/08/03/090803fa_fact_baker?currentPage=all.

VN:F [1.9.10_1130]
Rating: 5.3/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 2 votes)