Would they?

If you use the Swiss lounge at Zurich airport and have the effrontery of asking for Internet access, you are given a voucher, valid 24 hours. It works (well, if you are ready to watch a Mercedes ad for a full thirty seconds, no early escape unless you want to restart from scratch). Otherwise I could not be writing this article.

The voucher shows the code you musts enter into the browser. It also shows your name, your flight, your seat number. These pieces of information and the connection between then are, then, in the system. Anyone with access to that system can precisely track what you did online.

Of course I am fantasizing. No one would ever do this. Why would they?

Predictably, many people leave their vouchers lying around when they leave for their flights. So you could use someone else’s voucher to engage in some dubious or downright illicit Internet practice, and shift the blame to the other person. But no one would ever do this. Why would they?

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Lamport, Turing

 

Theories abound (I have my own) about why it did not happen long ago, but at last Lamport did receive the Turing Award. For me what most characterizes him is a stream [1] of elegant, original, insightful articles providing solutions to one important and thorny problem after another. Some of these articles are well known but many gems are not; see for example his take on Buridan’s Ass, not even a computer science paper, offering a convincing treatment of a centuries-old riddle.

References

[1] Leslie Lamport: annotated publication list, here.

[2] Leslie Lamport: Buridan’s Principle, “to appear in Foundations of Physics“, dated 24 February 2012, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

New article: contracts in practice

For almost anyone programming in Eiffel, contracts are just a standard part of daily life; Patrice Chalin’s pioneering study of a few years ago [1] confirmed this impression. A larger empirical study is now available to understand how developers actually use contracts when available. The study, to published at FM 2014 [2] covers 21 programs, not just in Eiffel but also in JML and in Code Contracts for C#, totaling 830,000 lines of code, and following the program’s revision history for a grand total of 260 million lines of code over 7700 revisions. It analyzes in detail whether programmers use contracts, how they use them (in particular, which kinds, among preconditions, postconditions and invariants), how contracts evolve over time, and how inheritance interacts with contracts.

The paper is easy to read so I will refer you to it for the detailed conclusions, but one thing is clear: anyone who thinks contracts are for special development or special developers is completely off-track. In an environment supporting contracts, especially as a native part of the language, programmers understand their benefits and apply them as a matter of course.

References

[1] Patrice Chalin: Are practitioners writing contracts?, in Fault-Tolerant System, eds. Butler, Jones, Romanovsky, Troubitsyna, Springer LNCS, vol. 4157, pp. 100–113, 2006.

[2] H.-Christian Estler, Carlo A. Furia, Martin Nordio, Marco Piccioni and Bertrand Meyer: Contracts in Practice, to appear in proceedings of 19th International Symposium on Formal Methods (FM 2014), Singapore, May 2014, draft available here.

VN:F [1.9.10_1130]
Rating: 9.1/10 (10 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

New article: passive processors

 

The SCOOP concurrency model has a clear division of objects into “regions”, improving the clarity and reliability of concurrent programs by establishing a close correspondence between the object structure and the process structure. Each region has an associated “processor”, which executes operations on the region’s objects. A literal application of this rule implies, however, a severe performance penalty. As part of the work for his PhD thesis (defended two weeks ago), Benjamin Morandi found out that a mechanism for specifying certain processors as “passive” yields a considerable performance improvement. The paper, to be published at COORDINATION, describes the technique and its applications.

Reference

Benjamin Morandi, Sebastian Nanz and Bertrand Meyer: Safe and Efficient Data Sharing for Message-Passing Concurrency, to appear in proceedings of COORDINATION 2014, 16th International Conference on Coordination Models and Languages, Berlin, 3-6 June 2014, draft available here.
.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Agile book announced

My book “Agile! The Good, the Hype and the Ugly” will be published in a few weeks by Springer. The announced date is April 30 and there is a preview Amazon page: here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Learning to program, online

Introduction to Programming MOOCThe ETH introductory programming course, which I have taught since 2003 and used as the basis for the Springer Touch of Class textbook, is now available as a MOOC: an online course, open to anyone interested [1]. The project was started and led by Marco Piccioni.

The MOOC was released in September; although it was “open” (the other “O”) from the start, we have not publicized it widely until now, since we used it first for the benefit of students taking the course at ETH, and took advantage of this experience to polish it. If you follow the acronym buzz you may say it was first a “SPOC” (Small Personal Online Course”). The experience with our students has been extremely encouraging: they took it as a supplement to the lectures and widely praised its value.We hope that many others will find it useful as well.

MOOCs are hot but they have attracted as much criticism as hype. We have seen the objections: low completion rates, lack of direct human contact, threats to traditional higher education. Two things are clear, though: MOOCs are here to stay; and they have their own pedagogical advantages.

MOOCs are here to stay; one ignores them at one’s own peril. For courses on a popular topic, I believe that in a few years almost everyone will be teaching from a MOOC. Not in the sense of telling students “just follow this course on the Web and come back for the exam“, but as a basis for individual institutions’ courses. For example the students might be told to take the lessons online, then come to in-person lectures (“Flip The Classroom“) or discussion sessions. For any given topic, such as introduction to programming, only a handful of MOOCs will emerge in this role. We would like the ETH course to be one of them.

The question is not just to replace courses and textbooks with an electronic version. MOOCs enrich the learning experience. Introduction to Programming is a particularly fertile application area for taking advantage of technology: the presentation of programming methods and techniques becomes even more effective if students can immediately try out the ideas, compile the result, run it, and see the results on predefined tests. Our course offers many such interactive exercises, thanks to the E4Mooc (Eiffel for MOOCs) framework developed by Christian Estler [2]. This feature has proved to be a key attraction of the course for ETH students. Here for example is an exercise asking you to write a function that determines whether a string is a palindrome (reads identically in both directions):

The program area is pre-filled with a class skeleton where all that remains for you to enter is the algorithm for the relevant function. Then you can click “Compile” and, if there are no compilation errors, “Run” to test your candidate code against a set of predefined test values. One of the benefits for users taking the course is that there is no software to install: everything runs in the cloud, accessible from the browser. Here we see the MOOC not just as a technique for presenting standard material but as an innovative learning tool, opening up pedagogical techniques that were not previously possible.

Besides E4Mooc, our course relies on the Moodle learning platform. Our experience with Moodle has been pleasant; we noticed, for example, that students really liked the Moodle feature enabling them to gain virtual “badges” for good answers, to the point of repeating exercises until they got the badges. For instructors preparing the course, building a MOOC is a huge amount of work (that was not a surprise, people had told us); but it is worth the effort.

We noted that attendance at the lectures increased as compared to previous years. The matter is a natural concern: other than the cold November mornings in Zurich (one of the lectures is at 8 AM) there are many reasons not to show up in class:  the textbook covers much of the material; all the slides are online; so are the slides for exercise sessions (tutorials) and texts of exercises and some earlier exams; lectures were video-recorded in previous years, and students can access the old recordings. Our feeling is that the MOOC makes the course more exciting and gives students want to come to class and hear more.

The MOOC course is not tied to a particular period; you can take it whenever you like. (The current practice of offering MOOCs at fixed times is disappointing: what is the point of putting a course online if participants are forced to fit to a fixed schedule?)

Marco Piccioni and I are now off to our second MOOC, which will be a generalization of the first, covering the basics not just of programming but of computer science and IT overall, and will be available on one of the major MOOC platforms. We will continue to develop the existing MOOC, which directly supports our in-person course, and which we hope will be of use to many other people.

Take the MOOC, or tell a beginner near you to take it, and tell us what you think.

References

[1] Bertrand Meyer, Marco Piccioni and other members of the ETH Chair of Software Engineering: ETH Introduction to Programming MOOC, available here.

[2] H-Christian Estler, E4Mooc demo, available on YouTube: here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (11 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Eiffel as an expression language

A functional-programming style, or more generally a style involving more expressions and fewer instructions, is possible in Eiffel. In particular, Eiffel’s agent mechanism embeds a full functional-programming mechanism in the object-oriented framework of the language.

To make the notations simpler, we are discussing and tentatively implementing a number of proposed extensions. They involve no fundamental new language mechanisms, but provide new, more concise notations for existing mechanisms. Examples are:

  • Conditional expressions.
  • Implicit tuple, a rule allowing the omission of brackets for an actual argument when it is a tuple and the last argument, e.g. f (x, y, z) as an abbreviation for f ([x, y, z]) (an example involving just one argument). Tuples already provided the equivalent of a variable-argument (“varargs”) facility, but it is made simpler to use with this convention.
  • Parenthesis alias, making it possible to write just f (x, y) when f is an agent (closure, lambda expression, delegate etc. in other terminologies), i.e. treating f as if it were a function; the notation is simply an abbreviation for f.item ([x, y]) (an example that also takes advantage of implicit tuples). It has many other applications since a “parenthesis alias” can be defined for a feature of any class.
  • Avoiding explicit assignments to Result.
  • Type inference (to avoid explicitly specifying the type when it can be deduced from the context). This is a facility for the programmer, useful in particular for local variables, but does not affect the type system: Eiffel remains strongly typed, it is just that you can be lazy about writing the type when there is no ambiguity.
  • In the same vein, omitting the entire list of generic parameters when it can be inferred.

The description of the mechanism (see the link in [1]) is in the form of a set of slides explaining the concepts and presenting example. This is a working document and feedback is welcome.

References

[1] Eiffel as an expression language, Eiffel Software working document, 2012-2014, see here.

VN:F [1.9.10_1130]
Rating: 9.8/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

LASER 2014 (Elba, September)

2014 marks the 10-th anniversary (11th edition) of the LASER summer school. The school will be held September 7-14, 2014, and the detailed information is here.

LASER (the name means Laboratory for Applied Software Engineering Research) is dedicated to practical software engineering. The roster of speakers since we started is a who’s who of innovators in the field. Some of the flavor of the school can gathered from the three proceedings volumes published in Springer LNCS (more on the way) or simply by browsing the pages of the schools from previous years.

Usually we have a theme, but to mark this anniversary we decided to go for speakers first; we do have a title, “Leading-Edge Software Engineering”, but broad enough to encompass a wide variety of a broad range of topics presented by star speakers: Harald Gall, Daniel Jackson, Michael Jackson, Erik Meijer (appearing at LASER for the third time!), Gail Murphy and Moshe Vardi. With such a cast you can expect to learn something important regardless of your own primary specialty.

LASER is unique in its setting: a 5-star hotel in the island paradise of Elba, with outstanding food and countless opportunities for exploring the marvelous land, the beaches, the sea, the geology (since antiquity Elba has been famous for its stones and minerals) and the history, from the Romans to Napoleon, who in the 9 months of his reign changed the island forever. The school is serious stuff (8:30 to 13 and 17 to 20 every day), but with enough time to enjoy the surroundings.

Registration is open now.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

PhD positions in concurrency/distribution/verification at ETH

As part of our “Concurrency Made Easy” ERC Advanced Investigator Grant project (2012-2017), we are offering PhD positions at the Chair of Software Engineering of ETH Zurich. The goal of the project is to build a sophisticated programming and verification architecture to make concurrent and distributed programming simple and reliable, based on the ideas of Eiffel and particularly the SCOOP concurrency model. Concurrency in its various forms (particularly multithreading) as well as distributed computing are required for most of today’s serious programs, but programming concurrent applications remains a challenge. The CME project is determined to break this complexity barrier.  Inevitably, achieving simplicity for users (in this case, application programmers) requires, under the hood, a sophisticated infrastructure, both conceptual (theoretical models) and practical (the implementation). We are building that infrastructure.

ETH offers an outstanding research and education environment and competitive salaries for “assistants” (PhD students), who are generally expected in addition to their research to participate in teaching, in particular introductory programming, and other activities of the Chair.  The candidates we seek have: a master’s degree in computer science or related field from a recognized institution (as required by ETH); a strong software engineering background, both practical and theoretical, and more generally a strong computer science and mathematical culture; a good knowledge of verification techniques (e.g. Hoare-style, model-checking, abstract interpretation); some background in concurrency or distribution; and a passion for high-quality software development. Prior publications, and experience with Eiffel, are pluses. In line with ETH policy, particular attention will be given to female candidates.

Before applying, you should become familiar with our work; see in particular the research pages at se.ethz.ch including the full description of the CME project at cme.ethz.ch.

Candidates should send (in PDF or text ) to se-open-positions@lists.inf.ethz.ch a CV and a short cover letter describing their view of the CME project and ideas about their possible contribution.

VN:F [1.9.10_1130]
Rating: 5.7/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Negative variables: new version

I have mentioned this paper before (see the earlier blog entry here) but it is now going to be published [1] and has been significantly revised, both to take referee comments into account and because we found better ways to present the concepts.

We have  endeavored to explain better than in the draft why the concept of negative variable is necessary and why the usual techniques for modeling object-oriented programs do not work properly for the fundamental OO operation, qualified call x.r (…). These techniques are based on substitution and are simply unable to express certain properties (let alone verify them). The affected properties are those involving properties of the calling context or the global project structure.

The basic idea (repeated in part from the earlier post) is as follows. In modeling OO programs, we have to take into account the unique “general relativity” property of OO programming: all the operations you write are expressed relative to a “current object” which changes repeatedly during execution. More precisely at the start of a call x.r (…) and for the duration of that call the current object changes to whatever x denotes — but to determine that object we must again interpret x in the context of the previous current object. This raises a challenge for reasoning about programs; for example in a routine the notation f.some_reference, if f is a formal argument, refers to objects in the context of the calling object, and we cannot apply standard rules of substitution as in the non-OO style of handling calls.

We introduced a notion of negative variable to deal with this issue. During the execution of a call x.r (…) the negation of x , written x’, represents a back pointer to the calling object; negative variables are characterized by axiomatic properties such as x.x’= Current and x’.(old x)= Current.

Negative variable as back pointer

The paper explains why this concept is necessary, describes the associated formal rules, and presents applications.

Reference

[1] Bertrand Meyer and Alexander Kogtenkov: Negative Variables and the Essence of Object-Oriented Programming, to appear in Specification, Algebra, and Software, eds. Shusaku Iida, Jose Meseguer and Kazuhiro Ogata, Springer Lecture Notes in Computer Science, 2014, to appear. See text here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Saint Petersburg Software Engineering Seminar: 14 January 2014 (6 PM)

There will be two talks in the Software Engineering Seminar at ITMO, 18:00 local time, Tuesday, January 14, 2014. Please arrive 10 minutes early for registration.

Place: ITMO, Sytninskaya Ulitsa, Saint Petersburg.

Andrey Terekhov (SPBGU): Programming crystals

(I do not know whether this talk will be in Russian or English. An abstract follows but the talk is meant as the start of a discussion rather than a formal lecture.)

В течение последних 20-30 лет основными языками программирования кристаллов были VHDL и Verilog. Эти языки изначально проектировались как средства создания проектной документации, потом они стали использоваться в качестве инструмента моделирования и только сравнительно недавно для этих языков появились средства генерации кода уровня RTL (Register Transfer Language). Тексты на  VHDL и Verilog очень громоздки, трудно читаемые, плохо стандартизованы (одна и та же программа может синтезироваться на одном инструменте и не поддаваться синтезу на другом. Лет 10 назад появился язык SystemC – это С++ с огромным набором библиотек. С одной стороны, любая программа на SystemC может транслироваться стандартными трансляторами С++ , есть удобные средства потактного моделирования и приличные средства генгерации RTL, с другой стороны, требование совместимости с С++ не прошло даром – если в базовом языке нет средств описания параллелизма и конвейеризации, их приходится добавлять весьма искусственными приемами через приставные библиотеки. Буквально в прошлом году фирма Xilinx выпустила продукт Vivado, в рекламе которого утверждается, что он способен автоматически транслировать обычные программы на С/C++ в RTL промышленного качества.

Мы выполнили несколько экспериментов по использованию этого продукта, оказалось, что обещанной автоматизации там нет, пользователь должен писать на С, постоянно думая о том, как его код будет выглядеть в финальном RTL,  расставлять огромное количество прагм, причем не всегда очевидных.

Основной тезис доклада – такая важная область, как проектирование кристаллов, нуждается в специализированных языковых и инструментальных средствах, обеспечивающих  создание компактных и  легко читаемых программ, которые могут быть использованы как для симуляции, так и для генерации эффективного RTL. В докладе будут приведены примеры программ на языке HaSCoL (Hardware and Software Codesign Language), разработанном на кафедре системного программирования СПбГУ, и даны некоторые сравнительные характеристики.

Sergey Velder (ITMO): Alias graphs

(My summary – BM.) In the ITMO SEL work on automatic alias analysis, a new model has been developed: alias graphs, an abstraction of the object structure. This short talk will compare it to previously used approaches.

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Niklaus Wirth birthday symposium, 20 February, Zurich

In honor of Niklaus Wirth’s 80-th birthday we are organizing a symposium at ETH on February 20, 2014. This is a full-day event with invited talks by:

  • Vint Cerf
  • Hans Eberlé
  • Michael Franz
  • me
  • Carroll Morgan
  • Martin Odersky
  • Clemens Szyperski
  • Niklaus Wirth himself

From the symposium’s web page:

Niklaus Wirth was a Professor of Computer Science at ETH Zürich, Switzerland, from 1968 to 1999. His principal areas of contribution were programming languages and methodology, software engineering, and design of personal workstations. He designed the programming languages Algol W, Pascal, Modula-2, and Oberon, was involved in the methodologies of structured programming and stepwise refinement, and designed and built the workstations Lilith and Ceres. He published several text books for courses on programming, algorithms and data structures, and logical design of digital circuits. He has received various prizes and honorary doctorates, including the Turing Award, the IEEE Computer Pioneer, and the Award for outstanding contributions to Computer Science Education.

Participation is free (including breaks, lunch and the concluding “Apéro”) but space is strictly limited and we expect to run out of seats quickly. So if you are interested (but only if you are certain to attend) please register right away.

Symposium page and access to registration form: here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +5 (from 5 votes)

New paper: alias calculus and frame inference

For a while now I have  been engaged in  a core problem of software verification: the aliasing problem. As with many difficult problems in science, it is easy to state the basic question: can we determine automatically whether at a program point p the values of two reference expressions e and f can ever denote the same object?

Alias analysis lies at the core of many problems in software analysis and verification.

Earlier work [2] I introduced an “alias calculus”. The calculus is a set of rules, attached to the constructs of the programming language, to compute the “alias relation”: the set of possibly aliased expression pairs. A new paper [1] with Sergey Velder and Alexander Kogtenkov improves the model (correcting in particular an error in the axiom for assignment, whose new version has been proved sound using Coq) and applies it to the inference of frame properties. Here the abstract:

Alias analysis, which determines whether two expressions in a program may reference to the same object, has many potential applications in program construction and verification. We have developed a theory for alias analysis, the “alias calculus”, implemented its application to an object-oriented language, and integrated the result into a modern IDE. The calculus has a higher level of precision than many existing alias analysis techniques. One of the principal applications is to allow automatic change analysis, which leads to inferring “modifies clauses”, providing a significant advance towards addressing the Frame Problem. Experiments were able to infer the “modifies” clauses of an existing formally specied library. Other applications, in particular to concurrent programming, also appear possible. The article presents the calculus, the application to frame inference including experimental results, and other projected applications. The ongoing work includes building more efficient model capturing aliasing properties and soundness proof for its essential elements.

This is not the end of the work, as better models and implementations are needed, but an important step.

References

[1] Sergey Velder, Alexander Kogtenkovand Bertrand Meyer: Alias Calculus, Frame Calculus and Frame Inference, in Science of Computer Programming, to appear in 2014 (appeared online 26 November 2013); draft available here, published version here.
[2] Bertrand Meyer: Steps Towards a Theory and Calculus of Aliasing, in International Journal of Software and Informatics, Chinese Academy of Sciences, 2011, pages 77-116, available here.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

The biggest software-induced disaster ever

 

In spite of the brouhaha surrounding the Affordable Care Act, the US administration and its partisans seem convinced that “the Web site problems will be fixed”.

That is doubtful. All reports suggest that the problem is not to replace a checkbox by a menu, or buy a few more servers. The analysis, design and implementation are wrong, and the sites will not work properly any time soon.

Barring sabotage (for which we have seen no evidence), this can only be the result of incompetence. An insurance exchange? Come on. Any half-awake group of developers could program it over breakfast.

Who chose the contractors?

When the problems first surfaced a few weeks ago, anyone with experience and guts would have done the right thing: fire all the companies responsible for  the mess and start from scratch with a dedicated, competent and well-managed team.

The latest promises published are that by the end of the month “four out of five” of the people trying to register will manage to do it. Nice. Imagine that when trying to make a purchase at Amazon you would succeed 80% of the time.

And that is only an optimistic goal.

The people building the site do not have infinite time. In fact, the process is crucially time-driven: if people do not get health coverage in time, they will be fined. But what if they cannot get coverage because the Web sites do not respond, or mess up?

Consider for a second another example of another strictly time-driven project: on January 1, 2002, twelve countries switched to a common currency, with the provision that their current legal tender would lose its status only a bare two months later. The IT infrastructure had to work on the appointed day. It did. How come Europe could implement the Euro in time and the US cannot get a basic health exchange to work?

Here is a possible scenario: the sites do not work (cannot handle the load, give inconsistent results). A massive wave of protests ensues, boosted by those who were against universal health coverage in the first place. Faced with popular revolt and with the evidence, the administration announces that the implementation of the universal mandate — the enforcement of the fines — is delayed by a year. In a year much can happen; opposition grows and the first exchanges are an economic disaster since the “young healthy adults” feel no pressure to enroll. The law fades into oblivion. Americans do not get universal health care for another generation. Show me it is not going to happen.

The software engineering lessons here are clear: hire competent companies; faced with a complicated system, implement the essential functions first, but stress-test them; deploy step by step, with the assurance that whatever is deployed works.

The exact reverse strategy was applied. As a result, we face the prospect of a software disaster that will dwarf Y2K and other famous mishaps; a disaster that software engineering textbooks will feature for decades to come.

 

VN:F [1.9.10_1130]
Rating: 9.6/10 (16 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 9 votes)

Informatics education in Europe: Just the facts

 

In 2005 a number of us started Informatics Europe [1], the association of university departments and industrial research labs in computer science in Europe. The association has now grown to 80 members across the entire continent; it organizes the annual European Computer Science Summit and has published a number of influential reports. The last one just came out: Informatics Education in Europe: Institutions, Degrees, Students, Positions, Salaries — Key Data 2008-2012 [2]. The principal author is Cristina Pereira, who collected and organized the relevant data over more than a year; I helped with the preparation of the final text.

At the beginning of Informatics Europe we considered with particular attention  the model of the Computing Research Association [3], which played a crucial role in giving computer science (informatics) its due place in the US academic landscape. Several past and current officers of the CRA,  such as Willy Zwaenepoel, Ed Lazowska, Bob Constable, Andy Bernat, Jeannette Wing, Moshe Vardi and J Strother Moore gave keynotes at our early conferences and we of course asked them for the secrets of their organization’s success. One answer that struck us was the central role played by data collection. Just gathering the  facts, such as degrees and salaries, established for the first time a solid basis for serious discussions. We took this advice to heart and the report is the first result.

Gathering the information is particularly difficult for Europe given the national variations and the absence of centralized statistical data. Even the list of names under which institutions teach informatics in Europe fills a large table in the report. Cristina’s decision was, from the start, to favor quality over quantity: to focus on impeccable data for countries for which we could get it, rather than trying to cover the whole continent with data of variable credibility.

The result is the first systematic repository of basic information on informatics education in Europe: institutions, degrees offered and numbers awarded, student numbers, position titles and definitions, and (a section which will not please everyone) salaries for PhD students, postdocs and professors of various ranks.

The report is a first step; it only makes sense if we can regularly continue to update it and particularly extend it to other countries. But even in its current form (and with the obvious observation that my opinion is not neutral)  I see it as a major step forward for the discipline in Europe. We need an impeccable factual basis to convince the public at large and political decision-makers to give informatics the place it deserves in today’s educational systems.

References

[1] Informatics Europe site, see here.

[2] Cristina Pereira and Bertrand Meyer: Informatics Education in Europe: Institutions, Degrees, Students, Positions, Salaries — Key Data 2008-2012, Informatics Europe report, 30 September 2013, available here.

[3] Computing Research Association (US), see here.

.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Hungarian rotation

The 2013 Informatics Europe “Best Practices in Education” award was devoted, this year, to initiatives for teaching informatics in schools [1].  It was given out last week at the European Computer Science Summit in Amsterdam [2]. Two teams shared it, one from Poland and the other from Romania. Both teams showed excellent projects, but the second was beyond anything I expected.

The project comes from the Hungarian-speaking Sapientia University in Transylvania and is devoted to teaching algorithms visually and “at the same time enhancing intercultural communication” in the region. It illustrates the classical sorting algorithms through folk dances. Quicksort is Hungarian, selection sort is gypsy, merge sort is “Transylvanian-saxon”. I think my favorite is Shell sort [3]. For more, see their YouTube channel [4].

Now  if only they could act the loop invariants [5].

References

[1] 2013 Best Practices in Education award, see here.

[2] 2013 European Computer Science Summit, here.

[3] “Shell sort with Hungarian (Székely) folk dance”, see here.

[4] YouTube Algorythmics channel, here.

[5] Carlo Furia, Bertrand Meyer and Sergey Velder: Loop invariants: Analysis, Classification and Examples, in ACM Computing Surveys, Septembre 2014, to appear, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (9 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

The secret of success

In the process of finishing a book right now, it occurred to me that writing is really a simple matter. There are only three issues to address:

  1. How to start.
  2. How to finish.
  3. How to take care of the stuff in-between.

In the early stages  (1) you face the anguish of the “empty paper, defended by its whiteness” (Mallarmé) and do not know where to begin. If you overcome it, you have all the grunt work to do (3), step after step. At the end (2), while deep down you feel you have done enough and should be permitted to  click “Publish”, you still have to fill the holes, which may be small in number but have remained holes for a reason: you have again and again delayed filling them because in reality you do not know what to write there.

All things considered, then, it is not such a big deal.

VN:F [1.9.10_1130]
Rating: 9.9/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Reuse of another kind

This is a plug for a family member, but for a good cause: reuse and the environment. There are many reasons for promoting reuse in software; in other fields some of those reasons apply too, plus many others, economic and environmental. The core concern is to reduce waste of all kinds.

One of the most appalling sources of waste is the growing use of throw-away food containers. It is also avoidable.

A new company, bizeebox [1], has devised an ingenious scheme to let restaurants use containers that can be washed and reused many times. It is easy to use by both restaurants and customers, saves money, and avoids heaping tons of waste on landfills. Their slogan: “Take the Waste Out of Takeout“.

The founders of bizeebox have run successful pilot projects and are now starting for real with restaurants in the Ann Arbor area. They have launched a campaign on Indiegogo, with a $30,000 funding goal. Reuse deserves a chance; so do they.

References

[1] bizeebox page, at bizeebox.com.

[2] bizeebox Indiegogo campaign: here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

The laws of branching (part 2): Tichy and Joy

Recently I mentioned the first law of branching (see earlier article) to Walter Tichy, famed creator of RCS, the system that established modern configuration management. He replied with the following anecdote, which is worth reproducing in its entirety (in his own words):

I started work on RCS in 1980, because I needed an alternative for SCCS, for which the license cost would have been prohibitive. Also, I wanted to experiment with reverse deltas. With reverse deltas, checking out the latest version is fast, because it is stored intact. For older ones, RCS applied backward deltas. So the older revisions took longer to extract, but that was OK, because most accesses are to the newest revision anyway.

At first, I didn’t know how to handle branches in this scheme. Storing each branch tip in full seemed like a waste. So I simply left out the branches.

It didn’t take long an people were using RCS. Bill Joy, who was at Berkeley at the time and working on Berkeley Unix, got interested. He gave me several hints about unpleasant features of SCCS that I should correct. For instance, SCCS didn’t handle identification keywords properly under certain circumstances, the locking scheme was awkward, and the commands too. I figured out a way to solve these issue. Bill was actually my toughest critic! When I was done with all the modifications, Bill cam back and said that he was not going to use RCS unless I put in branches. So I figured out a way. In order to reconstruct a branch tip, you start with the latest version on the main trunk, apply backwards deltas up to the branch point, and then apply forward deltas out to the branch tip. I also implemented a numbering scheme for branches that is extensible.

When discussing the solution, Bill asked me whether this scheme meant that it would take longer to check in and out on branches. I had to admit that this was true. With the machines at that time (VAXen) efficiency was important. He thought about this for a moment and then said that that was actually great. It would discourage programmers from using branches! He felt they were a necessary evil.

VN:F [1.9.10_1130]
Rating: 6.4/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: -1 (from 3 votes)

Another displaced business

Front-page notice in yesterday’s Tages Anzeiger (one of the principal Swiss newspapers):

Dear Readers:

From today the employment-ads section will no longer appear as a separate supplement, but directly as a section of the Tuesdays and Thursday paper. The reason is the ever smaller number of position offerings.

It seems clear that what has decreased is not the number of positions offered — the job market in Switzerland is healthy — but the number of those offered through newspapers. End of an era.

VN:F [1.9.10_1130]
Rating: 9.5/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

The laws of branching (part 1)

 

The first law of branching is: don’t. There is no other law.

The only sane way to develop software in a group, whether collocated or distributed, is to have a single branch (“trunk”) to which everyone commits changes, with constant running of the regression test suite to make sure that any breaking change is detected and corrected right away.

To allow branching, that is to say the emergence of separate lines of development with the expectation that they will be merged back later on, is to guarantee disaster. It is easy to diverge, but hard to converge; not only hard, but unpredictable. It can take days, weeks or more to reconcile changes and resolve conflicts, when the reason for the changes is no longer fresh in the developers’ memories, and the developers themselves may even no longer be there. Conflicts should be detected right away, and corrected immediately.

The EiffelStudio development never uses branches. A related development, EVE (Eiffel Verification Environment), maintained at ETH, includes all research tools, and is reconciled every Friday with the EiffelStudio trunk, so it is never allowed to diverge into a separate branch. Most other successful teams I know apply the first law of branching too. Solve conflicts before they have had the time to become conflicts.

VN:F [1.9.10_1130]
Rating: 5.4/10 (26 votes cast)
VN:F [1.9.10_1130]
Rating: -8 (from 18 votes)

Swiss railway prowess

Yesterday, wanting to find out the platform of an arriving  train (stations are good at displaying departures, but don’t seem to consider arrival information worth showing), I fired up the “SBB Business” mobile application of the Swiss railways which I purchased some time ago:

SBB schedule

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The results are impressive: one can, they tell us, ride from Paris to Zurich — about 500 kilometers or 300 miles — in 56 minutes by leaving at 17:02. Wow! No wonder the entry displays (on the right) the graphical symbol for the highest possible expected passenger load, in both first and second class. With such a speed I would flock to that train too!

Nonsense of course. The TGV is fast, at least in the French part (from Basel to Zurich they seem to make it as slow as possible to prove some point), but it still takes four hours and three minutes. I have no idea where the program got the information it displays.

Trying to tap the “earlier” or “later” buttons does not help much, since what you get (consistently repeated if you keep trying, and confirmed again one day later) is this screen:

SBB error message

 

 

 

 

 

 

 

 

 

 

 

 

 

 

No clue what “F1″ means.

Whatever software solutions the SBB uses, I am not impressed. Of course, they are welcome to ask for my suggestions.

VN:F [1.9.10_1130]
Rating: 9.7/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Empirical answers to fundamental software engineering questions

This is a slightly reworked version of an article in the CACM blog, which also served as the introduction to a panel which I moderated at ESEC/FSE 2013 last week; the panelists were Harald Gall, Mark Harman, Giancarlo Succi (position paper only) and Tony Wasserman.

For all the books on software engineering, and the articles, and the conferences, a remarkable number of fundamental questions, so fundamental indeed that just about every software project runs into them, remain open. At best we have folksy rules, some possibly true, others doubtful, and others — such as “adding people to a late software project delays it further” [1] — wrong to the point of absurdity. Researchers in software engineering should, as their duty to the community of practicing software practitioners, try to help provide credible answers to such essential everyday questions.

The purpose of this panel discussion is to assess what answers are already known through empirical software engineering, and to define what should be done to get more.

Empirical software engineering” applies the quantitative methods of the natural sciences to the study of software phenomena. One of its tasks is to subject new methods — whose authors sometimes make extravagant and unsupported claims — to objective scrutiny. But the benefits are more general: empirical software engineering helps us understand software construction better.

There are two kinds of target for empirical software studies: products and processes. Product studies assess actual software artifacts, as found in code repositories, bug databases and documentation, to infer general insights. Project studies assess how software projects proceed and how their participants work; as a consequence, they can share some properties with studies in other fields that involve human behavior, such as sociology and psychology. (It is a common attitude among computer scientists to express doubts: “Do you really want to bring us down to the standards of psychology and sociology?” Such arrogance is not justified. These sciences have obtained many results that are both useful and sound.)

Empirical software engineering has been on a roll for the past decade, thanks to the availability of large repositories, mostly from open-source projects, which hold information about long-running software projects and can be subjected to data mining techniques to identify important properties and trends. Such studies have already yielded considerable and often surprising insights about such fundamental matters as the typology of program faults (bugs), the effectiveness of tests and the value of certain programming language features.

Most of the uncontested successes, however, have been from the product variant of empirical software engineering. This situation is understandable: when analyzing a software repository, an empirical study is dealing with a tangible and well-defined artifact; if any of the results seems doubtful, it is possible and sometimes even easy for others to reproduce the study, a key condition of empirical science. With processes, the object of study is more elusive. If I follow a software project working with Scrum and another using a more traditional lifecycle, and find that one does better than the other, how do I know what other factors may have influenced the outcome? And even if I bring external factors under control how do I compare my results with those of another researcher following other teams in other companies? Worse, in a more realistic scenario I do not always have the luxury of tracking actual industry projects since few companies are enlightened enough to let researchers into their developments; how do I know that I can generalize to industry the conclusions of experiments made with student groups?

Such obstacles do not imply that sound results are impossible; studies involving human behavior in psychology and sociology face many of the same difficulties and yet do occasionally yield insights. But these obstacles explain why there are still few incontrovertible results on process aspects of software engineering. This situation is regrettable since it means that projects large and small embark on specific methods, tools and languages on the basis of hearsay, opinions and sometimes hype rather than solid knowledge.

No empirical study is going to give us all-encompassing results of the form “Agile methods yield better products” or “Object-oriented programming is better than functional programming”. We are entitled to expect, however, that they help practitioners assess some of the issues that await every project. They should also provide a perspective on the conventional wisdom, justified or not, that pervades the culture of software engineering. Here are some examples of general statements and questions on which many people in the field have opinions, often reinforced by the literature, but crying for empirical backing:

  • The effect of requirements faults: the famous curve by Boehm is buttressed by old studies on special kinds of software (large mission-critical defense projects). What do we really lose by not finding an error early enough?
  • The cone of uncertainty: is that idea just folklore?
  • What are the successful techniques for shortening delivery time by adding manpower?
  • The maximum compressibility factor: is there a nominal project delivery time, and how much can a project decrease it by throwing in money and people?
  • Pair programming: when does it help, when does it hurt? If it has any benefits, are there in quality or in productivity (delivery time)?
  • In iterative approaches, what is the ideal time for a sprint under various circumstances?
  • How much requirements analysis should be done at the beginning of a project, and how much deferred to the rest of the cycle?
  • What predictors of size correlate best with observed development effort?
  • What predictors of quality correlate best with observed quality?
  • What is the maximum team size, if any, beyond which a team should be split?
  • Is it better to use built-in contracts or just to code assertions in tests?

When asking these and other similar questions relating to core aspects of practical software development, I sometimes hear “Oh, but we know the answer conclusively, thanks to so-and-so’s study“. This may be true in some cases, but in many others one finds, in looking closer, that the study is just one particular experiment, fraught with the same limitations as any other.

The principal aim of the present panel is to find out, through the contributions of the panelists which questions have useful and credible empirical answers already available, whether or not widely known. The answers must indeed be:

  • Empirical: obtained through objective quantitative studies of projects.
  • Useful: providing answers to questions of interest to practitioners.
  • Credible: while not necessarily absolute (a goal difficult to reach in any matter involving human behavior), they must be backed by enough solid evidence and confirmation to be taken as a serious input to software project decisions.

An auxiliary outcome of the panel should be to identify fundamental questions on which credible, useful empirical answers do not exist but seem possible, providing fuel for researchers in the field.

To mature, software engineering must shed the folkloric advice and anecdotal evidence that still pervade the field and replace them with convincing results, established with all the limitations but also the benefits of quantitative, scientific empirical methods.

Note

[1] From Brooks’s Mythical Man-Month.

VN:F [1.9.10_1130]
Rating: 9.6/10 (9 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

Concurrency video

Our Concurrency Made Easy project, the result of an ERC Advanced Investigator Grant, is trying to solve the problem of making concurrent programming simple, reliable and effective. It has spurred related efforts, in particular the Roboscoop project applying concurrency to robotics software.

Sebastian Nanz and other members of the CME project at ETH have just produced a video that describes the aims of the project and presents some of the current achievements. The video is available on the CME project page [1] (also directly on YouTube [2]).

References

[1] Concurrency Made Easy project, here.

[2] YouTube CME video, here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (16 votes cast)
VN:F [1.9.10_1130]
Rating: +14 (from 14 votes)

Smaller, better textbook

A new version of my Touch of Class [1] programming textbook is available. It is not quite a new edition but more than just a new printing. All the typos that had been reported as of a few months ago have been corrected.

The format is also significantly smaller. This change is more than a trifle. When а  reader told me for the first time “really nice book, pity it is so heavy!”, I commiserated and did not pay much attention. After twenty people said that, and many more after them, including professors looking for textbooks for their introductory programming classes, I realized it was a big deal. The reason the book was big and heavy was not so much the size of the contents (876 is not small, but not outrageous for a textbook introducing all the fundamental concepts of programming). Rather, it is a technical matter: the text is printed in color, and Springer really wanted to do a good job, choosing thick enough paper that the colors would not seep though. In addition I chose a big font to make the text readable, resulting in a large format. In fact I overdid it; the font is bigger than necessary, even for readers who do not all have the good near-reading sight of a typical 19-year-old student.

We kept the color and the good paper,  but reduced the font size and hence the length and width. The result is still very readable, and much more portable. I am happy to make my contribution to reducing energy consumption (at ETH alone, think of the burden on Switzerland’s global energy bid of 200+ students carrying the book — as I hope they do — every morning on the buses, trains and trams crisscrossing the city!).

Springer also provides electronic access.

Touch of Class is the textbook developed on the basis of the Introduction to Programming course [2], which I have taught at ETH Zurich for the last ten years. It provides a broad overview of programming, starting at an elementary level (zeros and ones!) and encompassing topics not usually covered in introductory courses, even a short introduction to lambda calculus. You can get an idea of the style of coverage of such topics by looking up the sample chapter on recursion at touch.ethz.ch. Examples of other topics covered include a general introduction to software engineering and software tools. The presentation uses full-fledged object-oriented concepts (including inheritance, polymorphism, genericity) right from the start, and Design by Contract throughout. Based on the “inverted curriculum” ideas on which I published a number of articles, it presents students with a library of reusable components, the Traffic library for graphical modeling of traffic in a city, and builds on this infrastructure both to teach students abstraction (reusing code through interfaces including contracts) and to provide them models of high-quality code for imitation and inspiration.

For more details see the article on this blog that introduced the book when it was first published [3].

References

[1] Bertrand Meyer, Touch of Class: An Introduction to Programming Well Using Objects and Contracts, Springer Verlag, 2nd printing, 2013. The Amazon page is here. See the book’s own page (with slides and other teaching materials, sample chapter etc.) here. (Also available in Russian, see here.)

[2] Einführung in die Programmierung (Introduction to Programming) course, English course page here.

[3] Touch of Class published, article on this blog, 11 August 2009, see [1] here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +6 (from 6 votes)

Barenboim = Rubinstein?

 

I have always admired Daniel Barenboim, both as a pianist and as a conductor — and not just because years ago, from pictures on disk covers, we looked strikingly alike, see e.g. [1] which could almost be me at that time (Then I went to see him in concert and realized that he was a good 15 centimeters shorter; the pictures were only head-and-shoulders. Since that time the difference of our physical appearances has considerably increased, not compensated, regrettably, by any decrease of the difference of our musical abilities.)

Nowadays you can find lots of good music, an unbelievable quantity in fact, on YouTube. Like this excellent performance of Beethoven’s Emperor Concerto [2] by Barenboim with a Danish orchestra.

If you go to that page and expand the “about” tab an interesting story unfolds. If I parse it right (I have no direct information) it is a record of a discussion between the person who uploaded the video and the YouTube copyright police. It seems YouTube initially rejected the upload on the basis that it violated no fewer than three different copyrights, all apparently for recordings of the concerto: one by the Berlin Philharmonic (pianist not named), one by Arthur Rubinstein (orchestra not named), and one by Ivan Szekely (orchestra not named). The uploader contested these copyright claims, pointing out that the performers are different in all four cases. It took a little more than a month before YouTube accepted the explanation and released the video on 22 April 2012.

Since the page clearly listed  the performers’ names and contained a full video, the initial copyright complaints must have been made on the basis of the audio track alone. Further, the detection must have been automatic, as it is hard to imagine that either YouTube or the copyright owners employ a full staff of music experts to listen all day to recordings on the web and,  once in a while, write an email of the form “I just heard something at http://musicsite.somewhere that sounds suspiciously close to bars 37-52  of what I remember from the `Adagio un poco mosso’ in the 1964 Rubinstein performance, or possibly his 1975 performance, of Beethoven’s Emperor“. (The conductor in Rubinstein’s 1975 recording, by the way, is… Daniel Barenboim.) Almost certainly, the check is done by a program which scours the Web for clones.

It seems, then, that the algorithm used by YouTube or whoever runs these checks can, reasonably enough, detect that a recording is from a certain piece of music, but — now the real scandal — cannot distinguish between Rubinstein and Barenboim.

If this understanding is correct one would like to think that some more research can solve the problem. That would assume that humans can always distinguish performers. On French radio there used to be a famous program, the “Tribune of Record Critics“, where for several hours on every Sunday the moderator would play excerpts of a given piece in various interpretations, and the highly opinionated star experts on the panel would praise some to the sky and excoriate others (“This Karajan guy — does he even know what music is about?“).  One day, probably an April 1st,  they broadcast a parody of themselves, pretending to fight over renditions of Beethoven’s The Ruins of Athens overture while all were actually the same recording being played again and again. After that I always wondered whether in normal instances of the program the technicians were not tempted once in a while to switch recordings to fool the experts. (The version of the program that runs today, which is much less fun, relies on blind tasting, if I may call it that way.) Presumably no professional listener would ever confuse the playing of Barenboim (the pianist) with that of Rubinstein. Presumably… and yet reading about the very recent Joyce Batto scandal [3], in which a clever fraudster  tricked the whole profession  for a decade about more than a hundred recordings, is disturbing.

If my understanding of the situation regarding the Barenboim video is correct, then it remarkable that any classical music recordings can appear at all on YouTube without triggering constant claims of copyright infringement; specifically, any multiple recordings of the same piece. In classical music, interpretation is crucial, and one never tires of comparing performances of the same piece by different artists, with differences that can be subtle at times and striking at others. Otherwise, why would we go hear Mahler’s 9th or see Cosi Fan Tutte after having been there, done that so many times? And now we can perform even more comparisons without leaving home, just by browsing YouTube. Try for example Schumann’s Papillons by Arrau, Kempf, Argerich and — my absolute favorite for many years — Richter. Perhaps a reader with expertise on the topic can tell us about the current state of plagiarism detection for music: how finely can it detect genuine differences of interpretation, without being fooled by simple tricks as were used in the Batto case?

Still. To confuse Barenboim with Rubinstein!

References and notes

[1] A photograph of the young Barenboim: see here.

[2] Video recording of performance of Beethoven’s 5th piano concert by Daniel Barenboim and Det kongelige kapel conducted by Michael Schønvandt on the occasion of the Sonning Prize award, 2009, uploaded to YouTube by “mugge62″ and available here.

[3] Wikipedia entry on Joyce Hatto and the Barrington-Coupe fraud, here.

VN:F [1.9.10_1130]
Rating: 9.9/10 (7 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 4 votes)

The invariants of key algorithms (new paper)

 

I have mentioned this paper before but as a draft. It has now been accepted by ACM’s Computing Surveys and is scheduled to appear in September 2014; the current text, revised from the previous version, is available [1].

Here is the abstract:

Software verification has emerged as a key concern for ensuring the continued progress of information technology. Full verification generally requires, as a crucial step, equipping each loop with a “loop invariant”. Beyond their role in verification, loop invariants help program understanding by providing fundamental insights into the nature of algorithms. In practice, finding sound and useful invariants remains a challenge. Fortunately, many invariants seem intuitively to exhibit a common flavor. Understanding these fundamental invariant patterns could therefore provide help for understanding and verifying a large variety of programs.

We performed a systematic identification, validation, and classification of loop invariants over a range of fundamental algorithms from diverse areas of computer science. This article analyzes the patterns, as uncovered in this study,governing how invariants are derived from postconditions;it proposes a taxonomy of invariants according to these patterns, and presents its application to the algorithms reviewed. The discussion also shows the need for high-level specifications based on “domain theory”. It describes how the invariants and the corresponding algorithms have been mechanically verified using an automated program prover; the proof source files are available. The contributions also include suggestions for invariant inference and for model-based specification.

Reference

[1] Carlo Furia, Bertrand Meyer and Sergey Velder: Loop invariants: Analysis, Classification and Examples, in ACM Computing Surveys, to appear in September 2014, preliminary text available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

Informatics: catch them early

 

Recycled[I occasionally post on the Communications of the ACM blog. It seems that there is little overlap between readers of that blog and the present one, so — much as I know, from software engineering, about the drawbacks of duplication — I will continue to repost articles here when relevant.]

Some call it computer science, others informatics, but they face the same question: when do we start teaching the subject? In many countries where high schools began to introduce it in the seventies, they actually retreated since then; sure, students are shown how to use word processors and spreadsheets, but that’s not the point.

Should we teach computer science in secondary (and primary) school? In a debate at SIGCSE a few years ago, Bruce Weide said in strong words that we should not: better give students the strong grounding in mathematics and especially logic that they will need to become good at programming and CS in general. I found the argument convincing: I teach first-semester introductory programming to 200 entering CS students every year, and since many have programming experience, of highly diverse nature but usually without much of a conceptual basis, I find myself unteaching a lot. In a simple world, high-school teachers would teach students to reason, and we would teach them to program. The world, however, is not simple. The arguments for introducing informatics earlier are piling up:

  • What about students who do not enter CS programs?
  • Many students will do some programming anyway. We might just as well teach them to do it properly, rather than let them develop bad practices and try to correct them at the university level.
  • Informatics is not just a technique but an original scientific discipline, with its own insights and paradigms (see [2] and, if I may include a self-citation, [3]). Its intellectual value is significant for all educated citizens, not just computer scientists.
  • Countries that want to be ahead of the race rather than just consumers of IT products need their population to understand the basic concepts, just as they want everyone, and not just future mathematicians, to master the basics of arithmetics, algebra and geometry.

These and other observations led Informatics Europe and ACM Europe, two years ago, to undertake the writing of a joint report, which has now appeared [1]. The report is concise and makes strong points, emphasizing in particular the need to distinguish education in informatics from a mere training in digital literacy (the mastery of basic IT tools, the Web etc.). The distinction is often lost on the general public and decision-makers (and we will surely have to emphasize it again and again).

The report proposes general principles for both kinds of programs, emphasizing in particular:

  • For digital literacy, the need to teach not just how-tos but also safe, ethical and effective use of IT resources and tools.
  • For informatics, the role of this discipline as a cross-specialty subject, like mathematics.

The last point is particularly important since we should make it clear that we are not just pushing (out of self-interest, as members of any discipline could) for schools to give our specialty a share, but that informatics is a key educational, scientific and economic resource for the citizens of any modern country.

The report is written from a European perspective, but the analysis and conclusions will, I think, be useful in any country.

It does not include any detailed curriculum recommendation, first because of the wide variety of educational contexts, but also because that next task is really work for another committee, which ACM Europe and Informatics Europe are in the process of setting up. The report also does not offer a magic solution to the key issue of bootstrapping the process — by finding teachers to make the courses possible, and courses to justify training the teachers — but points to successful experiences in various countries that show a way to break the deadlock.

The introduction of informatics as a full-fledged discipline in the K-12 curriculum is clearly where the winds of history are blowing. Just as the report was being finalized, the UK announced that it was making CS one of the choices of required scientific topics would become a topic in the secondary school exam on a par with traditional sciences. The French Academy of Sciences recently published its own report on the topic, and many other countries have similar recommendations in progress. The ACM/IE report is a major milestone which should provide a common basis for all these ongoing efforts.

References

[1] Informatics education: Europe Cannot Afford to Miss the Boat,  Report of the joint Informatics Europe & ACM Europe Working Group on Informatics Education,  April 2013,  available here.

[2] Jeannette Wing: Computational Thinking, in Communications of the ACM, vol. 49, no. 3, March 2006, pages 33-35, available here.

[3] Bertrand Meyer: Software Engineering in the Academy, in Computer (IEEE), vol. 34, no. 5, May 2001, pages 28-35, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Reading notes: strong specifications are well worth the effort

 

This report continues the series of ICSE 2013 article previews (see the posts of these last few days, other than the DOSE announcement), but is different from its predecessors since it talks about a paper from our group at ETH, so you should not expect any dangerously delusional,  disingenuously dubious or downright deceptive declaration or display of dispassionate, disinterested, disengaged describer’s detachment.

The paper [1] (mentioned on this blog some time ago) is entitled How good are software specifications? and will be presented on Wednesday by Nadia Polikarpova. The basic result: stronger specifications, which capture a more complete part of program functionality, cause only a modest increase in specification effort, but the benefits are huge; in particular, automatic testing finds twice as many faults (“bugs” as recently reviewed papers call them).

Strong specifications are specifications that go beyond simple contracts. A straightforward example is a specification of a push operation for stacks; in EiffelBase, the basic Eiffel data structure library, the contract’s postcondition will read

item =                                          /A/
count = old count + 1

where x is the element being pushed, item the top of the stack and count the number of elements. It is of course sound, since it states that the element just pushed is now the new top of the stack, and that there is one more element; but it is also  incomplete since it says nothing about the other elements remaining as they were; an implementation could satisfy the contract and mess up with these elements. Using “complete” or “strong” preconditions, we associate with the underlying domain a theory [2], or “model”, represented by a specification-only feature in the class, model, denoting a sequence of elements; then it suffices (with the convention that the top is the first element of the model sequence, and that “+” denotes concatenation of sequences) to use the postcondition

model = <x> + old model         /B/

which says all there is to say and implies the original postconditions /A/.

Clearly, the strong contracts, in the  /B/ style, are more expressive [3, 4], but they also require more specification effort. Are they worth the trouble?

The paper explores this question empirically, and the answer, at least according to the criteria used in the study, is yes.  The work takes advantage of AutoTest [5], an automatic testing framework which relies on the contracts already present in the software to serve as test oracles, and generates test cases automatically. AutoTest was applied to both to the classic EiffelBase, with classic partial contracts in the /A/ style, and to the more recent EiffelBase+ library, with strong contracts in the /B/ style. AutoTest is for Eiffel programs; to check for any language-specificity in the results the work also included testing a smaller set of classes from a C# library, DSA, for which a student developed a version (DSA+) equipped with strong model-based contracts. In that case the testing tool was Microsoft Research’s Pex [7]. The results are similar for both languages: citing from the paper, “the fault rates are comparable in the C# experiments, respectively 6 . 10-3 and 3 . 10-3 . The fault complexity is also qualitatively similar.

The verdict on the effect of strong specifications as captured by automated testing is clear: the same automatic testing tools applied to the versions with strong contracts yield twice as many real faults. The term “real fault” comes from excluding spurious cases, such as specification faults (wrong specification, right implementation), which are a phenomenon worth studying but should not count as a benefit of the strong specification approach. The paper contains a detailed analysis of the various kinds of faults and the corresponding empirically determined measures. This particular analysis is for the Eiffel code, since in the C#/Pex case “it was not possible to get an evaluation of the faults by the original developers“.

In our experience the strong specifications are not that much harder to write. The paper contains a precise measure: about five person-weeks to create EiffelBase+, yielding an “overall benefit/effort ratio of about four defects detected per person-day“. Such a benefit more than justifies the effort. More study of that effort is needed, however, because the “person” in the person-weeks was not just an ordinary programmer. True, Eiffel experience has shown that most programmers quickly get the notion of contract and start applying it; as the saying goes in the community, “if you can write an if-then-else, you can write a contract”. But we do not yet have significant evidence of whether that observation extends to model-based contracts.

Model-based contracts (I prefer to call them “theory-based” because “model” means so many other things, but I do not think I will win that particular battle) are, in my opinion, a required component of the march towards program verification. They are the right compromise between simple contracts, which have proved to be attractive to many practicing programmers but suffer from incompleteness, and full formal specification à la Z, which say everything but require too much machinery. They are not an all-or-nothing specification technique but a progressive one: programmers can start with simple contracts, then extend and refine them as desired to yield exactly the right amount of precision and completeness appropriate in any particular context. The article shows that the benefits are well worth the incremental effort.

According to the ICSE program the talk will be presented in the formal specification session, Wednesday, May 22, 13:30-15:30, Grand Ballroom C.

References

[1] Nadia Polikarpova, Carlo A. Furia, Yu Pei, Yi Wei and Bertrand Meyer: What Good Are Strong Specifications?, to appear in ICSE 2013 (Proceedings of 35th International Conference on Software Engineering), San Francisco, May 2013, draft available here.

[2] Bertrand Meyer: Domain Theory: the forgotten step in program verification, article on this blog, see here.

[3] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[4] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ‘ 10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[5] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, also available here.

[6] Bertrand Meyer, Ilinca Ciupa, Andreas Leitner, Arno Fiva, Yi Wei and Emmanuel Stapf: Programs that Test Themselves, in IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009, also available here.

[7] Nikolai Tillman and Peli de Halleux, Pex: White-Box Generation for .NET, in Tests And Proofs (TAP 2008), pp. 134-153.

 

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)

New course partners sought: a DOSE of software engineering education

 

Since 2007 we have conducted, as part of a course at ETH, the DOSE project, Distributed and Outsourced Software Engineering, developed by cooperating student teams from a dozen universities around the world. We are finalizing the plans for the next edition, October to December 2013, and will be happy to welcome a few more universities.

The project consists of building a significant software system collaboratively, using techniques of distributed software development. Each university contributes a number of “teams”, typically of two or three students each; then “groups”, each made up of three teams from different universities, produce a version of the project.

The project’s theme has varied from year to year, often involving games. We make sure that the development naturally divides into three subsystems or “clusters”, so that each group can quickly distribute the work among its teams. An example of division into clusters, for a game project, is: game logic; database and player management; user interface. The page that describes the setup in more detail [1] has links enabling you to see the results of some of the best systems developed by students in recent years.

The project is a challenge. Students are in different time zones, have various backgrounds (although there are minimum common requirements [1]), different mother tongues (English is the working language of the project). Distributed development is always hard, and is harder in the time-constrained context of a university course. (In industry, while we do not like that a project’s schedule slips, we can often survive if it does; in a university, when the semester ends, we have to give students a grade and they go away!) It is typical, after the initial elation of meeting new student colleagues from exotic places has subsided and the reality of interaction sets in, that some groups will after a month, just before the first or second deadline, start to panic — then take matters into their own hands and produce an impressive result. Students invariably tell us that they learn a lot through the course; it is a great opportunity to practice the principles of modern software engineering and to get prepared for the realities of today’s developments in industry, which are in general distributed.

For instructors interested in software engineering research, the project is also a great way to study issues of distributed development in  a controlled setting; the already long list of publications arising from studies performed in earlier iterations [3-9] suggests the wealth of available possibilities.

Although the 2013 project already has about as many participating universities as in previous years, we are always happy to consider new partners. In particular it would be great to include some from North America. Please read the requirements on participating universities given in [1]; managing such a complex process is a challenge in itself (as one can easily guess) and all teaching teams must share goals and commitment.

References

[1] General description of DOSE, available here.

[2] Bertrand Meyer: Offshore Development: The Unspoken Revolution in Software Engineering, in Computer (IEEE), January 2006, pages 124, 122-123, available here.

[3] Bertrand Meyer and Marco Piccioni: The Allure and Risks of a Deployable Software Engineering Project: Experiences with Both Local and Distributed Development, in Proceedings of IEEE Conference on Software Engineering & Training (CSEE&T), Charleston (South Carolina), 14-17 April 2008, available here.

[4] Martin Nordio, Roman Mitin, Bertrand Meyer, Carlo Ghezzi, Elisabetta Di Nitto and Giordano Tamburelli: The Role of Contracts in Distributed Development, in Proceedings of Software Engineering Advances For Offshore and Outsourced Development, Lecture Notes in Business Information Processing 35, Springer-Verlag, 2009, available here.

[5] Martin Nordio, Roman Mitin and Bertrand Meyer: Advanced Hands-on Training for Distributed and Outsourced Software Engineering, in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering – Volume 1, ACM. 2010 available here.

[6] Martin Nordio, Carlo Ghezzi, Bertrand Meyer, Elisabetta Di Nitto, Giordano Tamburrelli, Julian Tschannen, Nazareno Aguirre and Vidya Kulkarni: Teaching Software Engineering using Globally Distributed Projects: the DOSE course, in Collaborative Teaching of Globally Distributed Software Development – Community Building Workshop (CTGDSD — an ICSE workshop), ACM, 2011, available here.

[7] Martin Nordio, H.-Christian Estler, Bertrand Meyer, Julian Tschannen, Carlo Ghezzi, and Elisabetta Di Nitto: How do Distribution and Time Zones affect Software Development? A Case Study on Communication, in Proceedings of the 6th International Conference on Global Software Engineering (ICGSE), IEEE, pages 176–184, 2011, available here.

[8] H.-Christian Estler, Martin Nordio, Carlo A. Furia, and Bertrand Meyer: Distributed Collaborative Debugging, to appear in Proceedings of 7th International Conference on Global Software Engineering (ICGSE), 2013.

[9] H.-Christian Estler, Martin Nordio, Carlo A. Furia, and Bertrand Meyer: Unifying Configuration Management with Awareness Systems and Merge Conflict Detection, to appear in Proceedings of the 22nd Australasian Software Engineering Conference (ASWEC), 2013.

 

VN:F [1.9.10_1130]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)