## Free tutoring for children

“*We’re a group of cousins aged 8-14 who*” got “*the idea to help others, since we know we are not alone*”. They are providing mostly free tutoring to other kids. Details here.

Software engineering, programming methodology, languages, verification, general technology, publication culture, and more

“*We’re a group of cousins aged 8-14 who*” got “*the idea to help others, since we know we are not alone*”. They are providing mostly free tutoring to other kids. Details here.

_{About this article: it originated as a series of posts on the Communications of the ACM blog. I normally repost such articles here. (Even though copy-paste is usually not good, there are three reasons for this duplication: the readership seems to be largely disjoint; I can use better formatting, since their blog software is more restrictive than WordPress; and it is good to have a single repository for all my articles, including both those who originated on CACM and those who did not.) The series took for the form of nine articles, where each of the first few ended with a quiz, to which the next one, published a couple of days later, provided an answer. Since all these answers are now available it would make no sense to use the same scheme, so I am instead publishing the whole thing as a single article with nine sections, slightly adapted from the original.}

_{I was too lazy so far to collect all the references into a single list, so numbers such as [1] refer to the list at the end of the current section.}

A colleague recently asked me to present a short overview of axiomatic semantics as a guest lecture in one of his courses. I have been teaching courses on software verification for a long time (see e.g. here), so I have plenty of material; but instead of just reusing it, I decided to spend a bit of time on explaining why it is good to have a systematic approach to software verification. Here is the resulting tutorial.

Say “software verification” to software professionals, or computer science students outside of a few elite departments, and most of them will think “testing”. In a job interview, for example, show a loop-based algorithm to a programmer and ask “how would you verify it?”: most will start talking about devising clever test cases.

Far from me to berate testing [1]; in fact, I have always thought that the inevitable Dijkstra quote about testing — that it can only show the presence of errors, not their absence [2] — which everyone seems to take as an indictment and dismissal of testing (and which its author probably intended that way) is actually a fantastic advertisement *for *testing: a way to find bugs? Yes! Great! Where do I get it? But that is not the same as verifying the software, which means attempting to ascertain that it has *no *bugs.

Until listeners realize that verification cannot just mean testing, the best course material on axiomatic semantics or other proof techniques will not attract any interest. In fact, there is somewhere a video of a talk by the great testing and public-speaking guru James Whittaker where he starts by telling his audience not to worry, this won’t be a standard boring lecture, he will not start talking about loop invariants [3]! (Loop invariants are coming in this article, in fact they are one of its central concepts, but in later sections only, so don’t bring the sleeping bags yet.) I decided to start my lecture by giving an example of what happens when you do *not *use proper verification. More than one example, in fact, as you will see.

A warning about this article: there is nothing new here. I am using an example from my 1990 book *Introduction to the Theory of Programming Languages *(exercise 9.12). Going even further back, a 1983 “Programming Pearls” *Communications of the ACM* article by Jon Bentley [4] addresses the same example with the same basic ideas. Yet almost forty years later these ideas are still not widely known among practitioners. So consider these articles as yet another tutorial on fundamental software engineering stuff.

The tutorial is a quiz. We start with a program text:

**from**

i := 1 ; j := n — **Result **initialized to 0.

**until **i = j **loop**

m := (i + j) // 2 — Integer division

**if **t [m] ≤ x **then **i := m **else **j := m **end**

**end**

**if **x = t [i] **then **Result := i **end**

All variables are of integer type. t is an up-sorted array of integers, indexed from 1 to n . We do not let any notation get between friends. A loop **from **p **until **e **loop **q **end **executes p then, repeatedly: stops if e (the exit condition) is true, otherwise executes q. (Like {p ; **while not **e **do** {q}} in some other notations.) “:=” is assignment, “=” equality testing. “//” is integer division, e.g. 6 //3 = 7 //3 = 2. **Result **is the name of a special variable whose final value will be returned by this computation (as part of a function, but we only look at the body). **Result **is automatically initialized to zero like all integer variables, so if execution does not assign anything to **Result **the function will return zero.

First question: what is this program trying to do?

OK, this is not the real quiz. I assume you know the answer: it is an attempt at “binary search”, which finds an element in the array, or determines its absence, in a sequence of about log_{2} (n) steps, rather than n if we were use sequential search. (Remember we assume the array is sorted.) **Result** should give us a position where x appears in the array, if it does, and otherwise be zero.

Now for the real quiz: does this program meet this goal?

The answer should be either yes or no. (If no, I am not asking for a correct version, at least not yet, and in any case you can find some in the literature.) The situation is very non-symmetric, we might say Popperian:

- To justify a no answer it suffices of a
*single*example, a particular array t and a particular value x, for which the program fails to set**Result**as it should. - To justify a yes answer we need to provide a credible argument that for
*every*t and x the program sets**Result**as it should.

[1] The TAP conference series (Tests And Proofs), which Yuri Gurevich and I started, explores the complementarity between the two approaches.

[2] Dijkstra first published his observation in 1969. He did not need consider the case of infinite input sets: even for a trivial finite program that multiplies two 32-bit integers, the number of cases to be examined, 2^{64}, is beyond human reach. More so today with 64-bit integers. Looking at this from a 2020 perspective, we may note that exhaustive testing of a finite set of cases, which Dijkstra dismissed as impossible in practice, is in fact exactly what the respected *model checking* verification technique does; not on the original program, but on a simplified — *abstracted —* version precisely designed to keep the number of cases tractable. Dijkstra’s argument remains valid, of course, for the original program if non-trivial. And model-checking does not get us out of the woods: while we are safe if its “testing” finds no bug, if it does find one we have to ensure that the bug is a property of the original program rather than an artifact of the abstraction process.

[3] It is somewhere on YouTube, although I cannot find it right now.

[4] Jon Bentley:* Programming Pearls: Writing Correct Programs*, in *Communications of the ACM*, vol. 26, no. 12, pp. 1040-1045, December 1983, available for example here.

Was program #1 correct? If so it should yield the correct answer. (An answer is correct if either **Result **is the index in t of an element equal to x, or **Result **= 0 and x does not appear in t.)

This program is *not* correct. To prove that it is not correct it suffices of a single example (test case) for which the program does not “*yield the correct answer*”. Assume x = 1 and the array t has two elements both equal to zero (n = 2, remember that arrays are indexed from 1):

t = [0 0]

The successive values of the variables and expressions are:

m i j i + j + 1

After initialization: 1 2 3

i ≠ j, so enter loop: 1 1 2 6 — First branch of “if” since t [1] ≤ x

— so i gets assigned the value of m

But then neither of the values of i and j has changed, so the loop will repeat its body identically (taking the first branch) forever. It is not even that the program yields an incorrect answer: it does not yield an answer at all!

Note (in reference to the famous Dijkstra quote mentioned in the first article), that while it is common to pit tests against proofs, *a test can actually be a proof*: a test that fails is a proof that the program is incorrect. As valid as the most complex mathematical proof. It may not be the kind of proof we like most (our customers tend to prefer a guarantee that the program is correct), but it is a proof all right.

We are now ready for the second attempt:

— Program attempt #2.

**from**

i := 1 ; j := n

**until **i = j **or** **Result** > 0** loop**

m := (i + j) // 2 — Integer division

**if **t [m] ≤ x **then**

i := m + 1

**elseif **t [m] = x **then**

**Result **:= m

**else**** ** — In this case t [m] > x

j := m – 1

**end**

**end**

Unlike the previous one this version always changes i or j, so we may hope it does not loop forever. It has a nice symmetry between i and j.

Same question as before: does this program meet its goal?

The question about program #2, as about program #1: was: it right?

Again no. A trivial example disproves it: n = 1, the array t contains a single element t [1] = 0, x = 0. Then the initialization sets both i and j to 1, i = j holds on entry to the loop which stops immediately, but **Result **is zero whereas it should be 1 (the place where x appears).

Here now is attempt #3, let us see it if fares better:

— Program attempt #3.

**from**

i := 1 ; j := n

**until **i = j **loop**

m := (i + j + 1) // 2

**if **t [m] ≤ x **then**

i := m + 1

**else**

j := m

**end**

**end**

**if **1 ≤ i **and **i ≤ n **then Result **:= i **end
** — If not,

What about this one?

The first two program attempts were wrong. What about the third?

I know, you have every right to be upset at me, but the answer is no once more.

Consider a two-element array t = [0 0] (so n = 2, remember that our arrays are indexed from 1 by convention) and a search value x = 1. The successive values of the variables and expressions are:

m i j i + j + 1

After initialization: 1 2 4

i ≠ j, so enter loop: 2 3 2 6 — First branch of “if” since t [2] < x

i ≠ j, enter loop again: 3 ⚠ — Out-of-bounds memory access!

— (trying to access non-existent t [3])

Oops!

Note that we could hope to get rid of the array overflow by initializing i to 0 rather than 1. This variant (version #3′) is left as a bonus question to the patient reader. (*Hint*: it is also not correct. Find a counter-example.)

OK, this has to end at some point. What about the following version (#4): is it right?

— Program attempt #4.

**from**

i := 0 ; j := n + 1

**until **i = j **loop**

m := (i + j) // 2

**if **t [m] ≤ x **then**

i := m + 1

**else**

j := m

**end**

**end**

**if **1 ≤ i **and **i ≤ n **then Result **:= i **end**

Yes, I know, this is dragging on. But that’s part of the idea: witnessing how hard it is to get a program right if you just judging by the seat of your pants. Maybe we can get it right this time?

Are we there yet? Is program attempt #4 finally correct?

Sorry to disappoint, but no. Consider a two-element array t = [0 0], so n = 2, and a search value x = 1 (yes, same counter-example as last time, although here we could also use x = 0). The successive values of the variables and expressions are:

m i j i + j

After initialization: 0 3 3

i ≠ j, so enter loop: 1 2 3 5 — First branch of “**if**”

i ≠ j, enter loop again: 2 3 3 6 — First branch again

i = j, exit loop

The condition of the final “**if**” is true, so **Result **gets the value 3. This is quite wrong, since there is no element at position 3, and in any case x does not appear in t.

But we are so close! Something *like *this *should *work, should it not?

So patience, patience, let us tweak it just one trifle more, OK?

— Program attempt #5.

**from**

i := 0 ; j := n

**until **i ≥ j **or Result** > 0 **loop**

m := (i + j) // 2

**if **t [m] < x **then**

i := m + 1

**elseif ** t [m] > x** then**

j := m

**else**

**Result **:= m

**end**

**end**

Does it work now?

The question about program #5 was the same as before: is it right, is it wrong?

Well, I know you are growing more upset at me with each section, but the answer is still that this program is wrong. But the way it is wrong is somewhat specific; and it applies, in fact, to all previous variants as well.

This particular wrongness (fancy word for “bug”) has a history. As I pointed out in the first article, there is a long tradition of using binary search to illustrate software correctness issues. A number of versions were published and proved correct, including one in the justly admired *Programming Pearls* series by Jon Bentley. Then in 2006 Joshua Bloch, then at Google, published a now legendary blog article [2] which showed that all these versions suffered from a major flaw: to obtain m, the approximate mid-point between i and j, they compute

(i + j) // 2

which, working on computer integers rather than mathematical integers, might overflow! This in a situation in which both i and j, and hence m as well, are well within the range of the computer’s representable integers, 2^{-n} to 2^{n} (give or take 1) where n is typically 31 or, these days, 63, so that there is no conceptual justification for the overflow.

In the specification that I have used for this article, i starts at 1, so the problem will only arise for an array that occupies half of the memory or more, which is a rather extreme case (but still should be handled properly). In the general case, it is often useful to use arrays with arbitrary bounds (as in Eiffel), so we can have even a small array, with high indices, for which the computation will produce an overflow and bad results.

The Bloch gotcha is a stark reminder that in considering the correctness of programs we must include all relevant aspects and consider programs as they are executed on a real computer, not as we wish they were executed in an ideal model world.

(Note that Jon Bentley alluded to this requirement in his original article: while he did not explicitly mention integer overflow, he felt it necessary to complement his proof by the comment that that “*As laborious as our proof of binary search was, it is still unfinished by some standards. How would you prove that the program is free of runtime errors (such as division by zero, word overflow, or array indices out of bounds)?*” Prescient words!)

It is easy to correct the potential arithmetic overflow bug: instead of (i + j) // 2, Bloch suggested we compute the average as

i + (j – i) // 2

which is the same from a mathematician’s viewpoint, and indeed will compute the same value if both variants compute one, but will not overflow if both i and j are within range.

So we are ready for version 6, which is the same as version 5 save for that single change:

— Program attempt #6.

**from**

i := 0 ; j := n

**until **i ≥ j **or Result** > 0 **loop**

m :=** i + ****(****j – i****) // 2**

**if **t [m] < x **then**

i := m + 1

**elseif ** t [m] > x** then**

j := m

**else**

**Result **:= m

**end**

**end**

Now is probably the right time to recall the words by which Donald Knuth introduces binary search in the original 1973 tome on *Sorting and Searching* of his seminal book series* The Art of Computer Programming*:

Although the basic idea of binary search is comparatively straightforward, the details can be somewhat tricky, and many good programmers have done it wrong the first few times they tried.

Do you need more convincing? Be careful what you answer, I have more variants up my sleeve and can come up with many more almost-right-but-actually-wrong program attempts if you nudge me. But OK, even the best things have an end. This is not the last section yet, but that was the last program attempt. To the naturally following next question in this running quiz, “*is version 6 right or wrong*”, I can provide the answer: it is, to the best of my knowledge, a correct program. Yes! [3].

But the quiz continues. Since answers to the previous questions were all that the programs ** were not** correct, it sufficed in each case to find one case for which the program did not behave as expected. Our next question is of a different nature: can you find an argument why version #6

[1] (In particular) Jon Bentley: *Programming Pearls — Writing Correct Programs*, in *Communications of the ACM*, vol. 26, no. 12, December 1983, pages 1040-1045, available here.

[2] Joshua Bloch:* Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken*, blog post, on the Google AI Blog, 2 June 2006, available here.

[3] A caveat: the program is correct barring any typos or copy-paste errors — I am starting from rigorously verified programs (see the next posts), but the blogging system’s UI and text processing facilities are not the best possible for entering precise technical text such as code. However carefully I check, I cannot rule out a clerical mistake, which of course would be corrected as soon as it is identified.

Preceding sections presented candidate binary search algorithms and asked whether they are correct. “Correct” means something quite precise: that for an array t and a value x, the final value of the variable **Result **is a valid index of t (that is to say, is between 1 and n, the size of t) if and only if x appears at that index in t.

The last section boldly stated that program attempt #6 was correct. The question was: why?

In the case of the preceding versions, which were incorrect, you could prove that property, and I do mean prove, simply by exhibiting a single counter-example: a single t and x for which the program does not correctly set **Result**. Now that I asserting the program to be correct, one example, or a million examples, do not suffice. In fact they are almost irrelevant. Test as much as you like and get correct results every time, you cannot get rid of the gnawing fear that if you had just tested one more time after the millionth test you would have produced a failure. Since the set of possible tests is infinite there is no solution in sight [1].

We need a proof.

I am going to explain that proof in the next section, but before that I would like to give you an opportunity to look at the proof by yourself. I wrote in one of the earlier articles that most of what I have to say was already present in Jon Bentley’s 1983 *Programming Pearls* contribution [2], but a dramatic change did occur in the four decades since: the appearance of automated proof system that can handle significant, realistic programs. One such system, AutoProof, was developed at the Chair of Software engineering at ETH Zurich [3] (key project members were Carlo Furia, Martin Nordio, Nadia Polikarpova and Julian Tschannen, with initial contributions by Bernd Schoeller) on the basis of the Boogie proof technology from Microsoft Research).

AutoProof is available for online use, and it turns out that one of the basic tutorial examples is binary search. You can go to the corresponding page and run the proof.

I am going to let you try this out (and, if you are curious, other online AutoProof examples as well) without too many explanations; those will come in the next section. Let me simply name the basic proof technique: **loop invariant**. A loop invariant is a property INV associated with a loop, such that:

**A**. After the loop’s initialization, INV will hold.**B**.*One*execution of the loop’s body, if started with INV satisfied (and the loop’s exit condition*not*satisfied, otherwise we wouldn’t be executing the body!), satisfies INV again when it terminates.

This idea is of course the same as that of a proof by induction in mathematics: the initialization corresponds to the base step (proving that P (0) holds) and the body property to the induction step (proving that from P (n) follows P (n + 1). With a traditional induction proof we deduce that the property (P (n)) holds for all integers. For the loop, we deduce that when the loop finishes its execution:

- The invariant still holds, since executing the loop means executing the initialization once then the loop body zero or more times.
- And of course the exit condition also holds, since otherwise we would still be looping.

That is how we prove the correctness of a loop: the conjunction of the invariant and the exit condition must yield the property that we seek (in the example, the property, stated above of **Result **relative to t and x).

We also need to prove that the loop does terminate. This part involves another concept, the loop’s *variant*, which I will explain in the next section.

For the moment I will not say anything more and let you look at the AutoProof example page (again, you will find it here), run the verification, and read the invariant and other formal elements in the code.

To “run the verification” just click the **Verify **button on the page. Let me emphasize (and emphasize again and again and again) that clicking **Verify** will not run the code. There is no execution engine in AutoProof, and the verification does not use any test cases. It processes the text of the program as it appears on the page and below. It applies mathematical techniques to perform the proof; the core property to be proved is that the proposed loop invariant is indeed invariant (i.e. satisfies properties** A** and **B** above).

The program being proved on the AutoProof example page is version #6 from the last section, with different variable names. So far for brevity I have used short names such as i, j and m but the program on the AutoProof site applies good naming practices with variables called low, up, middle and the like. So here is that version again with the new variable names:

— Program attempt #7 (identical to #6 with different variable names) .

**from**

low := 0 ; up := n

**until **low ≥ up **or Result** > 0 **loop**

middle := low + ((up – low) // 2)

**if **a [middle] < value **then**** **— The array is now called a rather than t

low := middle + 1

**elseif **** a** [middle] > value** then**

up := middle

**else**

**Result **:= middle

**end**

**end**

This is exactly the algorithm text on the AutoProof page, the one that you are invited to let AutoProof verify for you. I wrote “algorithm text” rather than “program text” because the actual program text (in Eiffel) includes **variant** and invariant clauses which do not affect the program’s execution but make the proof possible.

Whether or not these concepts (invariant, variant, program proof) are completely new to you, do try the prover and take a look at the proof-supporting clauses. In the next article I will remove any remaining mystery.

[1] Technically the set of possible [array, value] pairs is finite, but of a size defying human abilities. As I pointed out in the first section, the “model checking” and “abstract interpretation” verification techniques actually attempt to perform an exhaustive test anyway, after drastically reducing the size of the search space. That will be for some other article.

[2] Jon Bentley:* Programming Pearls: Writing Correct Programs*, in *Communications of the ACM*, vol. 26, no. 12, pp. 1040-1045, December 1983, available for example here.

[3] The AutoProof page contains documentations and numerous article references.

The previous section invited you to run the verification on the AutoProof tutorial page dedicated to the example. AutoProof is an automated proof system for programs. This is just a matter of clicking “Verify”, but more importantly, you should read the annotations added to the program text, particularly the loop invariant, which make the verification possible. (To avoid any confusion let me emphasize once more that clicking “Verify” does not run the program, and that no test cases are used; the effect is to run the verifier, which attempts to prove the correctness of the program by working solely on the program text.)

Here is the program text again, reverting for brevity to the shorter identifiers (the version on the AutoProof page has more expressive ones):

**from**

i := 0 ; j := n

**until **i ≥ j **or Result** > 0 **loop**

m := i + (j – i) // 2

**if **t [m] < x **then**

i := m + 1

**elseif ** t [m] > x** then**

j := m

**else**

**Result** := m

**end**

**end**

Let us now see what makes the proof possible. The key property is the loop invariant, which reads

A: 1 ≤ i ≤ j ≤ n + 1

B: 0 ≤ **Result **≤ n

C: ∀ k: 1 .. i –1 | t [k] < x

D: ∀ k: j .. n | t [k] > x

E: (**Result **> 0) ⇒ (t [**Result**] = x)

The notation is slightly different on the Web page to adapt to the Eiffel language as it existed at the time it was produced; in today’s Eiffel you can write the invariant almost as shown above. Long live Unicode, allowing us to use symbols such as ∀ (obtained not by typing them but by using smart completion, e.g. you start typing “forall” and you can select the ∀ symbol that pops up), ⇒ for “**implies**” and many others

Remember that the invariant has to be established by the loop’s initialization and preserved by every iteration. The role of each of its clauses is as follows:

- A: keep the indices in range.
- B: keep the variable
**Result**, whose final value will be returned by the function, in range. - C and D: eliminate index intervals in which we have determined that the sought value, x, does not appear. Before i, array values are smaller; starting at j, they are greater. So these two intervals, 1..i and j..n, cannot contain the sought value. The overall idea of the algorithm (and most other search algorithms) is to extend one of these two intervals, so as to narrow down the remaining part of 1..n where x may appear.
- E: express that as soon as we find a positive (non-zero)
**Result**, its value is an index in the array (see B) where x does appear.

Why is this invariant useful? The answer is that on exit it gives us what we want from the algorithm. The exit condition, recalled above, is

i ≥ j **or Result** > 0

Combined with the invariant, it tells us that on exit one of the following will hold:

**Result**> 0, but then because of E we know that x appears at position**Result**.- i < j, but then A, C and D imply that x does not appear
*anywhere*in t. In that case it cannot be true that**Result**> 0, but then because of B**Result**must be zero.

What AutoProof proves, mechanically, is that under the function’s precondition (that the array is sorted):

- The initialization ensures the invariant.
- The loop body, assuming that the invariant is satisfied but the exit condition is not, ensures the loop invariant again after it executes.
- The combination of the invariant and the exit condition ensures, as just explained, the postcondition of the function (the property that
**Result**will either be positive and the index of an element equal to x, or zero with the guarantee that x appears nowhere in t).

Such a proof guarantees the correctness of the program *if it terminates*. We (and AutoProof) must prove separately that it *does *terminate. The technique is simple: find a “loop ** variant**”, an integer quantity v which remains non-negative throughout the loop (in other words, the loop invariant includes or implies v ≥ 0) and decreases on each iteration, so that the loop cannot continue executing forever. An obvious variant here is j – i + 1 (where the + 1 is needed because j – i may go down to -1 on the last iteration if x does not appear in the array). It reflects the informal idea of the algorithm: repeatedly decrease an interval i .. j – 1 (initially, 1 .. n) guaranteed to be such that x appears in t if and only if it appears at an index in that interval. At the end, either we already found x or the interval is empty, implying that x does not appear at all.

A great reference on variants and the techniques for proving program termination is a *Communications of the ACM* article of 2011: [3].

The variant gives an upper bound on the number of iterations that remain at any time. In sequential search, j – i + 1 would be our best bet; but for binary search it is easy to show that log_{2 }(j – i + 1) is also a variant, extending the proof of correctness with a proof of performance (the key goal of binary search being to ensure a logarithmic rather than linear execution time).

This example is, I hope, enough to highlight the crucial role of loop invariants and loop variants in reasoning about loops. How did we get the invariant? It looks like I pulled it out of a hat. But in fact if we go the other way round (as advocated in classic books [1] [2]) and develop the invariant and the loop together the process unfolds itself naturally and there is nothing mysterious about the invariant.

Here I cannot resist quoting (thirty years on!) from my own book *Introduction to the Theory of Programming Languages* [4]. It has a chapter on axiomatic semantics (also known as *Hoare logic*, the basis for the ideas used in this discussion), which I just made available: see here [5]. Its exercise 9.12 is the starting point for this series of articles. Here is how the book explains how to design the program and the invariant [6]:

In the general case [

of search, binary or not] we aim for a loop body of the formm := ‘‘Some value in 1.. n such that i ≤ m < j’’;

ift [m] ≤ xtheni := m + 1

elsej := m

endIt is essential to get all the details right (and easy to get some wrong):

- The instruction must always decrease the variant j – i, by increasing i or decreasing j. If the the definition of m specified just m ≤ j rather than m < j, the second branch would not meet this goal.
- This does not transpose directly to i: requiring i < m < j would lead to an impossibility when j – i is equal to 1. So we accept i ≤ m but then we must take m + 1, not m, as the new value of i in the first branch.
- The conditional’s guards are tests on t [m], so m must always be in the interval 1 . . n. This follows from the clause 0 ≤ i ≤ j ≤ n + 1 which is part of the invariant.
- If this clause is satisfied, then m ≤ n and m > 0, so the conditional instruction indeed leaves this clause invariant.
- You are invited to check that both branches of the conditional also preserve the rest of the invariant.
- Any policy for choosing m is acceptable if it conforms to the above scheme. Two simple choices are i and j – 1; they lead to variants of the sequential search algorithm [
which the book discussed just before binary search].For binary search, m will be roughly equal to the average of i and j.

“Roughly” because we need an integer, hence the // (integer division).

In the last section, I will reflect further on the lessons we can draw from this example, and the practical significance of the key concept of invariant.

[1] E.W. Dijkstra:* A Discipline of Programming*, Prentice Hall, 1976.

[2] David Gries: *The Science of Programming*, Springer, 1989.

[3] Byron Cook, Andreas Podelski and Andrey Rybalchenko: *Proving program termination*, in *Communications of the ACM*, vol. 54, no. 11, May 2011, pages 88-98, available here.

[4] Bertrand Meyer, *Introduction to the Theory of Programming Languages*, Prentice Hall, 1990. The book is out of print but can be found used, e.g. on Amazon. See the next entry for an electronic version of two chapters.

[5] Bertrand Meyer *Axiomatic semantics*, chapter 9 from [3], available here. Note that the PDF was reconstructed from an old text-processing system (troff); the figures could not be recreated and are missing. (One of these days I might have the patience of scanning them from a book copy and adding them. Unless someone wants to help.) I also put online, with the same caveat, chapter 2 on notations and mathematical basis: see here.

[6] Page 383 of [4] and [5]. The text is verbatim except a slight adaptation of the programming notation and a replacement of the variables: i in the book corresponds to i – 1 here, and j to j – 1. As a matter of fact I prefer the original conventions from the book (purely as a matter of taste, since the two are rigorously equivalent), but I changed here to the conventions of the program as it appears in the AutoProof page, with the obvious advantage that you can verify it mechanically. The text extract is otherwise exactly as in the 1990 book.

What was this journey about?

We started with a succession of attempts that might have “felt right” but were in fact all wrong, each in its own way: giving the wrong answer in some cases, crashing (by trying to access an array outside of its index interval) in some cases, looping forever in some cases. Always “in some cases”, evidencing the limits of testing, which can never guarantee that it exercises all the problem cases. A correct program is one that works in *all *cases. The final version was correct; you were able to prove its correctness with an online tool and then to understand (I hope) what lies behind that proof.

To show how to prove such correctness properties, I have referred throughout the series to publications from the 1990s (my own *Introduction to The Theory of Programming Languages*), the 1980s (Jon Bentley’s *Programming Pearls* columns, Gries’s *Science of Programming*), and even the 1970s (Dijkstra’s *Discipline of Programming*). I noted that the essence of my argument appeared in a different form in one of Bentley’s *Communications* articles. What is the same and what has changed?

The core concepts have been known for a long time and remain applicable: assertion, invariant, variant and a few others, although they are much better understood today thanks to decades of theoretical work to solidify the foundation. Termination also has a more satisfactory theory.

On the practical side, however, the progress has been momentous. Considerable engineering has gone into making sure that the techniques scaled up. At the time of Bentley’s article, binary search was typical of the kind of programs that could be proved correct, and the proof had to proceed manually. Today, we can tackle much bigger programs, and use tools to perform the verification.

Choosing binary search again as an example today has the obvious advantage that everyone can understand all the details, but should not be construed as representative of the state of the art. Today’s proof systems are far more sophisticated. Entire operating systems, for example, have been mechanically (that is to say, through a software tool) proved correct. In the AutoProof case, a major achievement was the proof of correctness [1] of an entire data structure (collections) library, EiffelBase 2. In that case, the challenge was not so much size (about 8,000 source lines of code), but the complexity of both:

- The scope of the verification, involving the full range of mechanisms of a modern object-oriented programming language, with classes, inheritance (single and multiple), polymorphism, dynamic binding, generics, exception handling etc.
- The code itself, using sophisticated data structures and algorithms, involving in particular advanced pointer manipulations.

In both cases, progress has required advances on both the science and engineering sides. For example, the early work on program verification assumed a bare-bones programming language, with assignments, conditionals, loops, routines, and not much more. But real programs use many other constructs, growing ever richer as programming languages develop. To cover exception handling in AutoProof required both theoretical modeling of this construct (which appeared in [2]) and implementation work.

More generally, scaling up verification capabilities from the small examples of 30 years ago to the sophisticated software that can be verified today required the considerable effort of an entire community. AutoProof, for example, sits at the top of a tool stack relying on the Boogie environment from Microsoft Research, itself relying on the Z3 theorem prover. Many person-decades of work make the result possible.

Beyond the tools, the concepts are esssential. One of them, loop invariants, has been illustrated in the final version of our program. I noted in the first article the example of a well-known expert and speaker on testing who found no better way to announce that a video would not be boring than “relax, we are not going to talk about loop invariants.” Funny perhaps, but unfair. Loop invariants are one of the most beautiful concepts of computer science. Not so surprisingly, because loop invariants are the application to programming of the concept of mathematical induction. According to the great mathematician Henri Poincaré, all of mathematics rests on induction; maybe he exaggerated, maybe not, but who would think of teaching mathematics without explaining induction? Teaching programming without explaining loop invariants is no better.

Below is an illustration (if you will accept my psychedelic diagram) of what a loop is about, as a problem-solving technique. Sometimes we can get the solution directly. Sometimes we identify several steps to the solution; then we use a sequence (A ; B; C). Sometimes we can find two (or more) different ways of solving the problem in different cases; then we use a conditional (**if **c **then **A **else **B **end**). And sometimes we can only get a solution by getting closer repeatedly, not necessarily knowing in advance how many times we will have to advance towards it; then, we use a loop.

We identify an often large (i.e. very general) area where we know the solution will lie; we call that area the loop invariant. The solution or solutions (there may be more than one) will have to satisfy a certain condition; we call it the exit condition. From wherever we are, we shoot into the invariant region, using an appropriate operation; we call it the initialization. Then we execute as many times as needed (maybe zero if our first shot was lucky) an operation that gets us closer to that goal; we call it the loop body. To guarantee termination, we must have some kind of upper bound of the distance to the goal, decreasing each time discretely; we call it the loop variant.

This explanation is only an illustration, but I hope it makes the ideas intuitive. The key to a loop is its invariant. As the figure suggests, the invariant is always a generalization of the goal. For example, in binary search (and many other search algorithms, such as sequential search), our goal is to find a * position *where either x appears or, if it does not, we can be sure that it appears nowhere. The invariant says that we have an

An invariant should be:

- Strong enough that we can devise an exit condition which in the end, combined with the invariant, gives us the goal we seek (a solution).
- Weak enough that we can devise an initialization that ensures it (by shooting into the yellow area) easily.
- Tuned so that we can devise a loop body that, from a state satifying the invariant, gets us to a new one that is closer to the goal.

In the example:

- The exit condition is simply that the interval’s length is 1. (Technically, that we have computed
**Result**as the single interval element.) Then from the invariant and the exit condition, we get the goal we want. - Initialization is easy, since we can just take the initial interval to be the whole index range of the array, which trivially satisfies the invariant.
- The loop body simply decreases the length of the interval (which can serve as loop variant to ensure termination). How we decrease the length depends on the search strategy; in sequential search, each iteration decreases the length by 1, correct although not fast, and binary search decreases it by about half.

The general scheme always applies. *Every * loop algorithm is characterized by an invariant. The invariant may be called the DNA of the algorithm.

To demonstrate the relevance of this principle, my colleagues Furia, Velder, and I published a survey paper [6] in *ACM* *Computing Surveys *describing the invariants of important algorithms in many areas of computer science, from search algorithms to sorting (all major algorithms), arithmetic (long integer addition, squaring), optimization and dynamic programming (Knapsack, Levenshtein/Edit distance), computational geometry (rotating calipers), Web (Page Rank)… I find it pleasurable and rewarding to go deeper into the basis of loop algorithms and understand their invariants; like a geologist who does not stop at admiring the mountain, but gets to understand how it came to be.

Such techniques are inevitable if we want to get our programs right, the topic of this article. Even putting aside the Bloch average-computation overflow issue, I started with 5 program attempts, all kind of friendly-looking but wrong in different ways. I could have continued fiddling with the details, following my gut feeling to fix the flaws and running more and more tests. Such an approach can be reasonable in some cases (if you have an algorithm covering a well-known and small set of cases), but will not work for non-trivial algorithms.

Newcomers to the concept of loop invariant sometimes panic: “this is all fine, you gave me the invariants in your examples, how do I find my own invariants for my own loops?” I do not have a magic recipe (nor does anyone else), but there is no reason to be scared. Once you have understood the concept and examined enough examples (just a few of those in [6] should be enough), writing the invariant at the same time as you are devising a loop will come as a second nature to you.

As the fumbling attempts in the first few sections should show, there is not much of an alternative. Try this approach. If you are reaching these final lines after reading what preceded them, allow me to thank you for your patience, and to hope that this rather long chain of reflections on verification will have brought you some new insights into the fascinating challenge of writing correct programs.

[1] Nadia Polikarpova, Julian Tschannen, and Carlo A. Furia: *A Fully Verified Container Library*, in Proceedings of 20th International Symposium on Formal Methods (FM 15), 2015. (Best paper award.)

[2] Martin Nordio, Cristiano Calcagno, Peter Müller and Bertrand Meyer: *A Sound and Complete Program Logic for Eiffel*, in *Proceedings of TOOLS 2009 (Technology of Object-Oriented Languages and Systems)*, Zurich, June-July 2009, eds. M. Oriol and B. Meyer, Springer LNBIP 33, June 2009.

[3] Boogie page at MSR, see here for publications and other information.

[4] Z3 was also originally from MSR and has been open-sourced, one can get access to publications and other information from its Wikipedia page.

[5] Carlo Furia, Bertrand Meyer and Sergey Velder: *Loop invariants: Analysis, Classification and Examples*, in *ACM Computing Surveys*, vol. 46, no. 3, February 2014. Available here.

[6] Dynamic programming is a form of recursion removal, turning a recursive algorithm into an iterative one by using techniques known as “memoization” and “bottom-up computation” (Berry). In this transformation, the invariant plays a key role. I will try to write this up some day as it is a truly elegant and illuminating explanation.

For some time I have been nudging (yes, I am that kind of person) two young boys in my family, two cousins who are both learning the piano, to try Schubert’s delightful (and not very military) *Marche Militaire *together. With not much success until recently… but now they are stuck in their (neighboring) homes, and here is what comes out:

OK, not quite Horowitz and Rubinstein yet… (When the Carnegie Hall recital comes — will there ever be concerts at Carnegie Hall again? — I promise to post the announcement here with an offer for discounted tickets.) But I hope that in these trying times for all of us it brightens your day as it does not cease to brighten mine.

*(Note added 21 March: there is already a complete and better rehearsed version — I have updated the above link to point to it.)*

Consider the following expression:

∃ c: s ¦ moisture (c) = soft

This is obviously mathematics. To express such a property in a programming language, you have to write a function containing a loop that iterates through the elements of s. Right?

Wrong. The above construct is valid Eiffel. It’s a consequence of recent syntax extensions that retain all the simplicity and consistency of the language but take full advantage of Unicode. Of course you do not have Unicode characters such as `∃`

on you keyboard, but EiffelStudio’s completion mechanism inserts them for you.

To see how this works, just read Alexander Kogtenkov’s recent blog post on the topic.

In my last article, *time to live up to the boasting*, I pointed out how bewildering it is to see that top newspapers around the world, the supposed “papers of reference”, continue both to:

- Extoll their grandiose proclamations of supposed devotion to public service.
- Charge for access to the epidemic that is scaring the world.

In a meeting I recently attended, someone was saying that “*the media has hyped the crisis*”. About the mainstream media, this reproach is incorrect and unfair: articles have generally been measured and informative, explaining the situation and calling on experts.

But such solid content sits behind paywalls! Free sources, particularly on social networks, are where you find the hype, the crazy theories and the lies.

Rightly or wrongly, many people around the world are panicking. They need a serious source of information and they are not all able to pay for it, especially if it comes from many sources to which one must independently subscribe.

Newspaper owners, this is your moment of truth, or of eternal shame. Free Covid-19 content *now* and *without restrictions* until this crisis ends.

We are fed up with your self-professions of sanctity and want you to fulfill your elementary social duty. You should have started to do this weeks ago already.

It’s not even bad for business — it will attract new, grateful, supportive subscribers who will stay with you for a long time.

The simple, obvious, honest thing to do.

I, for one, pledge never in the future to give one cent, peso or kopeck in the future to any publication that continues its current selfish and abhorrent policy of charging for life-and-death information that the world craves and needs.

The decent media is not modest these days. “*Democracy Dies in Darkness*” says the excellent Washington Post, intimating, if I understand it right, that the only way for the US to avoid dictatorship is that I pay subscription fees. Maybe I would if they just stopped devoting every single one of their articles to King Ubu. La Repubblica tells us that it will “*always fight for the defense of freeddom of information, fot its readers and for all those who have in their hearts the principles of democracy and of civil coexistence.*” Beautiful (and behind a paywall).

The epidemic expert Jonathan Quick, interviewed by the* Guardian*, had this remarkable observation, talking about Covid-19: ** news tends to be behind paywalls, while fake news is free**. The

Granted, every company (except maybe the *Washington Post*, since I have a feeling I am ordering enough from Amazon already) is entitled to earn money. But not all companies claim that their business model is about saving the world. My dear self-praising press, if you are really as generously public-minded as you are, here is a good way to demonstrate it. People around the world are genuinely worried about the Coronavirus epidemic and eager for serious information, if only to counter rumors and conspiracy theories. They eagerly seek credible, validated information that has gone through professional vetting, but many of them cannot afford to subscribe to all the relevant sources.

A few days before and after elections, outlets such as the NYT and Wapo generally make their political articles free-access. The current health scare is an even more serious occasion.

This is the time for all serious news media around the world to show that their grand declarations of philanthropy are not just words.

We, the readers, should vociferously demand that as a public service these press organs immediately make all Covid-19 news, reports and analyses free-access.

A chemistry researcher published a paper in *Science* with two junior collaborators and, a few months later, found flaws and retracted the article.

She commented “*I am totally bummed to announce that we have retracted last year’s paper on enzymatic synthesis of beta-lactams*” and “*it is painful to admit, but important to do so*” and “t*he work has not been reproducible*” and “*I apologize to all*” and “*I was a bit busy when this was submitted, and did not do my job well*”.

Not very unusual news; this kind of thing happens all the time as part of the normal process of research and publication. (This just in! Scientists are human! They make mistakes once in a while! Full story at 11!)

Perhaps this one is slightly more worthy of notice because the lead author is a Nobel prize winner. Time for some rejoicing (Schadenfreude as it is called in good English) for anyone who is not a Nobel prize winner: haha, you think you are so smart but you mess up too. It never hurts to have an occasional reminder that we should not deify anyone. But hardly prime-time news.

Well, it is prime-time news for Fox News, which devotes a whole article to the matter. OK, I know, Fox News. And yes, it does pain me to include a hyperlink to a foxnews.com page in this otherwise perfectly decent, civilized, family-safe blog. But in fact that particular article is not by itself outrageous. Suspicious, yes: why such a sudden focus on a minor scientific episode in a news source not particularly famous (I hope you admire my gift for euphemism) for its extensive coverage of the frontlines of scientific research? But whatever its ultimate agenda the article itself is factual, not judgmental.

What is striking is the avalanche of reader comments on that page. If you go and take a look at them, be prepared; put on your parka. Reading these comments will be, for many of us, a peek into a completely different world. A world that we vaguely know exists, but do not actually visit.

It is not a nice world to venture into: full of bile, frustration, resentment, jealousy, conspiracy theories, slander, attacks on anyone trying to take a rational approach to issues, with hardly a pleasant or optimistic note ever. It is not a world one wants to visit often, but reading such a page is an eye-opener for anyone who accepts the premises of rational thinking and might believe that they are universally accepted.

“Striking”, I wrote. *Scary* is a more apposite word. With the kind of nonsense-spouting and science-bashing that appears in countless messages in the comments section of the page, one can fear the worst regarding questions that face our society, for which rational, science-based advice is critical. (Yes, coronavirus, I am looking at you!)

Very few of the comments on the page says the obvious: if is not good to make errors, but errors will occur, and the scientist should be commended for checking further and coming out with the admission that her study had flaws. As far as we know the initiative came from her, spontaneously. It is one of the signs of the healthiness of science that we always question results. We question those of other people (there are plenty of sites, such as pubpeer and forbetterscience, entirely devoted to tracking and debunking flawed research). We also question our own: partly to avoid the humiliation of having someone else report one of our mistakes before we do; but also because of the good scientist’s natural search for intellectual honesty.

Most of the article commenters do not mention this key lesson of the incident; the Nobel prize winner’s integrity. For them, the article retraction demonstrates that… the entire edifice of science is flawed! For example:

*She’s a liberal… I thought her being wrong was understood.*

*Now we need to find an honest Climate Change researcher to admit that their computer models are faulty and much of their “data” is fake.*

*Integrity! Now if the “scientists” who have fabricated Global Warming/ Climate Change, whatever, “research” would come forward with admissions about their flawed, fallacious “research” we would be golden.*

*Now if we could get the climate change “scientists” to do the same maybe some credibility could be restored to the field.*

and so on ad nauseam. (Not a figure of style — reading these comments is truly nauseating.) In reality the retraction demonstrates, or rather illustrates (one example is not a demonstration), the reverse of these assertions: that the scientific process includes its own correction mechanisms. To use a computer scientist’s terminology, it is not fault-free (no scientist ever claimed anything like that) but fault-tolerant.

Of course the reason the Fox News crowd is suddenly so interested in science is not (one imagines) science per se but the science of climate change. Comment after comment uses the article, as illustrated by the above examples, to dismiss the scientific consensus on the reports of the United Nations’ Intergovernmental Panel on Climate Change. In other words: the retraction of one three-author paper on beta-lactams proves that the the work of hundreds of scientists producing thousands of articles on climatology over several decades is flawed? The logic of such a deduction is… shaky.

The modern world is based, through technology, on science. To post on the Web their absurd rejections of scientifically established facts, the Fox News readers couldn’t do without relying on mobile phones, mobile networks, software systems, computers and other extraordinary achievements of human intelligence, the result of centuries of patient cumulative application of the same scientific principles and techniques that these posts ridicule. They are stuck in a pre-scientific mindset, dominated by the kind of magical thinking that the founders of modern thought had to overcome between the 16th and 18th century, as brilliantly analyzed by Gaston Bachelard’s Formation of the Scientific Mind.

Somehow they skipped what the rest of us learn in grade school (that two plus two equals four, cause precedes effect and so on). They are many, they vote, they think they are right and the rest of the world is wrong, hold these beliefs very strongly (Dunning-Kruger effect), and put the world at risk.

In the restrooms of French freeway service stations managed by Total, the soap dispensers partake of pressing advice:

The message reads:

Total wants to save on costs. Soap is money.

Fine. But on the matter of hand-washing one might (perhaps) think, in the current circumstances, of more urgent advice?

On April 29 in the early evening at the Schaffhausen Institute of Technology I will give a talk on “*The Beauty of Software*”, exploring examples of what makes some concepts, algorithms, data structures etc. produce a sense of esthetics. (Full abstract below.) I gave a first version at TOOLS last year but am revising and expanding the talk extensively.

I obviously have my own examples but am interested in more. If you have some that you feel should be considered for inclusion, perhaps because you experienced a “Wow!” effect when you encountered them, please tell me. I am only asking for names or general pointers, not an in-depth analysis (that’s my job). To avoid having my thunder stolen I would prefer that you alert me by email. I will give credit for examples not previously considered.

Thanks!

*Abstract of the talk as published:*

*Scientists often cite the search for beauty as one of their primary guiding forces. Programming and software engineering offer an inexhaustible source of astoundingly beautiful ideas, from strikingly elegant algorithms and data structures to powerful principles of methodology and language design.*

*Defining beauty is elusive, but true beauty imposes itself in such a way as to remove any doubt. Drawing comparisons from art, literature and other endeavours. He will show a sample of ideas from all walks of software, directly understandable to a wide audience of non-software-experts, offering practical applications in technology that we use daily, and awe-inspiring in their simplicity and elegance.*

When you want to contact academic researchers, particularly computer scientists, you often find their email addresses on their Web pages in a mildly obfuscated form such as “albert dot einstein at princeton dot edu”.

If you try to copy-paste such a pseudo-address into an email client so as to fix it there, you often have to spend some time fighting the email client’s knowledge of what an email address looks like. It can result in errors and bounced mail. Not the world’s worst scandal but an annoying waste of time.

An address written out in that form is a way for the page owner to announce to the cognoscenti: “*I am a computer scientist and hence very knowledgeable about the ways of the Internet; I know that spammers run bots to harvest addresses. See how I defeat them.*”

So 1995!

Both spam and defenses against it are no longer what they were back then. Anyone who manages to use email effectively is protected, even without knowing it, by spam blockers, which have become quite good. (According to a specialized site, 14.5 *billion* spam emails are sent every day, so without these protections we would all be be drowning in spam, which for most people is not the case.)

As to any spam harvesters who are really interested in computer science researchers, they are most likely able anyway to write a little regular expression analyzer that captures the vast majority of the supposedly obfuscated addresses.

If you really want strangers to be able to email you without making your address visible to the spammers, and are a CS person, just include in your Web page a few lines of Javascript that, without revealing the email address in the HTML code, will display something like “Here is my email address”, in such a way that a visitor who clicks on Here gets a new-email window with your email address pre-filled. Not very hard — I use this trick on my own home page and am certainly not a Javascript expert.

But I suspect that as long as you are prepared to let people email you, even just letting your email address appear in clear is not going to result in catastrophe. Your organization’s or ISP’s spam filter is protecting you.

Come on. This is 2020. Windows 95 and the OJ Simpson trial are no longer the news of the day. Time to stop worrying about what no longer matters, and stop bothering people who are just trying to reach you.

Down with corny address obfuscation!

The page for the 2020 LASER summer school (31 May to 7 June) now has the basic elements (some additions still forthcoming) and registration at the early price is open. The topic is **DevOps, Microservices and Software Development for the Age of the Web ** with both conceptual lectures and contributions from industry, by technology leaders from Amazon, Facebook and ServiceNow. The confirmed speakers are:

- Fabio Casati, ServiceNow and University of Trento, and Kannan Govindarajan from ServiceNow on
**Taking AI from research to production – at scale**. - Adrian Cockcroft, Amazon Web Services, on
**Building and Operating Modern Applications**. - Elisabetta Di Nitto, Politecnico di Milano.
- Valérie Issarny, INRIA, on
**The Web for the age of the IoT**. - Erik Meijer, Facebook, on
**Software Development At Scale**. - Me, on
**Software from beginning to end: a comprehensive method**.

As always, the setup is the incomparable environment of the Hotel del Golfo in Procchio, Elba Island off the coast of Tuscany, ideal at that time of year (normally good weather, warm but not hot, few tourists). The school is intensive but there is time to enjoy the beach, the hotel’s amenities and the wonderful of environment of Elba (wake up your inner Napoleon). The school has a fairly small size and everyone lives under the same (beautiful) roof, so there is plenty of time for interaction with the speakers and other participants.

About these participants: the school is intended for engineers and managers in industry as well as researchers and PhD student. In fact it’s a mix that one doesn’t find that often, allowing for much cross-learning.

Another way to put it is that this is now the 16th edition of the school (it started in 2004 but we skipped one year), so it cannot be doing everything wrong.

Gilles Brassard, quantum cryptography pioneer (among other achievements), will give two talks this Wednesday (22.01):

- One at the University of Zurich, at 11:15 (session start at 10:30) on “The Art of Secret Communication in a Quantum World””.
- The other at the Schaffhausen Institute of Technology at 18:30 (session start at 17:30, talks followed by Apéro) in Schaffhausen, with the title “What if Einstein was right after all? Once again…”.

In other words, morning talk more technical (quantum cryptography), evening talk more general.

Abstracts and other details at https://sit.org/insights, also registration (not required but recommended).

— Bertrand Meyer

I am giving a “distinguished lecture” at the University of California, Santa Barbara, January 10 (Friday, tomorrow) at 14. The title is A* Comprehensive Approach to Requirements Engineering*.

The abstract and rest of the information are here.

I will spend the last few minutes of the talk discussing other current developments (verification, concurrency).

The “Morgenstern Colloquium” at the University of Nice / INRIA Sophia Antipolis invited me to give a talk, next Wednesday (18 December) at 11 in Sophia Antipolis, in the aptly named* “Kahn Building”. The announcement appears here. I proposed various topics but (pleasant surprise) the organizers explicitly asked me to lecture about what I really want to talk about: the Eiffel approach. I will give a general presentation describing not specifically the language but the unified view of software construction embodied in Eiffel, from modeling to requirements to design, implementation and verification. Here is the abstract:

With society’s growing reliance on IT systems, the ability to write high-quality software is ever more critical. While a posteriori verification techniques have their role, there is no substitute for methods and tools that provide built-in quality (“correctness by construction”)and scale up to very large systems. For several decades my colleagues and I have been building such a method, based in particular on the concept of Design by Contract, the associated tools and the supporting language, Eiffel. The scope is wide, encompassing all aspects of the software development process, from requirements and design to implementation and verification. I will present an overview of the approach, show what it can yield, and discuss remaining open issues.

This talk is meant for everyone, whether from industry or academia, with an interest in practical techniques for engineering high-quality software.

No registration is required. The presentation will be in English.

*Gilles Kahn, a brilliant computer scientist who died too young, was for a while director of INRIA.

Software engineering has improved a lot in the past couple of decades, but there remains an area where the old doomsday style of starting a software engineering paper (software crisis, everything is rotten…) still fits: requirements engineering. Just see the chasm between textbook advice and the practice of most projects.

I have written on requirements in this blog, including very recently, and will continue in forthcoming installments. For today I want to point to a recent article [1], presented at the newly revived TOOLS conference in October. It attempts to bring some order and rigor to the basic definitions in the field.

From the abstract:

Requirements engineering is crucial to software development but lacks a precise definition of its fundamental concepts. Even the basic definitions in the literature and in industry standards are often vague and verbose.

To remedy this situation and provide a solid basis for discussions of requirements, this work provides precise definitions of the fundamental requirements concepts and two systematic classifications: a taxonomy of requirement elements (such as components, goals, constraints…) ; and a taxonomy of possible relations between these elements (such as “extends”, “excepts”, “belongs”…).

The discussion evaluates the taxonomies on published requirements documents; readers can test the concepts in two online quizzes.

The intended result of this work is to spur new advances in the study and practice of software requirements by clarifying the fundamental concepts.

This version is a first step; we are aware of its limitations and are already revising the definitions and taxonomy. The project is aimed at providing a solid foundation for a delicate area of software engineering and it will take some time to get it completely right. Still, I think the paper as it is already introduces important concepts. I will within the next two weeks write a more detailed blog article summarizing some of them.

[1] Bertrand Meyer, Jean-Michel Bruel, Sophie Ebersold, Florian Galinier, Alexandr Naumchev, *The Anatomy of Requirements*, in TOOLS 51, Software Technology: Methods and Tools

Innopolis, Russia, October 15–17, 2019, pages 10-40, available here (Springer site, paywall) and here (arXiv draft).

December 16 (next Monday), the newly created Schaffhausen Institute of Technology organizes an entire day of events around three (no less) talks by the physics Nobel prize winner and MIT professor Wolfgang Ketterle.

The culmination of the day is a talk by Prof. Ketterle in the evening on “*What happened to the kilogram*?”. From the abstract:

For 130 years, a cylinder made of a platinum-iridium alloy stored in Saint-Cloud near Paris was the official definition of a kilogram, the basic unit of mass. This all changed on May 20 of this year: a kilo is now be defined by a fundamental constant of nature known, the Planck constant, which relates the energy of a photon to its frequency: 6.62607015 times 10-34 kilograms times square meters per second. Try that the next time you buy a kilo of asparagus.

Sounds complicated? For MIT’s Wolfgang Ketterle, a Nobel Prize winner, “Conceptually, the definition is very simple”.

Simple? Really? Come to Schaffhausen and hear for yourself whether Prof. Ketterle can make the new kilogram crystal-clear to common mortals.

Earlier in the day, he will give a talk in German on new forms of materials that appear at temperatures near the absolute zero, complete with demonstrations.

More generally, there is a full set of talks throughout the day about various aspects of advanced physics and computer science, and even a “quantum magician”, plus music and food.

Schaffhausen is about 40 minutes from Zurich (or Zurich airport) by train or car.

Attendance is free but registration is recommended. One can register for the full day or for some events only. See further information and registration form here.

Some important concepts of software engineering, established over the years, are not widely known in the community. One use of this blog is to provide tutorials on such overlooked ideas. An earlier article covered one pertaining to project management: the Shortest Possible Schedule property . Here is another, this time in the area of requirements engineering, also based on a publication that I consider to be a classic (it is over 40 years old) but almost unknown to practitioners.

Practitioners are indeed, as in most of my articles, the intended audience. I emphasize this point right at the start because if you glance at the rest of the text you will see that it contains (horror of horrors) some mathematical formulae, and might think “this is not for me”. It is! The mathematics is very simple and my aim is practical: to shed light on an eternal question that faces anyone writing requirements (whatever the style, traditional or agile): how can I be sure that a requirements specification is complete?

To a certain extent you cannot. But there is better answer, a remarkably simple one which, while partial, helps.

The better answer is called “sufficient completeness” and comes from the theory of abstract data types. It was introduced in a 1978 article by Guttag and Horning [1]. It is also implicit in a more down-to-earth document, the 1998 IEEE standard on how to write requirements [2].

There is nothing really new in the present article; in fact my book *Object-Oriented Software Construction *[3] contains an extensive discussion of sufficient completeness (meant to be more broadly accessible than Guttag and Horning’s scholarly article). But few people know the concepts; in particular very few practitioners have heard of sufficient completeness (if they have heard at all of abstract data types). So I hope the present introduction will be useful.

The reason the question of determining completeness of requirements seems hopeless at first is the natural reaction: complete with respect to *what*? To know that the specification is complete we would need a more general description of all that our stakeholders want and all the environment constraints, but this would only push the problem further: how do we know that such description itself is complete?

That objection is correct in principle: we can never be sure that we did not forget something someone wanted, or some property that the environment imposes. But there also exist more concrete and assessable notions of completeness.

The IEEE standard gives three criteria of completeness. The first states that “all requirements” have been included, and is useless, since it runs into the logical paradox mentioned above, and is tautological anyway (the requirements are complete if they include all requirements, thank you for the information!). The second is meaningful but of limited interest (a “bureaucratic” notion of completeness): every element in the requirements document is numbered, every cross-reference is defined and so on. The last criterion is the interesting one: *“Definition of the responses of the software to all realizable classes of input data in all realizable classes of situations*”. Now this is meaningful. To understand this clause we need to step back to sufficient completeness and, even before that, to abstract data types.

Abstract data types will provide our little mathematical excursion (our formal picnic in the words of an earlier article) in our study of requirements and completeness. If you are not familiar with this simple mathematical theory, which every software practitioner should know, I hope you will benefit from the introduction and example. They will enable us to introduce the notion of sufficient completeness formally before we come back to its application to requirements engineering.

Abstract data types are the mathematical basis for object-oriented programming. In fact, OO programming but also OO analysis and OO design are just a realization of this mathematical concept at various levels of abstraction, even if few OO practitioners are aware of it. (Renewed reference to [3] here if you want to know more.)

An ADT (abstract data type) is a set of objects characterized not by their internal properties (what they are) but by the operations applicable to them (what they have), and the properties of these operations. If you are familiar with OO programming you will recognize that this is exactly, at the implementation level, what a *class *is. But here we are talking about mathematical objects and we do not need to consider implementation.

An example of a type defined in this way, as an ADT, is a notion of POINT on a line. We do not say how this object is represented (a concept that is irrelevant at the specification level) but how it appears to the rest of the world: we can create a new point at the origin, ask for the coordinate of a point, or move the point by a certain displacement. The example is the simplest meaningful one possible, but it gives the ideas.

An ADT specification has three part: Functions, Preconditions and Axioms. Let us see them (skipping Preconditions for the moment) for the definition of the POINT abstract data type.

The functions are the operations that characterize the type. There are three kinds of function, defined by where the ADT under definition, here POINT, appears:

- Creators, where the type appears only among the results.
- Queries, where it appears only among the arguments.
- Commands, where it appears on both sides.

There is only one creator here:

*new*: → POINT

*new *is a function that takes no argument, and yields a point (the origin). We will write the result as just *new *(rather than using empty parentheses as in *new *()).

Creators correspond in OO programming to constructors of a class (creation procedures in Eiffel). Like constructors, creators may have arguments: for example instead of always creating a point at the origin we could decide that *new *creates a point with a given coordinate, specifying it as INTEGER → POINT and using it as *new *(i) for some integer i (our points will have integer coordinates). Here for simplicity we choose a creator without arguments. In any case the new type, here POINT, appears only on the side of the results.

Every useful ADT specification needs at least one creator, without which we would never obtain any objects of the type (here any points) to work with.

There is also only one query:

*x*: POINT → INTEGER

which gives us the position of a point, written *x* (p) for a point p. More generally, a query enables us to obtain properties of objects of the new type. These properties must be expressed in terms of types that we have already defined, like INTEGER here. Again there has to be at least one query, otherwise we could never obtain usable information (information expressed in terms of what we already know) about objects of the new type. In OO programming, queries correspond to fields (attributes) of a class and functions without side effects.

And we also have just one command:

*move*: POINT × INTEGER → POINT

a function that for any point p and integer i and yields a new point, *move *(p, i). Again an ADT specification is not interesting unless it has at least one command, representing ways to modify objects. (In mathematics we do not actually modify objects, we get new objects. In imperative programming we will actually update existing objects.) In the classes of object-oriented programming, commands correspond to procedures (methods which may change objects).

You see the idea: define the notion of POINT through the applicable operations.

Listing their names and the types of their arguments types results (as in POINT × INTEGER → POINT) is not quite enough to specify these operations: we must specify their fundamental properties, without of course resorting to a programming implementation. That is the role of the second component of an ADT specification, the axioms.

For example I wrote above that *new *yields the origin, the point for which *x* = 0, but you only had my word for it. My word is good but not good enough. An axiom will give you this property unambiguously:

*x* (*new*) = 0 — A0

The second axiom, which is also the last, tells us what *move* actually does. It applies to any point p and any integer m:

*x* (*move* (p, m)) = *x* (p) + m — A1

In words: the coordinate of the point resulting from moving p by m is the coordinate of p plus m.

That’s it! (Except for the notion of precondition, which will wait a bit.) The example is trivial but this approach can be applied to any number of data types, with any number of applicable operations and any level of complexity. That is what we do, at the design and implementation level, when writing classes in OO programming.

Sufficient completeness is a property that we can assess on such specifications. An ADT specification for a type T (here POINT) is sufficiently complete if the axioms are powerful enough to yield the value of any well-formed query expression in a form not involving T. This definition contains a few new terms but the concepts are very simple; I will explain what it means through an example.

With an ADT specification we can form all kinds of expressions, representing arbitrarily complex specifications. For example:

*x* (*move *(*move* (*move* (*new*, 3), *x* (*move* (*move* (*new*, -2), 4))), -6))

This expression will yield an integer (since function* x* has INTEGER as its result type) describing the result of a computation with points. We can visualize this computation graphically; note that it involves creating two points (since there are two occurrences of *new*) and moving them, using in one case the current coordinate of one of them as displacement for the other. The following figure illustrates the process.

The result, obtained informally by drawing this picture, is the x of P5, that is to say -1. We will derive it mathematically below.

Alternatively, if like most programmers (and many other people) you find it more intuitive to reason operationally than mathematically, you may think of the previous expression as describing the result of the following OO program (with variables of type POINT):

**create **p — In C++/Java syntax: p = new POINT();

**create **q

p.move (3)

q.move (-2)

q.move (4)

p.move (q.x)

p.move (-6)

**Result **:= p.x

You can run this program in your favorite OO programming language, using a class POINT with *new*, *x *and *move,* and print the value of **Result**, which will be -1.

Here, however, we will stay at the mathematical level and simplify the expression using the axioms of the ADT, the same way we would compute any other mathematical formula, applying the rules without needing to rely on intuition or operational reasoning. Here is the expression again (let’s call it i, of type INTEGER):

i* = **x* (*move *(*move* (*move* (*new*, 3), *x* (*move* (*move* (*new*, -2), 4))), -6))

A *query *expression is one in which the outermost function being applied, here x, is a query function. Remember that a query function is one which the new type, here POINT, appears only on the left. This is the case with *x*, so the above expression i is indeed a query expression.

For sufficient completeness, query expressions are the ones of interest because their value is expressed in terms of things we already know, like INTEGERs, so they are the only way we can concretely obtain directly usable information the ADT (to de-abstract it, so to speak).

But we can only get such a value by applying the axioms. So the axioms are “sufficiently complete” if they always give us the answer: the value of any such query expression.

Let us see if the above expression i satisfies this condition of sufficient completeness. To make it more tractable let us write it in terms of simpler expressions (all of type POINT), as illustrated by the figure below:

p1 = *move *(*new*, 3)

p2= *move *(*new*, -2)

p3= *move *(p2, 4)

p4= *move *(p1, *x* (p3))

p5= *move *(p4, -6)

i = *x* (p5)

(You may note that the intermediate expressions roughly correspond to the steps in the above interpretation of the computation as a program. They also appear in the illustrative figure repeated below.)

Now we start applying the axioms to evaluating the expressions. Remember that we have two axioms: A0 tells us that *x* (*new*) = 0 and A1 that *x* (*move* (p, m)) = *x* (p) + m. Applying A1 to the definition the expression i yields

i = *x *(*p4*) – 6

= i4 – 6

if we define

i4 = *x *(*p4*) — Of type INTEGER

We just have to compute i4. Applying A1 to the definion of p4 tells us that

i4 = *x* (p1) + *x* (p3)

To compute the two terms:

- Applying A1 again, we see that the first term
*x*(p1) is*x**(new*) + 3, but then A0 tells us that*x*(*new*) is zero, so*x*(p1) is 3. - As to
*x*(p3), it is, once more from A1,*x*(p2) + 4, and*x*(p2) is (from A1 then A0), just -2, so*x*(p3) is 2.

In the end, then, i4 is 5, and the value of the entire expression i = i4 – 6 is -1. Good job!

The successful computation of i was just a derivation for one example, showing that in that particular case the axioms yield the answer in terms of an INTEGER. How do we go from one example to an entire specification?

The bad news first: like all interesting problems in programming, sufficient completeness of an ADT specification is theoretically undecidable. There is no general automatic procedure that will process an ADT specification and print out ““sufficiently complete” or “not sufficiently complete”.

Now that you have recovered from the shock, you can share the computer scientist’s natural reaction to such an announcement: so what. (In fact we might define the very notion of computer scientist as someone who, even before he brushes his teeth in the morning — if he brushes them at all — has already built the outline of a practical solution to an undecidable problem.) It is enough that we can find a way to determine if a *given *specification is sufficiently complete. Such a proof is, in fact, the computer scientist’s version of dental hygiene: no ADT is ready for prime time unless it is sufficiently complete.

The proof is usually not too hard and will follow the general style illustrated for our simple example.

We note that the definition of sufficient completeness said: “the axioms are powerful enough to yield the value of any *well-formed* query expression in a form not involving the type”. I have not defined “well-formed” yet. It simply means that the expressions are properly structured, with the proper syntax (basically the correct matching of parentheses) and proper number and types of arguments. For example the following are not well-formed (if p is an expression of type POINT):

*move *(p, 55( — Bad use of parentheses.

*move *(p) — Wrong number of arguments.

*move *(p, p) — Wrong type: second argument should be an integer.

Such expressions are nonsense, so we only care about well-formed expressions. Note that in addition to *new*, *x* and *move *, an expression can use integer constants as in the example (although we could generalize to arbitrary integer expressions). We consider an integer constant as a query expression.

We have to prove that with the two axioms A0 and A1 we can determine the value of any query expression i. Note that since the only query functions is* x*, the only possible form for i, other than an integer constant, is *x* (p) for some expression p of type POINT.

The proof proceeds by induction on the number *n *of parenthesis pairs in a query expression i.

There are two base steps:

*n*= 0: in that case i can only be an integer constant. (The only expression with no parentheses built out of the ADT’s functions is*new*, and it is not a query expression.) So the value is known. In all other cases i will be of the form*x*(*p*) as noted.*n*= 1: in that case p can only be*new*, in other words i = x (*new*), since the only function that yields points, other than*new*, is*move*, and any use of it would add parentheses. In this case axiom A0 gives us the value of i: zero.

For the induction step, we consider i with *n* + 1 parenthesis pairs for n > 1. As noted, i is of the form x (p), so p has exactly n parenthesis pairs. p cannot be *new *(which would give 0 parenthesis pairs and was taken care of in the second base step), so p has to be of the form

p = *move *(p’, i’) — For expressions p’ of type POINT and i’ of type INTEGER.

implying (since i = *x* (p)) that by axiom A1, the value of i is

*x* (p’) + i’

So we will be able to determine the value of i if we can determine the value of both *x* (p’) and i’. Since p has *n* parenthesis pairs and p = *move *(p’, i’), both p’ and i’ have at most *n* – 1 parenthesis pairs. (This use of *n* – 1 is legitimate because we have two base steps, enabling us to assume *n* > 1.) As a consequence, both *x* (p’) and i’ have at most *n* parenthesis pairs, enabling us to deduce their values, and hence the value of i, by the induction hypothesis.

Most proofs of sufficient completeness in my experience follow this style: induction on the number of parenthesis pairs (or the maximum nesting level).

I left until now the third component of a general ADT specification: preconditions. The need for preconditions arises because most practical specifications need some of their functions to be *partial*. A partial function from X to Y is a function that may not yield a value for some elements of X. For example, the inverse function on real numbers, which yields 1 / a for x, is partial since it is not defined for a = 0 (or, on a computer, for non-zero but very small a).

Assume that in our examples we only want to accept points that lie in the interval [-4, +4]:

We can simply model this property by turning move into a partial function. It was specified above as

*move*: POINT × INTEGER → POINT

The ordinary arrow → introduces a total (always defined) function. For a partial function we will use a crossed arrow ⇸, specifying the function as

*move*: POINT × INTEGER ⇸ POINT

Other functions remain unchanged. Partial functions cause trouble: for f in X ⇸ Y we can no longer cheerfully use f (x) if f is a partial function, even for x of the appropriate type X. We have to make sure that x belongs to the *domain *of f, meaning the set of values for which f is defined. There is no way around it: if you want your specification to be meaningful and it uses partial functions, you must specify explicitly the domain of each of them. Here is how to do it, in the case of *move*:

*move *(p: POINT; d: INTEGER) **require **|*x* (p) + d | < 5 — where |…| is absolute value

To adapt the definition (and proofs) of sufficient completeness to the possible presence of partial functions:

- We only need to consider (for the rule that axioms must yield the value of query expressions) well-formed expressions that satisfy the associated preconditions.
- The definition must, however, include the property that axioms always enable us to determine whether an expression satisfies the associated preconditions (normally a straightforward part of the proof since preconditions are themselves query expressions).

Updating the preceding proof accordingly is not hard.

The definition of sufficient completeness is of great help to assess the completeness of a requirements document. We must first regretfully note that for many teams today requirements stop at “use cases” (scenarios) or “user stories”. Of course these are not requirements; they only describe individual cases and are to requirements what tests are to programs. They can serve to check requirements, but do not suffice as requirements. I am assuming real requirements, which include descriptions of behavior (along with other elements such as environment properties and project properties). To describe behaviors, you will define operations and their effects. Now we know what the old IEEE standard is telling us by stating that complete requirements should include

definition of the responses of the software to all realizable classes of input data in all realizable classes of situations

Whether or not we have taken the trouble to specify the ADTs, they are there in the background; our system’s operations reflect the commands, and the effects we can observe reflect the queries. To make our specification complete, we should draw as much as possible of the (mental or explicit) matrix of possible effects of all commands on all queries. “As much as possible” because software engineering is engineering and we will seldom be able to reach perfection. But the degree of fullness of the matrix tells us a lot (possible software metric here?) about how close our requirements are to completeness.

I should note that there are other aspects to completeness of requirements. For example the work of Michael Jackson, Pamela Zave and Axel van Lamsweerde (more in some later article, with full references) distinguishes between business goals, environment constraints and system properties, leading to a notion of completeness as how much the system properties meet the goals and obey the constraints [4]. Sufficient completeness operates at the system level and, together with its theoretical basis, is one of those seminal concepts that every practicing software engineer or project manager should master.

[1] John V. Guttag, Jim J. Horning: *The Algebraic Specification of Abstract Data Type*s, in *Acta Informatica*, vol. 10, no. 1, pages 27-52, 1978, available here from the Springer site. This is a classic paper but I note that few people know it today; in Google Scholar I see over 700 citations but less than 100 of them in the past 8 years.

[2] IEEE: *Recommended Practice for Software Requirements Specifications*, IEEE Standard 830-1998, 1998. This standard is supposed to be obsolete and replaced by newer ones, more detailed and verbose, but it remains the better reference: plain, modest and widely applied by the industry. It does need an update, but a good one.

[3] Bertrand Meyer, *Object-Oriented Software Construction*, 2nd edition, Prentice Hall, 1997. The discussion of sufficient completeness was in fact already there in the first edition from 1988.

[4] With thanks to Elisabetta Di Nitto from Politecnico di Milano for bringing up this notion of requirements completeness.

* A version of this article was first published on the Communications of the ACM blog.*

Sign seen in a Singapore shopping center:

Let us make sure we understand: here children are not allowed, but playing is.

As a consequence such playing must be performed by non-children only. Adults welcome to play!

Maybe it is actually not the intended meaning. Instead of

(and (not (allowed children)) (allowed playing))

the desired parsing may be

(not (allowed (playing children)))

One hint in favor of this second interpretation is that in practice people seldom put up signs to advertise that something *is* allowed.

So new-line must be a mere break character equivalent to space, not a semantics-carrying delimiter.

Somewhat reminiscent of eats shoots and leaves. Here it is not even a punctuation mark, just a humble new-line, but its visual effect is strong all the same.

The 2020 LASER summer school has been announced. It will take place June 1 to 6* , as always in Elba Island, this year with the theme **DevOps, Microservices and Software Development for the Age of the Web**. The first five speakers are listed on the conference page, with more to come, from both academia and industry.

This is the 16th edition of the school (already) and, as always, rests on the LASER recipe of “Sea, Sun and Software”: densely packed lectures by top experts with the opportunity to enjoy the extraordinary surroundings of the Island of Elba and the Hotel del Golfo’s unique food, beach and facilities, with lots of time devoted to interactions between speakers and attendees.

This year’s theme is devoted to advances in the newest Web technologies and the corresponding software engineering issues and development models.

*Arrival on May 31st, departure on June 7th.

Earlier this year I was in Sofia for a conference, at the main university (Saint Kliment) which in the entrance hall had an exhibition about its history. There was this student poster and song from I think around 1900:

I like the banner (what do you think?). It even has the correct Latin noun and verb plurals.

Anyone know where to find a university today with that kind of students, that kind of slogan, that kind of attitude and that kind of grammar? Please send me the links.

The best way to make software requirements precise is to use one of the available “formal” approaches. Many have been proposed; I am not aware of a general survey published so far. Over the past two years, we have been working on a comprehensive survey of the use of formality in requirements, of which we are now releasing a draft. “We” is a joint informal research group from Innopolis University and the University of Toulouse, whose members have been cooperating on requirements issues, resulting in publications listed under “References” below and in several scientific events.

The survey is still being revised, in particular because it is longer than the page limit of its intended venue (ACM *Computing Surveys*), and some sections are in need of improvement. We think, however, that the current draft can already provide a solid reference in this fundamental area of software engineering.

The paper covers a broad selection of methods, altogether 22 of them, all the way from completely informal to strictly formal. They are grouped into five categories: natural language, semi-formal, automata- or graph-based, other mathematical frameworks, programming-language based. Examples include SysML, Relax, Statecharts, VDM, Eiffel (as a requirements notation), Event-B, Alloy. For every method, the text proposes a version of a running example (the Landing Gear System, already used in some of our previous publications) expressed in the corresponding notation. It evaluates the methods using a set of carefully defined criteria.

The paper is: Jean-Michel Bruel, Sophie Ébersold, Florian Galinier, Alexandr Naumchev, Manuel Mazzara and Bertrand Meyer: *Formality in Software Requirements*, draft, November 2019.

The text is available here. Comments on the draft are welcome.

Bertrand Meyer, Jean-Michel Bruel, Sophie Ebersold, Florian Galinier and Alexandr Naumchev: *Towards an Anatomy of Software Requirements, in *TOOLS 2019, pages 10-40, see here (or arXiv version here). I will write a separate blog article about this publication.

Alexandr Naumchev and Bertrand Meyer: *Seamless requirements*. Computer Languages, Systems & Structures 49, 2017, pages 119-132, available here.

Florian Galinier, Jean-Michel Bruel, Sophie Ebersold and Bertrand Meyer: *Seamless Integration of Multirequirements*, in Complex Systems, 25th International Requirements Engineering Conference Workshop, IEEE, pages 21-25, 2017, available here.

Alexandr Naumchev, Manuel Mazzara, Bertrand Meyer, Jean-Michel Bruel, Florian Galinier and Sophie Ebersold: *A contract-based method to specify stimulus-response requirements*, Proceedings of the Institute for System Programming, vol. 29, issue 4, 2017, pp. 39-54. DOI: 10.15514, available here.

Alexandr Naumchev and Bertrand Meyer: *Complete Contracts through Specification Drivers*., in 10th International Symposium on Theoretical Aspects of Software Engineering (TASE), pages 160-167, 2016, available here.

Alexandr Naumchev, Bertrand Meyer and Víctor Rivera: *Unifying Requirements and Code: An Example*, in PSI 2015 (Ershov conference, Perspective of System Informatics), pages 233-244, available here.

Recently I had a need to collect my education-related publications, so I went through my publication list and extracted items devoted to issues of learning computer science (informatics) and software engineering. There turned out to be far more than I expected; I did not think of myself as primarily an education researcher but it seems I am that too. (Looking around in my fields and institutions I don’t immediately see other research computer scientists with a comparable record at SIGCSE, ITiCSE and other top CS education publications.)

Without presuming that the list will be of interest I am reproducing it below for the record. All comes from my publication list here, which contains more information, in particular a descriptive paragraph or two for every single publication. (Note: that consolidate list is up to date until 2017 only, because it is produced by a script and one of the supporting system tools — not an Eiffel program, thanks for asking — has stopped functioning on the latest OS version on the server. I have not found the time to fix it yet. There are a good dozen publications missing for 2018 and 2019.)

I have also included PhD theses in education. (Whole list of PhD theses supervised here.)

The topics include among others, in approximate chronological order (although the list below is in the reverse order):

- Early experience teaching modern programming concepts in both industry and universities.
- In the nineties, I was full time at Eiffel Software, the development of a general framework for teaching programming. This was written from the safe position of someone in industry advising academic colleagues on what to do (usually the advice goes the other way). I did have, however, the opportunity to practice my preaching in short stints at University of Technology, Sydney and particularly Monash University. The concept of the Inverted Curriculum (also known as “ Outside-In”) date back to that period, with objects first (actually classes) and contracts first too.
- When I joined ETH, a general paper on the fundamental goals and concepts of software engineering education, “Software Engineering in the Academy”, published in IEEE Computer.
- At ETH, putting the Inverted Curriculum in practice, with 14 consecutive sessions of the introductory programming courses for all computer science students, resulting in the
*Touch of Class*textbook and a number of papers coming out of our observations. An estimated 6000 students took the course. A variant of it has also been given several times at Innopolis University. - A theory of how to structure knowledge for educational purposes, leading to the notion of “Truc” (Teachable, Reusable Unit of Cognition).
- The development by Michela Pedroni of the Trucstudio environment, similar in its form to an IDE but devoted, instead of the development of programs, to the visual development of courses, textbooks, curricula etc.
- Empirical work by Marie-Hélène Ng Cheong Vee (Nienaltowski) and Michela Pedroni on what beginners understand easily, and not, for example according to the phrasing of compiler error messages.
- Other empirical work, by Michela Pedroni and Manuel Oriol, on the prior knowledge of entering computer science students.
- The DOSE course (Distributed and Outsourced Software Engineering) ran for several years a student project done by joint student teams from several cooperating universities, including Politecnico di Milano which played a key role along with us. It enabled many empirical studies on the effect on software development of having geographically distributed teams. People who played a major role in this effort are, at ETH, Martin Nordio, Julian Tschannen and Christian Estler and, at Politecnico, Elisabetta di Nitto, Giordano Tamburrelli and Carlo Ghezzi.
- Several MOOCs, among the first at ETH, on introductory computing and agile methods. They do not appear below because they are not available at the moment on the EdX site (I do not know why and will try to get them reinstated). The key force there was Marco Piccioni. MOOCs are interesting for many reasons; they are a substitute neither for face-to-face teaching nor for textbooks, but an interesting complement offering novel educational possibilities. Our programming MOOCs are particularly innovative since (thanks to codeboard, see below) they provide the opportunity to compile and run program directly from the course exercise pages, compare the run’s result to correct answers for prepared tests, and get immediate feedback .
- A comparative study of teaching effectiveness of two concurrency models, Eiffel SCOOP and JavaThreads (Sebastian Nanz, Michela Pedroni).
- The development (Christian Estler with Martin Nordio) of the Codeboard system and site, still as far as I know the most advanced system for cloud support to teach programming, enabling students to compile, correct and run programs on the web, with support for various languages. Codeboard is used in the programming MOOCs.
- A hint system (Paolo Antonucci, Michela Pedroni) to help students get progressive help, as in video games, when they stumble trying to write a program, e.g. with Codeboard.

The following three theses are devoted to educational topics (although many of the other theses have educational aspects too):

Christian Estler, 2014, *Understanding and Improving Collaboration in Distributed Software Development,* available here.

Michela Pedroni, 2009, *Concepts and Tools for Teaching Programming*, available here.

Markus Brändle, 2006: *GraphBench: Exploring the Limits of Complexity with Educational Software: Exploring the limits of complexity with educational software*, available here. (The main supervisor in this case was Jürg Nievergelt.)

Internal MOOCs, and three courses on EdX (links will be added when available):

- Computing: Art, Magic, Science? Part 1 (CAMS 1), 2013.
- Computing: Art, Magic, Science? Part 1 (CAMS 2), 2014.
- Agile Software Development, 2015.

1. Paolo Antonucci, Christian Estler, Durica Nikolic, Marco Piccioni and Bertrand Meyer:* An Incremental Hint System For Automated Programming Assignments*, in ITiCSE ’15, Proceedings of 2015 ACM Conference on Innovation and Technology in Computer Science Education, 6-8 July 2015, Vilnius, ACM Press, pages 320-325. (The result of a master’s thesis, a system for helping students solve online exercises, through successive hints.) Available here.

2. Jiwon Shin, Andrey Rusakov and Bertrand Meyer: *Concurrent Software Engineering and Robotics Education*, in 37th International Conference on Software Engineering (ICSE 2015), Florence, May 2015, IEEE Press, pages 370-379. (Describes our innovative Robotics Programming Laboratory course, where students from 3 departments, CS, Mechanical Engineering and Electrical Engineering learned how to program robots.) Available here.

3. Cristina Pereira, Hannes Werthner, Enrico Nardelli and Bertrand Meyer: I*nformatics Education in Europe: Institutions, Degrees, Students, Positions, Salaries — Key Data 2008-2013, Informatics Europe report, October 2014.* (Not a scientific publication but a report. I also collaborated in several other editions of this yearly report series, which I started, from 2011 on. A unique source of information about the state of CS education in Europe.) Available here.

4. (One of the authors of) *Informatics education: Europe cannot afford to miss the boat*, edited by Walter Gander, joint Informatics Europe and ACM Europe report, April 2013. An influential report which was instrumental in the introduction of computer science in high schools and primary schools in Europe, particularly Switzerland. Emphasized the distinction between “digital literacy” and computer science. Available here.

5. Sebastian Nanz, Faraz Torshizi, Michela Pedroni and Bertrand Meyer: *Design of an Empirical Study for Comparing the Usability of Concurrent Programming Languages*, in Information and Software Technology Journal Elsevier, volume 55, 2013. (Journal version of conference paper listed next.) Available here.

6. Bertrand Meyer: *Knowledgeable beginners*, in Communications of the ACM, vol. 55, no. 3, March 2012, pages 10-11. (About a survey of prior knowledge of entering ETH CS students, over many years. Material from tech report below.) Available here.

7. Sebastian Nanz, Faraz Torshizi, Michela Pedroni and Bertrand Meyer: *Design of an Empirical Study for Comparing the Usability of Concurrent Programming Languages*, in ESEM 2011 (ACM/IEEE International Symposium on Empirical Software Engineering and Measurement), 22-23 September 2011 (best paper award). Reports on a carefully designed empirical study to assess the teachability of various approaches to concurrent programming. Available here.

8. Martin Nordio, H.-Christian Estler, Julian Tschannen, Carlo Ghezzi, Elisabetta Di Nitto and Bertrand Meyer: *How do Distribution and Time Zones affect Software Development? A Case Study on Communication*, in Proceedings of the 6th International Conference on Global Software Engineering (ICGSE), IEEE Computer Press, 2011, pages 176-184. (A study of the results of our DOSE distributed course, which involved students from different universities in different countries collaborating on a common software development project.) Available here.

9. Martin Nordio, Carlo Ghezzi, Elisabetta Di Nitto, Giordano Tamburrelli, Julian Tschannen, Nazareno Aguirre, Vidya Kulkarni and Bertrand Meyer: *Teaching Software Engineering using Globally Distributed Projects: the DOSE course*, in Collaborative Teaching of Globally Distributed Software Development – Community Building Workshop (CTGDSD), Hawaii (at ICSE), May 2011. (Part of the experience of our Distributed Outsourced Software Engineering course, taught over many years with colleagues from Politecnico di Milano and elsewhere, see paper in previous entry.) Available here.

10. Bertrand Meyer: *From Programming to Software Engineering* (slides only), material for education keynote at International Conference on Software Engineering (ICSE 2010), Cape Town, South Africa, May 2010. Available here.

11. Michela Pedroni and Bertrand Meyer: *Object-Oriented Modeling of Object-Oriented Concepts*, in ISSEP 2010, Fourth International Conference on Informatics in Secondary Schools, Zurich, January 2010, eds. J. Hromkovic, R. Královic, J. Vahrenhold, Lecture Notes in Computer Science 5941, Springer, 2010. Available here.

12. Michela Pedroni, Manuel Oriol and Bertrand Meyer: *What Do Beginning CS Majors Know*?, ETH Technical Report, 2009. (Unpublished report about the background of 1st-year ETH CS students surveyed over many years. See shorter 2012 CACM version above.) Available here.

13. Bertrand Meyer: *Touch of Class: Learning to Program Well Using Object Technology and Design by Contract*, Springer, 2009 (also translated into Russian). (Introductory programming textbook, used for many years at ETH Zurich and Innopolis University for the first programming course. The herecontains a long discussion of pedagogical issues of teaching programming and CS.) Book page and text of several chapters here.

14. Michela Pedroni, Manuel Oriol, Lukas Angerer and Bertrand Meyer: *Automatic Extraction of Notions from Course Material,* in Proceedings of SIGCSE 2008 (39th Technical Symposium on Computer Science Education), Portland (Oregon), 12-15 March 2008, ACM SIGCSE Bulletin, vol. 40, no. 1, ACM Press, 2008, pages 251-255. (As the title indicates, tools for automatic analysis of course material to extract the key pedagogical notions or “Trucs”.) Available here.

15. Marie-Hélène Nienaltowski, Michela Pedroni and Bertrand Meyer: *Compiler Error Messages: What Can Help Novices?*, in Proceedings of SIGCSE 2008 (39th Technical Symposium on Computer Science Education), Portland (Oregon), Texas, 12-15 March 2008, ACM SIGCSE Bulletin, vol. 40, no. 1, ACM Press, 2008, pages 168-172. (Discusses the results of experiments with different styles of compiler error messages, which can be baffling to beginners, to determine what works best.) Available here.

16. Bertrand Meyer and Marco Piccioni:* The Allure and Risks of a Deployable Software Engineering Project: Experiences with Both Local and Distributed Development*, in Proceedings of IEEE Conference on Software Engineering & Training (CSEE&T), Charleston (South Carolina), 14-17 April 2008, ed. H. Saiedian, pages 3-16. (Paper associated with a keynote at an SE education conference. See other papers on the DOSE distributed project experience below.) Available here.

17. Marie-Hélène Ng Cheong Vee (Marie-Hélène Nienaltowski), Keith L. Mannock and Bertrand Meyer: *Empirical study of novice error paths*, Proceedings of workshop on educational data mining at the 8th international conference on intelligent tutoring systems (ITS 2006), 2006, pages 13-20. (An empirical study of the kind of programming mistakes learners make.) Available here.

18. Bertrand Meyer: *Testable, Reusable Units of Cognition*, in Computer (IEEE), vol. 39, no. 4, April 2006, pages 20-24. (Introduced a general approach for structuring knowledge for teaching purposes: “Trucs”. Served as the basis for some other work listed, in particular papers with Michela Pedroni on the topics of her PhD thesis. Available here.

19. Michela Pedroni and Bertrand Meyer: *The Inverted Curriculum in Practice*, in Proceedings of SIGCSE 2006, Houston (Texas), 1-5 March 2006, ACM Press, 2006, pages 481-485. (Develops the idea of inverted curriculum which served as the basis for our teaching of programming at ETH, Innopolis etc. and led to the “Touch of Class” textbook.) Available here.

20. Bertrand Meyer: *The Outside-In Method of Teaching Introductory Programming*, in Perspective of System Informatics, Proceedings of fifth Andrei Ershov Memorial Conference, Akademgorodok, Novosibirsk, 9-12 July 2003, eds. Manfred Broy and Alexandr Zamulin, Lecture Notes in Computer Science 2890, Springer, 2003, pages 66-78. (An early version of the ideas presented in the previous entry.) Available here.

21. Bertrand Meyer: *Software Engineering in the Academy*, in Computer (IEEE), vol. 34, no. 5, May 2001, pages 28-35. Translations: Russian in Otkrytye Systemy (Open Systems Publications), #07-08-2001, October 2001. (A general discussion of the fundamental concepts to be taught in software engineering. Served as a blueprint for my teaching at ETH.) Available here.

22. Bertrand Meyer: *Object-Oriented Software Construction*, second edition, Prentice Hall, 1296 pages, January 1997. Translations: Spanish, French Russian, Serbian, Japanese. (Not a publication on education per se but cited here since it is a textbook that has been widely used for teaching and has many comments on pedagogy.)

23. Bertrand Meyer: *The Choice for Introductory Software Education*, Guest editorial in Journal of Object-Oriented Programming, vol. 7, no. 3, June 1994, page 8. (A discussion of the use of Eiffel for teaching software engineering topics.)

24. Bertrand Meyer, *Towards an Object-Oriented Curriculum*, in Journal of Object-Oriented Programming, vo. 6, number 2, May 1993, pages 76-81. (Journal version of paper cited next.) Available here.

25. Bertrand Meyer: *Towards an Object-Oriented Curriculum*, in TOOLS 11, Technology of Object-Oriented Languages and Systems, Santa Barbara, August 1993, eds. Raimund Ege, Madhu Singh and B. Meyer, Prentice Hall 1993, pages 585-594. (Early advocacy for using OO techniques in teaching programming – while I was not in academia. Much of my subsequent educational work relied on those ideas.) Available here.

26. Bertrand Meyer: Object-Oriented Software Construction, Prentice Hall, 592 pages, 1988. (First edition, translated into German, Italian, French, Dutch, Romanian, Chinese. As noted for second edition above, not about education per se, but widely used textbook with pedagogical implications.)

27. I*nitiation à la programmation en milieu industriel* (Teaching Modern Programming Methodology in an Industrial Environment), in RAIRO, série bleue (informatique), vol. 11, no. 1, pages 21-34 1977. (Early paper on teaching advanced programming techniques in industry.) Available here.

28. Claude Kaiser, Bertrand Meyer and Etienne Pichat, *L’Enseignement de la Programmation à l’IIE* (Teaching Programming at the IIE engineering school), in Zéro-Un Informatique, 1977. (A paper on my first teaching experience barely out of school myself.) Available here.

Some of the folk wisdom going around in software engineering, often cluessly repeated for decades, is just wrong. It can be particularly damaging when it affects key aspects of software development and is contradicted by solid scientific evidence. The present discussion covers a question that meets both of these conditions: whether it makes sense to add staff to a project to shorten its delivery time.

My aim is to popularize a result that is well known in the software engineering literature, going back to the early work of Barry Boehm [1], and explained with great clarity by Steve McConnell in his 2006 book on software cost estimation [2] under the name “Shortest Possible Schedule”. While an empirical rather than a logical result, I believe it deserves to be called a theorem (McConnell stays shy of using the term) because it is as close as we have in the area of software engineering management to a universal property, confirmed by numerous experimental studies.

This article contributes no new concept since McConnell’s chapter 20 says all there is to say about the topic; my aim is simply to make the Shortest Possible Schedule Theorem better known, in particular to practitioners.

The myth about shortening project times begins with an observation that is clearly correct, at least in an extreme form. Everyone understands that if our project has been evaluated, through accepted cost estimation techniques, to require three developers over a year we cannot magically hire 36 people to complete it in one month. Productivity does not always scale up.

But neither does common sense. Too often the conclusion from the preceding trival observation takes the form of an old saw, “Brooks’ Law”: adding people to a late project delays it further. The explanation is that the newcomers cost more through communication overhead than they bring through actual contributions. While a few other sayings of Brooks’ *Mythical Man-Month* have stood the test of time, this one has always struck me as describing, rather than any actual law, a definition of bad management. Of course if you keep haplessly throwing people at deadlines you are just going to add communication problems and make things worse. But if you are a competent manager expanding the team size is one of the tools at your disposal to improve the state of a project, and it would be foolish to deprive yourself of it. A definitive refutation of the supposed law, also by McConnell, was published 20 years ago [3].

For all the criticism it deserves, Brooks’s pronouncement was at least limited in its scope: it addressed addition of staff to a project that is already late. It is even wronger to apply it to the more general issue of cost-estimating and staffing software projects, at any stage of their progress. Forty-year-old platitudes have even less weight here. As McConnell’s book shows, cost estimation is no longer a black art. It is not an exact science either, but techniques exist for producing solid estimates.

The Shortest Possible Schedule theorem is one of the most interesting results. Much more interesting than Brooks’s purported law, because it is backed by empirical studies (rather than asking us to believe one person’s pithy pronouncement), and instead of just a general negative view it provides a positive result complemented by a limitation of that result; and both are expressed quantitatively.

Figure 1 gives the general idea of the SPS theorem. General idea only; Figure 2 will provide a more precise view.

*Figure 1: General view of the Shortest Possible Schedule theorem.*

The “nominal project” is the result of a cost and schedule estimation yielding the optimum point. The figure and the theorem provide project managers with both a reason to rejoice and a reason to despair:

- Rejoice: by putting in more money, i.e. more people (in software engineering, project costs are essentially people costs [4]), you can bring the code to fruition faster.
- Despair: whatever you do, there is a firm limit to the time you can gain: 25%. It seems to be a kind of universal constant of software engineering.

The “despair” part typically gets the most attention at first, since it sets an absolute value on how much money can buy (so to speak) in software: try as hard as you like, you will never get below 75% of the nominal (optimal) value. The “impossible zone” in Figure 1 expresses the fundamental limitation. This negative result is the reasoned and precise modern replacement for the older folk “law”.

The positive part, however, is just as important. A 75%-empty glass is also 25%-full. It may be disappointing for a project manager to realize that no amount of extra manpower will make it possible to guarantee to higher management more than a 25% reduction in time. But it is just as important to know that such a reduction, not at all insignificant, is in fact reachable given the right funding, the right people, the right tools and the right management skills. The last point is critical: money by itself does not suffice, you need management; Brooks’ law, as noted, is mostly an observation of the effects of *bad* management.

Figure 1 only carries the essential idea, and is not meant to provide precise numerical values. Figure 2, the original figure from McConnell’s book, is. It plots effort against time rather than the reverse but, more importantly, it shows several curves, each corresponding to a published empirical study or cost model surveyed by the book.

*Figure 2: Original illustration of the Shortest Possible Schedule
(figure 2-20 of [3], reproduced with the author’s permission)*

On the left of the nominal point, the curves show how, according to each study, increased cost leads to decreased time. They differ on the details: how much the project needs to spend, and which maximal reduction it can achieve. But they all agree on the basic Shortest Possible Schedule result: spending *can *decrease time, and the maximal reduction will not exceed 25%.

The figure also provides an answer, although a disappointing one, to another question that arises naturally. So far this discussion has assumed that time was the critical resource and that we were prepared to spend more to get a product out sooner. But sometimes it is the other way around: the critical resource is cost, or, concretely, the number of developers. Assume that nominal analysis tells us that the project will take four developers for a year and, correspondingly, cost 600K (choose your currency). We only have a budget of 400K. Can we spend less by hiring fewer developers, accepting that it will take longer?

On that side, right of the nominal point in Figure 2, McConnell’s survey of surveys shows no consensus. Some studies and models do lead to decreased costs, others suggest that with the increase in time the cost will actually increase too. (Here is my interpretation, based on my experience rather than on any systematic study: you can indeed achieve the original goal with a somewhat smaller team over a longer period; but the effect on the final cost can vary. If the new time is t’= t + T and the new team size s’= s – S, t and s being the nominal values, the cost difference is proportional to Ts – t’S. It can be positive as well as negative depending on the values of the original t and s and the precise effect of reduced team size on project duration.)

The firm result, however, is the left part of the figure. The Shortest Possible Schedule theorem confirms what good project managers know: you can, within limits, shorten delivery times by bringing all hands on deck. The precise version deserves to be widely known.

[1] Barry W. Boehm: *Software Engineering Economics*, Prentice Hall, 1981.

[2] Steve McConnell: *Software Estimation ― Demystifying the Black Art*, Microsoft Press, 2006.

[3] Steve McConnell: *Brooks’ Law Repealed*, in *IEEE Software*, vol. 16, no. 6, pp. 6–8, November-December 1999, available here.

[4] This is the accepted view, even though one might wish that the industry paid more attention to investment in tools in addition to people.

A version of this article was first published on the Comm. ACM blog under the title *The Shortest Possible Schedule Theorem: Yes, You Can Throw Money at Software Deadlines*

Over the past few days I have come across several people who told me they want to attend the Frontiers In Software Engineering Education (FISEE) workshop in Villebrumier, 11-13 November, but have not registered yet. If that’s your case please register right now because:

- The number of spots is limited (it’s a residential event, everyone is hosted onsite, and there is a set number of rooms).
- We need a preliminary program. The format of the event is flexible, Springer LNCS proceedings come after the meeting, we make room for impromptu presentations and discussions, but still we need a basic framework and we need to finalize it now.

So please go ahead and fill in the registration form.

From the previous posting about FISEE:

The next event at the LASER center in Villebrumier (Toulouse area, Southwest France) is FISEE, Frontiers in Software Engineering Education, see the web site. This small-scale workshop, 11 to 13 November is devoted to what Software Engineering needs, what should be changed, and how new and traditional institutions can adapt to the fast pace of technology.

Workshops at the Villebrumier center favor a friendly, informal and productive interaction between participants, who are all hosted on site. There are no formal submissions, but post-event proceedings will be published as part of the LASER sub-series of Springer Lecture Notes in Computer Science.

Like other events there, FISEE is by invitation; if you are active in the field of software engineering education as an educator, as a potential employer of software engineering graduates, or as a researcher, you can request an invitation by writing to me or one of the other organizers. Attendance is limited to 15-20 participants.

Among already scheduled talks: a keynote by Alexander Tormasov, rector of Innopolis University, and a talk by me on “the 15 concepts of software engineering”.

The next event at the LASER center in Villebrumier (Toulouse area, Southwest France) is FISEE, Frontiers in Software Engineering Education, see the web site. This small-scale workshop, 11 to 13 November is devoted to what Software Engineering needs, what should be changed, and how new and traditional institutions can adapt to the fast pace of technology.

Workshops at the Villebrumier center favor a friendly, informal and productive interaction between participants, who are all hosted on site. There are no formal submissions, but post-event proceedings will be published as part of the LASER sub-series of Springer Lecture Notes in Computer Science.

Like other events there, FISEE is by invitation; if you are active in the field of software engineering education as an educator, as a potential employer of software engineering graduates, or as a researcher, you can request an invitation by writing to me or one of the other organizers. Attendance is limited to 15-20 participants.

Once you have learned the benefits of formally expressing requirements, you keep noticing potential ambiguities and other deficiencies [1] in everyday language. Most such cases are only worth a passing smile, but here’s one that perhaps can serve to illustrate a point with business analysts in your next requirements engineering workshop or with students in your next software engineering lecture.

As a customer of the Swiss telecommunications company Sunrise I receive an occasional “news” email. (As a customer of the Swiss telecommunications company Sunrise I would actually prefer that they spend my money improving bandwidth, but let us not digress.) Rather than raw marketing messages these are tips for everyday life, with the presumed intent of ingratiating the populace. For example, today’s message helpfully advises me on how to move house. The admirable advice starts (my translation):

10.7% of all Swiss people relocate every year. Is that your case too for next Autumn?

Actually no, it’s not my case (neither a case of being one of the “*Swiss people*” nor a case of intending to relocate this Fall). And, ah, the beauty of ridiculously precise statistics! Not 10.8% or 10.6%, mind you, no, 10.7% exactly! But consider the first sentence and think of something similar appearing in a requirements document or user story. Something similar does appear in such documents, all the time, leading to confusions for the programmers interpreting them and to bugs in the resulting systems. Those restless Swiss! Did you know that they include an itchy group, exactly 922,046 people (I will not be out-significant-digited!), who relocate every year?

Do not be silly, I hear you saying. What Sunrise is sharing of its wisdom is that every year a tenth of the Swiss population moves, but not the same tenth every year. Well, OK, maybe I am being silly. But if you think of a programmer reading such a statement about some unfamiliar domain (not one about which we can rely on common sense), the risk of confusion and consequent bugs is serious.

As [1] illustrated in detail, staying within the boundaries of natural language to resolve such possible ambiguities only results in convoluted requirements that make matters worse. The only practical way out is, for delicate system properties, to use precise language, also known technically as “mathematics”.

Here for example a precise formulation of the two possible interpretations removes any doubt. Let *Swiss* denote the set of Swiss people and *E* the number of elements (cardinal) of a finite set *E*, which we can apply to the example because the set of Swiss people is indeed finite. Let us define *slice* as the Sunrise-official number of Swiss people relocating yearly, i.e. *slice* = *Swiss* ∗ 0.107 (the actual value appeared above). Then one interpretation of the fascinating Sunrise-official fact is:

{s:

Swiss| (∀y:Year| s.is_moving (y))} =slice

In words: the cardinal of the set of Swiss people who move every year (i.e., such that for every year y they move during y) is equal to the size of the asserted population subset.

The other possible interpretation, the one we suspect would be officially preferred by the Sunrise powers (any formal-methods fan from Sunrise marketing reading this, please confirm or deny!), is:

∀y: Year | {s:

Swiss| s.is_moving (y)} =slice

In words: for any year y, the cardinal of the set of Swiss people who move during y is equal to the size of the asserted subset.

This example is typical of where and why we need mathematics in software requirements. No absolutist stance here, no decree that *everything* become formal (mathematical). Natural language is not going into retirement any time soon. But whenever one spots a possible ambiguity or imprecision, the immediate reaction should always be to express the concepts mathematically.

To anyone who has had a successful exposure to formal methods this reaction is automatic. But I keep getting astounded not only by the total lack of awareness of these simple ideas among the overwhelming majority of software professionals, but also by their absence from the standard curriculum of even top universities. Most students graduate in computer science without ever having heard such a discussion. Where a formal methods course does exist, it is generally as a specialized topic reserved for a small minority, disconnected (as Leslie Lamport has observed [2]) from the standard teaching of programming and software engineering.

In fact *all* software engineers should possess the ability to go formal when and where needed. That skill is not hard to learn and should be practiced as part of the standard curriculum. Otherwise we keep training the equivalent of electricians rather than electrical engineers, programmers keep making damaging mistakes from misunderstanding ambiguous or inconsistent requirements, and we all keep suffering from buggy programs.

[1] Self-citation appropriate here: Bertrand Meyer: *On Formalism in Specifications*, IEEE Software, vol. 3, no. 1, January 1985, pages 6-25, available here.

[2] Leslie Lamport: *The Future of Computing: Logic or Biology*, text of a talk given at Christian Albrechts University, Kiel on 11 July 2003, available here.

The lecture schedule has now been posted for the 2019 LASER summer school on artificial intelligence, machine learning and software engineering. The speakers are Shai Ben-David (Waterloo), Lionel Briand (Luxembourg), Pascal Fua (EPFL), Erik Meijer (Facebook), Tim Menzies (NC State) and I.

The last deadline for registration is May 20.

The school takes place June 1-9 in the magnificent Hotel del Golfo in Elba Island, Italy.

All details at www.laser-foundation.org/school/2019.

Over breakfast at your hotel you read an article berating banks about the fraudulent credit card transactions they let through. You proceed to check out and bang! Your credit card is rejected because (as you find out later) the bank thought [1] it couldn’t possibly be you in that exotic place. Ah, those banks! They accept too much. Ah, those banks! They reject too much. Finding the right balance is a case of *soundness *versus *precision*.

Similar notions are essential to the design of tools for program analysis, looking for such suspicious cases as dead code (program parts that will never be executed). An analysis can be sound, or not; it can be complete, or not.

These widely used concepts are sometimes misunderstood. The first answer I get when innocently asking people whether the concepts are clear is yes, of course, everyone knows! Then, as I bring up such examples as credit card rejection or dead code detection, assurance quickly yields to confusion. One sign that things are not going well is when people start throwing in terms like “true positive” and “false negative”. By then any prospect of reaching a clear conclusion has vanished. I hope that after reading this article you will never again (in a program analysis context) be tempted to use them.

Now the basic idea is simple. An analysis is *sound *if it reports all errors, and *complete *if it only reports errors. If not complete, it is all the more *precise *that it reports fewer non-errors.

You can stop here and not be too far off [2]. But a more nuanced and precise discussion helps.

As an example of common confusion, one often encounters attempts to help through something like Figure 1, which cannot be right since it implies that all sound methods are complete. (We’ll have better pictures below.)

Figure 1: Naïve (and wrong) illustration

Perhaps this example can be dismissed as just a bad use of illustrations [3] but consider the example of looking for dead code. If the analysis wrongly determines that some reachable code is unreachable, is it unsound or incomplete?

With this statement of the question, the only answer is: *it depends!*

It depends on the analyzer’s mandate:

- If it is a code checker that alerts programmers to cases of bad programming style, it is incomplete: it reports as an error a case that is not. (Reporting that unreachable code is reachable would cause unsoundness, by missing a case that it should have reported.)
- If it is the dead-code-removal algorithm of an optimizing compiler, which will remove unreachable code, it is unsound: the compiler will remove code that it should not. (Reporting that unreachable code is reachable would cause incompleteness, by depriving the compiler of an optimization.)

As another example, consider an analyzer that finds out whether a program will terminate. (If you are thinking “*but that can’t be done!*“, see the section “Appendix: about termination” at the very end of this article.) If it says a program does not terminates when in fact it does, is it unsound or incomplete?

Again, that depends on what the analyzer seeks to establish. If it is about the correctness of a plain input-to-output program (a program that produces results and then is done), we get incompleteness: the analyzer wrongly flags a program that is actually OK. But if it is about verifying that continuously running programs, such as the control system for a factory, will not stop (“liveness”), then the analyzer is unsound.

Examples are not limited to program analysis. A fraud-indentification process that occasionally rejects a legitimate credit card purchase is, from the viewpoint of preserving the bank from fraudulent purchases, incomplete. From the viewpoint of the customer who understands a credit card as an instrument enabling payments as long as you have sufficient credit, it is unsound.

These examples suffice to show that there cannot be absolute definitions of soundness and precision: the determination depends on which version of a boolean property we consider **desirable**. This decision is human and subjective. Dead code is desirable for the optimizing compiler and undesirable (we will say it is a **violation**) for the style checker. Termination is desirable for input-output programs and a violation for continuously running programs.

Once we have decided which cases are desirable and which are violations, we can define the concepts without any ambiguity: soundness means rejecting all violations, and completeness means accepting all desirables.

While this definition is in line with the unpretentious, informal one in the introduction, it makes two critical aspects explicit:

*Relativity*. Everything depends on an explicit decision of what is desirable and what is a violation. Do you want customers always to be able to use their credit cards for legitimate purchases, or do you want to detect all frauds attempts?*Duality*. If you reverse the definitions of desirable and violation (they are the negation of each other), you automatically reverse the concepts of soundness and completeness and the associated properties.

We will now explore the consequences of these observations.

For all sufficiently interesting problems, theoretical limits (known as Rice’s theorem) ensure that it is impossible to obtain both soundness and completeness.

But it is not good enough to say “we must be ready to renounce either soundness or completeness”. After all, it is very easy to obtain soundness if we forsake completeness: *reject every case*. A termination-enforcement analyzer can reject every program as potentially non-terminating. A bank that is concerned with fraud can reject every transaction (this seems to be my bank’s approach when I am traveling) as potentially fraudulent. Dually, it is easy to ensure completeness if we just sacrifice soundness: *accept every case*.

These extreme theoretical solutions are useless in practice; here we need to temper the theory with considerations of an engineering nature.

The practical situation is not as symmetric as the concept of duality theoretically suggests. If we have to sacrifice one of the two goals, it is generally better to accept some incompleteness: getting false alarms (spurious reports about cases that turn out to be harmless) is less damaging than missing errors. Soundness, in other words, is essential.

Even on the soundness side, though, practice tempers principle. We have to take into account the engineering reality of how tools get produced. Take a program analyzer. In principle it should cover the entire programming language. In practice, it will be built step by step: initially, it may not handle advanced features such as exceptions, or dynamic mechanisms such as reflection (a particularly hard nut to crack). So we may have to trade soundness for what has been called “*soundiness*” [4], meaning soundness outside of cases that the technology cannot handle yet.

If practical considerations lead us to more tolerance on the soundness side, on the completeness side they drag us (duality strikes again) in the opposite direction. Authors of analysis tools have much less flexibility than the theory would suggest. Actually, close to none. In principle, as noted, false alarms do not cause catastrophes, as missed violations do; but in practice they can be almost as bad. Anyone who has ever worked on or with a static analyzer, going back to the venerable Lint analyzer for C, knows the golden rule: *false alarms kill an analyzer*. When people discover the tool and run it for the first time, they are thrilled to discover how it spots some harmful pattern in their program. What counts is what happens in subsequent runs. If the useful gems among the analyzer’s diagnostics are lost in a flood of irrelevant warnings, forget about the tool. People just do not have the patience to sift through the results. In practice any analysis tool has to be darn close to completeness if it has to stand any chance of adoption.

Completeness, the absence of false alarms, is an all-or-nothing property. Since in the general case we cannot achieve it if we also want soundness, the engineering approach suggests using a numerical rather than boolean criterion: *precision*. We may define the precision pr as 1 – im where im is the ** im**precision: the proportion of false alarms.

The theory of classification defines precision differently: as pr = tp / (tp + fp), where tp is the number of false positives and fp the number of true positives. (Then im would be fp / (tp + fp).) We will come back to this definition, which requires some tuning for program analyzers.

From classification theory also comes the notion of *recall*: tp / (tp + fn) where fn is the number of false negatives. In the kind of application that we are looking at, recall corresponds to soundness, taken not as a boolean property (“*is my program sound?*“) but a quantitative one (“*how sound is my program?*“). The degree of *unsoundness *un would then be fn / (tp + fn).

With the benefit of the preceding definitions, we can illustrate the concepts, correctly this time. Figure 2 shows two different divisions of the set of U of call cases (universe):

- Some cases are desirable (D) and others are violations (V).
- We would like to know which are which, but we have no way of finding out the exact answer, so instead we run an analysis which passes some cases (P) and rejects some others (R).

Figure 2: All cases, classified

The first classification, left versus right columns in Figure 2, is how things are (the reality). The second classification, top versus bottom rows, is how we try to assess them. Then we get four possible categories:

- In two categories, marked in green, assessment hits reality on the nail: accepted desirables (A), rightly passed, and caught violations (C), rightly rejected.
- In the other two, marked in red, the assessment is off the mark: missed violations (M), wrongly passed; and false alarms (F), wrongly accepted.

The following properties hold, where U (Universe) is the set of all cases and ⊕ is disjoint union [5]:

— Properties applicable to all cases:

U = D ⊕ V

U = P ⊕ R

D = A ⊕ F

V = C ⊕ M

P = A ⊕ M

R = C ⊕ F

U = A ⊕M ⊕ F ⊕ C

We also see how to define the **precision **pr: as the proportion of actual violations to reported violations, that is, the size of C relative to R. With the convention that u is the size of U and so on, then pr = c / r, that is to say:

- pr = c / (c + f) — Precision
- im = f / (c + f) — Imprecision

We can similarly define soundness in its quantitative variant (recall):

- so = a / (a + m) — Soundness (quantitative)
- un = m / (a + m) — Unsoundness

These properties reflect the full duality of soundness and completeness. If we reverse our (subjective) criterion of what makes a case desirable or a violation, everything else gets swapped too, as follows:

Figure 3: Duality

We will say that properties paired this way “dual” each other [6].

It is just as important (perhaps as a symptom that things are not as obvious as sometimes assumed) to note which properties do *not* dual. The most important examples are the concepts of “true” and “false” as used in “true positive” etc. These expressions are all the more confusing that the concepts of True and False do dual each other in the standard duality of Boolean algebra (where True duals False, Or duals And, and an expression duals its negation). In “true positive” or “false negative”, “true” and “false” do not mean True and False: they mean cases in which (see figure 2 again) the assessment respectively *matches *or does not match the reality. Under duality we reverse the criteria in both the reality and the assessment; but matching remains matching! The green areas remain green and the red areas remain red.

The dual of positive is negative, but the dual of true is true and the dual of false is false (in the sense in which those terms are used here: matching or not). So the dual of true positive is true negative, not false negative, and so on. Hereby lies the source of the endless confusions.

The terminology of this article removes these confusions. Desirable duals violation, passed duals rejected, the green areas dual each other and the red areas dual each other.

If we define an ideal world as one in which assessment matches reality [7], then figure 2 would simplify to just two possibilities, the green areas:

Figure 4: Perfect analysis (sound and complete)

This scheme has the following properties:

— Properties of a perfect (sound and complete) analysis as in Figure 4:

M = ∅ — No missed violations

F = ∅ — No false alarms

P = D — Identify desirables exactly

R = V –Identify violations exactly

As we have seen, however, the perfect analysis is usually impossible. We can choose to build a sound solution, potentially incomplete:

Figure 5: Sound desirability analysis, not complete

In this case:

— Properties of a sound analysis (not necessarily complete) as in Figure 5:

M = ∅ — No missed violations

P = A — Accept only desirables

V = C — Catch all violations

P ⊆ D — Under-approximate desirables

R ⊇ V — Over-approximate violations

Note the last two properties. In the perfect solution, the properties P = D and R = V mean that the assessment, yielding P and V, exactly matches the reality, D and V. From now on we settle for assessments that *approximate* the sets of interest: under-approximations, where the assessment is guaranteed to compute no more than the reality, and over-approximations, where it computes no less. In all cases the assessed sets are either subsets or supersets of their counterparts. (Non-strict, i.e. ⊆ and ⊇ rather than ⊂ and ⊃; “approximation” means *possible* approximation. We may on occasion be lucky and capture reality exactly.)

We can go dual and reach for completeness at the price of possible unsoundness:

Figure 6: Complete desirability analysis, not sound

The properties are dualled too:

— Properties of a complete analysis (not necessarily sound), as in Figure 6:

F = ∅ — No false alarms

R = C — Reject only violations

D = A — Accept all desirables

P ⊇ D — Over-approximate desirables

R ⊆ V — Under-approximate violations

We saw above why the terms “true positives”, “false negatives” etc., which do not cause any qualms in classification theory, are deceptive when applied to the kind of pass/fail analysis (desirables versus violations) of interest here. The definition of precision provides further evidence of the damage. Figure 7 takes us back to the general case of Figure 2 (for analysis that is guaranteed neither sound nor complete) but adds these terms to the respective categories.

Figure 7: Desirability analysis (same as fig. 2 with added labeling)

The analyzer checks for a certain desirable property, so if it wrongly reports a violation (F) that is a false negative, and if it misses a violation (M) it is a false positive. In the definition from classification theory (section 2, with abbreviations standing for True/False Positives/Negatives): TP = A, FP = M, FN = F, TN = C, and similarly for the set sizes: tp = a, fp = m, fn = f, tn = c.

The definition of precision from classification theory was pr = tp / (tp + fp), which here gives a / (a + m). This cannot be right! Precision has to do with how close the analysis is to completeness, that is to day, catching all violations.

Is classification theory wrong? Of course not. It is simply that, just as Alice stepped on the wrong side of the mirror, we stepped on the wrong side of duality. Figures 2 and 7 describe *desirability analysis*: checking that a tool does something good. We assess non-fraud from the bank’s viewpoint, not the stranded customer’s; termination of input-to-output programs, not continuously running ones; code reachability for a static checker, not an optimizing compiler. Then, as seen in section 3, a / (a + m) describes not precision but soundness (in its quantitative interpretation, the parameter called “so” above).

To restore the link with classification theory , we simply have to go dual and take the viewpoint of *violation analysis*. If we are looking for possible violations, the picture looks like this:

Figure 8: Violation analysis (same as fig. 7 with different positive/negative labeling)

Then everything falls into place: tp = c, fp = f, fn = m, tn = a, and the classical definition of precision as pr = tp / (tp + fp) yields c / (c + f) as we are entitled to expect.

In truth there should have been no confusion since we always have the same picture, going back to Figure 2, which accurately covers all cases and supports both interpretations: desirability analysis and violation analysis. The confusion, as noted, comes from using the duality-resistant “true”/”false” opposition.

To avoid such needless confusion, we should use the four categories of the present discussion: accepted desirables, false alarms, caught violations and missed violations [8]. Figure 2 and its variants clearly show the duality, given explicitly in Figure 3, and sustains interpretations both for desirability analysis and for violation analysis. Soundness and completeness are simply special cases of the general framework, obtained by ruling out one of the cases of incorrect analysis in each of Figures 4 and 5. The set-theoretical properties listed after Figure 2 express the key concepts and remain applicable in all variants. Precision c / (c + f) and quantitative soundness a / (a + m) have unambiguous definitions matching intuition.

The discussion is, I hope, sound. I have tried to make it complete. Well, at least it is precise.

[1] Actually it’s not your bank that “thinks” so but its wonderful new “Artificial Intelligence” program. **⤣**

[2] For a discussion of these concepts as used in testing see Mauro Pezzè and Michal Young, *Software Testing and Analysis: Process, Principles and Techniques*, Wiley, 2008. **⤣**

[3] Edward E. Tufte: *The Visual Display of Quantitative Information*, 2nd edition, Graphics Press, 2001.**⤣**

[4] Michael Hicks,*What is soundness (in static analysis)?*, blog article available here, October 2017. **⤣**

[5] The disjoint union property X = Y ⊕ Z means that Y ∩ Z = ∅ (Y and Z are disjoint) and X = Y ∪ Z (together, they yield X). **⤣**

[6] I thought this article would mark the introduction into the English language of “*dual*” as a verb, but no, it already exists in the sense of turning a road from one-lane to two-lane (dual). **⤣**

[7] As immortalized in a toast from the cult movie *The Prisoner of the Caucasus*: “*My great-grandfather says: I have the desire to buy a house, but I do not have the possibility. I have the possibility to buy a goat, but I do not have the desire. So let us drink to the matching of our desires with our possibilities.*” See 6:52 in the version with English subtitles. **⤣**

[8] To be fully consistent we should replace the term “false alarm” by *rejected desirable*. I is have retained it because it is so well established and, with the rest of the terminology as presented, does not cause confusion. **⤣**

[9] Byron Cook, Andreas Podelski, Andrey Rybalchenko: *Proving Program Termination*, in *Communications of the ACM*, May 2011, Vol. 54 No. 5, Pages 88-98. **⤣**

This reflection arose from ongoing work on static analysis of OO structures, when I needed to write formal proofs of soundness and completeness and found that the definitions of these concepts are more subtle than commonly assumed. I almost renounced writing the present article when I saw Michael Hicks’s contribution [4]; it is illuminating, but I felt there was still something to add. For example, Hicks’s set-based illustration is correct but still in my opinion too complex; I believe that the simple 2 x 2 pictures used above convey the ideas more clearly. On substance, his presentation and others that I have seen do not explicitly mention duality, which in my view is the key concept at work here.

I am grateful to Carlo Ghezzi for enlightening discussions, and benefited from comments by Alexandr Naumchev and others from the Software Engineering Laboratory at Innopolis University.

With apologies to readers who have known all of the following from kindergarten: a statement such as (section 1): “*consider an analyzer that finds out whether a program will terminate*” can elicit no particular reaction (the enviable bliss of ignorance) or the shocked rejoinder that such an analyzer is impossible because termination (the “halting” problem) is undecidable. This reaction is just as incorrect as the first. The undecidability result for the halting problem says that it is impossible to write a general termination analyzer that will always provide the right answer, in the sense of both soundness and completeness, for any program in a realistic programming language. But that does not preclude writing termination analyzers that answer the question correctly, in finite time, for given programs. After all it is not hard to write an analyzer that will tell us that the program **from **do_nothing **until True** **loop** do_nothing **end **will terminate and that the program **from **do_nothing **until False loop** do_nothing **end **will not terminate. In the practice of software verification today, analyzers can give such sound answers for very large classes of programs, particularly with some help from programmers who can obligingly provide *variants* (loop variants, recursion variants). For a look into the state of the art on termination, see the beautiful survey by Cook, Podelski and Rybalchenko [9].

* Communications of the ACM* blog

The DEVOPS 2019 workshop (6-8 May 2019) follows a first 2018 workshop whose proceedings [1] have just been published in the special LASER-Villebrumier subseries of Springer Lecture notes in Computer Science. It is devoted to *software engineering aspects of continuous development and new paradigms of software production and deployment*, including but not limited to DevOps.

The keynote will be delivered by Gail Murphy, vice-president Research & Innovation at University of British Columbia and one of leaders in the field of empirical software engineering.

The workshop is held at the LASER conference center in Villebrumier near Toulouse. It is by invitation; if you would like to receive an invitation please contact one of the organizers (Jean-Michel Bruel, Manuel Mazzara and me) with a short description of your interest in the field.

Jean-Michel Bruel, Manuel Mazzara and Bertrand Meyer (eds.), *Software Engineering Aspects of Continuous Development and New Paradigms of Software Production and Deployment*, First International Workshop, DEVOPS 2018, Chateau de Villebrumier, France, March 5-6, 2018, Revised Selected Papers, see here..