Archive for the ‘Design by Contract’ Category.

Programming on the cloud?

I am blogging live from the “Cloud Futures” conference organized by Microsoft in Redmond [1]. We had two excellent keynotes today, by Ed Lazowska [1] and David Patterson.

Lazowska emphasized the emergence of a new kind of science — eScience — based on analysis of enormous amounts of data. His key point was that this approach is a radical departure from “computational science” as we know it, based mostly on large simulations. With the eScience paradigm, the challenge is to handle the zillions of bytes of data that are available, often through continuous streams, in such fields as astronomy, oceanography or biology. It is unthinkable in his view to process such data through super-computing architectures specific to an institution; the Cloud is the only solution. One of the reasons (developed more explicitly in Patterson’s talk) is that cloud computing supports scaling down as well as scaling up. If your site experiences sudden bursts of popularity — say you get slashdotted — followed by downturns, you just cannot size the hardware right.

Lazowska also noted that it is impossible to convince your average  university president that Cloud is the way to go, as he will get his advice from the science-by-simulation  types. I don’t know who the president is at U. of Washington, but I wonder if the comment would apply to Stanford?

The overall argument for cloud computing is compelling. Of course the history of IT is a succession of swings of the pendulum between centralization and delocalization: mainframes, minis, PCs, client-server, “thin clients”, “The Network Is The Computer” (Sun’s slogan in the late eighties), smart clients, Web services and so on. But this latest swing seems destined to define much of the direction of computing for a while.

Interestingly, no speaker so far has addressed issues of how to program reliably for the cloud, even though cloud computing seems only to add orders of magnitude to the classical opportunities for messing up. Eiffel and contracts have a major role to play here.

More generally the opportunity to improve quality should not be lost. There is a widespread feeling (I don’t know of any systematic studies) that a non-negligible share of results generated by computational science are just bogus, the product of old Fortran programs built by generations of graduate students with little understanding of software principles. At the very least, moving to cloud computing should encourage the use of 21-th century tools, languages and methods. Availability on the cloud should also enhance a critical property of good scientific research: reproducibility.

Software engineering is remarkably absent from the list of scientific application areas that speaker after speaker listed for cloud computing. Maybe software engineering researchers are timid, and do not think of themselves as deserving large computing resources; consider, however, all the potential applications, for example in program verification and empirical software engineering. The cloud is a big part of our own research in verification; in particular the automated testing paradigm pioneered by AutoTest [3] fits ideally with the cloud and we are actively working in this direction.

Lazowska mentioned that development environments are the ultimate application of cloud computing. Martin Nordio at ETH has developed, with the help of Le Minh Duc, a Master’s student at Hanoi University of Technology, a cloud-based version of EiffelStudio: CloudStudio, which I will present in my talk at the conference tomorrow. I’ll write more about it in later posts; just one note for the moment: no one should ever be forced again to update or commit.

References

[1] Program of the Cloud Futures conference.

[2] Keynote by Ed Lazowska. You can see his slides here.

[3] Bertrand Meyer, Arno Fiva, Ilinca Ciupa, Andreas Leitner, Yi Wei, Emmanuel Stapf: Programs That Test Themselves. IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009; online version here.

VN:F [1.9.10_1130]
Rating: 7.7/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

Reflexivity, and other pillars of civilization

Let me start, dear reader of this blog, by probing your view of equality, and also of assignment. Two questions:  

  • Is a value x always equal to itself? (For the seasoned programmers in the audience: I am talking about a value, as in mathematics, not an expression, as in programming, which could have a side effect.)
  • In programming, if we consider an assignment

       x := y

and v is the value of y before that assignment (again, this little detour is to avoid bothering with side effects), is the value of x always equal to v after the assignment?  

Maybe I should include here one of these Web polls that one finds on newspaper sites, so that you can vote and compare your answer to the Wisdom of Crowds. My own vote is clear: yes to both. Equality is reflexive (every value is equal to itself, at any longitude and temperature, no excuses and no exceptions); and the purpose of assignment is to make the value of the target equal to the value of the source. Such properties are some of the last ramparts of civilization. If they go away, what else is left?  

754 enters the picture

Now come floating-point numbers and the famous IEEE “754” floating-point standard [1]. Because not all floating point operations yield a result usable as a floating-point number, the standard introduces a notion of “NaN”, Not a Number; certain operations involving floating-point numbers may yield a NaN. The term NaN does not denote a single value but a set of values, distinguished by their “payloads”.  

Now assume that the value of x is a NaN. If you use a programming language that supports IEEE 754 (as they all do, I think, today) the test in  

        if x = x then …  

is supposed to yield False. Yes, that is specified in the standard: NaN is never equal to NaN (even with the same payload); nor is it equal to anything else; the result of an equality comparison involving NaN will always be False.  

Assignment behavior is consistent with this: if y (a variable, or an expression with no side effect) has a NaN value, then after  

        x := y  

the test xy will yield False. 

Before commenting further let me note the obvious: I am by no means a numerics expert; I know that IEEE 754 was a tremendous advance, and that it was designed by some of the best minds in the field, headed by Velvel Kahan who received a Turing Award in part for that success. So it is quite possible that I am about to bury myself in piles of ridicule. On the other hand I have also learned that (1) ridicule does not kill, so I am game; and more importantly (2) experts are often right but not always, and it is always proper to question their reasoning. So without fear let me not stop at the arguments that “the committee must have voted on this point and they obviously knew what they were doing” and “it is the standard and implemented on zillions of machines, you cannot change it now”. Both are true enough, but not an excuse to censor questions.  

What are the criteria?

The question is: compatibility with an existing computer standard is great, but what about compatibility with a few hundred years of mathematics? Reflexivity of equality  is something that we expect for any data type, and it seems hard to justify that a value is not equal to itself. As to assignment, what good can it be if it does not make the target equal to the source value?  

The question of assignment is particularly vivid in Eiffel because we express the expected abstract properties of programs in the form of contracts. For example, the following “setter” procedure may have a postcondition (expressed by the ensure clause):  

        set_x (v: REAL)
                        — Set the value of x (an attribute, also of type REAL) the value of v.
                do
                        …
                        x := v  
                ensure
                        x = v
                end  

   
If you call this procedure with a NaN argument for a compiler that applies IEEE 754 semantics, and monitor contracts at run time for testing and debugging, the execution will report a contract violation. This is very difficult for a programmer to accept.  

A typical example arises when you have an assignment to an item of an array of REAL values. Assume you are executing a [i] := x. In an object-oriented view of the world (as in Eiffel), this is considered simplified syntax  for the routine call a.put (x, i). The postcondition is that a [i] = x. It will be violated!  

The experts’ view

I queried a number of experts on the topic. (This is the opportunity to express my gratitude to members of the IFIP working group 2.5 on numerical software [2], some of the world’s top experts in the field, for their willingness to respond quickly and with many insights.) A representative answer, from Stuart Feldman, was:  

If I remember the debate correctly (many moons ago), NaN represents an indefinite value, so there is no reason to believe that the result of one calculation with unclear value should match that of another calculation with unclear value. (Different orders of infinity, different asymptotic approaches toward 0/0, etc.)  

Absolutely correct! Only one little detail, though: this is an argument against using the value True as a result of the test; but it is not an argument for using the value False! The exact same argument can be used to assert that the result should not be False:  

… there is no reason to believe that the result of one calculation with unclear value should not match that of another calculation with unclear value.  

Just as convincing! Both arguments complement each other: there is no compelling reason for demanding that the values be equal; and there is no compelling argument either to demand that they be different. If you ignore one of the two sides, you are biased.  

What do we do then?

The conclusion is not that the result should be False. The rational conclusion is that True and False are both unsatisfactory solutions. The reason is very simple: in a proper theory (I will sketch it below) the result of such a comparison should be some special undefined below; the same way that IEEE 754 extends the set of floating-point numbers with NaN, a satisfactory solution would extend the set of booleans with some NaB (Not a Boolean). But there is no NaB, probably because no one (understandably) wanted to bother, and also because being able to represent a value of type BOOLEAN on a single bit is, if not itself a pillar of civilization, one of the secrets of a happy life.  

If both True and False are unsatisfactory solutions, we should use the one that is the “least” bad, according to some convincing criterion . That is not the attitude that 754 takes; it seems to consider (as illustrated by the justification cited above) that False is somehow less committing than True. But it is not! Stating that something is false is just as much of a commitment as stating that it is true. False is no closer to NaB than True is. A better criterion is: which of the two possibilities is going to be least damaging to time-honored assumptions embedded in mathematics? One of these assumptions is the reflexivity of equality:  come rain or shine, x is equal to itself. Its counterpart for programming is that after an assignment the target will be equal to the original value of the source. This applies to numbers, and it applies to a NaN as well. 

Note that this argument does not address equality between different NaNs. The standard as it is states that a specific NaN, with a specific payload, is not equal to itself.  

What do you think?

A few of us who had to examine the issue recently think that — whatever the standard says at the machine level — a programming language should support the venerable properties that equality is reflexive and that assignment yields equality.

Every programming language should decide this on its own; for Eiffel we think this should be the specification. Do you agree?  

Some theory

For readers who like theory, here is a mathematical restatement of the observations above. There is nothing fundamentally new in this section, so if you do not like strange symbols you can stop here.  

The math helps explain the earlier observation that neither True nor False is more“committing” than the other. A standard technique (coming from denotational semantics) for dealing with undefinedness in basic data types, is to extend every data type into a lattice, with a partial order relation meaning “less defined than”. The lattice includes a bottom element, traditionally written “” (pronounced “Bottom”) and a top element written (“Top”). represents an unknown value (not enough information) and an error value (too much information). Pictorially, the lattice for natural numbers would look like this:  

Integer lattice

The lattice of integers

On basic types, we always use very simple lattices of this form, with three kinds of element: , less than every other element; , larger than all other elements; and in-between all the normal values of the type, which for the partial order of interest are all equal. (No, this is not a new math in which all integers are equal. The order in question simply means “is less defined than”. Every integer is as defined as all other integers, more defined than , and less defined than .) Such lattices are not very exciting, but they serve as a starting point; lattices with more interesting structures are those applying to functions on such spaces — including functions of functions —, which represent programs.  

The modeling of floating-point numbers with NaN involves such a lattice; introducing NaN means introducing a value. (Actually, one might prefer to interpret NaN as , but the reasoning transposes immediately.)  More accurately, since there are many NaN values, the lattice will look more like this:

Float lattice

The lattice of floats in IEEE 754

For simplicity we can ignore the variety of NaNs and consider a single .

Functions on lattices — as implemented by programs — should satisfy a fundamental property: monotonicity. A function f  is monotone (as in ordinary analysis) if, whenever xy, then f (x) ≤ f (y). Monotonicity is a common-sense requirement: we cannot get more information from less information. If we know less about x than about y, we cannot expect that any function (with a computer, any program) f will, out of nowhere, restore the missing information.  

Demanding monotonicity of all floating-point operations reflects this exigency of monotonicity: indeed, in IEEE 754, any arithmetic operation — addition, multiplication … — that has a NaN as one of its arguments must yield a Nan as its result. Great. Now for soundness we should also have such a property for boolean-valued operations, such as equality. If we had a NaB as the  of booleans, just like NaN is the  of floating-point numbers,  then the result of NaN = NaN would be NaB. But the world is not perfect and the IEEE 754 standard does not govern booleans. Somehow (I think) the designers thought of False as somehow less defined than True. But it is not! False is just as defined as True in the very simple lattice of booleans; according to the partial order, True and False are equal for the relevant partial order:

Boolean lattice

The lattice of booleans

Because any solution that cannot use a NaB will violate monotonicity and hence will be imperfect, we must resort to heuristic criteria. A very strong criterion in favor of choosing True is reflexivity — remaining compatible with a fundamental property of equality. I do not know of any argument for choosing False. 

The Spartan approach

There is, by the way, a technique that accepts booleans as we love them (with two values and no NaB) and achieves mathematical rigor. If operations involving NaNs  truly give us pimples, we can make any such operation trigger an exception. In the absence of values,  this is an acceptable programming technique for representing undefinedness. The downside, of course, is that just about everywhere the program must be ready to handle the exception in some way. 

It is unlikely that in practice many people would be comfortable with such a solution. 

Final observations

Let me point out two objections that I have seen. Van Snyder wrote: 

NaN is not part of the set of floating-point numbers, so it doesn’t behave as if “bottom” were added to the set. 

Interesting point, but, in my opinion not applicable: is indeed not part of the mathematical set of floating point numbers, but in the form of NaN it is part of the corresponding type (float in C, REAL in Eiffel); and the operations of the type are applicable to all values, including NaN if, as noted, we have not taken the extreme step of triggering an exception every time an operation uses a NaN as one of its operands. So we cannot free ourselves from the monotonicity concern by just a sleight of hands. 

Another comment, also by Van Snyder (slightly abridged): 

Think of [the status of NaN] as a variety of dynamic run-time typing. With static typing, if  x is an integer variable and y

        x := y 

does not inevitably lead to 

        x = y

 True; and a compelling argument to require that conversions satisfy equality as a postcondition! Such  reasoning — reflexivity again — was essential in the design of the Eiffel conversion mechanism [3], which makes it possible to define conversions between various data types (not just integers and reals and the other classical examples, but also any other user types as long as the conversion does not conflict with inheritance). The same conversion rules apply to assignment and equality, so that yes, whenever the assignment x := y is permitted, including when it involves a conversion, the property x = y  is afterwards always guaranteed to hold.

It is rather dangerous indeed to depart from the fundamental laws of mathematics. 

References

[1] IEEE floating-point standard, 754-2008; see summary and references in the Wikipedia entry.  

[2] IFIP Working Group 2.5 on numerical software: home page

[3] Eiffel standard (ECMA and ISO), available on the ECMA site.

VN:F [1.9.10_1130]
Rating: 9.6/10 (21 votes cast)
VN:F [1.9.10_1130]
Rating: +17 (from 19 votes)

More expressive loops for Eiffel

New variants of the loop construct have been introduced into Eiffel, allowing a safer, more concise and more abstract style of programming. The challenge was to remain compatible with the basic loop concept, in particular correctness concerns (loop invariant and loop variant), to provide a flexible mechanism that would cover many different cases of iteration, and to keep things simple.

Here are some examples of the new forms. Consider a structure s, such as a list, which can be traversed in sequence. The following loop prints all elements of the list:

      across s as c loop print (c.item) end

(The procedure print is available on all types, and can be adapted to any of them by redefining the out feature; item gives access to individual values of the list.) More about c in just a moment; in the meantime you can just consider consider that using “as c” and manipulating the structure through c rather than directly through s  is a simple idiom to be learned and applied systematically for such across loops.

The above example is an instruction. The all and some variants yield boolean expressions, as in

across s as c all c.item > 0 end

which denotes a boolean value, true if and only if all elements of the list are positive. To find out if at least one is positive, use

across s as c some c.item > 0 end

Such expressions could appear, for example, in a class invariant, but they may be useful in many different contexts.

In some cases, a from clause is useful, as in

        across s as c from sum := 0  loop sum := sum + c.index c.item end
— Computes Σ i * s [i]

The original form of loop in Eiffel is more explicit, and remains available; you can achieve the equivalent of the last example, on a suitable structure, as

A list and a cursor

A list and a cursor

      from
sum := 0 ; s.start
until
s.after
loop
sum := sum + s.index s.item
s.forth

        end

which directly manipulates a cursor through s, using start to move it to the beginning, forth to advance it, and after to test if it is past the last element. The forms with across achieve the same purpose in a more concise manner. More important than concision is abstraction: you do not need to worry about manipulating the cursor through start, forth and after. Under the hood, however, the effect is the same. More precisely, it is the same as in a loop of the form

from
sum := 0 ; c.start
until
c.after
loop
sum := sum + c.index c.item
c.forth

        end

where c is a cursor object associated with the loop. The advantage of using a cursor is clear: rather than keeping the state of the iteration in the object itself, you make it external, part of a cursor object that, so to speak, looks at the list. This means in particular that many traversals can be active on the same structure at the same time; with an internal cursor, they would mess up with each other, unless you manually took the trouble to save and restore cursor positions. With an external cursor, each traversal has its own cursor object, and so does not interfere with other traversals — at least as long as they don’t change the structure (I’ll come back to that point).

With the across variant, you automatically use a cursor; you do not have to declare or create it: it simply comes as a result of the “as c” part of the construct, which introduces c as the cursor.

On what structures can you perform such iterations? There is no limitation; you simply need a type based on a class that inherits, directly or indirectly, from the library class ITERABLE. All relevant classes from the EiffelBase library have been updated to provide this inheritance, so that you can apply the across scheme to lists of all kinds, hash tables, arrays, intervals etc.

One of these structures is the integer interval. The notation  m |..| n, for integers m and n, denotes the corresponding integer interval. (This is not a special language notation, simply an operator, |..|, defined with the general operator mechanism as an alias for the feature interval of INTEGER classes.) To iterate on such an interval, use the same form as in the examples above:

        across m |..| n  as c from sum := 0  loop sum := sum + a [c.item] end
— Computes Σ a [i], for i ranging from m to n, for an array (or other structure) a

The key feature in ITERABLE is new_cursor, which returns a freshly created cursor object associated with the current structure. By default it is an ITERATION_CURSOR, the most general cursor type, but every descendant of ITERABLE can redefine the result type to something more specific to the current structure. Using a cursor — c in the above examples —, rather than manipulating the structure s directly, provides considerable flexibility thanks to the property that ITERATION_CURSOR itself inherits from ITERABLE   and hence has all the iteration mechanisms. For example you may write

across s.new_cursor.reversed as c loop print (c.item) end

to print elements in reverse order. (Using Eiffel’s operator  mechanism, you may write s.new_cursor, with a minus operator, as a synonym for new_cursor.reversed.) The function reversed gives you a new cursor on the same target structure, enabling you to iterate backwards. Or you can use

        across s.new_cursor + 3 as c loop print (c.item) end

(using s.new_cursor.incremented (3) rather than s.new_cursor + 3 if you are not too keen on operator syntax) to iterate over every third item. A number of other possibilities are available in the cursor classes.

A high-level iteration mechanism is only safe if you have the guarantee that the structure remains intact throughout the iteration. Assume you are iterating through a structure

across  as c loop some_routine end

and some_routine changes the structure s: the whole process could be messed up. Thanks to Design by Contract mechanisms, the library protects you against such mistakes. Features such as item and index, accessing properties of the structure during the iteration, are equipped with a precondition clause

require
is_valid

and every operation that changes the structure sets is_valid to false. So as soon as any change happens, you cannot continue the iteration; all you can do is restart a new one; the command start, used internally to start the operation, does not have the above precondition.

Sometimes, of course, you do want to change a structure while traversing it; for example you may want to add an element just to the right of the iteration position. If you know what you are doing that’s fine. (Let me correct this: if you know what you are doing, express it through precise contracts, and you’ll be fine.) But then you should not use the abstract forms of the loop, across; you should control the iteration itself through the basic form from … until with explicit cursor manipulation, protected by appropriate contracts.

The two styles, by the way, are not distinct constructs. Eiffel has always had only one form of loop and this continues the case. The across forms are simply new possibilities added to the classical loop construct, with obvious constraints stating for example that you may not have both a some or all form and an explicit  loop body.  In particular, an across loop can still have an invariant clause , specifying the correctness properties of the loop, as in

        across s as c from sum := 0  invariant sum = sigma (s, index)  loop sum := sum + c.index c.item end

EiffelStudio 6.5 already included the language update; the library update (not yet including the is_valid preconditions for data structure classes) will be made available this week.

These new mechanisms should increase the level of abstraction and the reliability of loops, a fundamental element of  almost all programs.

VN:F [1.9.10_1130]
Rating: 7.3/10 (21 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 13 votes)

Just another day at the office

In the past few weeks I wrote a program to compute the aliases of variables and expressions in an object-oriented program (based on a new theory [1]).

For one of the data structures, I needed a specific notion of equality, so I did the standard thing in Eiffel: redefine the is_equal function inherited from the top class ANY, to implement the desired variant.

The first time I ran the result, I got a postcondition violation. The violated postcondition clauses was not even any that I wrote: it was an original postcondition of is_equal (other: like Current)  in ANY, which my redefinition inherited as per the rules of Design by Contract; it reads

symmetric: Result implies other ~ Current

meaning: equality is symmetric, so if Result is true, i.e. the Current object is equal to other, then other must also be equal to Current. (~ is object equality, which applies the local version is is_equal).  What was I doing wrong? The structure is a list, so the code iterates on both the current list and the other list:

from
    start ; other.start ; Result := True
until (not Result) or after loop
        if other.after then Result := False else
              Result := (item ~ other.item)
              forth ; other.forth
        end
end

Simple enough: at each position check whether the item in the current list is equal to the item in the other list, and if so move forth in both the current list and the other one; stop whenever we find two unequal elements, or we exhaust either list as told by after list. (Since is_equal is a function and not produce any side effect, the actual code saves the cursors before the iteration and restores them afterwards. Thanks to Ian Warrington for asking about this point in a comment to this post. The new across loop variant described in  two later postings uses external cursors and manages them automatically, so this business of maintaining the cursor manually goes away.)

The problem is that with this algorithm it is possible to return True if the first list was exhausted but not the second, so that the first list is a subset of the other rather than identical. The correction is immediate: add

Result and other_list.after

after the loop; alternatively, enclose the loop in a conditional so that it is only executed if count = other.count (this solution is  better since it saves much computation in cases of lists of different sizes, which cannot be equal).

The lesson (other than that I need to be more careful) is that the error was caught immediately, thanks to a postcondition violation — and one that I did not even have to write. Just another day at the office; and let us shed a tear for the poor folks who still program without this kind of capability in their language and development environment.

Reference

[1] Bertrand Meyer: The Theory and Calculus of Aliasing, draft paper, available here.

VN:F [1.9.10_1130]
Rating: 9.6/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

“Touch of Class” published

My textbook Touch of Class: An Introduction to Programming Well Using Objects and Contracts [1] is now available from Springer Verlag [2]. I have been told of many bookstores in Europe that have it by now; for example Amazon Germany [3] offers immediate delivery. Amazon US still lists the book as not yet published [4], but I think this will be corrected very soon.

touch_of_class

The book results from six years of teaching introductory programming at ETH Zurich. It is richly illustrated in full color (not only with technical illustrations but with numerous photographs of people and artefacts). It is pretty big, but designed so that a typical one-semester introductory course can cover most of the material.

Many topics are addressed (see table of contents below), including quite a few that are seldom seen at the introductory level. Some examples, listed here in random order: a fairly extensive introduction to software engineering including things like requirements engineering (not usually mentioned in programming courses, with results for everyone to see!) and CMMI, a detailed discussion of how to implement recursion, polymorphism and dynamic binding and their role for software architecture, multiple inheritance, lambda calculus (at an introductory level of course), a detailed analysis of the Observer and Visitor patterns, event-driven programming, the lure and dangers of references and aliasing, topological sort as an example of both algorithm and API design, high-level function closures, software tools, properties of computer hardware relevant for programmers, undecidability etc.

The progression uses an object-oriented approach throughout; the examples are in Eiffel, and four appendices present the details of Java, C#, C++ and C. Concepts of Design by Contract and rigorous development are central to the approach; for example, loops are presented as a technique for computing a result by successive approximation, with a central role for the concept of loop invariant. This is not a “formal methods” book in the sense of inflicting on the students a heavy mathematical apparatus, but it uses preconditions, postconditions and invariants throughout to alert them to the importance of reasoning rigorously about programs. The discussion introduces many principles of sound design, in line with the book’s subtitle, “Learning to Program Well”.

The general approach is “Outside-In” (also known as “Inverted Curriculum” and described at some length in some of my articles, see e.g. [5]): students have, right from the start, the possibility of working with real software, a large (150,000-line) library that has been designed specifically for that purpose. Called Traffic, this library simulates traffic in a city; it is graphical and of good enough visual quality to be attractive to today’s “Wii generation” students, something that traditional beginners’ exercises, like computing the 7-th Fibonacci number, cannot do (although we have these too as well). Using the Traffic software through its API, students can right from the first couple of weeks produce powerful applications, without understanding the internals of the library. But they do not stop there: since the whole thing is available in open source, students learn little by little how the software is made internally. Hence the name “Outside-In”: understand the interface first, then dig into the internals. Two advantages of the approach are particularly worth noting:

  • It emphasizes the value of abstraction, and particular contracts, not by preaching but by showing to students that abstraction helps them master a large body of professional-level software, doing things that would otherwise be unthinkable at an introductory level.
  • It addresses what is probably today the biggest obstacle to teaching introductory programming: the wide diversity of initial student backgrounds. The risk with traditional approaches is either to fly too high and lose the novices, or stay too low and bore those who already have programming experience. With the Outside-In method the novices can follow the exact path charted from them, from external API to internal implementation; those who already know something about programming can move ahead of the lectures and start digging into the code by themselves for information and inspiration.

(We have pretty amazing data on students’ prior programming knowledge, as  we have been surveying students for the past six years, initially at ETH and more recently at the University of York in England thanks to our colleague Manuel Oriol; some day I will post a blog entry about this specific topic.)

The book has been field-tested in its successive drafts since 2003 at ETH, for the Introduction to Programming course (which starts again in a few weeks, for the first time with the benefit of the full text in printed form). Our material, such as a full set of slides, plus exercises, video recordings of the lectures etc. is available to any instructor selecting the text. I must say that Springer did an outstanding job with the quality of the printing and I hope that instructors, students, and even some practitioners already in industry will like both form and content.

Table of contents

Front matter: Community resource, Dedication (to Tony Hoare and Niklaus Wirth), Prefaces, Student_preface, Instructor_preface, Note to instructors: what to cover?, Contents

PART I: Basics
1 The industry of pure ideas
2 Dealing with objects
3 Program structure basics
4 The interface of a class
5 Just Enough Logic
6 Creating objects and executing systems
7 Control structures
8 Routines, functional abstraction and information hiding
9 Variables, assignment and references
PART II: How things work
10 Just enough hardware
11 Describing syntax
12 Programming languages and tools
PART III: Algorithms and data structures
13 Fundamental data structures, genericity, and algorithm complexity
14 Recursion and trees
15 Devising and engineering an algorithm: Topological Sort
PART IV: Object-Oriented Techniques
16 Inheritance
17 Operations as objects: agents and lambda calculus
18 Event-driven design
PART V: Towards software engineering
19 Introduction to software engineering
PART VI: Appendices
A An introduction to Java (from material by Marco Piccioni)
B An introduction to C# (from material by Benjamin Morandi)
C An introduction to C++ (from material by Nadia Polikarpova)
D From C++ to C
E Using the EiffelStudio environment
Picture credits
Index

References

[1] Bertrand Meyer, Touch of Class: An Introduction to Programming Well Using Objects and Contracts, Springer Verlag, 2009, 876+lxiv pages, Hardcover, ISBN: 978-3-540-92144-8.

[2] Publisher page for [1]: see  here. List price: $79.95. (The page says “Ships in 3 to 4 weeks” but I think this is incorrect as the book is available; I’ll try to get the mention corrected.)

[3] Amazon.de page: see here. List price: EUR 53.45 (with offers starting at EUR 41.67).

[4] Amazon.com page: see here. List price: $63.96.

[5] Michela Pedroni and Bertrand Meyer: The Inverted Curriculum in Practice, in Proceedings of SIGCSE 2006, ACM, Houston (Texas), 1-5 March 2006, pages 481-485; available online.

VN:F [1.9.10_1130]
Rating: 7.4/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)

Contracts written by people, contracts written by machines

What kind of contract do you write? Could these contracts, or some of them, be produced automatically?

The idea of inferring contracts from programs is intriguing; it also raises serious epistemological issues. In fact, one may question whether it makes any sense at all. I will leave the question of principle to another post, in connection with some of our as yet unpublished work. This is, in any case, an active research field, in particular because of the big stir that Mike Ernst’s Daikon created when it appeared a few years ago.

Daikon [1] infers loop invariants dynamically: it observes executions; by looking up a repertoire of invariant patterns, it finds out what properties the loops maintain. It may sound strange to you (it did to Mike’s PhD thesis supervisor [2] when he first heard about the idea), but it yields remarkable results.

In a recent paper presented at ISSTA [3], we took advantage of Daikon to compare the kinds of contract people write with those that a machine could infer. The work started out as Nadia Polikarpova’s master’s thesis at ITMO  in Saint Petersburg [4], in the group of Prof. Anatoly Shalyto and under the supervision of Ilinca Ciupa from ETH. (Ilinca recently completed her PhD thesis on automatic testing [5], and is co-author of the article.) The CITADEL tool — the name is an acronym, but you will have to look up the references to see what it means — applies Daikon to Eiffel program.

CITADEL is the first application of Daikon to a language where programmers can write contracts. Previous interfaces were for contract-less languages such as Java where the tool must synthesize everything. In Eiffel, programmers do write contracts (as confirmed by Chalin’s experimental study [6]). Hence the natural questions: does the tool infer the same contracts as a programmer will naturally write? If not, which kinds of contract is each best at?

To answer these questions, the study looked at three sources of contracts:

  • Contracts already present in the code (in the case of widely used libraries such as EiffelBase, equipped with contracts throughout).
  • Those devised by students, in a small-scale experiment.
  • The contracts inferred by Daikon.

What do you think? Before looking up our study, you might want to make your own guess at the answers. You will not find a spoiler here; for the study’s results, you should read our paper [3]. All right, just a hint: machines and people are (in case you had not noticed this before) good at different things.

References

 

[1] Michael Ernst and others, Daikon bibliography on Ernst’s research page at the University of Washington.

[2] David Notkin, see his web page.

[3] A Comparative Study of Programmer-Written and Automatically Inferred Contracts, by Nadia Polikarpova, Ilinca Ciupa and me, in ISSTA 2009: International Symposium on Software Testing and Analysis, Chicago, July 2009, online copy available.

[4] ITMO (Saint-Petersburg State University of Information Technologies, Mechanics and Optics), see here.

[5] Ilinca Ciupa, Strategies for random contract-based testing; PhD thesis, ETH Zurich, December 2008. For a link to the text and to her other publications see Ilinca’s ETH page.

[6] Patrice Chalin,  Are practitioners writing contracts? In Rigorous Development of Complex Fault-Tolerant Systems, eds. Jones et al.,  Lecture Notes in Computer Science 4157, Springer Verlag, 2006, pages 100-113.

VN:F [1.9.10_1130]
Rating: 9.3/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)