Archive for January 2010

A couple of loop examples

(This entry originated as a post on the EiffelStudio user group mailing list.) 

Here are a couple of actual examples of the new loop variants discussed in the blog entry immediately preceding this one. They came out of my current work; I started updating a program to take advantage of the new facility.

As a typical example, I replaced

        local
                eht: HASH_TABLE [EXPRESSION, EXPRESSION]
        do
               
        from
                eht := item (e)
                eht.start
         until
                eht.off
        loop
                Result.extend (eht.key_for_iteration)
                eht.forth
        end 

 by

        across item (e) as eht loop Result.extend (eht.key) end

 which also gets rid of the local variable declaration. The second form is syntactic sugar for the first, but I find it justified. 

 Another case, involving nested loops: 

— Previously:

        from
                other.start
        until
                other.off
        loop
                oht := other.item_for_iteration
                e := other.key_for_iteration
                from
                        oht.start
                until
                        oht.off
                loop
                        put (e, oht.item_for_iteration)
                        oht.forth
                end
                other.forth
        end

— Now:

        across other as o loop
                across o.item as oht loop put (o.key, oht.item) end
        end

here getting rid of two local variable declarations (although I might for efficiency reintroduce the variable e  to compute o.key just once). 

It is important to note that these are not your grandmother’s typical loops: they iterate on complex data structures, specifically hash tables where the keys are lists and the items are themselves hash tables, with lists as both items and keys. 

The mechanism is applicable to all the relevant data structures in EiffelBase (in other words, no need for the programmer to modify anything, just apply the across  loop to any such structure), and can easily extended to any new structure that one wishes to define. In the short term at least, I still plan in my introductory teaching to show the explicit variants first, as it is important for beginners to understand how a loop works. (My hunch based on previous cases is that after a while we will feel comfortable introducing the abstract variants from the start, but it takes some time to learn how to do it right.)

VN:F [1.9.10_1130]
Rating: 6.3/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 8 votes)

More expressive loops for Eiffel

New variants of the loop construct have been introduced into Eiffel, allowing a safer, more concise and more abstract style of programming. The challenge was to remain compatible with the basic loop concept, in particular correctness concerns (loop invariant and loop variant), to provide a flexible mechanism that would cover many different cases of iteration, and to keep things simple.

Here are some examples of the new forms. Consider a structure s, such as a list, which can be traversed in sequence. The following loop prints all elements of the list:

      across s as c loop print (c.item) end

(The procedure print is available on all types, and can be adapted to any of them by redefining the out feature; item gives access to individual values of the list.) More about c in just a moment; in the meantime you can just consider consider that using “as c” and manipulating the structure through c rather than directly through s  is a simple idiom to be learned and applied systematically for such across loops.

The above example is an instruction. The all and some variants yield boolean expressions, as in

across s as c all c.item > 0 end

which denotes a boolean value, true if and only if all elements of the list are positive. To find out if at least one is positive, use

across s as c some c.item > 0 end

Such expressions could appear, for example, in a class invariant, but they may be useful in many different contexts.

In some cases, a from clause is useful, as in

        across s as c from sum := 0  loop sum := sum + c.index c.item end
— Computes Σ i * s [i]

The original form of loop in Eiffel is more explicit, and remains available; you can achieve the equivalent of the last example, on a suitable structure, as

A list and a cursor

A list and a cursor

      from
sum := 0 ; s.start
until
s.after
loop
sum := sum + s.index s.item
s.forth

        end

which directly manipulates a cursor through s, using start to move it to the beginning, forth to advance it, and after to test if it is past the last element. The forms with across achieve the same purpose in a more concise manner. More important than concision is abstraction: you do not need to worry about manipulating the cursor through start, forth and after. Under the hood, however, the effect is the same. More precisely, it is the same as in a loop of the form

from
sum := 0 ; c.start
until
c.after
loop
sum := sum + c.index c.item
c.forth

        end

where c is a cursor object associated with the loop. The advantage of using a cursor is clear: rather than keeping the state of the iteration in the object itself, you make it external, part of a cursor object that, so to speak, looks at the list. This means in particular that many traversals can be active on the same structure at the same time; with an internal cursor, they would mess up with each other, unless you manually took the trouble to save and restore cursor positions. With an external cursor, each traversal has its own cursor object, and so does not interfere with other traversals — at least as long as they don’t change the structure (I’ll come back to that point).

With the across variant, you automatically use a cursor; you do not have to declare or create it: it simply comes as a result of the “as c” part of the construct, which introduces c as the cursor.

On what structures can you perform such iterations? There is no limitation; you simply need a type based on a class that inherits, directly or indirectly, from the library class ITERABLE. All relevant classes from the EiffelBase library have been updated to provide this inheritance, so that you can apply the across scheme to lists of all kinds, hash tables, arrays, intervals etc.

One of these structures is the integer interval. The notation  m |..| n, for integers m and n, denotes the corresponding integer interval. (This is not a special language notation, simply an operator, |..|, defined with the general operator mechanism as an alias for the feature interval of INTEGER classes.) To iterate on such an interval, use the same form as in the examples above:

        across m |..| n  as c from sum := 0  loop sum := sum + a [c.item] end
— Computes Σ a [i], for i ranging from m to n, for an array (or other structure) a

The key feature in ITERABLE is new_cursor, which returns a freshly created cursor object associated with the current structure. By default it is an ITERATION_CURSOR, the most general cursor type, but every descendant of ITERABLE can redefine the result type to something more specific to the current structure. Using a cursor — c in the above examples —, rather than manipulating the structure s directly, provides considerable flexibility thanks to the property that ITERATION_CURSOR itself inherits from ITERABLE   and hence has all the iteration mechanisms. For example you may write

across s.new_cursor.reversed as c loop print (c.item) end

to print elements in reverse order. (Using Eiffel’s operator  mechanism, you may write s.new_cursor, with a minus operator, as a synonym for new_cursor.reversed.) The function reversed gives you a new cursor on the same target structure, enabling you to iterate backwards. Or you can use

        across s.new_cursor + 3 as c loop print (c.item) end

(using s.new_cursor.incremented (3) rather than s.new_cursor + 3 if you are not too keen on operator syntax) to iterate over every third item. A number of other possibilities are available in the cursor classes.

A high-level iteration mechanism is only safe if you have the guarantee that the structure remains intact throughout the iteration. Assume you are iterating through a structure

across  as c loop some_routine end

and some_routine changes the structure s: the whole process could be messed up. Thanks to Design by Contract mechanisms, the library protects you against such mistakes. Features such as item and index, accessing properties of the structure during the iteration, are equipped with a precondition clause

require
is_valid

and every operation that changes the structure sets is_valid to false. So as soon as any change happens, you cannot continue the iteration; all you can do is restart a new one; the command start, used internally to start the operation, does not have the above precondition.

Sometimes, of course, you do want to change a structure while traversing it; for example you may want to add an element just to the right of the iteration position. If you know what you are doing that’s fine. (Let me correct this: if you know what you are doing, express it through precise contracts, and you’ll be fine.) But then you should not use the abstract forms of the loop, across; you should control the iteration itself through the basic form from … until with explicit cursor manipulation, protected by appropriate contracts.

The two styles, by the way, are not distinct constructs. Eiffel has always had only one form of loop and this continues the case. The across forms are simply new possibilities added to the classical loop construct, with obvious constraints stating for example that you may not have both a some or all form and an explicit  loop body.  In particular, an across loop can still have an invariant clause , specifying the correctness properties of the loop, as in

        across s as c from sum := 0  invariant sum = sigma (s, index)  loop sum := sum + c.index c.item end

EiffelStudio 6.5 already included the language update; the library update (not yet including the is_valid preconditions for data structure classes) will be made available this week.

These new mechanisms should increase the level of abstraction and the reliability of loops, a fundamental element of  almost all programs.

VN:F [1.9.10_1130]
Rating: 7.5/10 (22 votes cast)
VN:F [1.9.10_1130]
Rating: +4 (from 14 votes)

The theory and calculus of aliasing

In a previous post I briefly mentioned some work that I am doing on aliasing. There is a draft paper [1], describing the theory, calculus, graphical notation (alias diagrams) and implementation. Here I will try to give an idea of what it’s about, with the hope that you will be intrigued enough to read the article. Even before you read it you might try out the implementation[2], a simple interactive interface with all the examples of the article .

What the article does not describe in detail — that will be for a companion paper— is how the calculus will be used as part of a general framework for developing object-oriented software proved correct from the start, the focus of our overall  “Trusted Components” project’ [3]. Let me simply state that the computation of aliases is the key missing step in the effort to make correctness proofs a standard part of software development. This is a strong claim which requires some elaboration, but not here.

The alias calculus asks a simple question: given two expressions representing references (pointers),  can their values, at a given point in the program, ever reference the same object during execution?

As an example  application, consider two linked lists x and y, which can be manipulated with operations such as extend, which creates a new cell and adds it at the end of the list:

lists

 The calculus makes it possible to prove that if  x and y are not aliased to each other, then none of the pointers in any of the cells in either of the lists can point to  (be aliased to) any cell of  the other. If  x is initially aliased to y, the property no longer holds. You can run the proof (examples 18 and 19) in the downloadable implementation.

The calculus gives a set of rules, each applying to a particular construct of the language, and listed below.

The rule for a construct p is of the form

          a |= p      =   a’
 

where a and a’ are alias relations; this states that executing p  in a state where the alias relation is a will yield the alias relation a’ in the resulting state. An alias relation is a symmetric, irreflexive relation; it indicates which expressions and variables can be aliased to each other in a given state.

The constructs p considered in the discussion are those of a simplified programming language; a modern object-oriented language such as Eiffel can easily be translated into that language. Some precision will be lost in the process, meaning that the alias calculus (itself precise) can find aliases that would not exist in the original program; if this prevents proofs of desired properties, the cut instruction discussed below serves to correct the problem.

The first rule is for the forget instruction (Eiffel: x := Void):

          a |= forget x       =   a \- {x}

where the \- operator removes from a relation all the elements belonging to a given set A. In the case of object-oriented programming, with multidot expressions x.y.z, the application of this rule must remove all elements whose first component, here x, belongs to A.

The rule for creation is the same as for forget:

         a |= create x          =   a \- {x}

The two instructions have different semantics, but the same effect on aliasing.

The calculus has a rule for the cut instruction, which removes the connection between two expressions:

        a |= cut x, y       =   a — <x, y>

where is set difference and <x, y> includes the pairs [x, y] and [y, x] (this is a special case of a general notation defined in the article, using the overline symbol). The cut   instruction corresponds, in Eiffel, to cut   x /=end:  a hint given to the alias calculus (and proved through some other means, such as standard axiomatic semantics) that some references will not be aliased.

The rule for assignment is

      a |= (x := y)      =   given  b = a \- {x}   then   <b È {x} x (b / y)}> end

where b /y (“quotient”), similar to an equivalence class in an equivalence relation, is the set of elements aliased to y in b, plus y itself (remember that the relation is irreflexive). This rule works well for object-oriented programming thanks to the definition of the \- operator: in x := x.y, we must not alias x to x.y, although we must alias it to any z that was aliased to x.y.

The paper introduces a graphical notation, alias diagrams, which makes it possible to reason effectively about such situations. Here for example is a diagram illustrating the last comment:

Alias diagram for a multidot assignment

Alias diagram for a multidot assignment

(The grayed elements are for explanation and not part of the final alias relation.)

For the compound instruction, the rule is:

           a |= (p ;  q)      =   (a |= p) |= q)

For the conditional instruction, we get:

           a |= (then p else  q end)      =   (a |= p) È  (a |= q)

Note the form of the instruction: the alias calculus ignores information from the then clause present in the source language. The union operator is the reason why  alias relations,  irreflexive and symmetric, are not  necessarily transitive.

The loop instruction, which also ignores the test (exit or continuation condition), is governed by the following rule:

           a |= (loop p end)       =   tN

where span style=”color: #0000ff;”>N is the first value such that tN = tN+1 (in other words, tN is the fixpoint) in the following sequence:

            t0          =    a
           tn+1       =   (tn È (tn |= p))     

The existence of a fixpoint and the correctness of this formula to compute it are the result of a theorem in the paper, the “loop aliasing theorem”; the proof is surprisingly elaborate (maybe I missed a simpler one).

For procedures, the rule is

         a |= call p        =   a |= p.body

where p.body is the body of the procedure. In the presence of recursion, this gives rise to a set of equations, whose solution is the fixpoint; again a theorem is needed to demonstrate that the fixpoint exists. The implementation directly applies fixpoint computation (see examples 11 to 13 in the paper and implementation).

The calculus does not directly consider routine arguments but treats them as attributes of the corresponding class; so a call is considered to start with assignments of the form f : = a for every pair of formal and actual arguments f and a. Like the omission of conditions in loops and conditionals, this is a source of possible imprecision in translating from an actual programming language into the calculus, since values passed to recursive activations of the same routine will be conflated.

Particularly interesting is the last rule, which generalizes the previous one to qualified calls of the form x. f (…)  as they exist in object-oriented programming. The rule involves the new notion of inverse variable, written x’ where x is a variable. Laws of the calculus (with Current denoting the current object, one of the fundamental notions of object-oriented programming) are

        Current.x            = x   
        x.Current            = x
        x.x’                      = Current
        x’.x                      = Current

In other words, Current plays the role of zero element for the dot operator, and variable inversion is the inverse operation. In a call x.f, where x denotes the supplier object (the target of the call), the inverse variable provides a back reference to the client object (the caller), indispensable to interpret references in the original context. This property is reflected by the qualified client rule, which uses  the auxiliary operator n (where x n a, for a relation a and a variable x, is the set of pairs [x.u, y.v] such that the pair [u, v] is in a). The rule is:

         a |= call x.r       =   x n ((x’ n a ) |= call r)

You need to read the article for the full explanation, but let me offer the following quote from the corresponding section (maybe you will note a familiar turn of phrase):

Thus we are permitted to prove that the unqualified call creates certain aliasings, on the assumption that it starts in its own alias environment but has access to the caller’s environment through the inverted variable, and then to assert categorically that the qualified call has the same aliasings transposed back to the original environment. This change of environment to prove the unqualified property, followed by a change back to the original environment to prove the qualified property, explains well the aura of magic which attends a programmer’s first introduction to object-oriented programming.

I hope you will enjoy the calculus and try the examples in the implementation. It is fun to apply, and there seems to be no end to the potential applications.

Reference

[1] Bertrand Meyer: The Theory and Calculus of Aliasing, draft paper, first published 12 January 2009 (revised 21 January 2010), available here and also at arxiv.org.
[2] Implementation (interactive version to try all the examples of the paper): downloadable Windows executable here.
[3] Bertrand Meyer: The Grand Challenge of Trusted Components, in 2003 International Conference on Software Engineering, available here.

VN:F [1.9.10_1130]
Rating: 7.9/10 (8 votes cast)
VN:F [1.9.10_1130]
Rating: +7 (from 7 votes)

Just another day at the office

In the past few weeks I wrote a program to compute the aliases of variables and expressions in an object-oriented program (based on a new theory [1]).

For one of the data structures, I needed a specific notion of equality, so I did the standard thing in Eiffel: redefine the is_equal function inherited from the top class ANY, to implement the desired variant.

The first time I ran the result, I got a postcondition violation. The violated postcondition clauses was not even any that I wrote: it was an original postcondition of is_equal (other: like Current)  in ANY, which my redefinition inherited as per the rules of Design by Contract; it reads

symmetric: Result implies other ~ Current

meaning: equality is symmetric, so if Result is true, i.e. the Current object is equal to other, then other must also be equal to Current. (~ is object equality, which applies the local version is is_equal).  What was I doing wrong? The structure is a list, so the code iterates on both the current list and the other list:

from
    start ; other.start ; Result := True
until (not Result) or after loop
        if other.after then Result := False else
              Result := (item ~ other.item)
              forth ; other.forth
        end
end

Simple enough: at each position check whether the item in the current list is equal to the item in the other list, and if so move forth in both the current list and the other one; stop whenever we find two unequal elements, or we exhaust either list as told by after list. (Since is_equal is a function and not produce any side effect, the actual code saves the cursors before the iteration and restores them afterwards. Thanks to Ian Warrington for asking about this point in a comment to this post. The new across loop variant described in  two later postings uses external cursors and manages them automatically, so this business of maintaining the cursor manually goes away.)

The problem is that with this algorithm it is possible to return True if the first list was exhausted but not the second, so that the first list is a subset of the other rather than identical. The correction is immediate: add

Result and other_list.after

after the loop; alternatively, enclose the loop in a conditional so that it is only executed if count = other.count (this solution is  better since it saves much computation in cases of lists of different sizes, which cannot be equal).

The lesson (other than that I need to be more careful) is that the error was caught immediately, thanks to a postcondition violation — and one that I did not even have to write. Just another day at the office; and let us shed a tear for the poor folks who still program without this kind of capability in their language and development environment.

Reference

[1] Bertrand Meyer: The Theory and Calculus of Aliasing, draft paper, available here.

VN:F [1.9.10_1130]
Rating: 9.6/10 (5 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)