If I’m not pure, at least my functions are

14 February 20184 July 2011 by Bertrand Meyer

If I’m not pure, at least my jewels are [1].

We often need to be reassured that a routine, usually a function, is “pure”, meaning that it does not change the state of the computation. For example, a function used in a contract element (precondition, postcondition, class invariant, loop invariant) should be purely descriptive, since it is a specification element; evaluating it, typically for testing and debugging, should not create a change of behavior — a “Heisenberg effect” — in the very computation that it is intended to assess. Another application is in a concurrency context, particularly in SCOOP (see earlier posts and forthcoming ones): if one or more functions are pure, several of their executions can run concurrently on the same object.

The notion of purity admits variants. The usual notion is what [2] calls weak purity, which precludes changes to previously existing objects but allow creating new objects. In the EiffelBase library we also encounter routines that have another form of purity, which we may call “relative” purity: they can leave the same state on exit as they found on entry, but in-between they might change the state. For the rest of this discussion we will rely on the standard notion of weak purity: no changes permitted on existing objects.

It is often suggested that the programming language should support specifying that a routine is pure; many people have indeed proposed the addition of a keyword such as pure to Eiffel. One of the reasons this is not — in my opinion — such a great idea is that purity is just a special case of the more general problem of framing: specifying and verifying what a routine does not change. If we can specify an arbitrary frame property, then we can, as a special case covered by the general mechanism, specify that a routine changes nothing.

To see why framing is so important, consider a class ACCOUNT with a routine deposit that has the postcondition

balance = old balance + sum………..— Where sum is the argument of deposit

Framing here means stating that nothing else than balance changes; for example the account’s owner and its number should remain the same. It is not practical to write all individual postcondition clauses such as

owner= old owner
number= old number

and so on. But we do need to specify these properties and enforce them, if only to avoid that a descendant class (maybe MAFIA_ACCOUNT) distort the rules defined by the original.

One technique is to add a so-called “modifies clause”, introduced by verification tools such as ESC-Java [3] and JML [4]. Modifies clauses raise some theoretical issues; in particular, the list of modified expressions is often infinite, so we must restrict ourselves to an abstract-data-type view where we characterize a class by commands and queries and the modifies clause only involves queries of the class. Many people find this hard to accept at first, since anything that is not talked about can change, but it is the right approach. A modifies clause of sorts, included in the postcondition, appeared in an earlier iteration of the Eiffel specification, with the keyword only (which is indeed preferable to modifies, which in the Eiffel style favoring grammatically simple keywords would be modify, since what we want to express is not that the routine must change anything at all but that it may only change certain specified results!). The convention worked well with inheritance, since it included the rule that a clause such as only balance, in class ACCOUNT, prescribes that the routine may not, in its modifies version as well as in any version redefined in descendants, change any other query known at the level of ACCOUNT; but a descendant version may change, subject to its own ACCOUNT clauses, any new query introduced by a descendant.

To declare a routine as pure, it would suffice to use an empty only clause (not very elegant syntactically — “only” what? — but one can get used to it).

This construct has been discarded, as it places too heavy a burden on the programmer-specifier. Here the key observation resulted from a non-scientific but pretty extensive survey I made of all the JML code I could get my hands on. I found that every time a query appeared in a modifies clause it was also listed in the postcondition! On reflection, this seems reasonable: if you are serious about specification, as anyone bothering to write such a clause surely is, you will not just express that something changes and stop there; you will write something about how it may change. Not necessarily the exact result, as in

my_query = precise_final_value

but at least some property of that result, as in

some_property (my_query)

This observation has, however, an inescapable consequence for language design: modifies or only clauses should be inferred by the compiler from the postcondition, not imposed on the programmer as an extra burden. The convention, which we may call the Implicit Framing Rule, is simple:

A routine may change the value of a query only if the query is specified in the routine’s postcondition

(or, if you like double negation, “no routine may change the value of a query other than those specified in its postcondition”). Here we say that a feature is “specified” in a postcondition if it appears there outside of the scope of an old expression. (Clearly, an occurrence such as old balance does not imply that balance can be modified, hence this restriction to occurrences outside of an old.)

With this convention the only clause becomes implicit: it would simply list all the queries specified in the postcondition, so there is no need for the programmer to write it. For the rare case of wanting to specify that a query q may change, but not wanting to specify how, it is easy to provide a library function, say involved, that always return true and can be used in postconditions, as in involved (q).

The convention is of course not just a matter of programming methodology but, in an IDE supporting verification, such as the EVE “Verification As a Matter Of Course” environment which we are building for Eiffel [5], the compiler will enforce the definition above — no change permitted to anything not specified in the postcondition — and produce an error in case of a violation. This also means that we can easily specify that a routine is pure: it must simply not specify any query in its postcondition. It may still list it in an old clause, as happens often in practice, e.g.

Result = old balance – Minimum_balance………..— In the postcondition of a function available_funds

Note the need to use old here. Apart from this addition of old to some postconditions, a considerable advantage of the convention is that most existing code using pure functions will be suitable to the new purity enforcement without any need to provide new annotations.

I believe that this is the only sustainable convention. It does not, of course, solve the frame problem by itself (for attempts in this direction see [6, 7]), but it is a necessary condition for a solution that is simple, easily taught, fairly easily implemented, and effective. It goes well with model-based specifications [8, 9], which I believe are the technique of most promise for usable specifications of object-oriented software. And it provides a straightforward, no-frills way to enforce purity where prescribed by the Command-Query Separation principle [10, 11]: if I’m not pure, at least my functions must be.

References

[1] From the lyrics of the aria Glitter and Be Gay in Leonard Bernstein’s Candide, text by Lillian Hellman and others. Youtube offers several performances, including by Diana Damrau (here) and Natalie Dessay (here) . For the text see e.g. here.

[2] Adam Darvas and Peter Müller: Reasoning About Method Calls in Interface Specifications, in Journal of Object Technology, Volume 5, no. 5, jUNE 2006, pages 59-85, doi:10.5381/jot.2006.5.5.a3, available here.

[3] C. Flanagan, K.R.M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe and R. Stata: Extended static checking for Java, in PLDI 2002 (Programming Language Design and Implementation), pages 234–245, 2002.

[4] Gary Leavens et al.: Java Modeling Language, see references here.

[5] Julian Tschannen, Carlo A. Furia, Martin Nordio, and Bertrand Meyer: Verifying Eiffel Programs with Boogie, to appear in Boogie 2011, First International Workshop on Intermediate Verification Languages, Wroclaw, August 2011. See documentation about the EVE project on the project page.

[6] Ioannis Kassios: Dynamic Frames: Support for Framing, Dependencies and Sharing Without Restrictions, in Formal Methods 2006, eds. J. Misra, T. Nipkow and E. Sekerinski, Lecture Notes in Computer Science 4085, Springer Verlag, 2006, pages 268-283.

[7] Bernd Schoeller: Making Classes Provable through Contracts, Models and Frames, PhD thesis, ETH Zurich, 2007, available here

[8] Bernd Schoeller, Tobias Widmer and Bertrand Meyer: Making Specifications Complete Through Models, in Architecting Systems with Trustworthy Components, eds. Ralf Reussner, Judith Stafford and Clemens Szyperski, Lecture Notes in Computer Science, Springer-Verlag, 2006, available here.

[9] Nadia Polikarpova, Carlo Furia and Bertrand Meyer: Specifying Reusable Components, in Verified Software: Theories, Tools, Experiments (VSTTE ’10), Edinburgh, UK, 16-19 August 2010, Lecture Notes in Computer Science, Springer Verlag, 2010, available here.

[10] Bertrand Meyer: Object-Oriented Software Construction, first (1988) and second (1997) editions, Prentice Hall.

[11] Bertrand Meyer: Touch of Class: An Introduction to Programming Well, Using Objects and Contracts, Springer Verlag, 2009.

Be Sociable, Share!

Please wait...

13 thoughts on “If I’m not pure, at least my functions are”

helmut.brandl

5 July 2011 at 01:01

Nicely expressed. A routine must not change anything which it does not list in its postcondition. I agree basically with that view, although it needs some refinements. The postcondition “balance = old balance + amount” does not express that *only* balance is changed and nothing else, but that all queries which depend on balance might be changed as well (and all queries on which balance depends — if calculated — might change as well). But I assume you just wanted to express the basic idea and left out these details.

But there is another tweak related to the framing problem (or aliasing problem). E.g. we have a “class SORTED_LIST[G->ORDERED]” with the feature “extend(e:G)” and the invariant that asserts that the list is kept sorted.

local s:STRING; l:SORTED_LIST[STRING]
…
l.extend(“hello”)
s := “world!”
l.extend(s)
s[1] := ‘a’ — this statement breaks the invariant of the sorted list!!!

How do you avoid this “impurity”? The last statement should not be allowed as long as the sorted list does not deeply clone its elements before insertion. However it is (at least to the best of my knowledge) perfectly valid Eiffel.

Please wait...

Log in to Reply
Bertrand Meyer

5 July 2011 at 08:41

Thanks for the comments.

On your second point (breaking an invariant through indirect access): this is a real problem but a different one from purity. For existing solutions look at ownership types. I think it can be addressed through better techniques but that is for a future discussion. One thing at a time.

On your first point: “all queries which depend on balance might be changed as well”. No. The article is a bit curt but precise on this point: an abstract data type is defined by commands and queries; the postcondition of any routine (normally a command) must explicitly specify any query whose value it may modify. Except possibly in the implementation, there is no notion of a query “depending on” another.

Please wait...

Log in to Reply
helmut.brandl

5 July 2011 at 14:42

Dependant queries: You can require that a command must specify all the queries which are affected by the command. But then you might get the same “explosion” which you wanted to avoid with not requiring the complete specification of all queries which are not affected. Suppose you have a sequence with a command extend_rear with the queries count, item, last, etc. You might specify

extend_rear(e:G)
…
ensure
count = old count + 1
last = e
item(count) = e
model = old model.rear_extended(e) — if you use models
end

Suppose the postcondition is now complete with respect to the available queries. Then you decide to add the queries tail and subsequence. Introducing these queries requires updating the postcondition of extend_rear. Pushing the argument further you get an m-n complexity. Adding a query to an abstract data type with n queries and m commands might make it necessary to extend the postcondition of all or a significant part of the m commands. Do you agree with this consequence or did I get something wrong in understanding your proposal?

Please wait...

Log in to Reply
- Bertrand Meyer
  
  5 July 2011 at 15:33
  
  I don’t see that you got anything wrong.
  
  If there is a new query it is necessary to express how any existing commands might affect it. It will usually not be m x n since many commands will not have an effect on any particular query, especially if indeed the query is new. The convention proposed in the article (to which I have now given a name, “Implicit Framing Rule”, for ease of reference) is effective in this respect, since not listing a query means it may not change.
  
  The question of model queries is a bit different, as they are intended to describe the entire semantics of the class and all the others — such as `count’, which might be equal to `model.count’ — are conceptually superfluous even if practically useful. Model-based contracts are relatively new and we are still learning how best to use them. I believe that the solution here, to avoid specification explosion, is an idea that has been discussed in other contexts but not yet integrated into the language: theorems. Marking certain assertions as theorems states that they follow logically from others (axioms). The distinction is particularly useful for invariant clauses. Then we might amend the Implicit Framing Rule so that it only considers non-theorem clauses in postconditions. This deserves further thought.
  
  Please wait...
  
  Log in to Reply
  - helmut.brandl
    
    5 July 2011 at 17:04
    
    I am afraid I don’t agree with your optimism. If a query is new and calculated, then it uses other queries to calculate its result. Therefore the new query “depends on” other already existing queries. Following your rule, you have to add a new postcondition clause to all commands which affect the already existing queries on which the new query depends. So adding innocently a new query results in the need to update a lot of postconditions. An abstract class like SEQUENCE is very instructive to see the explosion (you can add head, tail, overridden, subsequence, prepended, appended … and all are affected by extend_rear, extend_front, put, append, prepend …).
    
    Please wait...
    
    Log in to Reply
    - Bertrand Meyer
      
      5 July 2011 at 18:31
      
      I agree this point needs to be refined further. We should resume this discussion on the basis of actual class texts — which I think will show that there is no explosion, but that’s for us to demonstrate.
      
      Please wait...
      
      Log in to Reply
rod_chapman

5 July 2011 at 18:20

Interesting article. Here a question: your justification for the implicit framing rule seems
to infer that the programmer supplying a postcondition must be less burdensome than
asking the same programmer to supply a modifies clause. Do you data to support that?
– Rod

Please wait...

Log in to Reply
- Bertrand Meyer
  
  5 July 2011 at 21:23
  
  Hi Rod,
  
  If you want to perform verification, you need to have specifications. In my experience, programmers have no qualms writing such specifications; in the case of Eiffel this is confirmed by Chalin’s study [12]. But the process must not be tedious. Writing frame conditions (in the form of `modifies’ clauses, or any other) is tedious. Other than tedious, it is annoying and unstable, as any addition of a query may force updating many specifications; see the points raised by Helmut Brandl.
  
  So while I do not have scientific evidence that one is more acceptable than the other, I am convinced that the right approach is to require programmers to write the postconditions and let the tools infer the frame conditions automatically, based on a simple language conventions as described by the Implicit Frame Rule.
  
  [12] Patrice Chalin, Are Practitioners Writing Contracts, in Rigorous Development of Complex Fault-Tolerant Systems
  Lecture Notes in Computer Science, 2006, Volume 4157/2006, 100-113, DOI: 10.1007/11916246_5, see https://springerlink3.metapress.com/content/41rvm764k1k18217/resource-secured/?target=fulltext.pdf&sid=c4c0yq55ep2hm2bmhtldrq45&sh=www.springerlink.com (restricted access), also technical report version at http://users.encs.concordia.ca/~chalin/papers/TR-2005-006.pdf.
  
  Please wait...
  
  Log in to Reply
Pingback: Bertrand Meyer's technology blog » Blog Archive » Specification explosion
Karlheinz Hug

19 March 2012 at 17:12

My problem is similar to Helmut Brandl’s first problem with “dependent queries”. Bertrand says, “except possibly in the implementation, there is no notion of a query ‘depending on’ another”. Why not?

Richard Mitchell, James McKim: Extending a method of devising software contracts. In: C. Mingins, B. Meyer (ed): Proc. TOOLS 32, IEEE (1999), give the guideline
“G2 Separate basic queries from derived queries (derived queries can be specified in terms of basic queries)” I find this guideline very useful.

Let me explain my problem on a simple example that I used in my introductory CS course since 1999. Suppose the Implicit Framing Rule (IFR) holds in the following.

class RECTANGLE

feature
width, height, area : REAL

invariant
width > 0
height > 0
area = width * height

end — class RECTANGLE

width and height are basic, area is derived (or “depending on” width and height). We could substitute the invariant

area = width * height

by the postcondition:

area : REAL
ensure
Result = old width * height

The semantics should be the same. In both variants, area is pure. The difference is with assertion checking on: The invariant is checked before and after each routine call,
the postcondition is only checked when area is called.

First, I add a setter for width:

set_width (new_width : REAL)
require
new_width > 0
ensure
width = new_width

The IFR says that only width changes and nothing else. This violates the invariant or the postcondition of area, because one of height and area must change too. I want set_width to keep height and change area. The IFR forces me to add a condition with area, maybe such:

ensure
width = new_width
area = width * old height

It seems that I have to repeat the invariant or postcondition of area in a modified form.

Second, I add a setter for area:

set_area (new_area : REAL)
require
new_area > 0
ensure
area = new_area

The IFR says that only area changes and nothing else. But if new_area /= area, this violates the invariant or the postcondition of area, because one of width and height or both must change too. I have at least three options. Each may be useful:

set_area_keep_width (new_area : REAL)
require
new_area > 0
ensure
area = new_area
area = (old width) * height

set_area_keep_height — similar

set_area_keep_relation (new_area : REAL)
require
new_area > 0
ensure
area = new_area
width * (old height) = (old width) * height

Conclusion: Given a setter for a derived query, the IFR is okay: It forces the programmer to specify which of the basic queries should change.

Given a setter for a basic query, the IFR seems not to be the last word. We can see that if we add more derived queries like perimeter, diagonal and so on, which depend on width and height. A generalization:

class EXAMPLE

feature
basic_1 : TYPE_B1
…
basic_m : TYPE_Bm

derived_1 : TYPE_D1
…
derived_n : TYPE_Dn

set_basic_i (new_basic_i : TYPE_Bi)
ensure
basic_i = new_basic_i
derived_1 = function_1 (old basic_1,…, basic_i,…, old basic_m)
…
derived_n = function_n (old basic_1,…, basic_i,…, old basic_m)
— All basics with old except basic_i!

invariant
derived_1 = function_1 (basic_1,…, basic_m)
…
derived_n = function_n (basic_1,…, basic_m)

end — class EXAMPLE

The explosion of nochange-postconditions is back again. To me, the invariants do not look like theorems that follow logically from axioms. So I wonder whether the theorem-approach may help. With the discarded “only” construct the setter looks like this:

set_basic_i (new_basic_i : TYPE_Bi)
ensure
basic_i = new_basic_i
only derived_1,…, derived_n

With “only”, there is no need to repeat the whole invariant formulas. So I do not see the advantage of the IFR in this case. I would prefer “only”, although it is not ideal to repeat the names of the derived queries.

Am I missing some points?

What about “distinction of basic and derived queries + IFR-variant” with IFR-variant:

“A routine may change the value of a basic query only if the basic query is specified in the routine’s postcondition,
and it may change the values of derived queries.”

This leads to:

class EXAMPLE

basic query
basic_1 : TYPE_B1
…
basic_m : TYPE_Bm

derived query
derived_1 : TYPE_D1
…
derived_n : TYPE_Dn

command
set_basic_i (new_basic_i : REAL)
ensure
basic_i = new_basic_i

invariant
derived_1 = function_1 (basic_1,…, basic_m)
…
derived_n = function_n (basic_1,…, basic_m)

end — class EXAMPLE

Can “derivedness” of a query be inferred from invariants and postconditions? Then there would be no need to distinguish derived queries at their declaration.
“A query is basic, if it is not fully specified in terms of other queries. A query is derived, if it is fully specified in terms of basic queries.”
Can a tool infer from

diagonal * diagonal = width * width + height * height

that diagonal is derived from width and height?

Please wait...

Log in to Reply
- helmut.brandl
  
  21 May 2012 at 16:21
  
  Hello Karlheinz,
  
  the derivation of the implicit frame contracts can be done automatically by a compiler if some precise language rules are available. The important notion is the dependency graph of the queries. There are two dependency graphs: one for the public view and one for the private view.
  
  Have a look at my articles to see the details of this.
  
  http://tecomp.sourceforge.net/index.php?file=doc/papers/proof/framing.txt
  http://tecomp.sourceforge.net/index.php?file=doc/papers/proof/framing_implement.txt
  
  Helmut
  
  Please wait...
  
  Log in to Reply
George Cowan

22 August 2013 at 21:53

It seems to me that the (implicit) only clause must be derived from the code – any data that the compiler detects possibly changing must be inserted into the only clause. Then everything in the only clause must occur explicitly in the postcondition for security reasons. Otherwise, I can add that extra fraction of a penny to my account.

I am embarrassed to admit that I don’t know enough Eiffel to know if every opportunity to change data can be detected.

George

Please wait...

Log in to Reply
- George S Cowan
  
  22 August 2013 at 23:01
  
  Oh, never mind. I found your statement that “The convention is of course not just a matter of programming methodology but, in an IDE supporting verification, such as the EVE “Verification As a Matter Of Course” environment which we are building for Eiffel [5], the compiler will enforce the definition above — no change permitted to anything not specified in the postcondition — and produce an error in case of a violation.” Which works just as well.
  
  George
  
  Please wait...
  
  Log in to Reply

References

13 thoughts on “If I’m not pure, at least my functions are”

Leave a Comment Cancel reply