Bertrand Meyer's technology+ blog

The Professor Smith syndrome: Part 1 – a quiz

2 June 2011, 14:26

[As a reminder, this blog is now on a regular schedule, appearing every Monday. Sometimes in mid-week there will be a lighter piece or, as here, a preparation for the following Monday’s entry.]

Consider the following hypothetical report in experimental software engineering (see earlier posts: [1], [2]):

Professor Smith has developed a new programming technique, “Suspect-Oriented Programming” (SOP). To evaluate SOP, he directs half of the students in his “Software Methodology” class to do the project using traditional techniques, and the others to use SOP.

He finds that projects by the students using SOP have, on the average, 15% fewer bugs than the others, and reports that SOP increases software reliability.

Quiz, in advance of next Monday’s post: what’s wrong with this story?

References

[1] Bertrand Meyer: The rise of empirical software engineering (I): the good news, this blog, 30 July 2010, available here.
[2] Bertrand Meyer: The rise of empirical software engineering (II): what we are still missing, this blog, 31 July 2010, available here.

Rating: 10.0/10 (4 votes cast)

Rating: +2 (from 2 votes)

Be Sociable, Share!

Category: Software engineering | Comment (RSS)

13 Comments

Craig Stuntz says:

2 June 2011 at 16:03

Any kind of control experiment, for starters. The students using SOP might just happen to be more careful programmers. It would make more sense to have the same groups of students to create applications both with and without SOP and compare those results.

VN:F [1.9.10_1130]
please wait...
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Log in to Reply
- Bertrand Meyer says:
  
  2 June 2011 at 16:35
  
  Actually, I don’t think there is a particular problem, and I am not sure about the solution you propose to it…
  
  As to the risk of unequal groups: the groups should be selected at random (there is no obvious reason why people whose names starts from A to L should program better or worse than those with names from M to Z); and standard statistics will tell us what group sizes to use to get a given degree of confidence in the results.
  
  As to the proposed solution, it actually introduces bias: if you develop an application twice, your performance the second time is influenced by your experience of the first time. So if the second experiment uses method B and the first used method A, and students generally do better the second time, you cannot deduce that B is better; the difference might just be due to the experience gained in the first iteration. (Only if the students do *worse* the second time might this be a result worth reporting — although here too one should make sure no other factor is involved, such as student’s boredom with having to do something twice.)
  
  VN:F [1.9.10_1130]
  please wait...
  Rating: 0.0/5 (0 votes cast)
  VN:F [1.9.10_1130]
  Rating: 0 (from 0 votes)
  
  Log in to Reply
  - Craig Stuntz says:
    
    2 June 2011 at 17:14
    
    I agree. It would certainly be wrong to always use the same methodologies in the same order!
    
    VN:F [1.9.10_1130]
    please wait...
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
    
    Log in to Reply
JustMe says:

2 June 2011 at 16:08

My first thought was that the experiment by Professor Smith was lacking an error calculation and that the number of the population, the total of students in his software class, is to low to get a respresentative result. At the second place one has to care in emperical science that in an experiment, what is measured may not be disturbed by the measurement itself. A better test would be that another person than professor Smith should perform the experiment described without telling to the two groups which type of technique could be better.

VN:F [1.9.10_1130]
please wait...
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Log in to Reply
- JustMe says:
  
  2 June 2011 at 16:39
  
  Or even better, it should be tested blindfold additionally, without the two groups knowing about eachother.
  
  VN:F [1.9.10_1130]
  please wait...
  Rating: 0.0/5 (0 votes cast)
  VN:F [1.9.10_1130]
  Rating: 0 (from 0 votes)
  
  Log in to Reply
- Werner Lioen says:
  
  4 June 2011 at 07:45
  
  I have changed my nickname JustMe to my own name Werner Lioen.
  
  VN:F [1.9.10_1130]
  please wait...
  Rating: 0.0/5 (0 votes cast)
  VN:F [1.9.10_1130]
  Rating: 0 (from 0 votes)
  
  Log in to Reply
- Werner Lioen says:
  
  4 June 2011 at 17:17
  
  Maybe the Professor Smith syndrom and the answer to this quiz is that you have to take care that you aren’t measuring what you want to measure. Professor Smith is probably proud on his new invented software methodology or has a strong believe in it and unconsciously will influence his results which will make it more or less a self full filling prophecy. This is of course important when you are measuring on people and not important for measuring on dead matter.
  
  VN:F [1.9.10_1130]
  please wait...
  Rating: 0.0/5 (0 votes cast)
  VN:F [1.9.10_1130]
  Rating: 0 (from 0 votes)
  
  Log in to Reply
Doradus says:

2 June 2011 at 20:33

15% fewer bugs… but did they write systems of comparable sophistication?

VN:F [1.9.10_1130]
please wait...
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Log in to Reply
- Bertrand Meyer says:
  
  3 June 2011 at 05:59
  
  Yes, the assumption is that they did the same programming project, using different techniques.
  
  VN:F [1.9.10_1130]
  please wait...
  Rating: 5.0/5 (1 vote cast)
  VN:F [1.9.10_1130]
  Rating: +1 (from 1 vote)
  
  Log in to Reply
dlebansais says:

2 June 2011 at 22:39

At the very least, someone else should measure the number of bugs. Ideally, students testing the programming technique should come from a different class, and not be acquainted with Prof. Smith whatsoever.

VN:F [1.9.10_1130]
please wait...
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Log in to Reply
arun says:

5 June 2011 at 19:51

I think Prof. Smith should be careful about the sample set of Programming Projects tried with SOP. Are these projects narrow, i.e. could it be possible that the projects are the typical use cases that SOP tries to solve? In that case, the claim on reliability improvements are not completely fair. Moreover if there’s a set of benchmarks (can they be identified?) to assess the quality of a programming paradigm, how does SOP fare with respect to these benchmarks?

Secondly, factors such as Testability, Maintainability, Performance overhead, Programmer’s learning curve and similar quality metrics are often used by engineers to decide on adopting a technique (assuming ease of adoption of this technique is one of the goals :)). Hence in addition to bugs, Prof. Smith should assess the effects of SOP on these aspects.

VN:F [1.9.10_1130]
please wait...
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Log in to Reply
thomas.beale says:

5 June 2011 at 20:51

Would it be going to far to invoke Duhem-Quine’s ‘theory-ladenness of observation’? The problem with the story is that ‘He finds’, not ‘an independent examiner finds’. Of course we know what he will find! But that’s rarely interesting… (reading back above, I think that is what Werner may have been saying as well).

VN:F [1.9.10_1130]
please wait...
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 0 votes)

Log in to Reply
Bertrand Meyer's technology blog » Blog Archive » The Professor Smith syndrome: Part 2 says:

8 June 2011 at 13:00

[…] Earlier columns « The Professor Smith syndrome: Part 1 – a quiz […]

Log in to Reply