The Professor Smith syndrome: Part 1 – a quiz

[As a reminder, this blog is now on a regular schedule, appearing every Monday. Sometimes in mid-week there will be a lighter piece or, as here, a preparation for the following Monday’s entry.]

Consider the following hypothetical report in experimental software engineering (see earlier posts: [1], [2]):

Professor Smith has developed a new programming technique, “Suspect-Oriented Programming” (SOP). To evaluate SOP, he directs half of the students in his “Software Methodology” class to do the project using traditional techniques, and the others to use SOP.

He finds that projects by the students using SOP have, on the average, 15% fewer bugs than the others, and reports that SOP increases software reliability.

Quiz, in advance of next Monday’s post: what’s wrong with this story?

References

[1] Bertrand Meyer: The rise of empirical software engineering (I): the good news, this blog, 30 July 2010, available here.
[2] Bertrand Meyer: The rise of empirical software engineering (II): what we are still missing, this blog, 31 July 2010, available here.

VN:F [1.9.10_1130]
Rating: 10.0/10 (4 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 2 votes)
The Professor Smith syndrome: Part 1 - a quiz, 10.0 out of 10 based on 4 ratings
Be Sociable, Share!

13 Comments

  1. Craig Stuntz says:

    Any kind of control experiment, for starters. The students using SOP might just happen to be more careful programmers. It would make more sense to have the same groups of students to create applications both with and without SOP and compare those results.

    VN:F [1.9.10_1130]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
    • Actually, I don’t think there is a particular problem, and I am not sure about the solution you propose to it…

      As to the risk of unequal groups: the groups should be selected at random (there is no obvious reason why people whose names starts from A to L should program better or worse than those with names from M to Z); and standard statistics will tell us what group sizes to use to get a given degree of confidence in the results.

      As to the proposed solution, it actually introduces bias: if you develop an application twice, your performance the second time is influenced by your experience of the first time. So if the second experiment uses method B and the first used method A, and students generally do better the second time, you cannot deduce that B is better; the difference might just be due to the experience gained in the first iteration. (Only if the students do *worse* the second time might this be a result worth reporting — although here too one should make sure no other factor is involved, such as student’s boredom with having to do something twice.)

      VN:F [1.9.10_1130]
      Rating: 0.0/5 (0 votes cast)
      VN:F [1.9.10_1130]
      Rating: 0 (from 0 votes)
      • Craig Stuntz says:

        I agree. It would certainly be wrong to always use the same methodologies in the same order!

        VN:F [1.9.10_1130]
        Rating: 0.0/5 (0 votes cast)
        VN:F [1.9.10_1130]
        Rating: 0 (from 0 votes)
  2. JustMe says:

    My first thought was that the experiment by Professor Smith was lacking an error calculation and that the number of the population, the total of students in his software class, is to low to get a respresentative result. At the second place one has to care in emperical science that in an experiment, what is measured may not be disturbed by the measurement itself. A better test would be that another person than professor Smith should perform the experiment described without telling to the two groups which type of technique could be better.

    VN:F [1.9.10_1130]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
    • JustMe says:

      Or even better, it should be tested blindfold additionally, without the two groups knowing about eachother.

      VN:F [1.9.10_1130]
      Rating: 0.0/5 (0 votes cast)
      VN:F [1.9.10_1130]
      Rating: 0 (from 0 votes)
    • Werner Lioen says:

      I have changed my nickname JustMe to my own name Werner Lioen.

      VN:F [1.9.10_1130]
      Rating: 0.0/5 (0 votes cast)
      VN:F [1.9.10_1130]
      Rating: 0 (from 0 votes)
    • Werner Lioen says:

      Maybe the Professor Smith syndrom and the answer to this quiz is that you have to take care that you aren’t measuring what you want to measure. Professor Smith is probably proud on his new invented software methodology or has a strong believe in it and unconsciously will influence his results which will make it more or less a self full filling prophecy. This is of course important when you are measuring on people and not important for measuring on dead matter.

      VN:F [1.9.10_1130]
      Rating: 0.0/5 (0 votes cast)
      VN:F [1.9.10_1130]
      Rating: 0 (from 0 votes)
  3. Doradus says:

    15% fewer bugs… but did they write systems of comparable sophistication?

    VN:F [1.9.10_1130]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
    • Yes, the assumption is that they did the same programming project, using different techniques.

      VN:F [1.9.10_1130]
      Rating: 5.0/5 (1 vote cast)
      VN:F [1.9.10_1130]
      Rating: +1 (from 1 vote)
  4. dlebansais says:

    At the very least, someone else should measure the number of bugs. Ideally, students testing the programming technique should come from a different class, and not be acquainted with Prof. Smith whatsoever.

    VN:F [1.9.10_1130]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
  5. arun says:

    I think Prof. Smith should be careful about the sample set of Programming Projects tried with SOP. Are these projects narrow, i.e. could it be possible that the projects are the typical use cases that SOP tries to solve? In that case, the claim on reliability improvements are not completely fair. Moreover if there’s a set of benchmarks (can they be identified?) to assess the quality of a programming paradigm, how does SOP fare with respect to these benchmarks?

    Secondly, factors such as Testability, Maintainability, Performance overhead, Programmer’s learning curve and similar quality metrics are often used by engineers to decide on adopting a technique (assuming ease of adoption of this technique is one of the goals :)). Hence in addition to bugs, Prof. Smith should assess the effects of SOP on these aspects.

    VN:F [1.9.10_1130]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
  6. thomas.beale says:

    Would it be going to far to invoke Duhem-Quine’s ‘theory-ladenness of observation’? The problem with the story is that ‘He finds’, not ‘an independent examiner finds’. Of course we know what he will find! But that’s rarely interesting… (reading back above, I think that is what Werner may have been saying as well).

    VN:F [1.9.10_1130]
    Rating: 0.0/5 (0 votes cast)
    VN:F [1.9.10_1130]
    Rating: 0 (from 0 votes)
  7. […] Earlier columns « The Professor Smith syndrome: Part 1 – a quiz […]

Leave a Reply

You must be logged in to post a comment.