Archive for the ‘Conference’ Category.

About Watts Humphrey

Watts Humphrey, 2007

At FOSE (see previous post [1]) we will honor the memory of Watts Humphrey, the pioneer of disciplined software engineering, who left us in October. A blog entry on my Communications of the ACM blog [2] briefly recalls some of Humphrey’s main contributions.

References

[1] The Future Of Software Engineering: previous entry of this blog.
[2] Watts Humphrey: In Honor of a Pioneer, in CACM blog.

VN:F [1.9.10_1130]
Rating: 8.5/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

The rise of empirical software engineering (II): what we are still missing

p> 

Recycled(This article was initially published in the CACM blog.)

The previous post under  the heading of empirical software engineering hailed the remarkable recent progress of this field, made possible in particular by the availability of large-scale open-source repositories and by the opening up of some commercial code bases.

Has the empirical side of software engineering become a full member of empirical sciences? One component of the experimental method is still not quite there: reproducibility. It is essential to the soundness of natural sciences; when you publish a result there, the expectation is that others will be able to replicate it. Perhaps such duplication does not happen as often and physicists and biologists would have us believe, but it does happen, and the mere possibility that someone could check your results (and make a name for himself, especially if you are famous, by disproving them) keeps experimenters on their toes. 

If we had the same norms in empirical software engineering, empirical papers would all contain a clause such as

Hampi’s source code and documentation, experimental data, and additional results are available at http://people.csail.mit.edu/akiezun/hampi

This example is, in fact, a real quote, from a paper [1] at the 2009 ISSTA conference. It shows exactly what we expect for an experimental software engineering publication: below are my results, if you want to rerun the experiments here is the URL where you will find the code (source and binary) and the data.

Unfortunately, such professionalism is the exception rather than the rule. I performed a quick check — entirely informal, as this is a blog post, not an empirical research paper! — in the ISSTA ’09 proceedings. ISSTA, an ACM conference is a good sample point, since it covers testing (plus other approaches to program analysis) and almost every paper has an  “experiment” section. I found only a very small number that, like the one cited above, give explicit reproducibility information. (Disclosure: one of those papers is ours [2].)

I believe that the situation will change dramatically and that in a few years it will be impossible to submit an empirical paper without including such information. Computer science, or at least some areas of software engineering, should actually consider themselves privileged when it comes to allowing reproducibility: all that we have to do to reproduce a result, in testing for example, is to run a program. That is easier than for a zoologist — wishing to reproduce a colleague’s experiment precisely — to gather in his lab the appropriate number of flies, chimpanzees or killer whales.

In some types of empirical software research, such as the assessment of process models or design techniques, reproducing an experiment’s setup is harder than when all you have to do is to rerun a program. But regardless of the area we must develop a true  culture of reproducibility. It is not yet there. I have personally come to take experimental results with a grain of salt; not that I particulary suspect foul play, but I simply know how easy it is, in the absence of external validation, to make a mistake in the experiments and, unwittingly, publish a paper with wrong results.

Developing a culture of reproducibility also has an effect on the refereeing process. In submitting papers with precise instructions to reproduce our results, we have sometimes remarked that referees never contact us. I hope this means they always succeed; I suspect, however, that in many cases they just do not try. If you think further about the implications, providing reproducibility instructions for a submitted paper is scary: after all a software run may fail to run for marginal reasons, such as the wrong hardware configuration or a misunderstanding of the instructions. You do not want to perform all the extra work (of making your results reproducible) just to have the paper summarily rejected because the referee is running Windows 95. Ideally, then, referees should have the possibility to ask technical questions — but anonymously, since this is the way most refereeing works. Conferences and journals generally do not support such a process.

These obstacles are implementation issues, however, and will go away. What matters for the growth of the discipline is that it needs, like experimental sciences before it, to embrace a true culture of reproducibility.

References

[1] Adam Kieun, Vijay Ganesh, Philip J. Guo, Pieter Hooimeijer, Michael D. Ernst: HAMPI: A Solver for String Constraints, Proceedings of the 2009 ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’09), July 19-23, 2009, Chicago.

[2] Nadia Polikarpova, Ilinca Ciupa  and Bertrand Meyer: A Comparative Study of Programmer-Written and Automatically Inferred Contracts, Proceedings of the 2009 ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’09), July 19-23, 2009, Chicago.

VN:F [1.9.10_1130]
Rating: 6.0/10 (3 votes cast)
VN:F [1.9.10_1130]
Rating: +2 (from 4 votes)

The rise of empirical software engineering (I): the good news

 

RecycledIn the next few days I will post a few comments about a topic of particular relevance to the future of our field: empirical software engineering. I am starting by reposting two entries originally posted in the CACM blog. Here is the first. Let me use this opportunity to mention the LASER summer school [1] on this very topic — it is still possible to register.

Empirical software engineering papers, at places like ICSE (the International Conference on Software Engineering), used to be terrible.

There were exceptions, of course, most famously papers by Basili, Zelkowitz, Rombach, Tichy, Berry, Humphrey, Gilb, Boehm, Lehmann, Belady and a few others, who kept hectoring the community about the need to base our opinions and practices on evidence rather than belief. But outside of these cases the typical ICSE empirical paper — I sat through a number of them — was depressing: we made these measurements in our company, found these results, just believe us. A question here in the back? Can you reproduce our results? Access our code? We’d love you to, but unfortunately we work for a company — the Call for Papers said industry contributions were welcome, didn’t it? — and we can’t give you the details. So sorry. But trust us, we checked our results.

Actually, there was another kind of empirical paper, which did not suffer from such secrecy: the university study. Hi, I am professor Bright, the well-known author of the Bright method of software development. Everyone knows it’s the best, but we wanted to assess it scientifically through a rigorous empirical study. I gave the same programming problem to two groups of third-year undergraduates; one group was told to use the Bright method, the other not. Guess what? The Bright group performed 67.94% better! I see the session chair wanting to move to the next speaker; see the details in the paper.

For years, this was most of what we had: unverifiable industry reports and unconvincing student experiments.

And suddenly the scene has changed. Empirical software engineering studies are in full bloom; the papers are flowing, and many are good!

What triggered this radical change is the availability of open-source repositories. Projects such as Linux, Eclipse, Apache, EiffelStudio and many others have records going back 10, 15, sometimes 20 years. These records contain the true history of the project: commits (into the configuration management system), bug reports, bug fixes, test runs and their results, developers involved, and many more elements of project data. All of a sudden empirical research has what any empirical science needs: a large corpus of objects to analyze.

Open-source projects have given the decisive jolt, but now we can rely on industrial data as well: Microsoft and other companies have started making their own records selectively available to researchers. In the work of authors such as Zeller from Sarrebruck, Gall from Uni. Zurich or Nagappan from Microsoft, systematic statistical techniques yield answers, sometimes surprising, to questions on which we could only speculate. Do novices or experts cause more bugs? Does test coverage correlate with software quality, and if so, positively or negatively? Little by little, we are learning about the true properties of software products and processes, based not on fantasies but on quantitative analysis of meaningful samples.

The trend is unmistakable, and irreversible.

Not all is right yet; in the second installment of this post I will describe some of what still needs to be improved for empirical software engineering to achieve full scientific rigor.

Reference

[1] LASER summer school 2010, at http://se.ethz.ch/laser.

VN:F [1.9.10_1130]
Rating: 4.5/10 (2 votes cast)
VN:F [1.9.10_1130]
Rating: 0 (from 4 votes)

Programming on the cloud?

I am blogging live from the “Cloud Futures” conference organized by Microsoft in Redmond [1]. We had two excellent keynotes today, by Ed Lazowska [1] and David Patterson.

Lazowska emphasized the emergence of a new kind of science — eScience — based on analysis of enormous amounts of data. His key point was that this approach is a radical departure from “computational science” as we know it, based mostly on large simulations. With the eScience paradigm, the challenge is to handle the zillions of bytes of data that are available, often through continuous streams, in such fields as astronomy, oceanography or biology. It is unthinkable in his view to process such data through super-computing architectures specific to an institution; the Cloud is the only solution. One of the reasons (developed more explicitly in Patterson’s talk) is that cloud computing supports scaling down as well as scaling up. If your site experiences sudden bursts of popularity — say you get slashdotted — followed by downturns, you just cannot size the hardware right.

Lazowska also noted that it is impossible to convince your average  university president that Cloud is the way to go, as he will get his advice from the science-by-simulation  types. I don’t know who the president is at U. of Washington, but I wonder if the comment would apply to Stanford?

The overall argument for cloud computing is compelling. Of course the history of IT is a succession of swings of the pendulum between centralization and delocalization: mainframes, minis, PCs, client-server, “thin clients”, “The Network Is The Computer” (Sun’s slogan in the late eighties), smart clients, Web services and so on. But this latest swing seems destined to define much of the direction of computing for a while.

Interestingly, no speaker so far has addressed issues of how to program reliably for the cloud, even though cloud computing seems only to add orders of magnitude to the classical opportunities for messing up. Eiffel and contracts have a major role to play here.

More generally the opportunity to improve quality should not be lost. There is a widespread feeling (I don’t know of any systematic studies) that a non-negligible share of results generated by computational science are just bogus, the product of old Fortran programs built by generations of graduate students with little understanding of software principles. At the very least, moving to cloud computing should encourage the use of 21-th century tools, languages and methods. Availability on the cloud should also enhance a critical property of good scientific research: reproducibility.

Software engineering is remarkably absent from the list of scientific application areas that speaker after speaker listed for cloud computing. Maybe software engineering researchers are timid, and do not think of themselves as deserving large computing resources; consider, however, all the potential applications, for example in program verification and empirical software engineering. The cloud is a big part of our own research in verification; in particular the automated testing paradigm pioneered by AutoTest [3] fits ideally with the cloud and we are actively working in this direction.

Lazowska mentioned that development environments are the ultimate application of cloud computing. Martin Nordio at ETH has developed, with the help of Le Minh Duc, a Master’s student at Hanoi University of Technology, a cloud-based version of EiffelStudio: CloudStudio, which I will present in my talk at the conference tomorrow. I’ll write more about it in later posts; just one note for the moment: no one should ever be forced again to update or commit.

References

[1] Program of the Cloud Futures conference.

[2] Keynote by Ed Lazowska. You can see his slides here.

[3] Bertrand Meyer, Arno Fiva, Ilinca Ciupa, Andreas Leitner, Yi Wei, Emmanuel Stapf: Programs That Test Themselves. IEEE Computer, vol. 42, no. 9, pages 46-55, September 2009; online version here.

VN:F [1.9.10_1130]
Rating: 7.7/10 (6 votes cast)
VN:F [1.9.10_1130]
Rating: +3 (from 3 votes)

SEAFOOD 2010

The next SEAFOOD (Software Engineering Advances For Offshore and Outsourced Development) conference will take place in Saint Petersburg, Russia, on 17 and 18 June 2010. The conference co-chairs are Andrey Terekhov from Saint Petersburg State University and Lanit-Tercom, and Martin Nordio from ETH are conference co-chairs. Mathai Joseph from Tata Consulting Services and I will be co-chairing the PC. The Call for Papers will be issued soon; information about this year’s conference at seafood.ethz.ch.

VN:F [1.9.10_1130]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.10_1130]
Rating: +1 (from 1 vote)