This is a slightly reworked version of an article in the CACM blog, which also served as the introduction to a panel which I moderated at ESEC/FSE 2013 last week; the panelists were Harald Gall, Mark Harman, Giancarlo Succi (position paper only) and Tony Wasserman.
For all the books on software engineering, and the articles, and the conferences, a remarkable number of fundamental questions, so fundamental indeed that just about every software project runs into them, remain open. At best we have folksy rules, some possibly true, others doubtful, and others — such as “adding people to a late software project delays it further”  — wrong to the point of absurdity. Researchers in software engineering should, as their duty to the community of practicing software practitioners, try to help provide credible answers to such essential everyday questions.
The purpose of this panel discussion is to assess what answers are already known through empirical software engineering, and to define what should be done to get more.
“Empirical software engineering” applies the quantitative methods of the natural sciences to the study of software phenomena. One of its tasks is to subject new methods — whose authors sometimes make extravagant and unsupported claims — to objective scrutiny. But the benefits are more general: empirical software engineering helps us understand software construction better.
There are two kinds of target for empirical software studies: products and processes. Product studies assess actual software artifacts, as found in code repositories, bug databases and documentation, to infer general insights. Project studies assess how software projects proceed and how their participants work; as a consequence, they can share some properties with studies in other fields that involve human behavior, such as sociology and psychology. (It is a common attitude among computer scientists to express doubts: “Do you really want to bring us down to the standards of psychology and sociology?” Such arrogance is not justified. These sciences have obtained many results that are both useful and sound.)
Empirical software engineering has been on a roll for the past decade, thanks to the availability of large repositories, mostly from open-source projects, which hold information about long-running software projects and can be subjected to data mining techniques to identify important properties and trends. Such studies have already yielded considerable and often surprising insights about such fundamental matters as the typology of program faults (bugs), the effectiveness of tests and the value of certain programming language features.
Most of the uncontested successes, however, have been from the product variant of empirical software engineering. This situation is understandable: when analyzing a software repository, an empirical study is dealing with a tangible and well-defined artifact; if any of the results seems doubtful, it is possible and sometimes even easy for others to reproduce the study, a key condition of empirical science. With processes, the object of study is more elusive. If I follow a software project working with Scrum and another using a more traditional lifecycle, and find that one does better than the other, how do I know what other factors may have influenced the outcome? And even if I bring external factors under control how do I compare my results with those of another researcher following other teams in other companies? Worse, in a more realistic scenario I do not always have the luxury of tracking actual industry projects since few companies are enlightened enough to let researchers into their developments; how do I know that I can generalize to industry the conclusions of experiments made with student groups?
Such obstacles do not imply that sound results are impossible; studies involving human behavior in psychology and sociology face many of the same difficulties and yet do occasionally yield insights. But these obstacles explain why there are still few incontrovertible results on process aspects of software engineering. This situation is regrettable since it means that projects large and small embark on specific methods, tools and languages on the basis of hearsay, opinions and sometimes hype rather than solid knowledge.
No empirical study is going to give us all-encompassing results of the form “Agile methods yield better products” or “Object-oriented programming is better than functional programming”. We are entitled to expect, however, that they help practitioners assess some of the issues that await every project. They should also provide a perspective on the conventional wisdom, justified or not, that pervades the culture of software engineering. Here are some examples of general statements and questions on which many people in the field have opinions, often reinforced by the literature, but crying for empirical backing:
- The effect of requirements faults: the famous curve by Boehm is buttressed by old studies on special kinds of software (large mission-critical defense projects). What do we really lose by not finding an error early enough?
- The cone of uncertainty: is that idea just folklore?
- What are the successful techniques for shortening delivery time by adding manpower?
- The maximum compressibility factor: is there a nominal project delivery time, and how much can a project decrease it by throwing in money and people?
- Pair programming: when does it help, when does it hurt? If it has any benefits, are there in quality or in productivity (delivery time)?
- In iterative approaches, what is the ideal time for a sprint under various circumstances?
- How much requirements analysis should be done at the beginning of a project, and how much deferred to the rest of the cycle?
- What predictors of size correlate best with observed development effort?
- What predictors of quality correlate best with observed quality?
- What is the maximum team size, if any, beyond which a team should be split?
- Is it better to use built-in contracts or just to code assertions in tests?
When asking these and other similar questions relating to core aspects of practical software development, I sometimes hear “Oh, but we know the answer conclusively, thanks to so-and-so’s study“. This may be true in some cases, but in many others one finds, in looking closer, that the study is just one particular experiment, fraught with the same limitations as any other.
The principal aim of the present panel is to find out, through the contributions of the panelists which questions have useful and credible empirical answers already available, whether or not widely known. The answers must indeed be:
- Empirical: obtained through objective quantitative studies of projects.
- Useful: providing answers to questions of interest to practitioners.
- Credible: while not necessarily absolute (a goal difficult to reach in any matter involving human behavior), they must be backed by enough solid evidence and confirmation to be taken as a serious input to software project decisions.
An auxiliary outcome of the panel should be to identify fundamental questions on which credible, useful empirical answers do not exist but seem possible, providing fuel for researchers in the field.
To mature, software engineering must shed the folkloric advice and anecdotal evidence that still pervade the field and replace them with convincing results, established with all the limitations but also the benefits of quantitative, scientific empirical methods.
 From Brooks’s Mythical Man-Month.