A new scientific index

The CF-Index, or Conference Frustration index, is an integer n (n ≥ 1) defined as follows. You are at a conference where your paper submission was rejected, and sitting in the session devoted to that paper’s very topic. You think for yourself  “My paper was at least n times better than the average here”. That n is your CF-index.

It is a law of nature (like speed never exceeding that of light, or temperature never going below absolute zero) that n < 1 is impossible. (The reason is obvious: if you were not the kind to believe your work is at least as good as anyone else’s, you would have gone for another profession, one calling for modesty, realism and timidity — such as, say, politician.)  Values of n = 3 or 4 are normal. Beyond 10 you might consider seeking professional advice. (These observations have nothing to do with my being at ICSE right now.)

Why we love computers and software

A 1949 book I just bought* has the following printer’s label, hand-glued on the title page:

printer_note

Because of a change in pagination, you will need to add the digit 9 [actually, the number, not the digit] to references to page numbers between 1 and 114, and the digit 8 to those between 116 and the end of the volume.

Examples: The reference that says “see page 92” should be read as “see page 101”
The reference that says “ see page 150” should be read as “see page 158”

*Précis d’histoire de la langue et du vocabulaire français (handbook of the history of French language and vocabulary) by Albert Dauzat, Larousse, Paris, 1949.

Things to do to an algorithm

What can you do to or with an algorithm? In other words, what is a good verb to substitute for the hyphen in   “— the algorithm”?

You can learn an algorithm. Discovering classical algorithms is a large part of the Bildungsroman of a computer scientist. Sorting algorithms, graph algorithms, parsing algorithms, numerical algorithms, matrix algorithms, graphical algorithms…

You can teach an algorithm. Whether a professor or just a team leader, you explain to others why the obvious solution is not always the right one. As when  I saw that someone had implemented the traversal part of a garbage collection scheme (the “mark” of mark-and-sweep) using a recursive algorithm. Recursion is a great tool, but not here: it needs a stack of unpredictable size, and garbage collection, which you trigger when you run out of space, is not precisely the moment to start wildly allocating memory. In comes the Deutsch-Schorr-Waite algorithm, which improbably (as if tightrope-walking) subverts the structure itself to find its way forth and back.

To teach it, you can dance an algorithm. Sounds strange, but Informatics Europe gave its 2013 education award to the “AlgoRhythmics” group from at Sapientia University in Romania, which  demonstrates algorithms using central-European dances; see their rendering of Merge Sort:

(Their page has more examples. I see that recently they expanded to other kinds of dance and will let you discover binary search as flamenco and backtracking as classical ballet.) More generally you can simulate or animate an algorithm.

You can admire an algorithm. Many indeed are a source of wonder. The inner beauty of topological sort, Levenshtein or AVL can leave no one indifferent.

You can improve an algorithm. At least you can try.

You can invent an algorithm. Small or large, ambitious or mundane, but not imagined yet by anyone. Devising a new algorithm is a sort of rite of passage in our profession. If it does prove elegant, useful and elegant, you’ll get a real kick (trust me). Then you can publish the algorithm.

You can prove an algorithm, that is to say, mathematically establish its correctness. It is indeed increasingly unreasonable to publish an algorithm without correctness arguments. Maybe I have an excuse here to advertize for an an article that examines important algorithms across a wide variety of fields and showcases their main claim to correctness: their loop invariants.

You can implement an algorithm. That is much of what we do in software engineering, even if as an OO guy I would immediately add “as part of the associated data structure.

Of late, algorithms have come to be associated with yet another verb; one that I would definitely not have envisioned when first learning about algorithms in Knuth (the book) and from Knuth (the man who most certainly does not use foul language).

You can fuck an algorithm.

Thousands of British students marched recently to that slogan:

They were demonstrating against a formula (the Guardian gives the details) that decided on university admissions. The starting point for these events was a ministerial decision to select students not from their grades at exams (“A-level”), which could not take place because of Covid, but instead from their assessed performance in their schools. So far so good but the authorities decided to calibrate these results with parameters deduced from each school’s past performance. Your grade is no longer your grade: if Jill and Joan both got a B, but Jill’s school has been better at getting students into (say) Oxford in the past, then Jill’s B is worth more than Joan’s B.

The outcry was easy to predict, or should have been for a more savvy government. Students want to be judged by their own performance, not by the results of some other students they do not even know. Arguments that the sole concern was a legimitate one (an effort to compensate for possible grade inflation in some schools) ceased to be credible when it came out that on average the algorithm boosted grades from private schools by 4.7. No theoretical justification was going to be of much comfort anyway to the many students who had been admitted to the universities of their dreams on the basis of their raw grades, and after the adjustment found themselves rejected.

In the end, “Fuck the Algorithm!” worked. The government withdrew the whole scheme; it tried to lay the blame for the fiasco on the regulatory authority (Ofqual), fooling no one.

These U.K. events of August 2020 will mark a turning point in the relationship between computer science and society. Not for the revelation that our technical choices have human consequences; that is old news, even if we often pretend to ignore it. Not for the use of Information Technology as an excuse; it is as old (“Sorry, the computer does not allow that!”) as IT itself. What “Fuck the Algorithm!” highlights is the massive danger of the current rush to apply machine learning to everything.

As long as we are talking marketing campaigns (“customers who bought the product you just ordered also bought …”) or image recognition, the admiring mood remains appropriate. But now, ever more often, machine learning (usually presented as “Artificial Intelligence” to sound more impressive) gets applied to decisions affecting human lives. In the US, for example, machine-learning algorithms increasingly help judges make decisions, or make the decisions themselves. Following this slippery path is crazy and unethical. It is also dangerous, as the U.K. students’ reaction indicates.

Machine learning does what the name indicates: it reproduces and generalizes the dominant behaviors of the past. The algorithms have no notion of right and wrong; they just learn. When they affect societal issues, the potential for societal disaster is everywhere.

Amid all the enthusiasm generated by the elegant techniques invented by machine-learning pioneers over the last two decades, one barely encounters any serious reservation. Codes of ethics (from ACM and others) have little to contribute.

We should be careful, though. Either we get our act together and define exacting controls on the use of machine learning for matters affecting people’s fates, or we will see a massive rejection of algorithmic technology, the right parts along with the wrong ones.

The British students of the year 2020’s weird summer will not be the last ones to tell us to fuck the algorithm.

This article was first published in the Communications of the ACM blog.Recycled

A novel concept for success in science

No one seems until now [1] to have identified a key element of any scientific article being submitted or revised for publication. It is guaranteed to increase, if not the quality of your articles, at least their chances of publication, which after all is what counts.

We are told to include a “related work” section, but just as important is the unrelated work section. For example:

Unrelated work

The following publications have no attested  relevance to the topic of this paper but, as pointed out by an anonymous reviewer [2], they are breathtakingly brilliant: [Meyer 1997], [Meyer 2005], [Meyer et al. 2009], [Al et Meyer 2011]. In addition, having taken a look at the composition of the Editorial Board,  we would like to point out the pioneering results introduced by [Meyer 2017] and [Meyer 2019].

This insight is shared with the sole selfless purpose of helping the community, particularly young and aspiring researchers.

Notes

[1] I did find a 2018 Twitter thread started by Arvind Narayanan, with the insightful (if dejected) observation that “`related work’ sections exclusively cite unrelated work”.

[2] Example only, for the sake of an example, since for my part I actually refuse to be an anonymous reviewer; I always sign my reviews, so if I want to tell the authors “I think you should cite my such-and-such paper here” I can do so without any qualms.

New master program at SIT: Webinar tomorrow

The Schaffhausen Institute of Technology (SIT) is holding a Webinar tomorrow with a set of three talks by: Serguei Beloussov, founder of Acronis and president of SIT; Michael Widenius, CTO of MariaDB and creator of MySQL Server; and Mauro Pezzè, my colleague at SIT, who will present the new master program that we have just announced, combining CS/SE topics with management and marketing courses to train future technology leaders.

The talks are in the form of a Webinar, starting at 9 AM this Tuesday (9 June). You can find all the details on the corresponding SIT page at here.

How to protect from the coronavirus

In the current state of the pandemic and for many more months until a vaccine is found, there is exactly one way to fight the coronavirus, protecting yourself and protecting others.

It is not a mask.

It is two masks. You wear a mask, I wear a mask.

Many people still believe that they can only get the virus if an infected person coughs or sneezes on them. This is a tragic myth. Droplets are carried by breath in conversation, by food particles from someone eating near you, or simply by air flowing your way.

Anyone today who goes out without wearing a mask is irresponsible (or suicidal, but that is not an excuse, since he harms others too). Your mask is not enough, though. I must wear one too.

Then we are safe from each other. Remember: we do not have definitive figures, but at least one carrier in five is asymptomatic.

Everything else (and I am not even considering quack solutions and unproven treatments) is pointless. Disinfectant (or better soap) helps, but as a complement. Gloves help medical professionals, who know how to use them properly, but for the general public they can do more harm than good: look at people in shops, once they have gloves they touch everything, moving the virus everywhere. Testing will be critical, of course, but here is another sobering statistics: while there are no false positives (if you test positive, you are infected), around 20% of negative tests are wrong (people have the virus, and it is not detected).

I know: in many places, including some the most technologically advanced nations on earth, there are no masks to be found. This may be the greatest scandal of the modern era. But in the meantime makeshift masks are an acceptable palliative. There are guides all over the web as to how to make them, and if nothing else is available a tightly bound scarf or equivalent, cleaned thoroughly and regularly, will do.

Wear a mask and tell others to do the same.

Time to live up to the boasting

The decent media is not modest these days. “Democracy Dies in Darkness” says the excellent Washington Post, intimating, if I understand it right, that the only way for the US to avoid dictatorship is that I pay subscription fees. Maybe I would if they just stopped devoting every single one of their articles to King Ubu. La Repubblica tells us that it will “always fight for the defense of freeedom of information, for its readers and for all those who have in their hearts the principles of democracy and of civil coexistence.” Beautiful (and behind a paywall).

The epidemic expert Jonathan Quick, interviewed by the Guardian, had this remarkable observation, talking about Covid-19: news tends to be behind paywalls, while fake news is free. The Guardian is in a way the right place to make this comment, since it remains, admirably, free-access with voluntary subscription (and all the same does not seem to be doing too poorly). But everywhere else there has been no change of policy. Whether you are looking at the New York Times, the Washington Post, Le Monde, Le Figaro, Libération, the Neue Zürcher Zeitung, Tages Anzeiger (“Dieser Abo+ Artikel ist exklusiv für Abonnenten”), La Repubblica, La Stampa, the kind of reputable press organs to which we would naturally turn, all have their more in-depth analyses reserved for subscribers. (The Russian Vedomosti seems to be an exception.)

Granted, every company (except maybe the Washington Post, since I have a feeling I am ordering enough from Amazon already) is entitled to earn money. But not all companies claim that their business model is about saving the world. My dear self-praising press, if you are really as generously public-minded as you are, here is a good way to demonstrate it. People around the world are genuinely worried about the Coronavirus epidemic and eager for serious information, if only to counter rumors and conspiracy theories. They eagerly seek credible, validated information that has gone through professional vetting, but many of them cannot afford to subscribe to all the relevant sources.

A few days before and after major elections, outlets such as the NYT and Wapo generally make their political articles free-access. The current health scare is an even more serious occasion.

This is the time for all serious news media around the world to show that their grand declarations of philanthropy are not just words.

We, the readers, should vociferously demand that as a public service these press organs immediately make all Covid-19 news, reports and analyses free-access.

In the scary land of irrational discourse

A chemistry researcher published a paper in Science with two junior collaborators and, a few months later, found flaws and retracted the article.

She commented “I am totally bummed to announce that we have retracted last year’s paper on enzymatic synthesis of beta-lactams” and “it is painful to admit, but important to do so” and “the work has not been reproducible” and I apologize to all” and  “I was a bit busy when this was submitted, and did not do my job well”.

Not very unusual news; this kind of thing happens all the time as part of the normal process of research and publication. (This just in! Scientists are human! They make mistakes once in a while! Full story at 11!)

Perhaps this one is slightly more worthy of notice because the lead author is a Nobel prize winner. Time for some rejoicing (Schadenfreude as it is called in good English)  for anyone who is not a Nobel prize winner: haha, you think you are so smart but you mess up too. It never hurts to have an occasional reminder that we should not deify anyone. But hardly prime-time news.

Well, it is  prime-time news for Fox News, which devotes a whole article to the matter. OK, I know, Fox News. And yes, it does pain me to include a hyperlink to a foxnews.com page in this otherwise perfectly decent, civilized, family-safe blog. But in fact that particular article is not by itself outrageous. Suspicious, yes: why such a sudden focus on a minor scientific episode in a news source not particularly famous (I hope you admire my gift for euphemism) for its extensive coverage of the frontlines of scientific research? But whatever its ultimate agenda the article itself is  factual, not judgmental.

What is striking is the avalanche of reader comments on that page. If you go and take a look at them, be prepared; put on your parka. Reading these comments will be, for many of us, a peek into a completely different world. A world that we vaguely know exists, but do not actually visit.

It is not a nice world to venture into: full of bile, frustration, resentment, jealousy, conspiracy theories, slander, attacks on anyone trying to take a rational approach to issues, with hardly a pleasant or optimistic note ever. It is not a world one wants to visit often, but reading such a page is an eye-opener for anyone who accepts the premises of rational thinking and might believe that they are universally accepted.

“Striking”, I wrote. Scary is a more apposite word. With the kind of nonsense-spouting and science-bashing that appears in countless messages in the comments section of the page, one can fear the worst regarding questions that face our society, for which rational, science-based advice is critical. (Yes, coronavirus, I am looking at you!)

Very few of the comments on the page says the obvious: it is not good to make errors, but errors will occur, and the scientist should be commended for checking further and coming out with the admission that her study had flaws. As far as we know the initiative came from her, spontaneously. It is one of the signs of the healthiness of science that we always question results. We question those of other people (there are plenty of sites, such as pubpeer and forbetterscience, entirely devoted to tracking and debunking flawed research). We also question our own: partly to avoid the humiliation of having someone else report one of our mistakes before we do; but also because of the good scientist’s natural search for intellectual honesty.

Most of the article commenters do not mention this key lesson of the incident; the Nobel prize winner’s integrity. For them, the article retraction demonstrates that… the entire edifice of science is flawed! For example:

She’s a liberal… I thought her being wrong was understood.

Now we need to find an honest Climate Change researcher to admit that their computer models are faulty and much of their “data” is fake.

Integrity! Now if the “scientists” who have fabricated Global Warming/ Climate Change, whatever, “research” would come forward with admissions about their flawed, fallacious “research” we would be golden.

Now if we could get the climate change “scientists” to do the same maybe some credibility could be restored to the field.

and so on ad nauseam. (Not a figure of style — reading these comments is truly nauseating.) In reality the retraction demonstrates, or rather illustrates (one example is not a demonstration), the reverse of these assertions: that the scientific process includes its own correction mechanisms. To use a computer scientist’s terminology, it is not fault-free (no scientist ever claimed anything like that) but fault-tolerant.

Of course the reason the Fox News crowd is suddenly so interested in science is not (one imagines) science per se but the science of climate change. Comment after comment uses the article, as illustrated by the above examples, to dismiss the scientific consensus on the reports of the United Nations’ Intergovernmental Panel on Climate Change. In other words: the retraction of one three-author paper on beta-lactams proves that the the work of hundreds of scientists producing thousands of articles on climatology over several decades is flawed? The logic of such a deduction is… shaky.

The modern world is based, through technology, on science. To post on the Web their absurd rejections of scientifically established facts, the Fox News readers couldn’t do without relying on mobile phones, mobile networks, software systems, computers and other extraordinary achievements of human intelligence, the result of centuries of patient cumulative application of the same scientific principles and techniques that these posts ridicule. They are stuck in a pre-scientific mindset, dominated by the kind of magical thinking that the founders of modern thought had to overcome between the 16th and 18th century, as brilliantly analyzed by Gaston Bachelard’s Formation of the Scientific Mind.

Somehow they skipped what the rest of us learn in grade school (that two plus two equals four, cause precedes effect and so on). They are many, they vote, they  think they are right and the rest of the world is wrong, hold these beliefs very strongly (Dunning-Kruger effect), and put the world at risk.

Call for suggestions: beauty

On April 29 in the early evening at the Schaffhausen Institute of Technology I will give a talk on “The Beauty of Software”, exploring examples of what makes some concepts, algorithms, data structures etc. produce a sense of esthetics. (Full abstract below.) I gave a first version at TOOLS last year but am revising and expanding the talk extensively.

I obviously have my own examples but am interested in more. If you have some that you feel should be considered for inclusion, perhaps because you experienced a “Wow!” effect when you encountered them, please tell me. I am only asking for names or general pointers, not an in-depth analysis (that’s my job). To avoid having my thunder stolen I would prefer that you alert me by email. I will give credit for examples not previously considered.

Thanks!

Abstract of the talk as published:

Scientists often cite the search for beauty as one of their primary guiding forces. Programming and software engineering offer an inexhaustible source of astoundingly beautiful ideas, from strikingly elegant algorithms and data structures to powerful principles of methodology and language design.

Defining beauty is elusive, but true beauty imposes itself in such a way as to remove any doubt. Drawing comparisons from art, literature and other endeavours. He will show a sample of ideas from all walks of software, directly understandable to a wide audience of non-software-experts, offering practical applications in technology that we use daily, and awe-inspiring in their simplicity and elegance.

An annoying practice from another age

When you want to contact academic researchers, particularly computer scientists, you often find their email addresses on their Web pages in a mildly obfuscated form such as “albert dot einstein at princeton dot edu”.

If you try to copy-paste such a pseudo-address into an email client so as to fix it there, you often have to spend some time fighting the email client’s knowledge of what an email address looks like. It can result in errors and bounced mail. Not the world’s worst scandal but an annoying waste of time.

An address written out in that form is a way for the page owner to announce to the cognoscenti: “I am a computer scientist and hence very knowledgeable about the ways of the Internet; I know that spammers run bots to harvest addresses. See how I defeat them.

So 1995!

Both spam and defenses against it are no longer what they were back then. Anyone who manages to use email effectively is protected, even without knowing it, by spam blockers, which have become quite good. (According to a specialized site, 14.5 billion spam emails are sent every day, so without these protections we would all be be drowning in spam, which for most people is not the case.)

As to any spam harvesters who are really interested in computer science researchers, they are most likely able anyway to write a little regular expression analyzer that captures the vast majority of the supposedly obfuscated addresses.

If you really want strangers to be able to email you without making your address visible to the spammers, and are a CS person, just include in your Web page a few lines of Javascript that, without revealing the email address in the HTML code, will display something like “Here is my email address”, in such a way that a visitor who clicks on Here gets a new-email window with your email address pre-filled. Not very hard — I use this trick on my own home page and am certainly not a Javascript expert.

But I suspect that  as long as you are prepared to let people email you, even just letting your email address appear in clear is not going to result in catastrophe. Your organization’s or ISP’s spam filter is protecting you.

Come on. This is 2020. Windows 95 and the OJ Simpson trial are no longer the news of the day.  Time to stop worrying about what no longer matters, and stop bothering people who are just trying to reach you.

Down with corny address obfuscation!