Bertrand Meyer's technology+ blog

The cloud and its risks

24 October 2010, 18:08

There is so much fervor around cloud computing that no one seems to note the downsides. Here is a little cautionary story.

Like many people, I have come to rely increasingly, for collaborative work, on Google Docs. As a text editor it is a kind of primitive Word. But it is web-based, so that several people can share a document. They will not just have read access, but can all write into the document, even at the same time. The system is indeed pretty good at incorporating changes by different users, as long as they do not affect the exact same place in the document. It also uses a modern approach to version management, originally popularized by wikis: a built-in history mechanism saves your successive revisions, enabling you to go back to any previous version. That mechanism is not ideal (the level of granularity is too small, so that it takes a long time to locate an earlier version), but it does provide safety.

In principle, indeed, safety is one of the great advantages of a cloud-based approach: you don’t have — so the mantra goes — to worry about backups, or even about saving your document; you don’t even need any local storage; everything is recorded on the server. If you make a mistake, or simply want to recover some piece of your text that you had discarded, just go back far enough in the history. “Control-S” (save the document) still exists, presumably for the sake of users born in the twentieth century, but it is just redundant. With the cloud, we are told, you no longer need to save. Just write your stuff, the theory goes: the infrastructure is taking care of remembering your steps.

Thus indeed does the theory go. Reality does not always follow theory. For example, the reality will not follow the theory if the implementation has bugs.

One beautiful evening some weeks ago I was busy polishing a technical note, and enjoying the Google Docs diagramming facilities that I had just discovered — basic, but good enough to prepare figures to illustrate a technical document. In fact the core of my document was a complex technical diagram, which I had spent several hours to develop. Then I prepared another, much simpler diagram, basically a couple of rectangles and an arrow between them. At that point I wanted to make sure my valuable efforts were not lost and, somewhat instinctively (I was born in the 20th century) I saved; if I had not, the system would have done it for me anyway. Under my very eyes, the page redisplayed, with the figures — there were 5 or 6 of them altogether — all turned into an identical one, the trivial little diagram I had entered last. After a moment of panic I realized that the history would be there, so I could at least go back to a recent version with the appropriate figures; relax, this is the cloud, the server is keeping my history for me! No such luck, though: all the figures in all the earlier versions had been overwritten in the same way. Gone forever.

At that point I realized I must have hit a major bug. Of course I am a programmer and could more or less guess what kind of bug this could be: a reference assignment instead of a clone, or maybe, in Eiffel terms, a clone rather than a deep_clone. (As far as I know Google is not using Eiffel; I’ll let the reader decide whether to jump from correlation to causation.) As to the history, any decent implementation stores “diffs” rather than full copies, so if an object has changed but it is still at the same place the reference is the same: there is no “diffs” to store.

Guessing the programming error provided little consolation: my brilliant diagram was lost for humankind. Just to check that the bug was real I tried a couple of times to modify one of the figures again, in a small way; sure enough, all figures in the document immediately redisplayed to an identical version, the one I had just produced. The software was obviously broken.

I decided not to take any more risks and recreated the figures in a Word document, then prepared screen shots and included them in the Google Doc. It was rather painful to redo everything but at least I knew that I would have to redo it just once. In the process I did not forget to type Control-S every once in a while, with the same feeling of warmth and safety as if revisiting a treasured childhood home, carefully maintained by the grandparents away from the fads and bustle of the big city.

I do not know if there is customer support for Google Docs; if there is, it is not obvious. In any case it is a free service, so one has little ground to complain. I happen, however, to have friends in high places, and through them was able in just a few hours to reach the Google engineers in charge. They reacted very promptly, confirmed it was a bug, and corrected it. They were very kind and valiantly tried to recover my figures; but at least they had corrected the problem, sparing many other users from an experience that, indeed, I do not recommend.

While preparing this note I had some further contacts with Google engineers, who commented:

“At Google we use logging, on disk snapshots, replicated storage, tape backups, and other systems to deliver massively redundant data protection. In the rare cases of bugs like this one, where user data is impacted, we continue to have a consistent and impressive track-record in successfully recovering that data.”

There is an uplifting moral to the story: the bug was fixed in a matter of hours. Most Google Docs users probably did not notice it. A bug of similar severity, in a traditional product that gets released and sent out to users, would have required a new release and a new download. On the cloud it is enough to fix the server copy.

Other lessons, however, are less encouraging.

First, one may surmise that the very process of an official release in a traditional product mode, and the resulting heightened impact of bugs, lead to a more careful Quality Assurance process than the “forever beta” culture of cloud-based deployment. I have no evidence that my bug story was due to an unsatisfactory QA process. It may just be a one-time blip. But the temptation definitely exists in a cloud-based project to lower one’s guard because of the expectation, conscious or not, that any error will be found by some user and promptly corrected for all users.

An even more scary observation is that on the cloud you are trusting everything to a provider and its software, correct or not, robust or not, secure or not. The recurring leaks of customer information from Web sites are a constant reminder of that risk. In the text processing example, any user of a text editor or document processing knows that, if he saves his work once in a while, possible damage if the tool crashes or misbehaves is circumscribed: at worst, you will lose the changes since the last save. When working on the cloud, you typically do not make local copies: if a tool messes up and loses the record, you have lost everything. It is gone for good.

In technology we have hype, buzzwords, fads, and successful advances. These are not necessarily disjoint categories, but often successive steps in the life of a new ideas. What characterizes the transition from a fad to a successful technology is that in the latter case one knows clearly both the advantages and the drawbacks. Cloud computing is advanced enough to reach that stage.

The cloud and its risks

Leave a Reply

Sites

Categories

Recent Comments

Archives

Get RSS feed

Meta

Bertrand Meyer's technology+ blog

The cloud and its risks

Leave a Reply

Sites

Tags

Categories

Recent Comments

Archives

Get RSS feed

Meta