on July 8, 2010
I want to be fair here. I bought this book not to read hype on what looks like an emerging technology, albeit massively overhyped but, rather, to read about legal and business issues that might moderate its acceptance. To be fair, I will return to give my appraisal after I have finished but I was forced to share this so as to, perhaps, give pause to others interested in buying this book. I've seen webinars that refer to cloud computing as 2-10 technology, massively hyped for 2 years and will take the next 10 for the industry to sort out where it fits (and maybe more importantly where it does not.
The first two glaring take-aways I've seen in this book is 1) the mashing of social web to cloud computing, vis-a-vis considering MySpace, FaceBook, and other social web sites as examples of cloud computing, they are not; 2) the notion that end users will be writing their own programs in the clouds vs. the, since the dawn of software development, programmer (or more recently developers) writing the programs, tech writers writing the documentation, marketeers hyping the program and end users buying or using, with embedded ads, the software. Both of these are orthogonal to 'cloud computing'. While it may be someday, in a "Battlestar Gallactica" age end users may speak to their computer in whatever language they speak and tell it what they'd like it to do. For now it takes specialized training and while the computer languages used are different syntactically from those used in the '60s and '70s, fundamentally they are not different at all. Of course someday maybe everyone will be flying their cars to work and to play. On your next flight anywhere, tap the pilot and ask him how much specialized training he's had in order to taxi a plane, much less leave the ground and return it in one piece to where ever they said they would land it.
The authors talk about computing being a utility as electricity providers (or cable providers) yet they also talk about global compute clouds. Are there global utility companies? They talk about replacing NetBeans, Eclipse, Microsoft Visual Studio (IDEs) with some Utopian ephemeral global software development environment where the tools and end products exist virtually in some ether. None of that has to do with IT Governance and Security much less Amazon, Terramark, Eucalyptus, RightScale, or CloudSwitch. Where they have another 10-11 chapters I withhold final judgment but I felt I owed it to others innocently looking for a good source of information, not hand-waving on this subject. Just as with any emerging technology or software development language there are plenty of people that emerge from the woodwork to write a book on it, totally independent of their experience with it. Confusing Cloud Computing and Web 2.0 is not going to garner confidence. If unwary readers do not discover this until after they have purchased the book, it will not make any difference.
As a professional software developer I can tell you provisioning an image for execution in the cloud is more intensive than provisioning a bare metal server. End users are not going to be doing anything more than issuing a run command on a pre-existing image.
Here is my take: Running your business at an undisclosed facility managed by Amazon (or others) is no more cost effective than running your business out of a service center was in the 70's or 80's. If you don't physically control the data, you don't physically control access to it either. Nowadays you are under legal obligation to do so. I spent the money on this book hoping there was more substance to the security, privacy, and governance aspects of cloud computing than I just summarized.
Since one of the authors has decided to launch personal attacks on me, I will continue with my review with Chapter 3. I didn't really pick up on this in chapters 1 and 2 but I am now concerned about who edited this book. Even at the high school level children are taught to never ever cite Wikipedia for their references. I noticed the bulk of the footnotes cited are wikipedia. Since the source of information found on Wikipedia is unknown, its validity is also unknown. The professional standard for citations are peer reviewed sources. By using these there is a level of confidence a claim made, by virtue of it's citation is likely of high quality.
An assertion, I believe, made several times, and characterized on pg 52, "The new mantra of 'the browser is your operating system...browsers have become the ubiquitous operating systems for consuming cloud services". I would call to the reader's attention in any legitimate Computer Science source the definition of an operating system. Internet Explorer is not an example of an operating system. Furthermore, services, clouded or not, where the Internet browser is the user interface (UI or GUI in this case), are but one type of solution space, often characterized as LAMP or Linux, Apache, MySQL, and PHP. This is totally independent of cloud anything. I contend whenever one writes a book (or publishes one) there are two axises of importance, the first being is the material relevant to the topic and is the material factually accurate. While one might chose to host multiple web containers in the 'cloud' to take advantage of the elasticity of the cloud for scaling up and down with volume, another pervasive class of problem that takes place in a cloud-like environment is compute scaling, such as can be seen in grid computing. In this space a problem may arise where 100 or 1000 processors are required to solve a compute intensive problem but only for a few hours. This, as opposed to 24x7x365, is an excellent usage of public cloud (burst mode). To the extent the author is, thus far, focusing on web based interaction with the cloud he calls out but never elaborates on why there is any more vulnerability for a web container hosted at an Amazon secure facility, for instance, than there is within one's own perimeter. The threat vector is port 80 or port 8080. Of course, if there really is one, the obvious solution is to use off port, two phase SSL, where both the client side and server side are digitally authenticated and encrypted and host the open (proxy) website(s) within your perimeter. In either case the DoS attack on port 80 or 8080 is independent of the location of the web container. Isn't that correct Tim?
In chapter 3, pg 52, "Using hijacked or exploited cloud accounts, hackers will be able to link together computing resources to achieve massive amounts of computing without any of the capital infrastructure costs". Really? what about the account owner seeing running instances on their accounts they aren't using? How long does it take for a credit card owner or provider to realize an account is being misused? There is an easier vector for this, they are called bots and have been around for years. One need but Google the program Asphyxia. If you, for any decision, had a choice of hard vs. easy...which do you think a hacker would take?
In chapter 3, the author discusses type 1 and type 2 hypervisors. This is something of an arcane distinction but he refers to Xen as type 1, bare metal. This actually is incorrect as Xen is hosted by an operating system meaning it is not bare metal [...]. The authors spend much time on Xen, which is relevant from the perspective of security attacks against it but in that vein not a single mentioned, that I have found, is made of KVM which is part and parcel of all remotely recent versions of Linux from, I believe 2.6.20 and up. Ubuntu Enterprise Cloud is based on KVM, as is RedHat's virtualization and cloud family. But, this is why they make second editions.
Another assertion the authors make in chapter 3 (pg 59), "Security requirements such as an application firewall, SSL accelerator, cryptography, or rights management... are not supported in a public SaaS, PaaS, or IaaS cloud". Huh???? I refer the reader to Amazon's VPC, Intel's Service Gateway, SELinux, UFW. That is simply a patently false statement. Of course you can host your applications on an instance of an image configured with SELinux in enforce mode, fully firewalled, with no open connections on unsecured ports, and be quite secure. However, if this book was written in 2008 only to be published in early 2009 this may have been a more true statement then. However few people knew what cloud was in early 2009 and the entire field has rapidly evolved since the authors wrote this book. This is why it is necessary for authors, and publishers, to maintain an errata site, perhaps in the cloud, where corrections and retractions to, best case dated, worst case patently false, statements can be made. Intel, by the way, is also producing encrypting NICs (network interface cards).
While I still subscribe to my previous comment about if you don't control your data you don't control who has access to it, I do have an addendum to it. Cloud computing is a rapidly evolving field. A book, written by anyone, 2 years or more ago on cloud computing is, almost by definition, wrong or highly questionable. Technology simply moves faster than publishers generally do. If you have data that you don't want to or, legally, can not share it, in all likelihood, does not belong in a public cloud. If you are risk averse, it does not. If you are risk tolerant then the decision should be dependent on talking to vendors, cloud and operating system (no, not web browsers). What are the cloud vendor's SLA, what is the insurance on data breaches, what is the state of the art vis-a-vis SELinux, encrypting NICs, encrypted databases, the cloud vendor's physical security, software security, etc. Who had physical access to software keys?
We are a long way from the George Jettson world. In our lifetime people won't be flying their cars to work. Provisioning of data centers, provisioning of infrastructure still, as in the case of airline pilots, should be left to trained and technically current professionals who's livelihoods depend on their ability to successfully navigate the issues. If you are somewhat risk tolerant talk to the vendors, they have no problems telling you what their competition can't do, and make your decisions based on the, then, current state of the art. Don't single source anything, seek confirmations on everything.
As I hope we are all telling our children and students, whatever they place on the Internet will be there forever.
Chapter 4 starts to get interesting although I disagree with some of the author's contentions, perhaps due to the temporal decay. In other words, in the non-SaaS world storing information as opaque encrypted blobs is certainly do-able and would be the responsibility of the system designer to, perhaps optionally, persist the data as such and, upon authenticated readers, decrypt it. Consistent with what I've said earlier, if you don't control your data, you don't control who has access to it. What the author contents is that SaaS providers, let's use SalesForce as an example, should do the same with 'your data'. If you don't control the encryption keys used, you can't even control your own access to the data. This is actually part of the value proposition of CloudSwitch. Disclaimer, I have no affiliation with CloudSwitch. I do not even know if they were even a gleam in their founder's eyes when this book was written, so their niche would be clearly out of scope for the authors (temporal decay). However, in today's state of the art, protection zones, if you will, provisioned by SELinux and afforded by KVM provide for security when data, stored externally, in read by your program and decrypted within the protected zone of the process you are running in. One merely needs to Google SELinux to see what it provides for today vs. what it provided for 2-3 years ago.
Chapter 5 is good (happy now Tim?). Technically it is very rich and philosophically, unintentionally, provides good food for thought. Something I flagged at the beginning of this review gnawed at me and chapter 5 (Authentication, Authorization, and Auditing) provided closure on this. I mentioned there seem to be an underlying premise that the 'cloud' should or will evolve into a global entity, pg 33, "For cloud computing to continually evolve into a borderless and global tool..." Why should it? I vaguely recall an episode, I believe, from Star Trek, where there was some impending catastrophe in progress when Spock commanded, as a high priority task, the computer system to solve, to the last digit, the value of pi. Spock then reminds the captain pi is an endless number the computer(s) can not solve. Uhura shortly announces to Spock and the captain that, one by one, all computer resources (cloud compute nodes) were being deployed to solve the command Spock gave it. Is that part of the problem space for cloud computing to solve? Frankly we sort of already have that in the academic world, Google condor grid and University of Wisconsin. Oddly, I proposed the same sort of thing to a friend and VP at a large software company wherein corporate data centers would now have the prospect of 'selling' their unused cpu and disk capacity by merely joining a cloud as a resource provider rather than a resource consumer. To that end the authors are now on a solid path to addressing or, at least, articulating a direction CSPs could take or must take in order to realize that goal of a 'borderless and global tool'. Where this chapter is equally valid is the use case of you (the reader now) is on a trip to some other part of the country and are in an accident. You are brought to the local hospital and the attending doctor must gain access to your medical records. In a HIPAA world what needs to happen, architecturally, for that doctor to ensure your medical privacy, maintain auditability, and gain timely access to your medical history, oh, your own doctor is out of town.
Note to authors, I also upped your score. I anxiously await the next 100 pages and your second edition.