- Paperback: 556 pages
- Publisher: O'Reilly Media; 1 edition (September 23, 2011)
- Language: English
- ISBN-10: 1449396100
- ISBN-13: 978-1449396107
- Product Dimensions: 7 x 1.2 x 9.2 inches
- Shipping Weight: 1.7 pounds (View shipping rates and policies)
- Average Customer Review: 15 customer reviews
- Amazon Best Sellers Rank: #260,727 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
HBase: The Definitive Guide: Random Access to Your Planet-Size Data 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Frequently bought together
Customers who bought this item also bought
About the Author
Lars George has been involved with HBase since 2007, and became a full HBase committer in 2009. He has spoken at various Hadoop User Group meetings, as well as large conferences such as FOSDEM in Brussels. He also started the Munich OpenHUG meetings. He now works closely with Cloudera to support Hadoop and HBase in and around Europe through technical support, consulting work, and training.
Top customer reviews
Also, I've found the text loses all its formatting on my Kindle Fire, making it very hard to read, though oddly it comes out fine in the Kindle app on my iPhone 6+. Presumably the print edition does not suffer from this flaw.
Though I can't recommend the book wholeheartedly, some readers may still find it useful for the depth of its coverage of architecture and implementation, from one of the earliest adopters and contributors.
At least one other reviewer here commented that it was hard for them to understand the purpose of this book, because of its broad scope. But in my opinion its broad scope is what makes it so valuable, at least to the architect. The potential reader just needs to keep in mind that they need to understand what they wish to gain from reading a text on HBase. For example, if the potential reader simply wants to understand how to use the APIs that HBase provides, there may be no need to understand HBase architecture and so the chapters that the author provides on related material may not be applicable at the outset. It is difficult to argue for a sequential reading of this text from beginning to end, although the potential reader also needs to keep in mind the importance of minimal understanding of the programmatic connections with the architecture, in which case a cursory overview of some of the more advanced material is highly recommended. In contrast, as an architect who wants to understand as many of the benefits, drawbacks, and implementation details as possible, especially when considering my extensive experience with relational databases such as Oracle, I appreciate all of the detail that George has to offer here, and would not have it any other way. While I have not yet reviewed his blog yet to any great extent, based on his street cred and the abundance of material in this book, I do not think it is a stretch to akin him to Tom Kyte of "Ask Tom" fame within the Oracle space.
After an introduction that provides discussions on big data, column-oriented databases, problems with relational database systems, nonrelational database systems, and an HBase architectural overview (all within Chapter 1), George quickly moves forward to a chapter on HBase installation (Chapter 2), followed by discussions of native Java APIs (Chapters 3, 4, and 5), available clients (Chapter 6), and integration with Hadoop's MapReduce framework (Chapter 7). The remainder of the text delves into more advanced topics, beginning with HBase architecture (Chapter 8), followed by design topics critical to taking full advantage of HBase architecture (Chapter 9), as well as discussions on cluster monitoring (Chapter 10), performance tuning (chapter 11), and cluster administration (chapter 12). As a general rule, George does a great job at introducing topics in an order that makes sense without frequently referring to chapters later in the book, which is a pet peeve of mine with other technical books. One of the many aspects I especially appreciated about this text is that even in chapters that cover content that might be seen as high-level content by other architects, George is not afraid to get down in the trenches at the programmatic level, even in the preface to the book, in which the author explains how to build the examples that he provides in his Git repository.
Please be aware that even though the author discusses features such as coprocessors that have been included in subsequent releases (at the time of writing this review, version 0.92.1 has been made available as a stable release), the author does make note in Chapter 2 that "the current version of HBase will only run on Hadoop 0.20.x. It will not run on Hadoop 0.21.x (nor 0.22.x) as of this writing. HBase may lose data in a catastrophic event unless it is running on an HDFS that has durable sync support. Hadoop 0.20.2 and Hadoop 0.20.203.0 do not have this support. Currently, only the branch-0.20-append branch has this attribute. No official releases have been made from this branch up to now, so you will have to build your own Hadoop from the tip of this branch." The author does bring up the fact that Cloudera's CDH3 has already applied the patches associated with durable sync, if building Hadoop is not your cup of tea, but keep in mind that Cloudera as of less than two weeks ago has already released CDH4 Beta 2. Anyone interested in HBase needs to keep in mind that this space is moving along quickly, so plan to get started with this book and move forward with the abundant material that is increasingly being made available online. That said, George provides a wealth of knowledge that I do not expect to be surpassed at any time in the foreseeable future by any other publication.