Customer Reviews


13 Reviews
5 star:
 (8)
4 star:
 (2)
3 star:
 (2)
2 star:    (0)
1 star:
 (1)
 
 
 
 
 
Average Customer Review
Share your thoughts with other customers
Create your own review
 
 
Only search this product's reviews

The most helpful favorable review
The most helpful critical review


37 of 39 people found the following review helpful:
5.0 out of 5 stars Pigs and Elephants on the road to World Domination
These days, one can't seem to attend technical conferences without hearing marketing-oriented speakers' world domination plans for their products. So imagine this: what if pigs and elephants are involved? Elephants would be Hadoop installations, and Pigs would be one of those animal-themed tools, smarter cousins of the elephants really, riding on top of Hadoops, directing...
Published on July 12, 2009 by Techie Evan

versus
33 of 38 people found the following review helpful:
3.0 out of 5 stars Partly succeeds
Tom White certainly writes very well: this book is very readable. It is also quite comprehensive, falling somewhere between a tutorial and a reference.

That being said, I was ultimately rather disappointed. First, and most importantly, it was not clear to me after reading this book how I might use Hadoop for some of my projects, or if indeed they were good...
Published on September 8, 2009 by BillyJoeBob


‹ Previous | 1 2 | Next ›
Most Helpful First | Newest First

37 of 39 people found the following review helpful:
5.0 out of 5 stars Pigs and Elephants on the road to World Domination, July 12, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
These days, one can't seem to attend technical conferences without hearing marketing-oriented speakers' world domination plans for their products. So imagine this: what if pigs and elephants are involved? Elephants would be Hadoop installations, and Pigs would be one of those animal-themed tools, smarter cousins of the elephants really, riding on top of Hadoops, directing them on how to perform their jobs. Would the world be a better place?

Hadoop is the brainchild of Doug Cutting, who named his creation after his kid's stuffed yellow elephant. Hadoop enables large datasets distributed over a cluster of machines to be processed in parallel. One machine or node in that cluster would usually house a JobTracker and a NameNode. The JobTracker schedules and manages processing jobs to be executed in the other machines, and the NameNode manages the metadata (e.g., file names and locations, etc) of the datasets to be processed. The processing jobs are programmed in the form of Map and Reduce functions. Inputs are usually split into blocks to be processed in parallel by two or more identical mappers. The close to final outputs are then fed to one or more identical reducers, whose job is to perform any final transformations on the intermediate data to produce data summaries in the expected format. Several companies are using Hadoop to extract knowledge from their extensive data.

I've read this book and Jason Venners' Pro Hadoop book. Although I like both, I like this book better for the following reasons: more comprehensive coverage of topics, and more insiders' information on design rationales and how certain Hadoop features really work behind the scenes.

Here's a breakdown of and some commentaries on the book's contents:

Chapter One introduces Hadoop, its history and how it's different from similar tools or frameworks. Kinda dry. Chapter Two introduces the MapReduce Programming model and its benefits when compared to, say, the use of Unix tools for achieving parallel processing of text files. This is also where readers are introduced to the concepts of: map, combiner, and reduce functions, shuffle and sort, streaming, etc. Chapters Three and Four are all about the Hadoop Distributed FileSystems and I/O and the design decisions that were made to address performance, reliability, and safety concerns.

Chapter Five shows you how to develop, configure, test, run and tune a MapReduce Application. Good chapter but Jason Venner's book has better materials on testing and debugging MapReduce applications.

Chapters Six through Eight discuss how MapReduce really works behind the scene, including advanced features. This is where you'll learn how flexible Hadoop is when it comes to handling different types of inputs and outputs in terms of numbers, sizes, formats, and usage scenarios. Excellent!

Chapters Nine and Ten are really good. They teach you how to set up and administer Hadoop clusters. There's even a brief but informative section on how to use Hadoop with Amazon EC2 servers.

Chapters 11-13 devote one chapter each on how to install and interact with frameworks built on top of Hadoop: Pig, HBase, and ZooKeeper. Chapter 14 provides Case Studies (e.g., How Facebook uses Hadoop to analyze ad campaign effectiveness, etc.).

Appendices A and B provide instructions on how to install Apache's Hadoop and Cloudera's distribution, respectively, and C gives you a runthrough of the steps to take when preparing to use the NCDC Weather Data used in the book.

Very thorough and well written book. 4.5 stars rating.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


33 of 38 people found the following review helpful:
3.0 out of 5 stars Partly succeeds, September 8, 2009
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
Tom White certainly writes very well: this book is very readable. It is also quite comprehensive, falling somewhere between a tutorial and a reference.

That being said, I was ultimately rather disappointed. First, and most importantly, it was not clear to me after reading this book how I might use Hadoop for some of my projects, or if indeed they were good candidates for MapReduce. I feel it should have been possible to provide some generic guidance. Second, some chapters are written by other authors, and these did not uniformly provide the same quality of instruction, reading occasionally like advertisements.

I confess I am puzzled by the number of encapsulating and utility APIs that have grown up around Hadoop. Why do we need Pig, HBase, Hive, Zookeeper and Cascading? Apparently because (according to what I have read here), bare Hadoop is hard to program with (productively). Some indication of how these wrappers interact with each other would have been helpful.

As it is, I feel LESS urge to evangelize for Hadoop having read this book. Surely not the desired effect?
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


4 of 5 people found the following review helpful:
5.0 out of 5 stars Don't understand all the other negative reviews, July 23, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
This is the book to get if you are actually doing something with Hadoop. It's been a lifesaver, and has answered all our questions of, "I wonder if I can do x in Hadoop?"
It gives a lot of information about the internals of Hadoop, which you will want to know when things go wrong or when you just want to get more out of Hadoop.
I normally don't post reviews as much, but I think Tom White and this book deserves way more than 5 stars, so I'm not sure why it only has 3 stars on Amazon.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


6 of 8 people found the following review helpful:
5.0 out of 5 stars First 25 Pages Have You Up And Running!, August 24, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
I picked up this book to catch up on Hadoop, which the rest of my team has been using for several months. Unfortunately I was too busy with other projects to spend any time on MapReduce and thought it'd be a grueling process to be brought up to speed on it. Within the first 25 pages and about 3 hours, Tom had me up and running my first MapReduce job which I successfully adapted for a specific metric we were trying to generate. The book does a great job of breaking down Hadoop's complex pieces into easy to understand components, but doesn't try and pump you full of conceptual BS before it lets you touch real code.

If I were to make any suggestions it would be to start the book off with some simple instructions for installing and getting Hadoop up and running on a local machine, followed by some simple explanations of DFS and Hadoop's commands for managing the file system. I would also explain much earlier how to get your classes recognized by Hadoop for those a bit rusty at Java. Fortunately, the online Wiki was very good about providing instructions to get me going on a Mac, and that took a majority of OS-specific needs off the burden of the book. You will, no doubt, have to be intelligent to read this book, but if you're using Hadoop, there is already a prerequisite for technical proficiency you'll need to satisfy. Overall good job, Tom.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


1 of 1 people found the following review helpful:
5.0 out of 5 stars Excellent for a beginner, April 26, 2011
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
The book is clear and easy to follow, especially for a beginner like me. It had short examples for most of the cases that you might think of. I think of it as a guidance on how to learn Hadoop functionalities and classes in the right order. Yet, I can not be more precise in my review since I haven't read another book about Hadoop. Most of my references are online, specially Yahoo site. I'm not sure how advanced it is, because I don't have a real cluster, so I'm not sure if what is mentioned, is enough for real cluster's problems and and configurations issues. The book also discuss other Apache projects like Hive and HBase.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


1 of 1 people found the following review helpful:
5.0 out of 5 stars great book, October 5, 2010
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
What I really liked most about this books was that I could read the vast majority of it straight through and enjoyed the process. Very well structured and the example surrounding weather station data was an appropriate choice to give a good perspective on most of the problems. A good mix of practical theory, examples and code snippets.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


3 of 4 people found the following review helpful:
4.0 out of 5 stars The elephant is tamed, April 30, 2010
By 
JUG Lugano (Lugano, Switzerland) - See all my reviews
This review is from: Hadoop: The Definitive Guide (Paperback)
Original review written by Paolo Canesi, JUG Lugano, www.juglugano.ch

Managing and analyzing huge data sets has become a very common problem in various areas of modern information technology, from different types of Web applications (social, financial, trading, ...) to applications for analyzing scientific data.

Distributed systems over a cluster of machines are almost a mandatory choice in such cases, but designing and implementing an effective solution in those areas may be troublesome and become a nightmare.

The Apache Hadoop Project is an infrastructure that helps the construction of reliable, scalable, distributed systems. Mainly known for its MapReduce and distributed file system (HDFS) subprojects, it actually includes other services that complement or extend them.

Tom Whites' "Hadoop: The Definitive Guide" is an enjoyable book which fully explains these complex technologies. The book is organized in such a way that the reader is gently guided into the Hadoop ecosystem. It begins with a couple of very readable chapters as a general introduction to the problems Hadoop is meant to solve and the main solutions to them (MapReduce and HDFS), then examines closely all its aspects, often describing what really happens under the scenes, giving useful design suggestions and common pitfalls descriptions. When reading this book you won't be overwhelmed by tons of lines of code: examples are short and yet effective.

This kind of structure makes it hard to classify the book as a mere tutorial or as a real reference guide, it can be rather considered a mix of the two. If this turns out to be a positive choice in many ways, it has some drawbacks: the reader is sometimes forced to go back and forth through the chapters and has to read it almost entirely to get a full understanding. But this is perhaps the price to pay for having a fluent and pleasant reading.

Let's go quickly through the chapters:

The first chapter is a brief history of Hadoop project illustrating its main characteristics and comparing them to those of others similar technologies. Chapter two is a pleasant introduction to MapReduce. The third chapter breaks the continuity of the previous one examining the Hadoop Distributed File System (HDFS subproject) in detail. Chapter four makes a step down in the abstraction layer talking about the Hadoop I/O fundamentals: data integrity, compression, serialization and data structures, explaining the design choice.

Chapters five to eight are an excellent source for learning Hadoop MapReduce in depth. They cover all the aspects of it: starting from practical ones, such as how to configure, run, test and debug map reduce programs, to those more advanced and formal, like programming models, data formats, sorting and joining tools.

The two following chapters list few very interesting and useful suggestions for managing and setting up a Hadoop cluster, a precious resource for administrators.

Chapters eleven to thirteen are for Pig, HBase and Zookeper subprojects under the Hadoop umbrella. Despite of suffering from brevity, they are still interesting.

Chapter fourteen is made for the reader not to feel alone: important case studies using Hadoop (e.g. Yahoo, and others contributions from Apache Hadoop community).

My final opinion is that "Hadoop: The Definitive Guide" is a very useful resource for those who want to learn how to ride the "pachydermic" Hadoop (like a "Mahout", perhaps?).
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


2 of 3 people found the following review helpful:
5.0 out of 5 stars Brilliant book to get started and keep going, May 18, 2010
By 
Simon Reavely (Boston, MA United States) - See all my reviews
(REAL NAME)   
This review is from: Hadoop: The Definitive Guide (Paperback)
I really enjoyed the book. It has everything you need to:
a) Get started running your own cluster and writing your own MR jobs
b) Understand how to administer the cluster
c) Troubleshoot your programs
d) Learn about really important side projects like Pig, Hive, Zookeeper and HBase (of which I think Hive is the most amazing)

One thing I wish I'd done is go through the cloudera online tutorials BEFORE reading this book. If I'd done that (instead of doing so afterwards) I think I'd have got through certain sections of the book much quicker; basically I would have 'got it' quicker. See [...]
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


11 of 17 people found the following review helpful:
3.0 out of 5 stars I had a hard time comprehending this book, July 18, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
I usually have good experiences with O'Reilly books, but this one left me befuddled. I figured because I knew Java well and understood database theory and distributed computing, that this couldn't be a difficult subject. I was wrong, at least for me. If you already know what MapReduce is and you already know what Hadoop can do, this book might be quite instructive. As a beginner though, I was lost. Afterwards, I did read through Cloud Application Architectures: Building Applications and Infrastructure in the Cloud (Theory in Practice (O'Reilly)), and I found it much more instructive to a novice at the particular technologies involved, like myself. Right now the table of contents is not available for this book in the product information, so I list that next:

Chapter 1. Meet Hadoop
Section 1.1. Data
Section 1.2. Data Storage and Analysis
Section 1.3. Comparison with Other Systems
Section 1.4. A Brief History of Hadoop
Section 1.5. The Apache Hadoop Project
Chapter 2. MapReduce
Section 2.1. A Weather Dataset
Section 2.2. Analyzing the Data with Unix Tools
Section 2.3. Analyzing the Data with Hadoop
Section 2.4. Scaling Out
Section 2.5. Hadoop Streaming
Section 2.6. Hadoop Pipes
Chapter 3. The Hadoop Distributed Filesystem
Section 3.1. The Design of HDFS
Section 3.2. HDFS Concepts
Section 3.3. The Command-Line Interface
keep-together 3.4. Hadoop Filesystems
Section 3.5. The Java Interface
Section 3.6. Data Flow
Section 3.7. Parallel Copying with distcp
Section 3.8. Hadoop Archives
Chapter 4. Hadoop I/O
Section 4.1. Data Integrity
Section 4.2. Compression
Section 4.3. Serialization
Section 4.4. File-Based Data Structures
Chapter 5. Developing a MapReduce Application
Section 5.1. The Configuration API
Section 5.2. Configuring the Development Environment
Section 5.3. Writing a Unit Test
Section 5.4. Running Locally on Test Data
Section 5.5. Running on a Cluster
Section 5.6. Tuning a Job
Section 5.7. MapReduce Workflows
Chapter 6. How MapReduce Works
Section 6.1. Anatomy of a MapReduce Job Run
Section 6.2. Failures
Section 6.3. Job Scheduling
Section 6.4. Shuffle and Sort
Section 6.5. Task Execution
Chapter 7. MapReduce Types and Formats
Section 7.1. MapReduce Types
Section 7.2. Input Formats
Section 7.3. Output Formats
Chapter 8. MapReduce Features
Section 8.1. Counters
Section 8.2. Sorting
Section 8.3. Joins
Section 8.4. Side Data Distribution
Section 8.5. MapReduce Library Classes
Chapter 9. Setting Up a Hadoop Cluster
Section 9.1. Cluster Specification
Section 9.2. Cluster Setup and Installation
Section 9.3. SSH Configuration
Section 9.4. Hadoop Configuration
Section 9.5. Post Install
Section 9.6. Benchmarking a Hadoop Cluster
Section 9.7. Hadoop in the Cloud
Chapter 10. Administering Hadoop
Section 10.1. HDFS
Section 10.2. Monitoring
Section 10.3. Maintenance
Chapter 11. Pig
Section 11.1. Installing and Running Pig
Section 11.2. An Example
Section 11.3. Comparison with Databases
Section 11.4. Pig Latin
Section 11.5. User-Defined Functions
Section 11.6. Data Processing Operators
Section 11.7. Pig in Practice
Chapter 12. HBase
Section 12.1. HBasics
Section 12.2. Concepts
Section 12.3. Installation
Section 12.4. Clients
Section 12.5. Example
Section 12.6. HBase Versus RDBMS
Section 12.7. Praxis
Chapter 13. ZooKeeper
Section 13.1. Installing and Running ZooKeeper
Section 13.2. An Example
Section 13.3. The ZooKeeper Service
Section 13.4. Building Applications with ZooKeeper
Section 13.5. ZooKeeper in Production
Chapter 14. Case Studies
Section 14.1. Hadoop Usage at Last.fm
Section 14.2. Hadoop and Hive at Facebook
Section 14.3. Nutch Search Engine
Section 14.4. Log Processing at Rackspace
Section 14.5. Cascading
Section 14.6. TeraByte Sort on Apache Hadoop
Appendix A. Installing Apache Hadoop
Section A.1. Prerequisites
Section A.2. Installation
Section A.3. Configuration
Appendix B. Cloudera's Distribution for Hadoop
Section B.1. Prerequisites
Section B.2. Standalone Mode
Section B.3. Pseudo-Distributed Mode
Section B.4. Fully Distributed Mode
Section B.5. Hadoop-Related Packages
Appendix C. Preparing the NCDC Weather Data

I did look through the preface to see who the reader was supposed to be and what your qualifications should be to read the book. I never saw any such instruction. Perhaps that would have been helpful in disqualifying myself as able to tackle this book as a novice.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


2 of 4 people found the following review helpful:
5.0 out of 5 stars Excellent book on all aspects of Hadoop, August 4, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
Excellent book. Covers a lot of ground on all aspects of Hadoop.

This book was my point of reference for setting up and testing up a small cluster. Best detailed description I've found yet on the flow of data through a map and reduce job.

Small negative is the content is a little scattered - need to flip back and forth between chapters.

Strongly recommend.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


‹ Previous | 1 2 | Next ›
Most Helpful First | Newest First

This product

Hadoop: The Definitive Guide
Hadoop: The Definitive Guide by Tom White (Paperback - June 12, 2009)
Used & New from: $6.84
Add to wishlist See buying options