Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your email address or mobile phone number.

Hadoop: The Definitive Guide 1st Edition

4.3 out of 5 stars 15 customer reviews
ISBN-13: 978-0596521974
ISBN-10: 0596521979
Why is ISBN important?
ISBN
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
Have one to sell? Sell on Amazon
More Buying Choices
17 New from $42.40 39 Used from $1.68

There is a newer edition of this item:

Free Two-Day Shipping for College Students with Amazon Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student


Self-paced course for SAT
Prep whenever you want, for as long as you need. Learn more
click to open popover

Editorial Reviews

Book Description

MapReduce for the Cloud

About the Author

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

NO_CONTENT_IN_FEATURE

Like this book? Find similar titles in the O'Reilly Bookstore.

Product Details

  • Paperback: 528 pages
  • Publisher: O'Reilly Media; 1 edition (June 15, 2009)
  • Language: English
  • ISBN-10: 0596521979
  • ISBN-13: 978-0596521974
  • Product Dimensions: 7 x 1 x 9.1 inches
  • Shipping Weight: 1.6 pounds
  • Average Customer Review: 4.3 out of 5 stars  See all reviews (15 customer reviews)
  • Amazon Best Sellers Rank: #698,521 in Books (See Top 100 in Books)

Customer Reviews

Top Customer Reviews

Format: Paperback
These days, one can't seem to attend technical conferences without hearing marketing-oriented speakers' world domination plans for their products. So imagine this: what if pigs and elephants are involved? Elephants would be Hadoop installations, and Pigs would be one of those animal-themed tools, smarter cousins of the elephants really, riding on top of Hadoops, directing them on how to perform their jobs. Would the world be a better place?

Hadoop is the brainchild of Doug Cutting, who named his creation after his kid's stuffed yellow elephant. Hadoop enables large datasets distributed over a cluster of machines to be processed in parallel. One machine or node in that cluster would usually house a JobTracker and a NameNode. The JobTracker schedules and manages processing jobs to be executed in the other machines, and the NameNode manages the metadata (e.g., file names and locations, etc) of the datasets to be processed. The processing jobs are programmed in the form of Map and Reduce functions. Inputs are usually split into blocks to be processed in parallel by two or more identical mappers. The close to final outputs are then fed to one or more identical reducers, whose job is to perform any final transformations on the intermediate data to produce data summaries in the expected format. Several companies are using Hadoop to extract knowledge from their extensive data.

I've read this book and Jason Venners' Pro Hadoop book. Although I like both, I like this book better for the following reasons: more comprehensive coverage of topics, and more insiders' information on design rationales and how certain Hadoop features really work behind the scenes.
Read more ›
1 Comment 47 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback Verified Purchase
Tom White certainly writes very well: this book is very readable. It is also quite comprehensive, falling somewhere between a tutorial and a reference.

That being said, I was ultimately rather disappointed. First, and most importantly, it was not clear to me after reading this book how I might use Hadoop for some of my projects, or if indeed they were good candidates for MapReduce. I feel it should have been possible to provide some generic guidance. Second, some chapters are written by other authors, and these did not uniformly provide the same quality of instruction, reading occasionally like advertisements.

I confess I am puzzled by the number of encapsulating and utility APIs that have grown up around Hadoop. Why do we need Pig, HBase, Hive, Zookeeper and Cascading? Apparently because (according to what I have read here), bare Hadoop is hard to program with (productively). Some indication of how these wrappers interact with each other would have been helpful.

As it is, I feel LESS urge to evangelize for Hadoop having read this book. Surely not the desired effect?
2 Comments 38 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback Verified Purchase
The book is clear and easy to follow, especially for a beginner like me. It had short examples for most of the cases that you might think of. I think of it as a guidance on how to learn Hadoop functionalities and classes in the right order. Yet, I can not be more precise in my review since I haven't read another book about Hadoop. Most of my references are online, specially Yahoo site. I'm not sure how advanced it is, because I don't have a real cluster, so I'm not sure if what is mentioned, is enough for real cluster's problems and and configurations issues. The book also discuss other Apache projects like Hive and HBase. This is found in other books too but what's amazing here is that all code scripts mentioned in the book are also provided to start running directly without the hassle of writing from scratch.
Comment One person found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
I picked up this book to catch up on Hadoop, which the rest of my team has been using for several months. Unfortunately I was too busy with other projects to spend any time on MapReduce and thought it'd be a grueling process to be brought up to speed on it. Within the first 25 pages and about 3 hours, Tom had me up and running my first MapReduce job which I successfully adapted for a specific metric we were trying to generate. The book does a great job of breaking down Hadoop's complex pieces into easy to understand components, but doesn't try and pump you full of conceptual BS before it lets you touch real code.

If I were to make any suggestions it would be to start the book off with some simple instructions for installing and getting Hadoop up and running on a local machine, followed by some simple explanations of DFS and Hadoop's commands for managing the file system. I would also explain much earlier how to get your classes recognized by Hadoop for those a bit rusty at Java. Fortunately, the online Wiki was very good about providing instructions to get me going on a Mac, and that took a majority of OS-specific needs off the burden of the book. You will, no doubt, have to be intelligent to read this book, but if you're using Hadoop, there is already a prerequisite for technical proficiency you'll need to satisfy. Overall good job, Tom.
Comment 7 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
Original review written by Paolo Canesi, JUG Lugano, [...]

Managing and analyzing huge data sets has become a very common problem in various areas of modern information technology, from different types of Web applications (social, financial, trading, ...) to applications for analyzing scientific data.

Distributed systems over a cluster of machines are almost a mandatory choice in such cases, but designing and implementing an effective solution in those areas may be troublesome and become a nightmare.

The Apache Hadoop Project is an infrastructure that helps the construction of reliable, scalable, distributed systems. Mainly known for its MapReduce and distributed file system (HDFS) subprojects, it actually includes other services that complement or extend them.

Tom Whites' "Hadoop: The Definitive Guide" is an enjoyable book which fully explains these complex technologies. The book is organized in such a way that the reader is gently guided into the Hadoop ecosystem. It begins with a couple of very readable chapters as a general introduction to the problems Hadoop is meant to solve and the main solutions to them (MapReduce and HDFS), then examines closely all its aspects, often describing what really happens under the scenes, giving useful design suggestions and common pitfalls descriptions. When reading this book you won't be overwhelmed by tons of lines of code: examples are short and yet effective.

This kind of structure makes it hard to classify the book as a mere tutorial or as a real reference guide, it can be rather considered a mix of the two.
Read more ›
Comment 4 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse

Most Recent Customer Reviews