Hadoop: The Definitive Guide: The Definitive Guide and over one million other books are available for Amazon Kindle. Learn more

Buy Used
Used - Good See details
$6.95 & eligible for FREE Super Saver Shipping on orders over $25. Details

or
Sign in to turn on 1-Click ordering.
 
   
Kindle Edition
 
   
Sell Back Your Copy
For a $1.78 Gift Card
Trade in
Have one to sell? Sell yours here
Hadoop: The Definitive Guide
 
 
Start reading Hadoop: The Definitive Guide: The Definitive Guide on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Hadoop: The Definitive Guide [Paperback]

Tom White (Author)
4.2 out of 5 stars  See all reviews (13 customer reviews)


Available from these sellers.


Formats

Amazon Price New from Used from
Kindle Edition $19.79  
Paperback $31.49  
Paperback, June 12, 2009 --  
Like this book? Find similar titles in the O'Reilly Bookstore.

Book Description

June 12, 2009

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters.

Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce
  • Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Take advantage of HBase, Hadoop's database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.

"Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!


Customers Who Bought This Item Also Bought


Editorial Reviews

About the Author

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.


Product Details

  • Paperback: 528 pages
  • Publisher: O'Reilly Media; Original edition (June 12, 2009)
  • Language: English
  • ISBN-10: 0596521979
  • ISBN-13: 978-0596521974
  • Product Dimensions: 9.1 x 7 x 1 inches
  • Shipping Weight: 1.6 pounds
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (13 customer reviews)
  • Amazon Best Sellers Rank: #203,174 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

 

Customer Reviews

13 Reviews
5 star:
 (8)
4 star:
 (2)
3 star:
 (2)
2 star:    (0)
1 star:
 (1)
 
 
 
 
 
Average Customer Review
4.2 out of 5 stars (13 customer reviews)
 
 
 
 
Share your thoughts with other customers:
Most Helpful Customer Reviews

37 of 39 people found the following review helpful:
5.0 out of 5 stars Pigs and Elephants on the road to World Domination, July 12, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
These days, one can't seem to attend technical conferences without hearing marketing-oriented speakers' world domination plans for their products. So imagine this: what if pigs and elephants are involved? Elephants would be Hadoop installations, and Pigs would be one of those animal-themed tools, smarter cousins of the elephants really, riding on top of Hadoops, directing them on how to perform their jobs. Would the world be a better place?

Hadoop is the brainchild of Doug Cutting, who named his creation after his kid's stuffed yellow elephant. Hadoop enables large datasets distributed over a cluster of machines to be processed in parallel. One machine or node in that cluster would usually house a JobTracker and a NameNode. The JobTracker schedules and manages processing jobs to be executed in the other machines, and the NameNode manages the metadata (e.g., file names and locations, etc) of the datasets to be processed. The processing jobs are programmed in the form of Map and Reduce functions. Inputs are usually split into blocks to be processed in parallel by two or more identical mappers. The close to final outputs are then fed to one or more identical reducers, whose job is to perform any final transformations on the intermediate data to produce data summaries in the expected format. Several companies are using Hadoop to extract knowledge from their extensive data.

I've read this book and Jason Venners' Pro Hadoop book. Although I like both, I like this book better for the following reasons: more comprehensive coverage of topics, and more insiders' information on design rationales and how certain Hadoop features really work behind the scenes.

Here's a breakdown of and some commentaries on the book's contents:

Chapter One introduces Hadoop, its history and how it's different from similar tools or frameworks. Kinda dry. Chapter Two introduces the MapReduce Programming model and its benefits when compared to, say, the use of Unix tools for achieving parallel processing of text files. This is also where readers are introduced to the concepts of: map, combiner, and reduce functions, shuffle and sort, streaming, etc. Chapters Three and Four are all about the Hadoop Distributed FileSystems and I/O and the design decisions that were made to address performance, reliability, and safety concerns.

Chapter Five shows you how to develop, configure, test, run and tune a MapReduce Application. Good chapter but Jason Venner's book has better materials on testing and debugging MapReduce applications.

Chapters Six through Eight discuss how MapReduce really works behind the scene, including advanced features. This is where you'll learn how flexible Hadoop is when it comes to handling different types of inputs and outputs in terms of numbers, sizes, formats, and usage scenarios. Excellent!

Chapters Nine and Ten are really good. They teach you how to set up and administer Hadoop clusters. There's even a brief but informative section on how to use Hadoop with Amazon EC2 servers.

Chapters 11-13 devote one chapter each on how to install and interact with frameworks built on top of Hadoop: Pig, HBase, and ZooKeeper. Chapter 14 provides Case Studies (e.g., How Facebook uses Hadoop to analyze ad campaign effectiveness, etc.).

Appendices A and B provide instructions on how to install Apache's Hadoop and Cloudera's distribution, respectively, and C gives you a runthrough of the steps to take when preparing to use the NCDC Weather Data used in the book.

Very thorough and well written book. 4.5 stars rating.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


33 of 38 people found the following review helpful:
3.0 out of 5 stars Partly succeeds, September 8, 2009
Amazon Verified Purchase(What's this?)
This review is from: Hadoop: The Definitive Guide (Paperback)
Tom White certainly writes very well: this book is very readable. It is also quite comprehensive, falling somewhere between a tutorial and a reference.

That being said, I was ultimately rather disappointed. First, and most importantly, it was not clear to me after reading this book how I might use Hadoop for some of my projects, or if indeed they were good candidates for MapReduce. I feel it should have been possible to provide some generic guidance. Second, some chapters are written by other authors, and these did not uniformly provide the same quality of instruction, reading occasionally like advertisements.

I confess I am puzzled by the number of encapsulating and utility APIs that have grown up around Hadoop. Why do we need Pig, HBase, Hive, Zookeeper and Cascading? Apparently because (according to what I have read here), bare Hadoop is hard to program with (productively). Some indication of how these wrappers interact with each other would have been helpful.

As it is, I feel LESS urge to evangelize for Hadoop having read this book. Surely not the desired effect?
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


4 of 5 people found the following review helpful:
5.0 out of 5 stars Don't understand all the other negative reviews, July 23, 2009
This review is from: Hadoop: The Definitive Guide (Paperback)
This is the book to get if you are actually doing something with Hadoop. It's been a lifesaver, and has answered all our questions of, "I wonder if I can do x in Hadoop?"
It gives a lot of information about the internals of Hadoop, which you will want to know when things go wrong or when you just want to get more out of Hadoop.
I normally don't post reviews as much, but I think Tom White and this book deserves way more than 5 stars, so I'm not sure why it only has 3 stars on Amazon.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No

Share your thoughts with other customers: Create your own review
 
 
 
Most Recent Customer Reviews











Only search this product's reviews



Inside This Book (learn more)
Browse Sample Pages:
Front Cover | Table of Contents | First Pages | Index | Back Cover | Surprise Me!
Search Inside This Book:

What Other Items Do Customers Buy After Viewing This Item?


Tags Customers Associate with This Product

 (What's this?)
Click on a tag to find related items, discussions, and people.
 
(3)
(1)
(1)
(1)

Your tags: Add your first tag
 

Customer Discussions

This product's forum
Discussion Replies Latest Post
Not available for Kindle 1?? 0 Sep 27, 2010
See all discussions...  
Start a new discussion
Topic:
First post:
Prompts for sign-in
 


Active discussions in related forums
Search Customer Discussions
   
Related forums





Look for Similar Items by Category


Look for Similar Items by Subject