Hadoop: The Definitive Guide and over one million other books are available for Amazon Kindle. Learn more

Sell Us Your Item
For a $1.61 Gift Card
Trade in
Have one to sell? Sell yours here
Start reading Hadoop: The Definitive Guide on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Sorry, this item is not available in
Image not available for
Color:
Image not available

To view this video download Flash Player

 

Hadoop: The Definitive Guide [Paperback]

Tom White
3.9 out of 5 stars  See all reviews (13 customer reviews)


Available from these sellers.


Free Two-Day Shipping for College Students with Amazon Student

Formats

Amazon Price New from Used from
Kindle Edition $19.79  
Paperback --  
Rent Your Textbooks
Save up to 70% when you rent your textbooks on Amazon. Keep your textbook rentals for a semester and rental return shipping is free.
There is a newer edition of this item:
Hadoop: The Definitive Guide Hadoop: The Definitive Guide 3.9 out of 5 stars (25)
$27.89
In Stock.

Book Description

October 12, 2010 1449389732 978-1449389734 Second Edition

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.

This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce
  • Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Analyze datasets with Hive, Hadoop’s data warehousing system
  • Take advantage of HBase, Hadoop’s database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

"Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk."

--Doug Cutting, Cloudera



Editorial Reviews

About the Author

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.


Product Details

  • Paperback: 628 pages
  • Publisher: Yahoo Press; Second Edition edition (October 12, 2010)
  • Language: English
  • ISBN-10: 1449389732
  • ISBN-13: 978-1449389734
  • Product Dimensions: 7 x 1.3 x 9.2 inches
  • Shipping Weight: 2.2 pounds
  • Average Customer Review: 3.9 out of 5 stars  See all reviews (13 customer reviews)
  • Amazon Best Sellers Rank: #317,898 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

3.9 out of 5 stars
(13)
3.9 out of 5 stars
Share your thoughts with other customers
Most Helpful Customer Reviews
25 of 27 people found the following review helpful
5.0 out of 5 stars The canonical reference of all things Hadoop June 13, 2011
Format:Paperback
The second edition of the already fantastic Hadoop: The Definitive Guide adds the last few missing bits to the best Hadoop reference out there.

For those not familiar with the first edition, Hadoop: The Definitive Guide is exactly what it claims to be. If you're not already familiar with Hadoop, the first and second chapters (Meet Hadoop and MapReduce, respectively) take you through the basics in both concept as well as code. For those used to writing data processing applications, the rationale behind Hadoop and why it's useful are immediately apparent. If you've already been exposed to Hadoop, these chapters may be redundant but they're worth reading anyway the first time through.

The chapter on HDFS does a great job at explaining the underbelly of Hadoop's distributed file system including the Java APIs. The section on Hadoop IO is probably introduced a bit too early - Hadoop newbies probably don't care about compression and serialization prior to reading about map reduce - but excellent none the less in its detail. That said, you'll *really* want to go back and read it to understand the details of how compression codecs work after you learn more about map reduce.The "Writing a Map Reduce Application" chapter is probably the one existing users of Hadoop will skip. First timers will definitely get a lot out of a step by step walk through of a Java MR job from beginning to end.

The chapters on how map reduce works, types and formats (including input / output format details), and the advanced features (counters, sorting, the distributed cache, join libraries) are the ones you'll reread and reference constantly. The explanation, for instance, on how input splits are calculated demystifies the border between HDFS and the map reduce layer (and finally answers the question of "how does Hadoop know not to split in the middle of a record?"). Buy this book for these chapters, if not for the others.

The chapters on HBase, Pig, ZooKeeper, and Sqoop are excellent and, in some cases, the best reference on the topic to date.

There are enough corrections, updates, and new chapters that it's worth buying the second edition if you already have the first. For anyone new to Hadoop this is a must have. If you already use Hadoop the later chapters are what you're looking for; a deep explanation of not just "how," but "why."

Some reviewers have noted the discussion of deprecated APIs. This really isn't a flaw of the book, but of premature deprecation within Hadoop itself. The newer APIs didn't have all the features of the old and anyone writing production map reduce jobs would wind up needing a lot of those features. I think the author does a great job with a tough situation while still alerting the reader that newer APIs are on the horizon. Besides, the differences are so few that it's almost not worth mentioning. While APIs may change, the core design, execution model, and architecture of Hadoop haven't changed and this is the best book on the subject.
Comment | 
Was this review helpful to you?
26 of 30 people found the following review helpful
2.0 out of 5 stars Sadly, already outdated May 22, 2011
Format:Paperback
Hadoop's MapReduce and HBase went through a major API change right around the time this book was finishing up. Consequently, if you try to use the examples in the book as a guide while developing against either the Apache Hadoop latest release or against Cloudera's CDH3, you'll find a mountain of frustration in the form of deprecated or entirely deleted classes.
Comment | 
Was this review helpful to you?
11 of 12 people found the following review helpful
5.0 out of 5 stars Excellant Hadoop Overview July 20, 2011
Format:Paperback|Amazon Verified Purchase
This book provides an excellent in-depth overview of all aspects of Hadoop with how-to examples that are easy to follow. It is well written, thorough and exactly what I needed to architect and build a Hadoop-based solution. Related technologies such as Hive, HBase, Sqoop, Pig and Zookeeper are also covered in decent depth.

Other reviewers gave poor reviews due to the APIs being not up to date, which I think is unfair. Those new APIs are still only available in early unstable Hadoop versions, so current developers are best served to use the earlier APIs. The book gives samples with new APIs and shows very clearly the API changes which are minor. The concepts are identical, but a few classes have been combined into a more cohesive "Context" class in the new APIs.

So, for example, to write a data record you call "context.collect(...);" rather than "output.collect(...);" with identical parameters. The structure of applications and the concepts are not changed. The changes to the syntax of Java calls is trivial and covered in the book very clearly. What is the big deal? Understanding the concepts is the most important thing and this book provides this very nicely.

I would recommend this book to anyone who is new to Hadoop and needs to learn it in depth.
Comment | 
Was this review helpful to you?
Most Recent Customer Reviews
5.0 out of 5 stars The definitive Hadoop book
The standard hadoop book for documenting and learning hadoop. The book is well written and is a fairly easy read.
Published 1 month ago by D. Zanter
4.0 out of 5 stars Good overview
This is a good overview book of Hadoop, how it works, and the software in the Hadoop ecosystem. It's definitely a breadth book, not a depth book, so if you're looking to be an... Read more
Published 4 months ago by Al
4.0 out of 5 stars Good coverage of the subject
The book was used to introduce the subject. It gave a good overview and enough detail to get a project going. Read more
Published 5 months ago by Eyal
5.0 out of 5 stars Must have book on Hadoop
This book is close to what the legendary C book was for C language programming. It is very detailed and clear.
Published 7 months ago by Behzad Pirvali
4.0 out of 5 stars In-depth with lots of examples
The book has lots examples and footnote resources that enriched the content. Some people recommend watching Cloudera training videos first and then reading this book if you are a... Read more
Published 11 months ago by JUAN JOSE DE LEON
1.0 out of 5 stars Shallow
I had bought this book (Kindle edition), hoping it would have a good intro to programming for MapReduce. It is not. Read more
Published 14 months ago by P. Bhowmick
4.0 out of 5 stars Useful but hard to understand
Good to have a book about this system; it is much better to have at least some book than no book at all. Read more
Published 15 months ago by Dimitri K
5.0 out of 5 stars Makes Hadoop as easy as it should be
The documentation for Hadoop is really lacking. This fills that gap very well. I suggest just read the book through and then try to code something up. Read more
Published 16 months ago by Justin Vincent
5.0 out of 5 stars Excellent Hadoop reference
This book sits next to my laptop as my Hadoop reference guide/bible. Covering installation, administration and development, this is my go to resource for understanding Hadoop as a... Read more
Published 20 months ago by Dan S.
2.0 out of 5 stars Outdated by the Time it hit the shelf
The APIs in this book were all outdated by the time the book hit the shelf. The authors did recognize this and mention it in the book, however you don't need 400 pages to... Read more
Published on November 18, 2010 by Peter Harrington
Search Customer Reviews
Only search this product's reviews

What Other Items Do Customers Buy After Viewing This Item?


Forums

Search Customer Discussions
Search all Amazon discussions

Start a new discussion
Topic:
First post:
Prompts for sign-in
 



Listmania!

Create a Listmania! list

So You'd Like to...


Create a guide


Look for Similar Items by Category