Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset 1st Edition, Kindle Edition
Use the Amazon App to scan ISBNs and compare prices.
Prepare for your professional certification with study guides and exam prep tools from Wiley. See more
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Customers who bought this item also bought
Top Customer Reviews
I have written a detailed chapter-by-chapter review of this book on www DOT i-programmer DOT info, the first and last parts of this review are given here. For my review of all chapters, search i-programmer DOT info for STIRK together with the book's title.
This book sets out to be "A Working Guide to the Complete Hadoop Toolset" and is both wide ranging in content, and practical in its approach. Michael Frampton assumes some knowledge of Linux and SQL (but only a little), and no knowledge of Hadoop or its tools.
There is a step by step approach to tool download, installation, execution, and error checking. The following areas of functionality and associated tools are covered:
*Hadoop installation (version 1 and version 2)
*Web-based data collection (Nutch, Solr, Gora, HBase)
*Map Reduce programming (Java, Pig, Perl, Hive)
*Scheduling (Fair and Capacity schedulers, Oozie)
*Moving data (Hadoop commands, Sqoop, Flume, Storm)
*Monitoring (Hue, Nagios, Ganglia)
*Hadoop cluster management (Ambari, CDH)
*Analysis with SQL (Impala, Hive, Spark)
*ETL (Pentaho, Talend)
*Reporting (Splunk, Talend)
Below is a chapter-by-chapter exploration of the topics covered.
Chapter 1: The Problem with Data
This chapter opens with a review of what big data is – data that can’t be stored and processed in the required timeframe using traditional methods (e.g. relational databases). It is typically defined in terms of the 3Vs (i.e. volume, velocity and variety).
Various attributes of big data are discussed, including: massive scalability, commodity and cost-effective hardware, ability to recover from hardware failure, and parallel processing.
Increasing amounts of data are being created and stored. Big data processing typically starts with high terabyte amounts of data, however big data can also be applicable to smaller systems which grow. Examples of high volume data generators are given, including the Large Hadron Collider which produces 15PB of detector data per year. The chapter ends with an overview of the rest of the book.
This is a useful opening chapter, defining what big data is, its attributes, and the problems it solves. It helpfully explains the book’s practical approach i.e. introduce a tool, show how to obtain it, how to install it, and show example usage. There’s a very useful overview of each subsequent chapter.
This book provides a broad and practical introduction to big data, using Hadoop and its many tools. It gives comprehensive step by step instructions on how to download, install, configure, and run the various tools - additionally, common errors are explained and solutions proposed.
The book is easy to read, with helpful explanations, screenshots, listings, outputs, and a logical flow between the sections and chapters. There are good links between the chapters, and to websites containing further information. It also steps back and puts what’s being discussed into the larger context of big data. The book will certainly give you more confidence in the topic.
It should be noted that there is much more information available on all the tools discussed, however this book is a great starting point, and it does an excellent job of introducing the many tools in an easily understandable manner. This is perhaps the book to read before the more specific or advanced books (e.g. “Hadoop: The Definitive Reference” - the new edition of which is out in April 2015, look out for its review soon.)
If I have one concern, it relates to who will use the whole book. This book contains both admin and development sections, however large companies typically separate out their admin and development teams.
If you want a useful working introduction and overview of the current state of big data, Hadoop and its associated tools, I can highly recommend this very instructive book.
Set up an Amazon Giveaway
What other items do customers buy after viewing this item?
Look for similar items by category
- Books > Business & Money > Human Resources > Knowledge Capital
- Books > Computers & Technology > Databases & Big Data
- Books > Computers & Technology > Internet & Social Media
- Books > Computers & Technology > Networking & Cloud Computing > Data in the Enterprise > Client-Server Systems
- Kindle Store > Kindle eBooks > Business & Money > Management & Leadership > Knowledge Capital
- Kindle Store > Kindle eBooks > Computers & Technology > Databases
- Kindle Store > Kindle eBooks > Computers & Technology > Networking > Client-Server Systems