Hi,
I have written a detailed chapter-by-chapter review of this book on www DOT i-programmer DOT info, the first and last parts of this review are given here. For my review of all chapters, search i-programmer DOT info for STIRK together with the book's title.
This book aims to introduce you to a popular platform for storing and analyzing big data on Hadoop. How does it fare?
Increasing amounts of data are being created, and there’s a need to store and process this data to gain competitive advantage. Hive is a popular platform for storing and analyzing big data on Hadoop. Hive tends to be popular because it uses a SQL-like syntax, familiar to many people. With plenty of built-in functionality, big data analysis can be done in Hive without advanced coded skills.
The book is aimed at both the beginner and the more advanced audience (data analysts, developers, and users). Some previous experience of SQL and databases is advantageous.
Chapter 1 Overview of Big Data and Hive
The chapter opens with a brief overview of the history of data processing, covering: batch, online, relational databases, and the internet. The latter has led to a massive rise in the amount of data being created, requiring new approaches to processing. This big data can be described in terms of various attributes including: volume, velocity and variety.
Big data tends to be processed on relatively cheap commodity hardware, using a distributed processing. Hadoop is a popular platform for big data processing. The chapter discusses the major components of Hadoop:
*Hadoop Distributed File System (HDFS) – storage system
*MapReduce – computing system (distributes processing and aggregates results)
*Associated components (e.g. HBase, Sqoop, Flume, Impala etc)
Having described how we arrived at big data and Hadoop, the chapter proceeds with an overview of Hive. Hive allows you to issue queries against petabytes of data, using its Hive Query Language (HQL) which is similar to SQL. Hive gives a table structure to data held in HDFS. Using Hive allows simpler data processing, compared with similar code written in Java.
This chapter provides a helpful background on how we arrived at today’s big data and Hadoop platform. An overview of Hadoop and its components is given, together with a very helpful diagram of the Hadoop ecosystem (e.g. HDFS, HBase, Sqoop, Impala, etc). A useful overview of Hive is provided, highlighting its purpose and advantages.
.
.
.
Conclusion
This book provides up-to-date detail on Hive, a very popular platform for storing and analyzing big data on Hadoop.
Most topics are explained in a very readable manner, a few sections could do with more detail (e.g. transactions). Throughout, there are helpful explanations, screenshots, practical code examples, and inter-chapter references. Some links to websites are provided for further information.
This book is especially suitable for developers and data analysts starting out with Hive. Additionally, since it also contains advanced and up-to-date material, it is also suitable for more advanced developers/analysts. If you have a background in SQL the book is even easier to understand.
There are very few books dedicated to Hive, and these tend to be out of date now (especially since Hive changes regularly). If you want an up-to-date, practical, wide-ranging review of Hive’s functionality, I highly recommend this book.
- Amazon Business : For business-only pricing, quantity discounts and FREE Shipping. Register a free business account






