- Use promo code PRIMEBOOKS18 to save $5.00 when you spend $20.00 or more on Books offered by Amazon.com. Enter code PRIMEBOOKS18 at checkout. Here's how (restrictions apply)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Big Data: Principles and best practices of scalable realtime data systems 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Frequently bought together
Customers who bought this item also bought
Special offers and product promotions
From the Publisher
About This Book
Services like social networks, web analytics, and intelligent e-commerce often need to manage data at a scale too big for a traditional database. Complexity increases with scale and demand, and handling Big Data is not as simple as just doubling down on your RDBMS or rolling out some trendy new technology. Fortunately, scalability and simplicity are not mutually exclusive—you just need to take a different approach. Big Data systems use many machines working in parallel to store and process data, which introduces fundamental challenges unfamiliar to most developers.
Big Data teaches you to build these systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to Big Data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of Big Data systems and how to implement them in practice.
Big Data requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful, though not required. The goal of the book is to teach you how to think about data systems and how to break down difficult problems into simple solutions. We start from first principles and from those deduce the necessary properties for each component of an architecture.
About the Author
Nathan Marz is currently working on a new startup. Previously, he was the lead engineer at BackType before being acquired by Twitter in 2011. At Twitter, he started the streaming compute team which provides and develops shared infrastructure to support many critical realtime applications throughout the company. Nathan is the creator of Cascalog and Storm, open-source projects which are relied upon by over 50 companies around the world, including Yahoo!, Twitter, Groupon, The Weather Channel, Taobao, and many more companies.
James Warren is an analytics architect at Storm8 with a background in big data processing, machine learning and scientific computing.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
The book is very organized. Introduction in chapter 1 will be the road map of the whole book. Motivating with a simple web application based on RDBMS, the author showed how the approach to scale it becomes undesirable. After enumerating a list of desired properties, he proposed Lambda architecture, an approach in contrast to fully incremental architecture (with RDBMS).
The Lambda architecture is partitioned into three layers:
1. batch layer that computes different views on big data
2. serving layer that answers user queries using views from the batch layer and speed layer.
3. speed layer that compensates an approximate answer over a period time when the batch layer is working on the complete answers.
In the remaining chapters, the author dive deep into the rationale and requirements of all the different pieces of Lambda Architecture.
To under the context of Lambda Architecture, also refer to the wikipedia for crticism.
The other very useful for me feature of this book is that it is the first book where I could find a concise explanation of Storm Trident framework, even though the book is not about Storm.
The only downside to the book is that the architecture and exosystem is so new that there's not really a lot of pragmatic solutions. For example, the theory describes a query layer that can merge the results of batch and real time processing for client applications. However, in real life there are no pragmatic solutions for doing this so you'd have to write your own.
It'll be interesting to see how the lambda architecture matures and to see future editions of this book. Hopefully, future editions will be as well written and have a better ecosystem for practice chapters.
Most recent customer reviews
- author is well known in the field. lambda architecture is a good idea.Read more
original developer of Storm (together with many other relevant pieces
of software, such as Cascalog) is...Read more