Buy new:
-60% $19.99$19.99
Delivery Wednesday, October 2
Ships from: Amazon Sold by: Green Namu
Save with Used - Good
$16.47$16.47
Delivery Monday, October 7
Ships from: Amazon Sold by: Jenson Books Inc
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the authors
OK
Cassandra: The Definitive Guide: Distributed Data at Web Scale 2nd Edition
There is a newer edition of this item:
$52.99
(13)
Only 9 left in stock (more on the way).
Purchase options and add-ons
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition―updated for Cassandra 3.0―provides the technical details and practical examples you need to put this database to work in a production environment.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility.
- Understand Cassandra’s distributed and decentralized structure
- Use the Cassandra Query Language (CQL) and cqlsh―the CQL shell
- Create a working data model and compare it with an equivalent relational model
- Develop sample applications using client drivers for languages including Java, Python, and Node.js
- Explore cluster topology and learn how nodes exchange data
- Maintain a high level of performance in your cluster
- Deploy Cassandra on site, in the Cloud, or with Docker
- Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene
- ISBN-101491933666
- ISBN-13978-1491933664
- Edition2nd
- PublisherO'Reilly Media
- Publication dateAugust 16, 2016
- LanguageEnglish
- Dimensions5.91 x 0.59 x 9.84 inches
- Print length367 pages
Frequently bought together

Customers who bought this item also bought
HBase: The Definitive Guide: Random Access to Your Planet-Size DataPaperback$7.18 shippingOnly 1 left in stock (more on the way).
Kafka: The Definitive Guide: Real-Time Data and Stream Processing at ScalePaperback$7.30 shippingGet it as soon as Thursday, Oct 3Only 1 left in stock - order soon.
Editorial Reviews
About the Author
Eben Hewitt is Director of Application Architecture at a publicly traded company where he is responsible for the design of their mission-critical, global-scale web, mobile and SOA integration projects. He has written several programming books, including Java SOA Cookbook (O'Reilly).
Product details
- Publisher : O'Reilly Media; 2nd edition (August 16, 2016)
- Language : English
- Paperback : 367 pages
- ISBN-10 : 1491933666
- ISBN-13 : 978-1491933664
- Item Weight : 1.47 pounds
- Dimensions : 5.91 x 0.59 x 9.84 inches
- Best Sellers Rank: #1,727,525 in Books (See Top 100 in Books)
- #765 in Data Mining (Books)
- #825 in Data Modeling & Design (Books)
- #1,081 in Data Processing
- Customer Reviews:
About the authors

Eben Hewitt is the CTO at Sabre Hospitality where he is responsible for the technology strategy and for designing large-scale, mission-critical systems and leading teams to build them. He has served as CTO at one of the world's largest hotel companies and CIO of O'Reilly Media. He has been a consultant to Warburg Pincus on distributed data and a frequent speaker at conferences including Software Architecture (NYC and London), JavaOne, Strange Loop, and many others and was introduced by Walter Isaacson to speak at AH&LA Forum in DC.
Eben Hewitt is the author of several technical books, including Technology Strategy Patterns, Cassandra: The Definitive Guide (2 editions), Java SOA Cookbook, Java for ColdFusion Developers and he is a contributor to 97 Things Every Software Architect Should Know. His books have been translated into Chinese and Japanese and taught at Georgia Tech.
His first full-length play was produced in NYC at the Mint Theater and starred Shea Whigham and Kirk Acevedo.

I'm a software developer, architect, advocate, and whatever the next challenge calls for. I've worked in the defense and hospitality industries and love to write about system architecture, microservices and cloud-native technologies including Apache Cassandra.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book very good and excellent. They say it provides a useful resource for anyone new to Cassandra and an excellent overview of the technology.
AI-generated from the text of customer reviews
Customers find the book very good and excellent. They say it provides good background on Cassandra's internal architecture details.
"Awesome book, it'll give you an overview of Cassandra, what it is and how it works internally...." Read more
"...It can be a bit overwhelming to start with but definitely worth the read if you are going to use Cassandra" Read more
"...The text is quite engaging and enjoyable to read...." Read more
"a massive steak dinner of a book about Cassandra. A very good read for anyone getting into this technology." Read more
Customers find the knowledge base useful for anyone new to Cassandra. They say it provides an excellent overview of Cassandra and is an excellent guide to learn and understand it.
"Rated 5 because I enjoyed reading it and it gave an excellent overview of Cassandra. Would recommend this to anyone that is new to Cassandra." Read more
"A useful resource for anyone new to Cassandra...." Read more
"Decent intro book to Cassandra. We are starting to proof out Cassandra at work and this book has benefited me." Read more
"Excellent Guide to learn and understand Cassandra..." Read more
-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
I would suggest this book as a must buy for anyone who wants to learns about Cassandra. If possible try to grab the Kindle edition which allows you to read across platforms.
However, the book is in need of additional editing – it contains enough sections that are confusing, misleading and in some cases, completely wrong, that it is not really suitable as an authoritative reference or (as its title claims) a definitive guide.
A few examples:
Page 70 contains a warning about counters, stating that “the increment and decrement operators are not idempotent”, with no additional explanation. Without further explanation, this statement is useless to most people new to Cassandra because incrementing and decrementing are normally not idempotent operations – incrementing a counter twice should be expected to leave the counter in a state different than incrementing a counter once. The passage goes on to say “There is no operation to reset a counter directly, but you can approximate a reset by reading the counter value and decrementing by that value. Unfortunately, this is not guaranteed to work perfectly, as the counter may have been changed elsewhere in between reading and writing.” While that passage may be correct, it has nothing to do with idempotence; instead it is due to the fact that read-modify-write of counters is not performed atomically by Cassandra. As it happens, there may be an issue with Cassandra counters and idempotence in versions of Cassandra prior to 2.1, and with counter inaccuracies resulting from timeouts in all versions of Cassandra, but these issues are nowhere described in the book. The book’s handling of counters is deficient in other ways as well – e.g. no detailed examples are given to illustrate how counters might be profitably employed in a real-world data model.
Even more concerning is the discussion of “wide rows” which first occurs on page 59 and continues at various points throughout the book. Page 59 defines a wide row as a row that has “lot and lots (perhaps tens of thousands or even millions) of columns”. But, the following page illustrates a wide row as being synonymous with a partition, i.e. a set of rows of a table with a common set of value for the columns that compose the partition key. These are two different notions, and the book does not make it clear which is the correct definition for “wide row”. A later section of the book (on page 90) uses the hotel model (introduced in the logical data modeling section) as an example of the “wide row” model. However, the most columns of any table in the hotel model is 7, hardly “lots and lots”, so presumably this section is using “wide row” to mean “partition” rather than “a row with lots and lots of columns”.
More partition confusion occurs on page 97 under the heading “Calculating Partition Size”. We are warned that we need to calculate a maximum partition size to look for whether “our tables will have partitions that will be overly large”, and that “Cassandra’s hard limit is 2 billion cells per partition, but we’ll likely run into performance issues before reaching that point”. A few paragraphs later, it calculates the partition size (in columns) of the available_rooms_by_hotel_date table from the book’s hotel data model as the number of rows times the number of non-primary key columns. For the number of rows, it uses 5000 hotels *100 rooms/hotel *730 days = 365,000,000. But, this is the number of rows in the table. Since this table’s partition key is hotel_id, there is one partition per hotel, and so the number of rows per partition is actually 100 rooms/hotel*730 days = 73,000, a far cry from 365,000,000!
Page 186 contains a misleading statement about inserting with light-weight transactions. It states that when inserting rows with the “with not exists” qualifier, if a row already exists with the same values for the primary key columns as the row that we are trying to insert, that the CQL interpreter will return a failure, along with the “values that we tried to enter”. However, a few paragraphs above, it is said that “if a transaction fails because the existing values did not match the one you expected, Cassandra will include the current ones so you can decide whether to retry or abort without making an extra request” – which sounds like Cassandra is returning the values that are already in the database rather than the ones that we tried to enter.
A final example of misleading text occurs on page 305, where the sizing for machines used as Cassandra nodes is described. This section recommends that Cassandra nodes in development environments should have at least 2 cores and 8 GB of memory, and that Cassandra nodes in production environments should have at least “eight cores (although four cores are acceptable for virtual machines), and anywhere from 16 MB to 64 MB of memory”. This section raises two questions:
1. Why would a virtual machine need fewer cores than a physical server? This assertion seems dubious. And, even if true (which seems unlikely), it is sufficiently counterintuitive as to require explanation, but none is given.
2. Is 16 MB really sufficient RAM for a production Cassandra node? Presumably the author intended to say 16GB to 64GB (rather than MB).
In summary, the book’s scope and engaging text make it a useful text for those new to Cassandra. However, it is in need of editing, and its numerous inaccuracies and misleading sections preclude it from being useful as an authoritative reference or definitive guide. Hopefully the third edition will address these issues.
Top reviews from other countries
I had to write 8 more words to complete this review.


