Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the authors
OK
Cassandra: The Definitive Guide: Distributed Data at Web Scale 3rd Edition
There is a newer edition of this item:
$52.99
(13)
Only 9 left in stock (more on the way).
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, youâ??ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This third editionâ??updated for Cassandra 4.0â??provides the technical details and practical examples you need to put this database to work in a production environment.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandraâ??s nonrelational design, with special attention to data modeling. If youâ??re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandraâ??s speed and flexibility.
- Understand Cassandraâ??s distributed and decentralized structure
- Use the Cassandra Query Language (CQL) and cqlshâ??the CQL shell
- Create a working data model and compare it with an equivalent relational model
- Develop sample applications using client drivers for languages including Java, Python, and Node.js
- Explore cluster topology and learn how nodes exchange data
- ISBN-101098115163
- ISBN-13978-1098115166
- Edition3rd
- PublisherO'Reilly Media
- Publication dateMay 12, 2020
- LanguageEnglish
- Dimensions7 x 0.75 x 9 inches
- Print length426 pages
Customers who bought this item also bought
The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDBEelco PluggePaperback$7.30 shippingGet it as soon as Friday, Oct 4Only 1 left in stock - order soon.
From the brand
-
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
From the Preface
Why Apache Cassandra?
Apache Cassandra is a free, open source, distributed data storage system that differs sharply from relational database management systems (RDBMSs).
Cassandra first started as an Incubator project at Apache in January of 2009. Shortly thereafter, the committers, led by Apache Cassandra Project Chair Jonathan Ellis, released version 0.3 of Cassandra, and steadily made releases up to the milestone 3.0 release. Since 2017, the project has been led by Apache Cassandra Project Chair Nate McCall, producing releases 3.1 through the latest 4.0 release. Cassandra is being used in production by some of the biggest companies on the web, including Facebook, Twitter, and Netflix.
Its popularity is due in large part to the outstanding technical features it provides. It is durable, seamlessly scalable, and tuneably consistent. It performs blazingly fast writes, can store hundreds of terabytes of data, and is decentralized and symmetrical so there’s no single point of failure. It is highly available and offers a data model based on the Cassandra Query Language (CQL).
There is a revision to this edition. See Amazon ASIN 1492097144, ISBN13: 9781492097143
Is This Book for You?
This book is intended for a variety of audiences. It should be useful to you if you are:
- A developer working with large-scale, high-volume applications, such as Web 2.0 social applications, ecommerce sites, financial services, or sensor-based Internet of Things (IoT) systems
- An application architect or data architect who needs to understand the available options for high-performance, decentralized, elastic data stores
- A database administrator or database developer currently working with standard relational database systems who needs to understand how to implement a fault-tolerant, eventually consistent data store
- A manager who wants to understand the advantages (and disadvantages) of Cassandra to help make decisions about technology strategy
- A student, analyst, or researcher who is designing a project related to Cassandra or other nonrelational data store options
This book is a technical guide. In many ways, Cassandra and other NoSQL databases represent a new way of thinking about data. Many developers who gained their professional chops in the last 15–20 years have become well versed in thinking about data in purely relational or object-oriented terms. Cassandra’s data model is different and can be difficult to wrap your mind around at first, especially for those of us with entrenched ideas about what a database is (and should be).
Using Cassandra does not mean that you have to be a Java developer. However, Cassandra is written in Java, so if you’re going to dive into the source code, a solid understanding of Java is crucial. Many of the examples in this book are in Java, but Cassandra drivers are available in a wide variety of languages, including Java, Node.js, Python, C#, PHP, Ruby, and Go.
Finally, it is assumed that you have a good understanding of how the web works, can use an integrated development environment (IDE), and are somewhat familiar with the typical concerns of data-driven applications. You might be a well-seasoned developer or administrator but still, on occasion, encounter tools used in the Cassandra world that you’re not familiar with. For example, Apache Ant is used to build Cassandra, and the Cassandra source code is available via Git. In cases where we speculate that you’ll need to do a little setup of your own in order to work with the examples, we try to support that.
Editorial Reviews
About the Author
Eben Hewitt is the CTO and Chief Architect at Sabre Hospitality where he is responsible for the technology strategy and for designing large-scale, mission-critical systems and leading teams to build them. He has served as CTO at one of the world's largest hotel companies and CIO of O'Reilly Media. He has been a consultant to Warburg Pincus and others on distributed data and a frequent speaker at international conferences. He is the author of several books, including Technology Strategy Patterns (2018), Cassandra: The Definitive Guide, Java SOA Cookbook, and several other books on architecture, web and software development, including Semantic Software Design (2019). He's won several innovation awards for his software design work.
Product details
- Publisher : O'Reilly Media; 3rd edition (May 12, 2020)
- Language : English
- Paperback : 426 pages
- ISBN-10 : 1098115163
- ISBN-13 : 978-1098115166
- Item Weight : 1.5 pounds
- Dimensions : 7 x 0.75 x 9 inches
- Best Sellers Rank: #1,877,787 in Books (See Top 100 in Books)
- #221 in Desktop Database Books
- #853 in Data Mining (Books)
- #1,184 in Data Processing
- Customer Reviews:
About the authors

I'm a software developer, architect, advocate, and whatever the next challenge calls for. I've worked in the defense and hospitality industries and love to write about system architecture, microservices and cloud-native technologies including Apache Cassandra.

Eben Hewitt is the CTO at Sabre Hospitality where he is responsible for the technology strategy and for designing large-scale, mission-critical systems and leading teams to build them. He has served as CTO at one of the world's largest hotel companies and CIO of O'Reilly Media. He has been a consultant to Warburg Pincus on distributed data and a frequent speaker at conferences including Software Architecture (NYC and London), JavaOne, Strange Loop, and many others and was introduced by Walter Isaacson to speak at AH&LA Forum in DC.
Eben Hewitt is the author of several technical books, including Technology Strategy Patterns, Cassandra: The Definitive Guide (2 editions), Java SOA Cookbook, Java for ColdFusion Developers and he is a contributor to 97 Things Every Software Architect Should Know. His books have been translated into Chinese and Japanese and taught at Georgia Tech.
His first full-length play was produced in NYC at the Mint Theater and starred Shea Whigham and Kirk Acevedo.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on Amazon-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
Frankly, the book covered almost all the topics I could think of - the reasoning behind its architecture, the intended access patterns, the usual pitfalls, data modeling paradigm shift that's necessary with stores like Cassandra, how to install, configure, run, manage and monitor, how to tweak its performance and where and how to look for bottlenecks, how to integrate it with other technologies... Like I said, pretty thorough.
The pedantic in me got slightly triggered by the way the CAP theorem was introduced and used to illustrate some features of Cassandra - Designing Data-Intensive Applications taught me well - and I wish the book also included more examples of failure and gotcha modes and how to manage them.
For instance, there's no mention of how to handle hot partitions - a common problem in DynamoDB and from what I've seen, a very possible issue in Cassandra as well. Yet it wasn't mentioned even in passing - only how to treat large partitions, which is probably similar but having no experience with Cassandra I wish I didn't have to guess.
Also, seeing how much Cassandra relies on timestamps, how big of an issue is clock skew? How do we spot it, how do we mitigate it, how big of a skew can we tolerate? I imagine in AWS (and other cloud providers) it's easier to handle, you'd probably just configure the instances to use NTP with the Time Sync Service but NTP still uses the network so no silver bullet there either, and what of hosting your own clusters?
All in all, a great starting point for sure.
Both are essentially useless. You’ll get more bang for your buck asking uncle Google.
