- Paperback: 370 pages
- Publisher: O'Reilly Media; 2 edition (July 22, 2016)
- Language: English
- ISBN-10: 1491933666
- ISBN-13: 978-1491933664
- Product Dimensions: 7 x 0.8 x 9.2 inches
- Shipping Weight: 1.6 pounds (View shipping rates and policies)
- Average Customer Review: 23 customer reviews
- Amazon Best Sellers Rank: #74,726 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Cassandra: The Definitive Guide: Distributed Data at Web Scale 2nd Edition
Use the Amazon App to scan ISBNs and compare prices.
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Frequently bought together
Customers who bought this item also bought
About the Author
Jeff Carpenter is a software and systems architect with experience in the hospitality and defense industries. Jeff cut his teeth as an architect in the early days of Service-Oriented Architecture (SOA) and has worked on projects ranging from a complex battle planning system in an austere network environment, to a cloud-based hotel reservation system. Jeff is passionate about projects and technologies that change industries, helping troubled projects find architectural solutions, and mentoring other architects and developers.
Eben Hewitt is Director of Application Architecture at a publicly traded company where he is responsible for the design of their mission-critical, global-scale web, mobile and SOA integration projects. He has written several programming books, including Java SOA Cookbook (O'Reilly).
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
However, the book is in need of additional editing – it contains enough sections that are confusing, misleading and in some cases, completely wrong, that it is not really suitable as an authoritative reference or (as its title claims) a definitive guide.
A few examples:
Page 70 contains a warning about counters, stating that “the increment and decrement operators are not idempotent”, with no additional explanation. Without further explanation, this statement is useless to most people new to Cassandra because incrementing and decrementing are normally not idempotent operations – incrementing a counter twice should be expected to leave the counter in a state different than incrementing a counter once. The passage goes on to say “There is no operation to reset a counter directly, but you can approximate a reset by reading the counter value and decrementing by that value. Unfortunately, this is not guaranteed to work perfectly, as the counter may have been changed elsewhere in between reading and writing.” While that passage may be correct, it has nothing to do with idempotence; instead it is due to the fact that read-modify-write of counters is not performed atomically by Cassandra. As it happens, there may be an issue with Cassandra counters and idempotence in versions of Cassandra prior to 2.1, and with counter inaccuracies resulting from timeouts in all versions of Cassandra, but these issues are nowhere described in the book. The book’s handling of counters is deficient in other ways as well – e.g. no detailed examples are given to illustrate how counters might be profitably employed in a real-world data model.
Even more concerning is the discussion of “wide rows” which first occurs on page 59 and continues at various points throughout the book. Page 59 defines a wide row as a row that has “lot and lots (perhaps tens of thousands or even millions) of columns”. But, the following page illustrates a wide row as being synonymous with a partition, i.e. a set of rows of a table with a common set of value for the columns that compose the partition key. These are two different notions, and the book does not make it clear which is the correct definition for “wide row”. A later section of the book (on page 90) uses the hotel model (introduced in the logical data modeling section) as an example of the “wide row” model. However, the most columns of any table in the hotel model is 7, hardly “lots and lots”, so presumably this section is using “wide row” to mean “partition” rather than “a row with lots and lots of columns”.
More partition confusion occurs on page 97 under the heading “Calculating Partition Size”. We are warned that we need to calculate a maximum partition size to look for whether “our tables will have partitions that will be overly large”, and that “Cassandra’s hard limit is 2 billion cells per partition, but we’ll likely run into performance issues before reaching that point”. A few paragraphs later, it calculates the partition size (in columns) of the available_rooms_by_hotel_date table from the book’s hotel data model as the number of rows times the number of non-primary key columns. For the number of rows, it uses 5000 hotels *100 rooms/hotel *730 days = 365,000,000. But, this is the number of rows in the table. Since this table’s partition key is hotel_id, there is one partition per hotel, and so the number of rows per partition is actually 100 rooms/hotel*730 days = 73,000, a far cry from 365,000,000!
Page 186 contains a misleading statement about inserting with light-weight transactions. It states that when inserting rows with the “with not exists” qualifier, if a row already exists with the same values for the primary key columns as the row that we are trying to insert, that the CQL interpreter will return a failure, along with the “values that we tried to enter”. However, a few paragraphs above, it is said that “if a transaction fails because the existing values did not match the one you expected, Cassandra will include the current ones so you can decide whether to retry or abort without making an extra request” – which sounds like Cassandra is returning the values that are already in the database rather than the ones that we tried to enter.
A final example of misleading text occurs on page 305, where the sizing for machines used as Cassandra nodes is described. This section recommends that Cassandra nodes in development environments should have at least 2 cores and 8 GB of memory, and that Cassandra nodes in production environments should have at least “eight cores (although four cores are acceptable for virtual machines), and anywhere from 16 MB to 64 MB of memory”. This section raises two questions:
1. Why would a virtual machine need fewer cores than a physical server? This assertion seems dubious. And, even if true (which seems unlikely), it is sufficiently counterintuitive as to require explanation, but none is given.
2. Is 16 MB really sufficient RAM for a production Cassandra node? Presumably the author intended to say 16GB to 64GB (rather than MB).
In summary, the book’s scope and engaging text make it a useful text for those new to Cassandra. However, it is in need of editing, and its numerous inaccuracies and misleading sections preclude it from being useful as an authoritative reference or definitive guide. Hopefully the third edition will address these issues.
Data modeling topics, arguably one of the most crucial aspects of learning a new database system are given a cursory glance with a single example and then left aside.
There is a lot of material about which classes to use what, the various strategies, etc. However, something that is basic for use "what is a cluster column" has two sentences in the entire book, none of which explain what it does, when I should apply it, etc.
Another topic, static columns, is also mentioned briefly. It explains _what_ it is, but in such a way that after I went online and read about it in the docs, I could say that the description is accurate, but useless.
At at no point was there any example of _when should I use it_.
The book spend a lot of time on minutia (this is how you setup the client API in Python, Java, etc). It talks about the _history_ of long dead client projects (not sure why I should care about that), but when talking about prepared statements, for example, it doesn't cover what happens if after preparing the statement, I'm connected to a node that didn't get that statement.
That seems like an obvious thing that I'll want to know, but there is no coverage of that. In the same manner, secondary indexes are covered to the extent that we know that they exists, but no discussion of their design, implication for use, etc are covered at all.
I'm very disappointed by this book.
I would suggest this book as a must buy for anyone who wants to learns about Cassandra. If possible try to grab the Kindle edition which allows you to read across platforms.