Buy new:
$33.49$33.49
FREE delivery:
Saturday, Feb 11
Ships from: Amazon.com Sold by: Amazon.com
Buy Used: $31.50
Other Sellers on Amazon
& FREE Shipping
97% positive over last 12 months
& FREE Shipping
91% positive over last 12 months
Usually ships within 2 to 3 days.
& FREE Shipping
82% positive over last 12 months
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required. Learn more
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Database Internals: A Deep Dive into How Distributed Data Systems Work 1st Edition
| Price | New from | Used from |
|
Audible Audiobook, Unabridged
"Please retry" |
$0.00
| Free with your Audible trial | |
Enhance your purchase
When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.
Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.
This book examines:
- Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each
- Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log
- Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns
- Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency
- ISBN-101492040347
- ISBN-13978-1492040347
- Edition1st
- PublisherO'Reilly Media
- Publication dateNovember 5, 2019
- LanguageEnglish
- Dimensions7 x 0.75 x 9 inches
- Print length370 pages
Frequently bought together

- +
- +
More items to explore
From the brand
-
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
From the Preface
Who is this book for?
In conversations at technical conferences, I often hear the same question: “How can I learn more about database internals? I don’t even know where to start.” Most of the books on database systems do not go into details of storage engine implementation, and cover the access methods, such as B-Trees, on a rather high level. There are very few books that cover more recent concepts, such as different B-Tree variants and log-structured storage, so I usually recommend reading papers.
Everyone who reads papers knows that it’s not that easy: you often lack context, the wording might be ambiguous, there’s little or no connection between papers, and they’re hard to find. This book contains concise summaries of important database systems concepts and can serve as a guide for those who’d like to dig in deeper, or as a cheat sheet for those already familiar with these concepts.
Not everyone wants to become a database developer, but this book will help people who build software that uses database systems: software developers, reliability engineers, architects, and engineering managers.
If your company depends on any infrastructure component, be it a database, a messaging queue, a container platform, or a task scheduler, you have to read the project change-logs and mailing lists to stay in touch with the community and be up-to-date with the most recent happenings in the project.
Understanding terminology and knowing what’s inside will enable you to yield more information from these sources and use your tools more productively to troubleshoot, identify, and avoid potential risks and bottlenecks. Having an overview and a general understanding of how database systems work will help in case something goes wrong. Using this knowledge, you’ll be able to form a hypothesis, validate it, find the root cause, and present it to other project maintainers.
This book is also for curious minds: for the people who like learning things without immediate necessity, those who spend their free time hacking on something fun, creating compilers, writing homegrown operating systems, text editors, computer games, learning programming languages, and absorbing new information.
The reader is assumed to have some experience with developing backend systems and working with database systems as a user. Having some prior knowledge of different data structures will help to digest material faster.
Editorial Reviews
About the Author
Product details
- Publisher : O'Reilly Media; 1st edition (November 5, 2019)
- Language : English
- Paperback : 370 pages
- ISBN-10 : 1492040347
- ISBN-13 : 978-1492040347
- Item Weight : 1.3 pounds
- Dimensions : 7 x 0.75 x 9 inches
- Best Sellers Rank: #47,819 in Books (See Top 100 in Books)
- #5 in Management Information Systems
- #5 in Data Warehousing (Books)
- #21 in Data Processing
- Customer Reviews:
About the author

Alex is a data infrastructure engineer, database and storage systems enthusiast, Apache Cassandra committer and a PMC member. His expertise is in storage, distributed systems, and algorithms.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonReviewed in the United States on November 30, 2019
-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
More and more I find myself just looking at the title of a section and then looking up that concept on wikipedia for a much better treatment of the topic. I think that's the main value of this book: look a the table of contents and just look at wikipedia for each section and save yourself the cost of the book.
I find it hard to believe that the people who gave this 5 stars actually read the entire book. The first half is a superficial review of database concepts (mostly just B-trees which are the focus of chapters 2, 4, and 6; and then LSM trees in chapter 7). Read the wikipedia article on B-trees and the original LSM Tree paper (free online) to get a better treatment. The second half (distributed systems) is particularly bad, so if you're more interested in that, frankly just google "Patterns of Distributed Systems" on Martin Fowler's page. It's free and much better.
Database Internals is divided into two parts - the first deals with database storage. Especially good sections put a 9-cell flash-light on how many recent architectures are indeed built to tackle complexity bottom-up. i.e., LSM (log-structured merge) trees nicely complement the "write amplification" of Solid-State Disks. The discussion on the canonical B-tree and its multiple siblings (especially Bw-tree) is very well done. The functional difference between locks and latches would be enlightening even for experienced database practitioners - locks are used to manage transactions, latches to guard the *physical* storage representation.
The second half of the book focusing on distributed systems is more uneven in quality. It is, however, a great start of economized discussion of about 50 "Best Papers" on Leader Election, Failure/Crash detection, Replication and how distributed systems friendly "consensus protocols", rather than atomic ones like 2-phase commit work better. In many ways, distributed systems have veered from monarchy (single, immutable leader deciding everything, including the next leader) to a true republic (leader is still almost omnipotent, but is regularly replaced by the constituents). The comparative analysis of Paxos, ZAB and Raft - with clear sequence diagrams - is very well done.
The quality of writing is good, though could have been helped with more ruthless editing. The area covered is simply too broad, other than the intersect of SSDs and Modern DB architecture which is very deep and very good. Still the book easily deserves at least 4-stars for the enthusiasm and for its good attempt to convey distributed systems pedagogy to general practitioners. Pair it with Martin Kleppmann's "Designing Data Intensive Applications" and Ken Birman's "Guide to Reliable Distributed Systems".
Reviewed in the United States 🇺🇸 on November 30, 2019
Database Internals is divided into two parts - the first deals with database storage. Especially good sections put a 9-cell flash-light on how many recent architectures are indeed built to tackle complexity bottom-up. i.e., LSM (log-structured merge) trees nicely complement the "write amplification" of Solid-State Disks. The discussion on the canonical B-tree and its multiple siblings (especially Bw-tree) is very well done. The functional difference between locks and latches would be enlightening even for experienced database practitioners - locks are used to manage transactions, latches to guard the *physical* storage representation.
The second half of the book focusing on distributed systems is more uneven in quality. It is, however, a great start of economized discussion of about 50 "Best Papers" on Leader Election, Failure/Crash detection, Replication and how distributed systems friendly "consensus protocols", rather than atomic ones like 2-phase commit work better. In many ways, distributed systems have veered from monarchy (single, immutable leader deciding everything, including the next leader) to a true republic (leader is still almost omnipotent, but is regularly replaced by the constituents). The comparative analysis of Paxos, ZAB and Raft - with clear sequence diagrams - is very well done.
The quality of writing is good, though could have been helped with more ruthless editing. The area covered is simply too broad, other than the intersect of SSDs and Modern DB architecture which is very deep and very good. Still the book easily deserves at least 4-stars for the enthusiasm and for its good attempt to convey distributed systems pedagogy to general practitioners. Pair it with Martin Kleppmann's "Designing Data Intensive Applications" and Ken Birman's "Guide to Reliable Distributed Systems".
Partly it's probably because I usually have less to say (or more precisely it's harder for me to be properly articulate) about things I like than I do about the ones I don't. And boy did I like Database Internals! I'll try my best to explain why, the book and the author surely deserve it.
Being a back-end engineer, the main reason for picking this one up was to better understand the distributed databases that I may end up in (or have already had) contact with. With that in mind, I planned on just skimming the first part of the book but imagine my surprise when I found myself Googling BW and LSM trees and going through papers comparing this and that algorithm and their impacts on memory, storage and CPU caches in multicore systems. The geek got suckered in! With my curiosity circuits pleasantly warmed by the first part, I moved on to the second part of the book - the main dish - where a similar scenario unfolded: again I swallowed up whatever was served and ended up digging for more and adding scores of books and papers to my to-read list.
All in all, Database Internals reads felt a lot like a trip to the zoo or a local museum: chock full of data structures and algorithms used by modern-day databases (and distributed systems in general), the book will showcase each item with sufficient details for you to grasp what they're about and then provide you with enough bibliography and reference material to last you a lifetime... or at least a couple of years.
I highly recommend this book not only to people working on distributed data systems, but to anyone working with databases. This is one of my most frequently referenced books I own.
Top reviews from other countries
It's really hard to get an overview of the way databases work, given how diverse and, well, *big* they really are. Decades of practical experience don't mean one has a clear understanding of query processing, optimisation, storage subsystems, transaction processing, concurrency control, etc.
Sometimes, just sometimes, mortals get lucky and somebody writes a survey of a subfield, or an extended overview, of relevant problems. Best example I am aware of: the Red Book aka Readings in Database Systems. It's a vast survey of academic work on databases. But it's more of a collection of paper references than a linear reading.
Database Internals also feels a bit like an extended survey: numerous paper references are, no code, mostly conceptual explanations. What stands out is its good linear narration, gradually coming up with definitions and clarifying explanations.
So, what this book is not: introductory text, a textbook, theory-centric volume or practise-centric work.
What this book is: a survey of typical approaches to two major aspects of databases (local storage subsystems and problems of distributed systems). Interested reader will have to follow the references, casual reader will get familiar with terminology and common concepts in a condensed way.
I would (and definitely will) recommend the book to people already working with databases for at least a few years looking for additional insights or an overview of the field.
I found the book informative, but not very effective in building a solid understanding of concepts. I felt the author jumps from idea to (related) idea too frequently in the manner of short paragraphs, and in so doing doesn't see an idea through to the end in enough detail for it to be learned properly. Perhaps the first part was better presented; the second was not.












