- Paperback: 624 pages
- Publisher: O'Reilly Media; 1 edition (April 2, 2017)
- Language: English
- ISBN-10: 1449373321
- ISBN-13: 978-1449373320
- Product Dimensions: 7 x 1.2 x 9.2 inches
- Shipping Weight: 2.3 pounds (View shipping rates and policies)
- Average Customer Review: 91 customer reviews
- Amazon Best Sellers Rank: #2,540 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
See the Best Books of 2018 So Far
Looking for something great to read? Browse our editors' picks for the best books of the year so far in fiction, nonfiction, mysteries, children's books, and much more.
From the Publisher
Who Should Read This Book?
If you develop applications that have some kind of server/backend for storing or processing data, and your applications use the internet (e.g., web applications, mobile apps, or internet-connected sensors), then this book is for you.
This book is for software engineers, software architects, and technical managers who love to code. It is especially relevant if you need to make decisions about the architecture of the systems you work on—for example, if you need to choose tools for solving a given problem and figure out how best to apply them. But even if you have no choice over your tools, this book will help you better understand their strengths and weaknesses.
You should have some experience building web-based applications or network services, and you should be familiar with relational databases and SQL. Any non-relational databases and other data-related tools you know are a bonus, but not required. A general understanding of common network protocols like TCP and HTTP is helpful. Your choice of programming language or framework makes no difference for this book.
If any of the following are true for you, you’ll find this book valuable:
- You want to learn how to make data systems scalable, for example, to support web or mobile apps with millions of users.
- You need to make applications highly available (minimizing downtime) and operationally robust.
- You are looking for ways of making systems easier to maintain in the long run, even as they grow and as requirements and technologies change.
- You have a natural curiosity for the way things work and want to know what goes on inside major websites and online services. This book breaks down the internals of various databases and data processing systems, and it’s great fun to explore the bright thinking that went into their design.
Sometimes, when discussing scalable data systems, people make comments along the lines of, 'You’re not Google or Amazon. Stop worrying about scale and just use a relational database'. There is truth in that statement: building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization. However, it’s also important to choose the right tool for the job, and different technologies each have their own strengths and weaknesses. As we shall see, relational databases are important but not the final word on dealing with data.
Scope of This Book
This book does not attempt to give detailed instructions on how to install or use specific software packages or APIs, since there is already plenty of documentation for those things. Instead we discuss the various principles and trade-offs that are fundamental to data systems, and we explore the different design decisions taken by different products.
We look primarily at the architecture of data systems and the ways they are integrated into data-intensive applications. This book doesn’t have space to cover deployment, operations, security, management, and other areas—those are complex and important topics, and we wouldn’t do them justice by making them superficial side notes in this book. They deserve books of their own.
Many of the technologies described in this book fall within the realm of the Big Data buzzword. However, the term 'Big Data' is so overused and underdefined that it is not useful in a serious engineering discussion. This book uses less ambiguous terms, such as single-node versus distributed systems, or online/interactive versus offline/batch processing systems.
This book has a bias toward free and open source software (FOSS), because reading, modifying, and executing source code is a great way to understand how something works in detail. Open platforms also reduce the risk of vendor lock-in. However, where appropriate, we also discuss proprietary software (closed-source software, software as a service, or companies’ in-house software that is only described in literature but not released publicly).
The big ideas behind reliable, scalable and maintainable systems
About the Author
Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes.
Martin is a regular conference speaker, blogger, and open source contributor. He believes that profound technical ideas should be accessible to everyone, and that deeper understanding will help us develop better software.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
If you are interested in distributed systems or scalability, this book is a must-read for you. It gives you a high level understanding of different technology, including the idea behind it, the pros and cons, and the problem it is trying to solve. A great book for practitioners who want to learn all the essential concepts quickly.
I didn't come from a traditional CS background, but I did have some basic knowledge in hardware and data structure. You will need some of that, such as hard disk vs SSD and AVL tree, to understand the materials. If you are completely new to backend or DS, you may want to start with another book "Web Scalability for Startup Engineers." After that book, you can read the free article "Distributed Systems for Fun and Profit" and you are good to go for this amazing book :D
Nowhere else perhaps is this more prominent than in data space that up-levels libraries and frameworks as the conversation starter. That gets in the way of success. It is indeed impossible to model Cassandra "tables" without understanding - at least - quorum, compaction, log-merge data structure. Due to the way the present day solutions are built ("fits one use case perfectly well"), if these solutions are not implemented well to the particular domain, failure is just a release away.
Mr Kleppmann does a great job of articulating the "systems" aspects of data engineering. He starts from a functional 4 lines code to build a database to the way how one can interpret and implement concurrency, serializability, isolation and linearizability (the latter for distributed systems). His book also has over 800 pointers to state of the art research as well as some of the computer science's classic papers. The book slows down its pace on the chapter on Distributed System and on the final one. A good editor could have trimmed about 120 pages and still retain most value one could get from the book.
That said, if you ever worked on data systems, especially across paradigms (IMS -> RDBMS -> NoSQL -> Map-Reduce -> Spark -> Streaming -> Polyglot), this book is pretty much only resource out there to tie the "loose ends" and paint a coherent narrative. Highly recommended!
Kleppman has coherently blended the relevant computer science theory with modern use cases and applications.The focus is primarily on the core principles and thought-processes that one must apply when it comes to building data services. Design concepts don't go out-of-date soon, so the book has very long shelf-life.
The high-point of this book is the author's lucid prose, which indicates mastery of the subject matter and clarity of thought. Conceptualizing reality is an art and the author really shines here. Kudos for those understandable diagrams and interesting maps (and also for avoiding mathematical formulas with Greek symbols). The bibliography at the end of each chapter is thorough enough for unending personal research.
If you are working on or interviewing for big data engineering, systems design, cloud consulting or devops/SRE, then this book is a keeper for a long-long time.