The Analysis of Data Project:
The Analysis of Data (TAOD) project provides educational material in the area of data analysis.
Volume 1: Probability:
- The project features comprehensive coverage of all relevant disciplines including probability, statistics, computing, and machine learning.
- The content is almost self-contained and includes mathematical prerequisites and basic computing concepts.
- The R programming language is used to demonstrate the contents. Full code is available, facilitating reproducibility of experiments and letting readers experiment with variations of the code.
- The presentation is mathematically rigorous, and includes derivations and proofs in most cases.
- HTML versions are freely available on the website theanalysisofdata.com. Hardcopies are available at affordable prices.
This volume focuses on probability theory. There are many excellent textbooks on probability, and yet this book differs from others in several ways.
- Probability theory is a wide field. This book focuses on the parts of probability that are most relevant for statistics and machine learning.
- The book contains almost all of the mathematical prerequisites, including set theory, metric spaces, linear algebra, differentiation, integration, and measure theory.
- Almost all results in the book appear with a proof.
- Probability textbooks are typically either elementary or advanced. This book strikes a balance by attempting to avoid measure theory where possible, but resorting to measure theory and other advanced material in a few places where they are essential.
- The book uses R to illustrate concepts. Full code is available in the book, facilitating reproducibility of experiments and letting readers experiment with variations of the code.
I am not aware of a single textbook that covers the material from probability theory that is necessary and sufficient for an in-depth understanding of statistics and machine learning. This book represents my best effort in that direction.
Since this book is part of a series of books on data analysis, it does not include any statistics or machine learning. Such content is postponed to future volumes. Website:
A companion website theanalysisofdata.com contains an HTML version of this book, errata, and additional multimedia material. The website will also link to additional TAOD volumes as they become available. Mathematical Appendices
A large part of the book contains six appendices on mathematical prerequisites. Probability requires knowledge of many branches of mathematics, including calculus, linear algebra, set theory, metric spaces, measure, and Lebesgue integration. Instead of referring the reader to a large collection of math textbooks we include here all of the necessary prerequisites. References are provided in the notes sections at the end of each chapter for additional resources.