TLDR Summary: If Machine Learning is like Mechanics, "Learning from Data" teaches you Newton's Laws!
---------------------------------
Machine Learning (ML), Data Mining (DM), Predictive Modeling, Big Data, Statistical Inference, Pattern Recognition, Regression, Classification: by whichever name you call it, you will find hundreds of books by the same name, and in theoretical as well as applied avatars. The applied ones tend to be books based on ML/DM programming libraries such as R, Weka (Java), and SciPy/NumPy (Python) and really are not meant to teach you the underlying foundations but I digress too soon.
I possess the standard three introductory texts in ML:
Pattern Classification (Duda, Hart, Stork) ,
Pattern Recognition (Bishop) and
Machine Learning (Mitchell). In addition, I have read portions of
Statistical Learning (Hastie),
Machine Learning (Alpaydin),
Support Vector Machines (Cristianini) and several other allied ML texts in natural language processing, convex optimization etc.
In spite of being considered the classic introductory texts in ML, all these books failed in the task of making me understand what I was doing as I was practicing ML. Try as I might, I could never read through more than a few tens of pages of the afore mentioned books. And what little I read, could not be retained by my feeble brain for too long.
But where all these texts failed, "Learning From Data" (LFD) succeeds.
First an analogy:
It is all fine to arm someone with equations of cantilever beams and have them build houses, but clearly we don't want a civil engineer who doesn't understand Newton's laws to build our own house. Most well known books in ML read to me like course readers of advanced Mechanics courses stitched together. LFD on the other hand feels like a book on Newton's Laws and Applications.
Writing Style:
The book serves as the reading counterpart to a set of eighteen one-hour video lectures that was presented in a course by the first author Yaser at Caltech. The book reads almost like a transcription of the lectures. The authors are always addressing you and manage to convey the feeling that they are holding your hand and actively helping you to learn how to walk. I found the style very engaging. Once I started reading the book, it did not take any special effort for me to finish it (which was the difficulty with the other classic ML books).
The videos are freely available online(Google for 'learning from data caltech course'). I strongly encourage the readers of this book to first watch each video and then read the corresponding chapters of the book.
Content:
There are five central themes underlying the organization and presentation of topics in this book:
1. What is Learning?
2. Is Learning possible?
3. How to Learn?
4. How to Learn well?
5. Take Home Lessons.
The authors follow a style of gradual expansion from simple to complex concepts throughout the book. E.g. Under the topic of "Is Learning feasible", they first derive a probability on the upper bound of the out-of-sample error using a thought experiment and the Hoeffding's Inequality. Then they reason that if one of the components of this probability is polynomial in the number of training examples, the error can be bounded. Finally they introduce the VC dimension and prove that in cases where it is finite, learning is truly feasible.
Throughout the book, the authors provide plenty of real life application scenarios and experimentally generated examples to illustrate the theory. I found the theory when put to practice (even if in a toy example) very useful, particularly when visualized through the various graphs. There are several gems scattered around the book in the form of subtle things that can be overlooked even by a smart person (such as inadvertent data snooping) and rules-of-thumb for practical applications.
The authors have clearly had to make some choices about what to focus on and what to omit. For example, the book has no mention of Bayesian Decision Theory or Naive Bayes classification. This appears shocking upon first glance since Naive Bayes is often the first learning algorithm taught in an introductory course on pattern recognition. But after going though Yaser's book/course such omissions appears to be a virtue. It is not the focus of this book to teach you everything ML. If this is what you are looking for, LFD is not for you; Kevin's Murphy's
forthcoming text appears promising. LFD however, gives you enough of a foundation that should you wish to educate yourself on advance topics like bootstrap aggregation, probabilistic graphical models, or ensemble-learning, you are sufficiently prepared.
The icing on the cake is the forum provided by the authors to discuss the book (and the lectures). Yaser has personally answered all of my questions, sometimes at 3AM, Pasadena time!
Final note on book quality:
The color printing, binding and paper quality are all excellent. The authors could have paid more attention to detail to some portions of the book (such as using high-contrast, colorblind-friendly colors in the illustrations) but honestly, this is just me being extremely an*l. The hardbound book at this low price of approximately $30 is pure value for money. Wide dissemination of the book contents appears to be a clear motivation.
PS: If the authors are reading this, they should look up "Ishihara test plates" and compare that with the illustration of red-green marbles on page 22 etc.