Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Algorithms of the Intelligent Web 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Frequently Bought Together
Customers Who Bought This Item Also Bought
Top Customer Reviews
I have read the book front-to-back (twice!) before I write this report. I started reading the electronic version a couple of months ago and read the paper print again over the weekend. This is the best practical book in machine learning that you can buy today -- period. All the examples are written in Java and all algorithms are explained in plain English. The writing style is superb! The book was written by one author (Marmanis) while the other one (Babenko) contributed in the source code, so there are no gaps in the narrative; it is engaging, pleasant, and fluent. The author leads the reader from the very introductory concepts to some fairly advanced topics. Some of the topics are covered in the book and some are left as an exercise at the end of each chapter (there is a "To Do" section, which was a wonderful idea!). I did not like some of the figures (they were probably made by the authors not an artist) but this was only a minor aesthetic inconvenience.
The book covers four cornerstones of machine learning and intelligence, i.e. intelligent search, recommendations, clustering, and classification. It also covers a subject that today you can find only in the academic literature, i.e. combination techniques. Combination techniques are very powerful and although the author presents the techniques in the context of classifiers, it is clear that the same can be done for ecommendations -- as the Bell Korr team did for the Netflix prize.
I work in a financial company and a number of people that I work with have PhD degrees in mathematics and computer science. I found the book so fascinating that I asked them to have a look. They had nothing but praise for this book. The consensus is that everything is explained in the simplest possible way, with clarity but without sacrificing accuracy. As one of them told me, this is a major step forward in teaching AI techniques and introducing the field to millions of developers around the world. Even for experts in the field and experienced software engineers, there are important insights in almost every chapter.
We had tried to write a software library, for a small project, that analyzes log files and assesses IT risk (e.g. probability of intrusion; preemptive alerts on application performance issues, and so on) based on Segaran's book "Programming collective intelligence". We spend about six weeks trying to find how to match what was in Segaran's book and what we wanted to do but we did not find the depth and clarity that was required. On top of that, Segaran used Python so the code had to be rewritten and things didn't quite work as expected! We are now using the code from Marmanis' book and our code analyzes apache and weblogic log files in order to assess risk! It just works! We wrote the code in one week! We would not have been able to succeed without reading this book.
Clearly, I am deeply impressed. This is an outstanding book; it was not just useful, it was inspiring! It is a "must have" book for every Java developer.
The content of the book includes:
* the PageRank algorithm; a content based algorithm similar to PageRank to which the author coined the term "DocRank" because it applies to Word, PDF, and other documents rather than Web pages; search improvements based on probabilistic methods (Naive Bayes); precision, recall, F1-score, and ROC curves;
* collaborative filtering as well as content based recommendations;
* k-means, ROCK, DBSCAN for clustering; the best explanation about the "curse of dimensionality" ever! I finally learned what this mystic term means!
* Bayesian classification; declarative programming (through the Drools rules engine); introduction to neural networks; decision trees
* Comparing and Combining classifiers: McNemar's test; Cochran'sQ test; F-test; Bagging; Boosting; general classifier ensembles
Buy it, read it, enjoy it, and use it!
First of all, the code uses BeanShell as a way to run the examples. BeanShell is a neat idea. It's one of a number of languages that move Java closer to being a scripting language. But it's not necessary for the book's purposes. It's a bit of a pain to install, and it takes a while to get used to. In the end it's an unnecessary distraction. It's far simpler to run the examples in eclipse with the "scripts" entered as the body of a main() method.
The preceding is a relatively minor point, but in some ways it illustrates some of the problems I had with the book. It focuses too much on the code. Yes, it's nice to have code that does what one is trying to describe, but code is not a substitute for a good explanation. In many places the book provides inadequate descriptions of the concepts, presumably on the grounds that one can just read the code. But code is not tutorial. Code itself must be commented to be understandable. And code cannot replace a good intuitive description of the important ideas.
Furthermore, the code (and the output) take up too much space in the book. There are pages of output when a few lines would suffice, and there are pages of code when a well-constructed paragraph would do. Pearson's coefficient is a good example. There is approximately a page of code to do the calucuation. There is also half a page of code-level comments--e.g., "The method getAverage is self-explanatory; it calculates the average of the vector that's provided as an argument." But there is no straightforward description of what's going on in the computation as a whole.
Further, there are frequent references to other books and papers as if making such references excuse the author from explaining an idea. For example, the Pearson's coefficient discussion includes this sentence. "There's a smarter way to do this that avoids a plague of numerical calculations called the roundoff error; read the article on the corrected two-pass algorithm by Chan, Golub, and LeVeque." That's all that's said about roundoff error or the smarter way to do something or other numerical computation issues. Referring the reader to an article is not good enough. If it's worth discussing, discuss it. If it's not important enough to discuss, then don't refer the reader elsewhere except as enrichment. The author does this over and over.
Another example of what I would consider the book's conceptual superficiality is its treatment of Bayes' Theorem. Bayes' Theorem is mentioned many times, but the only explanation is a half page translation of Bayes' Theorem into words--with no explanation of why Bayes' Theorem is true. Understanding why Bayes' Theorem is true should have been an important lesson.
A similar criticism holds for Decision Trees except even more so. There is no discussion at all about how to construct a decision tree. Such a discussion would have been a perfect place to introduce the notion of entropy. But the word "entropy" doesn't even appear in the book.
All-in-all I found the book disappointing. If one wants to build software that performs some of the functions discussed, the book can help. But if one wants to understand the principles underlying such software, the book is not the right place to go.
Most Recent Customer Reviews
A superb book by its own - which comes from an author who is an authority on the subject...