Customer Reviews: Collective Intelligence in Action
Your Garage Editors' Picks Amazon Fashion Learn more nav_sap_plcc_ascpsc Kate Voegele Fire TV Stick Health, Household and Grocery Back to School Totes Summer-Event-Garden Amazon Cash Back Offer AllOrNothingS1 AllOrNothingS1 AllOrNothingS1  Amazon Echo  Echo Dot  Amazon Tap  Echo Dot  Amazon Tap  Amazon Echo Starting at $49.99 All-New Kindle Oasis Shop Now

Customer Reviews

4.4 out of 5 stars
Format: Paperback|Change
Price:$31.22+ Free shipping with Amazon Prime
Your rating(Clear)Rate this item

There was a problem filtering reviews right now. Please try again later.

on November 25, 2008
I was recently asked by the publisher to review Collective Intelligence in Action. The author is Satnam Alag, a Bay area engineer with a Ph.D. from the University of California, Berkeley. Dr. Alag is VP of NextBio, a specialized search engine.

The first chapter is free and so is the source code used in the book.

The book is for Java developers who want to implement "Collective Intelligence" applications in Java. It tells us about extracting and applying data from blogs, wikis and social network applications. I am not one to praise, but this book succeeds brilliantly. If you are a Java engineer and work with Web technologies, you must get this book. It covers topics such as computing similarity measures using vector models, Nai've Bayes Classifiers, inverse document frequency (idf), Machine Learning (using the Weka API), building a crawler with regular expressions, collaborative filtering (with links to open source tools), and so on.

Even if you do not work with Java, if you care for high-end Web applications, this book is for you. It reminds me of Lyon's Java¿ Digital Signal Processing book. It offers the gist of what academia knows, but focuses on what people (engineers and researchers) do in practise.

The book is not meant for academia however. There are references, but no theorem.

Disclaimer. I did not get paid to review this book, and I do not stand to gain anything if you buy the book. I have no relationship with the publisher or the author.

Further reading. A competing book is Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran. It uses Python instead of Java.
0Comment| 22 people found this helpful. Was this review helpful to you?YesNoReport abuse
on November 12, 2008
I was not surprised to see "Hello, Sopan. We have recommendations for you" line at the top
when I logged into the site. Yes, this kind of functionality is very easy to
implement into your application after reading Satnam's Collective Intelligence in Action

Have you ever wonder how Netflix is able to recommend movies, what are the latest trends
in the making search more intelligent or how you can intelligently gather new content and
present it to your application?

In this book, Santnam does an excellent job providing the answers to all these questions
The book covers the wide breadth of the topics with amazing focus and detail-architecture
for adding intelligence, tagging and tag clouds, content aggregation through focused web
crawling and from the blogospare, leveraging machine learning techniques such as clustering
and predictive modeling, intelligent search and building recommendation engine.

I particularly liked the approach to explain the mathematical concepts with simple examples,
followed by implementing it in simple Java and then leveraging open-source software.

This book can be very useful if you are interested in integrating different Open Source Softwares
to deliver Enterprise Class Application.

I also liked the authors style of providing summary at the end of each chapter.
He also provides huge set of very useful resources for reading further on the topics
covered into the chapters.

You must pickup this book if you are

[1]. serious (developer/manager/architect type of Eng) on adding search or
intelligent/smartness into your Application
[2]. person involved in developing (programmer, tester, manager) Social
Networking Application.
[3]. involved in managing "Knowledge Management Infrastructure" of any size organization

This Book will provide you a great foundation for developing Enterprise Class

I highly recommend it.
0Comment| 20 people found this helpful. Was this review helpful to you?YesNoReport abuse
on January 16, 2010
To really understand this book one would probably have to be a Java programmer, which I'm not, but I was able to follow the argumentation. I do have some background with data mining using SAS and SQL and the mathematics described are fairly easy to understand for someone with even a 1st year engineering or applied math background. I also have an interest in linguistics which kept me going.

The basic idea is that one can catalog documents by removing irrelevant words (adjectives, abstract pronouns, conjunctives) and "stemming" the remaining words (ie: reducing "sews", "sewing", "resew", "sewer" to a root "sew") and creating a vector containing each root word and the word frequency and then normalizing it. One simple result is the ability to produce "word clouds". Similarity between documents is measured by taking the dot product of the two vectors. Any document compared to itself would have a dot product of 1. Two documents with no common stem words would have a dot product of zero. Similar docs would have a high value close to 1, say .8. Dissimilar docs would have a low coefficient, say .15. Even mistaking "sewer" (a conduit for waste) and sewer (one who uses a needle and thread) is taken into account because both docs would only be similar on a couple of keywords, and dissimilar on most others.

What's really neat is how this information gets collected and can be applied. Social networking sites, including the one you are reading right now,, collect data on us through our choices. Browse for a book while logged on then that's something you are interested in. Approve a review the words in the review, summary of the book and the title counts towards your interests. Disapprove and that counts against your interests. Write a review and the words you write become part of your cumulative profile as well, reduced to a vector or vectors of keywords and frequencies.

Here's how it gets applied: One of Amazon's marketing tools is it's "recommendation engine". (The book talks about Netflix recommendation engine and business model). By matching your vector against other people who have bought/viewed what you have bought a prediction can be made as to the likelihood of you being interested in the something that they have bought, or not interested in items that they rejected or disliked. The more Amazon caters to what you are interested in, and doesn't bother you with irrelevancies, the happier you may be.

Other applications discussed include the automatic creation of folksonomies (taxonomies based on popular usage) using cluster analysis and categorization using Bayes theorem.

In addition to recommendation engines Alag points out the usefulness of these techniques to Search and points out several search engines that apply this approach (as does Google), tools that search out and provide news based on your preferences, or suggest "friends" (ie: Facebook or eHarmony might use these ideas), search for similar material to identify copyright infringement, email filters that keep out spam for rolex watches or viagra (unless you are interested in rolex watches or viagra), construct a virus detection engine based on code phrases or early detection of epidemics or adverse reactions to medication through similarities in medical reports. Alag himself appears to be working at a biotech firm NextBio that matches public medical and genome related data to data held by private companies.

Some of the basic tools discussed are Lucene, a free version of what Google will sell you for a search engine, Nutch, a free web crawler, both of which require coding and WEKA, a free open source data mining package that looks usable by the rest of us.

Loved the book and the author's organization of the material. Some of the social implications are scary, especially for privacy concerns, but so is the implication of not leveraging the information that one holds within your organization to provide the best possible service. For example the World Bank has the capability (not necessarily using these methods) to match similar projects around the world so that experience gained in one area can be found and applied elsewhere. This is a key fast moving tech that one needs to understand in order to see where we are going as a society. C.I. in Action is merely the opening salvo - the methods and techniques described are the basics but there is much room for refinement and elaboration and this topic could be the start of a whole new field. The book also recommends and has sparked my interest in the site [...] which is probably more accessible to someone without a math or tech background.

Finally a note to SF fans, esp. of Spider Robinson's Callahan's Crosstime Saloon series, this may be the point at which the Web starts to appear to be intelligent. :-)
0Comment| 11 people found this helpful. Was this review helpful to you?YesNoReport abuse
on July 19, 2010
This book contains a lot of ideas and as such is a good starting point for further reading. But it's not a one-stop resource for actually implementing the algorithms it mentions, as a lot of them are described only in a very high level and incomplete way. For example, in the discussion of model-based recommendation engines in sections 12.3.3-12.3.5, the author gives a very short description of latent semantic indexing (LSI) and some Java code that shows how to use the Weka implementation. But firstly, the description is too short to give the user a real understanding of what is going on theoretically. And secondly, the implementation description doesn't go nearly far enough: it shows that reconstructing the original matrix from the top N dimensions of the singular value decomposition gives a close approximation to the original, but then it just stops there; it doesn't explain how to actually use the decomposition in a recommendation engine. And the section on LSI is verbose compared to the "section" on Bayesian belief networks, which at a single paragraph of text is completely inadequate for either practical or theoretical purposes. And so on throughout the book.
11 comment| 12 people found this helpful. Was this review helpful to you?YesNoReport abuse
on November 11, 2009
This book is more deserving of the "Collective Intelligence" title than O'Reilly's "Programming Collective Intelligence" as it's not just about algorithms, but discusses blogs, wikis etc, and shows how to do basic implementations of features such as tag clouds or finding related content in that context. Instead of explaining specific algorithms in detail, existing Java libraries are used, e.g. WEKA for data mining and Lucene for search.

There are lots of diagrams, and (somewhat verbose) Java code. The examples in this book are good starting points for further exploration; this book is more about showing what can be done and getting you started in the right direction than providing you with an understanding of the algorithms (as does the O'Reilly book) and libraries that are used.
0Comment| 8 people found this helpful. Was this review helpful to you?YesNoReport abuse
on November 8, 2008
This is a great book! It is a hands-on guidebook for implementing collective intelligence concepts. It is the first Java-based book to emphasize the underlying algorithms and technical implementation of scalable data mining techniques, such as, trend analyzing, relationship discovering, and predictive modeling. It provides a pragmatic approach to personalization by combining content-based analysis with collaborative approaches.

The concepts and code examples in the book have been practically used in a life science search engine named NextBio, which proves to be a great success. I strongly recommend this book.
0Comment| 8 people found this helpful. Was this review helpful to you?YesNoReport abuse
on January 22, 2009
After going through this book I can say that this book has the best coverage of the details for building web sites on social networking/reviews/search engines etc. The book not only gives good theoretical information but also goes beyond that and gives a whole bunch of relevant examples that the reader can build upon.
It gives good introduction to libraries like Lucene along with how to use it etc.
Any readers that are looking to build collaboration type of web sites this is the book to go after.
0Comment| 3 people found this helpful. Was this review helpful to you?YesNoReport abuse
on February 19, 2009
I don't review many books, but this one clearly stood out as an example of an author successfully providing a range of related topics, with good Java code provided to get the points across.

This book covers everything from statistics and multiple regression, to similarity matching, to decision trees, to text parsing and stemming.

The first chapter covered a set of definitions and examples that brought the older concepts of multiple regression and decision trees into modern concepts like Web 2.0. Much of the book was like getting an updated version of a "Data and Decisions" first-year graduate course, yet the examples were interesting enough to justify the math knowledge required.

I had seen pieces of this material in scattered places within the web technology world and the applied statistics world, but never brought together as well.

However, you aren't fed this in trivial concepts. This book is one where you need to read the material, try out an example, read the chapter again, improve the example, and read again. It is slow going, but worth it. The on-line download of code is essential.

My concerns about the book were that it doesn't refer to other books enough. For instance, if you want to understand Bayes nets a good place to go might be the Russell/Norvig book. However the only place that appears is in the Resources for chapter 10, which covers 2-4 different algorithms. Which algorithm matches which chapter in Russell/Norvig? You can't drill further into the material.

Instead, I approached the book from the other direction: after having started with Russell/Norvig, an applied statistics book (of which basically none are referred to), some materials on Baysian updates, and some Data/Decisions coursework, I used Alag's book to help me understand where some the advanced algorithms are used, and examples to help understand them. I would work to replicate the results of the D/D cases using Weka and algorithms in the book before moving on to the next chapter.

On a different front, there were no examples at the source level of using the concepts in a web application. While (as one reviewer pointed out) this allows the book to skip discussing HTTP, there could have been a simple source-level example of a Tomcat-based application that does similarity matching. In downloadable Lucene examples, for instance, they provide at least a few that are servlets/JSP's. This could greatly help newcomers to the concepts of the book.

Overall, this book was worth all of the time that I spent learning from it.
0Comment| 2 people found this helpful. Was this review helpful to you?YesNoReport abuse
on February 2, 2009
Collective intelligence is very popular these days. Thanks to the Internet companies we can use this concept every day. I have to admit that I am a big fan of "intelligent" services. I search news on Digg, I listen to music through the player, I use Wikipedia, YouTube, Amazon etc. everyday. I have been wondering many times how these sites work. The curiosity led me to read "Collective Intelligence in Action". After I read it, I was surprised that this book is so practical. Theory is limited to minimum. After reading it, you should be able to add the CI features to the existing sites.

Author, Satnam Alag, has organized his work in a perfect way. Every chapter has an introduction, a summary and very handy references - I used them many times. All the mathematical concepts and definitions are shown in examples. There are lot of a Java code listings, therefore the basic knowledge of this language might be useful. The theoretical foundations are not necessary. Each chapter can be treated separately, but together they create coherent paper about recommendation system. The author of the code takes care not only about the correctness but also about the efficiency and the scalability.

The fact that Satnam presents a lot of stable and useful open source software is worth of noticing. Projects like Nutch, Lucene or Weka can be easily adapt to our services. The book shows how to do it from programmer's point of view (API).

I recommend "Collective Intelligence in Action" to the Java developers who would like to know how to build recommendation systems, intelligent search of theirs resources, automatic

tagging or network crawling. Book is worth reading even if you do not plan to use CI in your application. Base Web2.0 mechanisms are very easy to implement and do not require a lot of theoretical knowledge, Satnam Alag has proven this in his paper.

I would not recommend this book to data mining or text analysis experts. This is not an academic work, people who are looking for theoretical information about CI could be disappointed.

Dariusz Walczak
0Comment| 2 people found this helpful. Was this review helpful to you?YesNoReport abuse
on January 17, 2009
"Collective Intelligence in Action" is a fantastic book to read and understand the complex area of data mining and unstructured data, I have used many of the technologies illustrated in the book in my real life projects and found it greatly helpful in improving my understanding of the subject.
0Comment| 3 people found this helpful. Was this review helpful to you?YesNoReport abuse