- Hardcover: 326 pages
- Publisher: Cambridge University Press; 1 edition (December 30, 2011)
- Language: English
- ISBN-10: 1107015359
- ISBN-13: 978-1107015357
- Product Dimensions: 7.4 x 0.9 x 9.7 inches
- Shipping Weight: 1.8 pounds (View shipping rates and policies)
- Average Customer Review: 8 customer reviews
- Amazon Best Sellers Rank: #1,164,504 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Mining of Massive Datasets 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Frequently bought together
Customers who bought this item also bought
Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike. It teaches algorithms that have been used in practice to solve key problems in data mining and includes exercises suitable for students from the advanced undergraduate level and beyond.
About the Author
Anand Rajaraman is CEO of Kosmix Inc., a website which organizes the Internet by topic. He is also a consulting assistant professor in the Computer Science Department at Stanford University. In 1996, together with four other engineers, Rajaraman founded Junglee Corp., which pioneered Internet comparison shopping. It was acquired by Amazon.com Inc. in August 1998 for 1.6 million shares of stock valued at $250 million. Rajaraman went on to become Director of Technology at Amazon.com, where he was responsible for technology strategy. He helped launch the transformation of Amazon.com from a retailer into a retail platform, enabling third-party retailers to sell on Amazon.com's website. Third-party transactions now account for almost 25% of all US transactions, and represent Amazon's fastest-growing and most profitable business segment. Rajaraman was also an inventor of the concept underlying Amazon.com's Mechanical Turk. Rajaraman and his business partner, Venky Harinarayan, co-founded Cambrian Ventures, an early stage VC fund, in 2000. Cambrian went on to back several companies later acquired by Google and has funded companies like Mobissimo, Aster Data Systems and TheFind.com.
Jeffrey David Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus) at Stanford University. He is also the CEO of Gradiance. Ullman's research interests include database theory, data integration, data mining and education using the information infrastructure. He is one of the founders of the field of database theory and was the doctoral advisor of an entire generation of students who later became leading database theorists in their own right. He was also the Ph.D. advisor of Sergey Brin, one of the co-founders of Google, and served on Google's technical advisory board. In 1995 he was inducted as a Fellow of the Association for Computing Machinery and in 2000 he was awarded the Knuth Prize. Ullman is also the co-recipient (with John Hopcroft) of the 2010 IEEE John von Neumann Medal, for 'laying the foundations for the fields of automata and language theory and many seminal contributions to theoretical computer science'.
Top customer reviews
I have graduate degree in data mining and information retrieval; my primary use for this book has been to help explain the "why" and "how" behind some of our implementation choices (with related expenses) to management. The authors do a good job of not only presented technically accurate material, but sufficiently motivating the concepts for the arithmetically challenged among us. The text flows well and is easy to digest, the authors didn't just regurgitate gobs of indecipherable summations and product formulas.
While I can sympathize with some of the reviews wanting a book to connect the theoretical with the practical, this isn't it-- and I can't help but to believe that wasn't its intention. If you want a book to teach you how to do your job or implement a concept in code, buy a technology-specific book. If you want to understand the basic math and motivation for choosing one approach over another, this book provides a sound foundation.
One thing I would have liked to have seen more of, perhaps in the appendix materials, are some non-trivial step-by-step algorithm explanations. Bound by size limitations (which I totally appreciate not having an 800p book) some of the examples are too simple to illustrate a clear step-by-step execution of some of the more complex examples where edge cases may get lost (e.g., Locality Sensitive Functions, Clustering, etc.).
It was also disappointing to see that some of the chapters were not comprehensive (e.g., the chapter on "advertising on the web" only contains some search advertising while completely ignoring models for display advertisement) and some of the more modern developments in recommendation techniques are left out from the Recommendation Systems chapter. For Recommender systems, I was hoping for coverage of scalable ML techniques tailored to high volume, low latency, and online learning requirements. The Clustering chapter has not enough depth as far as scaling up to massive datasets is concerned.
There are also some typos and printing errors in the printed hardbound version that seem to have been updated in the free online version of the book.