104 of 106 people found the following review helpful
Context for this review: I am a data miner with 20 years experience, and own the first edition of this book.
- Accessible writing style
- Broad coverage of algorithms and data mining issues, with an eye toward practical issues
- Needless technical trivia (derivations and the like) are avoided
- Algorithms are completely spelled out: A competent programmer should be able to turn these descriptions into functioning code.
- Third edition makes meaningful improvements on previous editions
- Approximately one-third of this book is now devoted to the WEKA data mining software. I have nothing against WEKA, and it is a good choice for a text such as this, since WEKA is free. In my opinion, though, this coverage consumes too many pages of this book.
- Data mining draws from a number of fields with separate roots (statistics, machine learning, pattern recognition, engineering, etc.), and many techniques go by multiple names. As with many other data mining books, this one does not always point out the aliases by which data mining methods are known.
The bottom line: This is still the best data mining text on the market.
27 of 27 people found the following review helpful
There exists a couple of classics of Machine learning, with various strengths and weaknesses. "The elements of statistical learning" by Hastie and company. Bishop's book, "Pattern Recognition and Machine Learning." And now, this book, "Data Mining." I'd say this is the most practical of the three books. The other two I mentioned are oriented towards theoretical underpinnings, and cataloging the rich zoology of machine learning techniques. This one tells you how to get stuff done. Lots of practical ideas on discretization, denoising, data preparation and performance characterization. It even has practical advice on things you really need an expert opinion on: for example, when using data folding techniques for cross validation ... what is a good number of folds to use? This book will tell you. It's like having a couple of seasoned experts looking over your shoulder when you're trying to get things done. It had a detailed recipe in it for something I really needed to solve... and their recipe worked!
While the subject matter is similar to the Bishop and Hastie books: what this most reminded me of was the classic physics text, "Numerical recipes." It's all very well having a good theoretical understanding of the techniques you're using. It's vastly more important to have advice on using them properly. This is that book; uniquely so, thus far, in my experience.
It's also a brilliant manual for their Weka machine learning environment, which is incredibly useful. I don't use the Weka UI, but I have called upon Weka as a library extension to the R programming environment. Mostly because of this book: it's both a recipe book and a map to a large collection of recipes you can use to solve your machine learning problems.
There isn't so much on time series applications, sadly, which is something I end up working with a lot. I'd love to see an extended chapter on the particular difficulties in using machine learning techniques to mine and forecast time series.
30 of 31 people found the following review helpful
The subtitle of the book should really be emphasized more: Practical Machine Learning Tools and Techniques. This isn't a book about adhoc SQL queries and database statistics, it is about tools to discover relationships you didn't know you were looking for. Much of the book shows how to handle knowledge formation and representation, statistical modeling and projections. The one critique I have in regard is that much of the algorithm breakdowns are done in prose rather than true pseudocode.
I would like to echo other reviews that point out the text focuses on WEKA, and the authors indicate this is by intent. Though they do give much generic information, at some point you have to pick a horse to hitch your carriage to, and an established open-source project in Java is probably most widely accessible. Their coverage of WEKA claims 50% more features than the 2nd ed. and indeed it consumes half the book. I feel this is a good thing, as it lends great practicality to the book, allowing you to dig right in and get something actually done.
There are some additions to the 3rd ed. that modernize the book a bit. Showing how data can be reidentified (and the ethical implications) is pertinent to today's HIPAA-regulated medical environments. They also touch on web and ubiquitous mining, reflecting our growing foray into non-traditional cloud sources of information.
31 of 33 people found the following review helpful
Fantastic book if you need to use WEKA; probably the best recommendation available.
If, however, you're not going to be using WEKA then the book is still valuable, but I challenge the true 'practicality' of it. The content is thorough but perhaps more academically oriented than as industry focused as I would have liked. The author keeps it very accessible, particularly as far as mathematics and statistics go. While this might make the book a little more long winded - in my view it makes it a far easier to get into the groove and allows you to read it like a book.
* Highly recommended for WEKA users
* For others users I suggest you look through to see if it will really be helpful before plunking down the cash
6 of 6 people found the following review helpful
If you are looking for a simple how-to book that gives you a lot of sample source code, this is not for you. If you want to learn the concepts and theoretical underpinnings of various algorithms and techniques, this is a great place to start. The authors clearly stress the concepts of data mining that can be applied to a variety of specific applications. This is a must have volume for anyone wanting to truly understand the theories and concepts behind the various approaches to data mining and the tradeoffs involved with each approach. Those with a background in artificial intelligence will have an easier time getting through this material but such a background is not necessary to gain a solid foundation in the topics. It is well written and organized for self-study. But it may be a little intimidating for some beginners.
5 of 5 people found the following review helpful
on December 11, 2011
To get the most out of this book, you need to either be a statistician, AI professional, or be willing to invest some time. But: if you commit yourself, then this book goes a long way to substitute for a graduate-level course on data mining. Don't get me wrong - it is not written with an academic audience in mind; as a matter of fact, it is unusually rich with application examples. But there is a lot to digest conceptually and many of the examples are quite involved. As such, it addresses the opposite end the O'Reilly series of how-to books. This one gets you up to speed with one of if not the best software package for data mining in all its many facets. With Weka and 'R', you have the tools to tackle many of the World's problems, and this book is the best introduction to one part of the duo.
7 of 8 people found the following review helpful
This is an excellent text. The authors early on define data mining "as the extraction of implicit, previously unknown and potentially useful information from data". The techniques, methodologies and algorithms for achieving this goal are the crux of this 600 page text.
It is grindingly thorough and only the dedicated will make it through the book without assistance. For those who do, they will have an excellent grounding in the theory and basic techniques of machine learning. The last third of the book introduces WEKA. WEKA is an open-source workbench that permits the data mining student to try out all the algorithms presented in the book. It's free and extensible.
The reader needs a pretty good grasp of database technology and more than a little knowledge of statistics and math. The latter is not an absolute necessity, but will make comprehending this material a faster and easier process.
The authors, collectively, has a clear writing style, free of academic cant. If you take it slow, everything ultimately becomes clear.
Overall, this is an excellent introduction to data mining, but is not for those who expect to learn even the fundamentals of this technology in a few days time. That just ain't gonna happen.
4 of 4 people found the following review helpful
on December 14, 2011
The book is really good to start learning machine learning and data mining.
- It doesn't jump into algorithms with mathematical details. It starts with what is it all about, what input and output look like in typical machine learning problems.
- One point that I really liked is that the book gives algorithms in two chapters (chapter 4 and 6). The first chapter is about basics and latter one gives detail about these algorithms.
- It also covers well that I think it is mostly ignored by other books/tutorials: practical issues. How to normalize data, what happens your data have both categorical and numerical features, discretizing numerical features and so on.
- If you consider using Weka, you should have this book. Authors are from the team who built Weka. For each algorithm described in the book, corresponding names of implementations in Weka are given too. With the book it is easier to understand parameters of Weka implementations of algorithms. Also last part of the book is like extensive Weka tutorial.
- In a few points, the book contains unnecessary details, although it is not the case for overall of the book. One of such things that I remember is chapter 4.7. The book spends 5 whole pages to how to find nearest neighbor efficiently (not-easy stuff), which I think it is really implementation detail. Instead of it, it could explain what nearest neighbor is, or something else.
- The part about Weka has several figures, mostly Weka screen shots. It was difficult to follow these figures, because of black-white screen shots. I think these figures should be in color in the next edition, which will make much easier to follow.
3 of 3 people found the following review helpful
I grabbed this book because I am in epidemiology and biostatistics, and am always looking for ways to apply what's been learned in other fields to how we work with data here. There's a lot of amazing advances in data mining and in machine learning especially, so this looked like an interesting book.
And truthfully, I did find about 10-15% of this book incredibly useful and interesting, and I'll be keeping it on my bookshelf for that alone. But large chunks of the book were focused solely on using Weka (which we don't use here) or in describing data modeling techniques (which I was already well versed in).
The parts of this book that I did find useful were those that focused on how to take those data modeling techniques and actually use them in a machine learning context. The way the book is set up makes it hard to just swoop through and pick those parts out, though -- which meant that I spent more time than I'd like weeding through the information I already knew to find the new and useful bits. That said, I'm not really the target audience for this book, which means that I keep coming back to it hoping that it will be something it isn't meant to be. And for what it is, I think it does a good job, although I think the utility would have been broadened somewhat if it wasn't so very focused on Weka users.
3 of 3 people found the following review helpful
This is my first read through a technical text on data mining. Despite only having experience with more general programming, math and statistical analysis, I found the material approachable. While the text caters well to the novice, there is substantial detail and enough content for the practitioner, which is what I'm aiming for.
The content is organized into 3 sections: basic concepts, advanced concepts, and WEKA. The organization was a major strength of the text and I was glad to have more advanced material put off for the 2nd section while I was still familiarizing myself with basic terms and methodology. The final section covers WEKA - which is a free, publicly licensed software tool capable of grinding out what you are learning in the book.
Overall, Data Mining exceeded my expectations. It covered all the topics I hoped it would, like predictive modeling, regression, recursion, noise, logic tres, efficiency, etc. While I am only a novice, I can easily recommend this book to anyone looking for a catchall on this subject egardless of knowledge.