Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required. Learn more
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Data Preparation for Data Mining (The Morgan Kaufmann Series in Data Management Systems) 1st Edition
Data Preparation for Data Mining addresses an issue unfortunately ignored by most authorities on data mining: data preparation. Thanks largely to its perceived difficulty, data preparation has traditionally taken a backseat to the more alluring question of how best to extract meaningful knowledge. But without adequate preparation of your data, the return on the resources invested in mining is certain to be disappointing.
Dorian Pyle corrects this imbalance. A twenty-five-year veteran of what has become the data mining industry, Pyle shares his own successful data preparation methodology, offering both a conceptual overview for managers and complete technical details for IT professionals. Apply his techniques and watch your mining efforts pay off-in the form of improved performance, reduced distortion, and more valuable results.
On the enclosed CD-ROM, you'll find a suite of programs as C source code and compiled into a command-line-driven toolkit. This code illustrates how the author's techniques can be applied to arrive at an automated preparation solution that works for you. Also included are demonstration versions of three commercial products that help with data preparation, along with sample data with which you can practice and experiment.
- ISBN-101558605290
- ISBN-13978-1558605299
- Edition1st
- PublisherMorgan Kaufmann
- Publication dateApril 5, 1999
- LanguageEnglish
- Dimensions7.25 x 1.25 x 9.25 inches
- Print length560 pages
![]() |
Customers who bought this item also bought
Editorial Reviews
From the Back Cover
Data Preparation for Data Mining addresses an issue unfortunately ignored by most authorities on data mining: data preparation. Thanks largely to its perceived difficulty, data preparation has traditionally taken a backseat to the more alluring question of how best to extract meaningful knowledge. But without adequate preparation of your data, the return on the resources invested in mining is certain to be disappointing.
Dorian Pyle corrects this imbalance. A twenty-five-year veteran of what has become the data mining industry, Pyle shares his own successful data preparation methodology, offering both a conceptual overview for managers and complete technical details for IT professionals. Apply his techniques and watch your mining efforts pay off-in the form of improved performance, reduced distortion, and more valuable results.
Features
- Offers in-depth coverage of an essential but largely ignored subject.
- Goes far beyond theory, leading you-step by step-through the author's own data preparation techniques.
- Provides practical illustrations of the author's methodology using realistic sample data sets.
- Includes algorithms you can apply directly to your own project, along with instructions for understanding when automation is possible and when greater intervention is required.
- Explains how to identify and correct data problems that may be present in your application.
- Prepares miners, helping them head into preparation with a better understanding of data sets and their limitations.
On the enclosed CD-ROM, you'll find a suite of programs as C source code and compiled into a command-line-driven toolkit. This code illustrates how the author's techniques can be applied to arrive at an automated preparation solution that works for you. Also included are demonstration versions of three commercial products that help with data preparation, along with sample data with which you can practice and experiment.
About the Author
Dorian Pyle is Chief Scientist and Founder of PTI (www.pti.com), which develops and markets Powerhouse™ predictive and explanatory analytics software. Dorian has over 20 years experience in artificial intelligence and machine learning techniques which are used in what is known today as “data mining or “predictive analytics. He has applied this knowledge as a consultant with Knowledge Stream Partners, Xchange, Naviant, Thinking Machines, and Data Miners and with various companies directly involved in credit card marketing for banks and with manufacturing companies using industrial automation. In 1976 he was involved in building artificially intelligent machine learning systems utilizing the pioneering technologies that are currently known as neural computing and associative memories. He is current in and familiar with using the most advanced technologies in data mining including: entropic analysis (information theory), chaotic and fractal decomposition, neural technologies, evolution and genetic optimization, algebra evolvers, case-based reasoning, concept induction and other advanced statistical techniques.
Product details
- Publisher : Morgan Kaufmann; 1st edition (April 5, 1999)
- Language : English
- Paperback : 560 pages
- ISBN-10 : 1558605290
- ISBN-13 : 978-1558605299
- Item Weight : 2.07 pounds
- Dimensions : 7.25 x 1.25 x 9.25 inches
- Best Sellers Rank: #1,649,502 in Books (See Top 100 in Books)
- #361 in Artificial Intelligence (Books)
- #449 in Database Storage & Design
- #2,177 in Artificial Intelligence & Semantics
- Customer Reviews:
About the author

Discover more of the author’s books, see similar authors, read author blogs and more
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on Amazon-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
The book's twelve chapters can be organized into three groups. The first three discuss data exploration as the larger context in which data mining is conducted. The author reminds us that finding interesting and useful problems that data analysis can solve is at least as important as knowing how to solve them. Chapters four through eight present common data problems and offer solutions. Processes include assembling data from archives and other sources, selectively removing variables, replacing missing observations, and normalizing distributions.
The final four chapters are more specialized. Chapter 9 discusses data problems that appear in time series data and other types of series data. Chapter 10 describes issues that may remain in data sets after problems with individual variables have been corrected. Chapter 11 describes why and how to conduct a "data survey" to learn the high-level features of a data set and prepare for more detailed analysis. The last chapter closes the book with modeling and analysis techniques--where most data mining books begin.
This book is an excellent resource for practicing data miners. It's coverage of data preparation is thorough; it connects well to other aspects of data mining; and it emphasizes overall purpose of making decisions with data. It provides an adequate statistical foundation and has a practical focus throughout. It is full of tips, tactics and techniques. Nicely done, Mr. Pyle.
Although I still believe if someone is brand new to the field that they begin with Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , this should be the second book that they read. Far too many books in this area read like statistics books (notably Data Mining Methods and Models ).
Statistics training can be of enormous benefit to data miners, but leads to certain predictable errors. Not only that, many data miners already have statistics training and that just compounds the likelihood that they will make these mistakes when the book author fails to show the difference clearly. Pyle performs consistently well in this regard. He consistently focuses on the kinds of problems data miners are likely to see in their work.
To give just a couple of examples: Few variables will be already stored as continuous, normally distributed variables; principle components analysis might sometimes be a problematic way to eliminate predictors and even be dangerous; missing versus "empty" data; constantly present non-linearity.
His practice data set has a real variety of variable types, and dozens of predictors. If you are figuring out if Data Mining can help you, start with the Berry/Linoff book. But .. if you are about to begin in earnest read this book. Then, time permitting; start reading specific books on modeling or software. For instance, another Larose book has good, detailed coverage of algorithms, and some information on Clementine. Discovering Knowledge in Data: An Introduction to Data Mining
I bought the latest edition. (Mar 2011). Morgan Kaufman publisher have done a terrible job in printing this book. The alphabets comprise of dots similar to save ink option when printing a page.
I find reading the book very disgusting.
kah
The book is well-written and more importantly, understandable. It should be required reading for every researcher and modeler BEFORE they begin their careers. The way data is prepared and aggregated determines the picture one gets from the data. It must be done correctly from the start, or all downstream processing and conclusions are suspect.
The CD that comes with the book is pretty much useless, but aside from that caveat, this is a great book. Buy it - you won't be disappointed.
Top reviews from other countries
El libro se acompaña de un CD con software que debe estar muy limitado porque los procesos que puede realizar son muy básicos e incluso con alguno de los ejemplos que incluye deja de funcionar.
Contacté al autor mediante el correo que proporciona el libro y resulta un correo desconocido, contacté con los editores y no contestaron, así que el libro se queda en más bien poco.

