4 of 5 people found the following review helpful:
5.0 out of 5 stars
Numerical recipes for Data Preparation, February 26, 2007
This review is from: Data Preparation for Data Mining Using SAS (The Morgan Kaufmann Series in Data Management Systems) (Paperback)
It is well known that the majority of time spent doing data mining is in the data preparation. With this book, Dr. Refaat has given analysts the tools they need to perform the often laborious task of data preparation. If you are familiar with the Numerical Recipes series of books (and if you aren't, you should be), you will recognize a similar theme. The author has:
* itemized all the elements the data miner should be aware of when they are doing data preparation,
* presented enough technical description for the reader to understand why they would be performing that particular task,
* provided all the SAS code that can be used to actually perform the data preparation step
This book is what the practicing data miner needs.
Here's a brief on the table of contents:
1. Introduction
* setting the context of data mining
2. Tasks and Data Flow
* describes what data mining can do and where data preparation fits in
3. Review of Data Mining Modeling Techniques
* an overview of data mining techniques
4. SAS Macros: A Quick Start
* just in case you haven't worked with SAS macros
5. Data Acquisition and Integration
* where you get your data from and how it's pulled together
6. Integrity Checks
* how to make sure the data is correct and even what "correct" means
7. Exploratory Data Analysis
* get to know your data
8. Sampling and Partition
* dealing with large data sets as well as getting ready to validate the models you build
9. Data Transformations
* rarely is your source data in the form most effective for modeling - this chapter describes what can be done to produce the most effective models
10. Binning and Reduction of Cardinality
* make your variables less complex and often times, more presentable and understandable
11. Treatment of Missing Values
* you will have missing values in your data - here are several approaches for dealing with them
12. Predictive Power and Variable Reduction I
* introduces the concept of identifying usefulness of input variables and reducing the required number of variables
13. Analysis of Nominal and Ordinal Variables
* how to evaluate relationships with discrete variables
14. Analysis of Continuous Variables
* how to evaluate relationships with continuous variables
15. Principal Component Analysis
* how to use PCA for variable reduction during data preparation
16. Factor Analysis
* how to use Factor Analysis for variable reduction during data preparation
17. Predictive Power and Variable Reduction II
* defines methods of simplifying and reducing input variables with respect to the target variable
18. Putting It All Together
* a case study showing the application of all these techniques for data preparation in a realistic example
Appendix. Listing of SAS Macros
* complete listing of all the SAS code referenced in the book - also included on the CD
Help other customers find the most helpful reviews
Was this review helpful to you? Yes
No
4 of 6 people found the following review helpful:
5.0 out of 5 stars
The best data prep book so far, February 13, 2007
This review is from: Data Preparation for Data Mining Using SAS (The Morgan Kaufmann Series in Data Management Systems) (Paperback)
I have been working in data mining and with SAS for the last 10 years. This is the best book without doubt. It is consice, to the point, not a lot of fluf and useless theory. It teaches you how to actually do it! The book took me step by step through the process of data preparation using SAS and let me write fantastic macros. All the macros are included in the CD and are ready to run. I strongly recommend this book to anyone who is using SAS to work the data either for reporting or for modeling. I attended many training courses on data mining, and even data preparation, but nothing is like this book. It reveals all the secrets. For example, how to bin variables using Gini, how to select the best modeling variables using Entropy, Ginin, Chi2, how to reduce the variables using principal component analysis, treatment of missing values occupies and huge chapter that in my opinion has no competitors, mapping categorical variables into dummy variables, reduction of cardinality using Gini (best grouping). All these things until now were the secrets of the 'gurus', not any more thanks to Dr. Refaat and his book! For example, I used to use a decision tree software to select the best variables, then use logistic regression to build the models. Not any more. With the SAS programs in the book, I can now select the best variables and build the model within one SAs script.... I only wish if the author would also write a similar book on modeling... This book is a life saver ...
Help other customers find the most helpful reviews
Was this review helpful to you? Yes
No