- Paperback: 389 pages
- Publisher: Manning; 1st edition (April 13, 2014)
- Language: English
- ISBN-10: 1617291560
- ISBN-13: 978-1617291562
- Product Dimensions: 7.3 x 1 x 9.1 inches
- Shipping Weight: 1.5 pounds (View shipping rates and policies)
- Average Customer Review: 35 customer reviews
- Amazon Best Sellers Rank: #97,269 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Practical Data Science with R 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Frequently bought together
Customers who bought this item also bought
About the Author
Nina Zumel co-founded Win-Vector, a data science consulting firm in San Francisco. She holds a PH.D. in robotics from Carnegie Mellon and was a content developer for EMC's Data Science and Big Data Analytics Training Course. Nina also contributes to the Win-Vector Blog, which covers topics in statistics, probability, computer science, mathematics and optimization.
John Mount co-founded Win-Vector, a data science consulting firm in San Francisco. He has a Ph.D. in computer science from Carnegie Mellon and over 15 years of applied experience in biotech research, online advertising, price optimization and finance. He contributes to the Win-Vector Blog, which covers topics in statistics, probability, computer science, mathematics and optimization.
Read reviews that mention
Showing 1-8 of 35 reviews
There was a problem filtering reviews right now. Please try again later.
Earlier in the book it seemed the authors took great pains to explain in layman's terms the various statistical elements of the topic they were covering. They provided very clear and meaningful explanations which made a lot of sense of complex topics. But later in the book it seemed that that approach largely went out the window and they started using more technical boiler plate to describe the various statistical tests and procedures. Rather than perhaps give the technical boilerplate (as you'd see it in a textbook) and then elaborate on it with a more human-centric explanation, they would just leave it at the nearly impervious technical description and then proceed to explain how to conduct the test/procedure/etc in R. But without understanding of what you're trying to accomplish and why, it's hard to write the code to actually do it. Keep in mind that I'm relatively well prepared for this book too, having had as much stats and econometrics as I could fit into my four-year degree. If I found some sections of the book too technical to understand then it seems likely that the book would benefit from some additional explanation and discussion in those later sections.
Also, I have a good deal of "boots on the ground" experience with this book in my attempts to apply it in my daily work. I've found that it is useful, but could be more useful if there was more discussion of various practical problems. For instance, much of my work is focused on producing a predictive model of likelihood of charge-off. I.e., if we approve and fund this application, how likely is it to perform or charge-off. The book shares some high-level approaches to finding problems in data (using plots and summaries), fixing those problems using various techniques, selecting variables, and how to conduct the statistical modeling (logistic in my case). But it fails to really tie those areas together beyond the high-level. For instance, what are the assumptions of a logistic regression? How do you resolve issues in your data to ensure that you meet those assumptions and can perform a valid logistic regression? How do you really select variables when you're faced with at least 20 possibilities (and potentially many many more if you count interaction terms, unfixed variables, and variables which have been fixed in different ways)?
I suppose, for what it was, that it is "mission accomplished." I'd just like to see a lot more. Perhaps there's need for a second volume? Perhaps "Advanced Practical Data Science with R?" Either this book could have a second edition with a lot more content covering finding data problems, resolving those problems intelligently (for instance, resolving missing data is basically left as "either drop the effected records" or "use the mean as a replacement or the missing value," but there are alternative methods which may be more suitable), what data problems will cause issues in OLS regression, logistic regression, and machine learning; And how to practically select variables and a model. I feel like the book gave me some tools to apply (like a small box of tools you might purchase from a hardware store), but left a lot out. So now I'm in deep water trying to figure out why my logistic regression isn't predictive enough and what I can do about it. Is it the data and how I fixed variables? Is it the variables I've selected? Should I have used automated variable selection techniques? Or just manually tried different variables? How does an experienced practitioner approach these problems? I know they iterate: explore data, clean data, select variables, select model, test model, look at data, change data, change variables, etc... but practically speaking what does it look like? In the book they offer a hand-coded basic variable selection script, and mention that one could also use stepwise variable selection. In the real world I'm reasonably sure that this is not actually done--mostly because their selection script does about as well as stepwise at selecting appropriate variables. There are many other better ways of selecting variables, I've discovered, and I wish that they'd discussed some of those ways (pros and cons), and shown how to conduct them in a meaningful fashion. Same thing with building a model. In my case, I have a whole bunch of variables, limited data (about 2000 records, with the desired outcome only occurring in 120 of those), and the automated tools (various R packages I've discovered and applied) either take a long time to run and/or yield poor results. But if not automated tools then what? Manually add variables and ANOVA test the difference between the first and second model?
I'd just like more...more discussion and elaboration and examples of how practical data science is conducted. This book seems like it does a fantastic job as an introduction to the topic, but you'll quickly find that you'll be in deep water without a clue how to swim--as in my case. You'll be left to your own devices, and find yourself wishing, as I do, that there was more in the book (or another book) that I could study after this one which would help take me from beginner data scientist to intermediate.
Overall, I'm very glad I bought and read the book.