Other Sellers on Amazon
+ $5.51 shipping
89% positive over last 12 months
100% positive over last 12 months
& FREE Shipping
90% positive over last 12 months
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required. Learn more
Read instantly on your browser with Kindle Cloud Reader.
Using your mobile phone camera - scan the code below and download the Kindle app.
Practical Data Science with R 1st Edition
| Nina Zumel (Author) Find all the books, read about the author, and more. See search results for this author |
| John Mount (Author) Find all the books, read about the author, and more. See search results for this author |
| Price | New from | Used from |
|
Audible Audiobook, Unabridged
"Please retry" |
$0.00
| Free with your Audible trial | |
There is a newer edition of this item:
Enhance your purchase
Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the Book
Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics.
Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels.
This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed.
What's Inside
- Data science for the business professional
- Statistical analysis using the R language
- Project lifecycle, from planning to delivery
- Numerous instantly familiar use cases
- Keys to effective data presentations
About the Authors
Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com.
Table of Contents
PART 1 INTRODUCTION TO DATA SCIENCE
- The data science process
- Loading data into R
- Exploring data
- Managing data
PART 2 MODELING METHODS
- Choosing and evaluating models
- Memorization methods
- Linear and logistic regression
- Unsupervised methods
- Exploring advanced methods
PART 3 DELIVERING RESULTS
- Documentation and deployment
- Producing effective presentations
- ISBN-101617291560
- ISBN-13978-1617291562
- Edition1st
- PublisherManning
- Publication dateApril 13, 2014
- LanguageEnglish
- Dimensions7.38 x 0.9 x 9.25 inches
- Print length389 pages
Frequently bought together

- +
Customers who viewed this item also viewed
Editorial Reviews
Review
ACM SIGACT, Reviewed by Allan M. Miller. doi :10.1145/3061640.3061644 ( dl.acm.org/citation.cfm?doid=3061640.3061644 )
"Practical Data Science with R" is a remarkable book, packed with both valuable technical material about data science, and practical advice for how to conduct a successful data science project. In a field that is so new, and growing so quickly, it is an essential guide for practitioners, especially for the large numbers of new data scientists moving into the field. It is not only a worthile read, it can serve as a useful ongoing technical reference and practical manual for the data science practitioner.
From the Author
About the Author
John Mount co-founded Win-Vector, a data science consulting firm in San Francisco. He has a Ph.D. in computer science from Carnegie Mellon and over 15 years of applied experience in biotech research, online advertising, price optimization and finance. He contributes to the Win-Vector Blog, which covers topics in statistics, probability, computer science, mathematics and optimization.
I'd like to read this book on Kindle
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Product details
- Publisher : Manning; 1st edition (April 13, 2014)
- Language : English
- Paperback : 389 pages
- ISBN-10 : 1617291560
- ISBN-13 : 978-1617291562
- Item Weight : 1.56 pounds
- Dimensions : 7.38 x 0.9 x 9.25 inches
- Best Sellers Rank: #1,471,691 in Books (See Top 100 in Books)
- #444 in Software Design Tools
- #599 in Software Testing
- #649 in Mathematical & Statistical Software
- Customer Reviews:
About the authors

I produce applied research, prototyping and training in information extraction, algorithms and data-mining for web-scale businesses, hedge funds and start ups. Right now I do this as a consultant at Win-Vector LLC. Earlier I have managed a research group at Shopping.com, performed research in biotech and been a trader in a hedge-fund.

Nina Zumel is a Principal Consultant with Win-Vector LLC, a data science consulting firm based in San Francisco. She is the co-author with John Mount of Practical Data Science with R, which presents the process and principles of data science from a practitioner's perspective.
Her technical interests include data science, statistics, statistical learning, and data visualization. She is also interested (at a layperson's level) in cognitive science, psychology, and linguistics. When she isn't working, she writes and dances.
Technical blogs:
http://www.win-vector.com/blog/
http://ninazumel.com/posts/
Personal blog (folklore and weird fiction):
http://multoghost.wordpress.com
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonTop reviews from the United States
There was a problem filtering reviews right now. Please try again later.
Earlier in the book it seemed the authors took great pains to explain in layman's terms the various statistical elements of the topic they were covering. They provided very clear and meaningful explanations which made a lot of sense of complex topics. But later in the book it seemed that that approach largely went out the window and they started using more technical boiler plate to describe the various statistical tests and procedures. Rather than perhaps give the technical boilerplate (as you'd see it in a textbook) and then elaborate on it with a more human-centric explanation, they would just leave it at the nearly impervious technical description and then proceed to explain how to conduct the test/procedure/etc in R. But without understanding of what you're trying to accomplish and why, it's hard to write the code to actually do it. Keep in mind that I'm relatively well prepared for this book too, having had as much stats and econometrics as I could fit into my four-year degree. If I found some sections of the book too technical to understand then it seems likely that the book would benefit from some additional explanation and discussion in those later sections.
Also, I have a good deal of "boots on the ground" experience with this book in my attempts to apply it in my daily work. I've found that it is useful, but could be more useful if there was more discussion of various practical problems. For instance, much of my work is focused on producing a predictive model of likelihood of charge-off. I.e., if we approve and fund this application, how likely is it to perform or charge-off. The book shares some high-level approaches to finding problems in data (using plots and summaries), fixing those problems using various techniques, selecting variables, and how to conduct the statistical modeling (logistic in my case). But it fails to really tie those areas together beyond the high-level. For instance, what are the assumptions of a logistic regression? How do you resolve issues in your data to ensure that you meet those assumptions and can perform a valid logistic regression? How do you really select variables when you're faced with at least 20 possibilities (and potentially many many more if you count interaction terms, unfixed variables, and variables which have been fixed in different ways)?
I suppose, for what it was, that it is "mission accomplished." I'd just like to see a lot more. Perhaps there's need for a second volume? Perhaps "Advanced Practical Data Science with R?" Either this book could have a second edition with a lot more content covering finding data problems, resolving those problems intelligently (for instance, resolving missing data is basically left as "either drop the effected records" or "use the mean as a replacement or the missing value," but there are alternative methods which may be more suitable), what data problems will cause issues in OLS regression, logistic regression, and machine learning; And how to practically select variables and a model. I feel like the book gave me some tools to apply (like a small box of tools you might purchase from a hardware store), but left a lot out. So now I'm in deep water trying to figure out why my logistic regression isn't predictive enough and what I can do about it. Is it the data and how I fixed variables? Is it the variables I've selected? Should I have used automated variable selection techniques? Or just manually tried different variables? How does an experienced practitioner approach these problems? I know they iterate: explore data, clean data, select variables, select model, test model, look at data, change data, change variables, etc... but practically speaking what does it look like? In the book they offer a hand-coded basic variable selection script, and mention that one could also use stepwise variable selection. In the real world I'm reasonably sure that this is not actually done--mostly because their selection script does about as well as stepwise at selecting appropriate variables. There are many other better ways of selecting variables, I've discovered, and I wish that they'd discussed some of those ways (pros and cons), and shown how to conduct them in a meaningful fashion. Same thing with building a model. In my case, I have a whole bunch of variables, limited data (about 2000 records, with the desired outcome only occurring in 120 of those), and the automated tools (various R packages I've discovered and applied) either take a long time to run and/or yield poor results. But if not automated tools then what? Manually add variables and ANOVA test the difference between the first and second model?
I'd just like more...more discussion and elaboration and examples of how practical data science is conducted. This book seems like it does a fantastic job as an introduction to the topic, but you'll quickly find that you'll be in deep water without a clue how to swim--as in my case. You'll be left to your own devices, and find yourself wishing, as I do, that there was more in the book (or another book) that I could study after this one which would help take me from beginner data scientist to intermediate.
Overall, I'm very glad I bought and read the book.
If you buy a hard copy, I would also recommend having a look at the colorful figures in the companion soft copy of the book.
This book isn't about using R for regression.
The "data science" the authors refer to seems to involve analyzing data around basic ratios and graphs. At least for me, some of the codes provided using ggplot2 did not work properly using the downloaded datasets.
Another point of annoyance: downloading the datasets used in this book required me to download and use GitHub, which to me is a sorry piece of software that I would otherwise never use. What a hassle that was.
The book assumes you are already familiar with statistics. I specifically do not recommend this book for those who are trying to learn and understand regression using R.
Pros:
1)Fantastic examples on how to use the graphing features of R.
2)Good breakdown of basic data analysis (not on regression).
Cons:
1)Scant information on statistics and R. Don't mistaken this book as a book on statistics (as "data science" implies).
2)Does not focus on regression using R.





