R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Frequently bought together
Customers who viewed this item also viewed
From the Publisher
|Data Science for Business||Data Science from Scratch||Doing Data Science||R for Data Science||Data Science at the Command Line||Python Data Science Handbook|
|What You Need to Know about Data Mining and Data-Analytic Thinking||First Principles with Python||Straight Talk from the Frontline||Visualize, Model, Transform, Tidy, and Import Data||Facing the Future with Time-Tested Tools||Tools and Techniques for Developers|
About the Author
Hadley Wickham is an Assistant Professor and the Dobelman FamilyJunior Chair in Statistics at Rice University. He is an active memberof the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization. His research focuses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualization to better understand data and models.
Garrett Grolemund is a statistician, teacher and R developer who currently works for RStudio. He sees data analysis as a largely untapped fountain of value for both industry and science. Garrett received his Ph.D at Rice University in Hadley Wickham's lab, where his research traced the origins of data analysis as a cognitive process and identified how attentional and epistemological concerns guide every data analysis.
Garrett is passionate about helping people avoid the frustration and unnecessary learning he went through while mastering data analysis. Even before he finished his dissertation, he started teaching corporate training in R and data analysis for Revolutions Analytics. He's taught at Google, eBay, Axciom and many other companies, and is currently developing a training curriculum for RStudio that will make useful know-how even more accessible.
Outside of teaching, Garrett spends time doing clinical trials research, legal research, and financial analysis. He also develops R software, he's co-authored the lubridate R package--which provides methods to parse, manipulate, and do arithmetic with date-times--and wrote the ggsubplot package, which extends the ggplot2 package.
There was a problem filtering reviews right now. Please try again later.
Before I get into the review to explain why I had to stop reading the book, it is important to note that this book is available online for free. I prefer print over screen, when possible. But if you don’t have a preference, just use that.
Why did I put this book down midway through Chapter 1?
Cascade of events that started w/ me requiring solutions to practice problems that are in the book.
The only way to learn math and software development is by doing. Books on these subjects should ALWAYS contain exercise problems and solutions to those problems, either at the end of the chapter or by way of an appendix at the end of the book.
The best solutions that I found are at jrnold's github page. I quickly noticed, however, that the answers posted on that site didn't quite fit the exercises in the book. When comparing the online version to the printed version (book), I noticed that exercises from the book had been reworded or completely dropped. So from the beginning of this year, when this book was published and released for sale, to this summer, it is apparent that many errors had been found and revisions needed to be implemented.
There were so many differences between the online version and the book that I decided to stop reading the book in lieu of the online version.
My 5-star Rating:
The author does an excellent job explaining topics. He is very knowledgeable and it shows. With the amount of revisions in such a short time, however, I can't help but think that this book was rushed.
But if I am stopped reading the book b/c of errors, why 5 stars? The book, by itself, might have gotten a 1-star review from me, but I am still going to learn from this author. The online version costs him/someone to keep up-to-date. Purchasing the book is an easy (and very fair) way to support this project.
The book is broken up into a number of sections that effectively builds up the ability to ingest, transform, visualise and model datasets. A good portion of the book is available in an online version, to give you a taste of how it is written. Many have been following it as it was written. I have passed on copies of the book to a number of colleagues who were just starting out and the response has been uniformly positive. In my own case I was familiar with some of the these packages; ggplot2, dplyr, tidyr, but found the book taught me purrr and how to better use the packages together.
Probably my two biggest caveats to readers are that there are situations where packages from outside the "hadleyverse" maybe required. The authors do a great job of pointing this out, but it does pay in my experience to know data.table and lattice for example. Both because they can occasionally fit a problem better but also because you inevitably come across other people's code where these packages are used. The other caveat is that the modelling is a little rudimentary. Most of the examples are just fitting independent regression models, whereas it seems to me that a hierarchical model would be a better fit. Still these are small things and it would be silly to expect a single book to cover all of these areas.
In short this is the book I would give to someone who was keen to learn about how to use R for data science. It reads really well building up the different components whilst still being a valuable reference if you just need a reminder of a particular package (what is the difference between tibbles and data frames again?). Even though a good portion of the book is available online, it is well worth it to have the full thing on your bookshelf (digital or otherwise). On a broader note with Max Kuhn (author of the excellent "Applied Predictive Modelling" with Kjell Johnson) joining Wickham and Grolemund at RStudio, it is a great time to start your R journey.
High quality printing, full color code and graphs. The book stay open.
Top international reviews
Worth noting that the book is made available for free online but is beautifully reproduced in book form for ease of reference.
Sure, you could find the information in this book on Google, but that can be an overwhelming experience if you’re just starting out with data analysis and trying to learn tidyverse and other essential packages. This book is in a very understandable language for beginners and novice R users and beyond, and something to keep at hand at all times.
However, it's a bit like learning conversation before learning grammar... Very quickly one realizes the importance of the basics. So I personally would combine this book with other conventional R courses to learn the language properly.
Bought two copies, first one when I was trying to fill in some of my knowledge gaps about R. Then bought another copy for other analysts in my team to use. We all write better code now :-)
In my personal experience it takes more than 90% time to get the data in a shape where you think it is now ready for you to run your algorithms. This is far cry from the sterile,cleaned up data sets used in the data science courses. The real life data is very messy. This book is clearly written for the practitioners who deal with real life situations. This book helped me a lot to get my data in proper form. which includes joining two disparate data-sets based on keys, creating new derivative columns based on existing columns, remove unwanted feature set, take care of all those NAs, filter the rows on criteria, perform different aggregate functions (mean, sum, median etc) date wise or factor wise. It also covers bit of ggplot basics so you can start plotting the data from the word go. This book uses the tidyverse family packages (especially dplyr) written by the same author. ggplot is also written by Wikham. Both packages contain functions that can improve your productivity 10x. Both have now become de facto packages used by most R data scientists.
This book immensely helped me perform data analysis on my messy data. It taught me what and how to perform necessary operations on my data using very useful functions in dplyr package which is part of tidyverse family.
R is very powerful environment for data analysis. I would call it Ferrari of data science world. But although very powerful, it has its own quirks and learning curve, even for experienced programmers. The packages like dplyr, caret, ggplot make your life easier and allow you to fully harness the horse power of R.