Buy new:
-9% $49.95$49.95
$3.99 delivery July 31 - August 1
Ships from: Broad-Street-Books Sold by: Broad-Street-Books
Save with Used - Acceptable
$26.99$26.99
Ships from: Amazon Sold by: Champion Book Co. LLC
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the authors
OK
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition
There is a newer edition of this item:
Purchase options and add-ons
Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You'll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you've learned along the way.
You'll learn how to:
- Wrangleâ??transform your datasets into a form convenient for analysis
- Programâ??learn powerful R tools for solving data problems with greater clarity and ease
- Exploreâ??examine your data, generate hypotheses, and quickly test them
- Modelâ??provide a low-dimensional summary that captures true "signals" in your dataset
- Communicateâ??learn R Markdown for integrating prose, code, and results
- ISBN-101491910399
- ISBN-13978-1491910399
- Edition1st
- PublisherO'Reilly Media
- Publication dateJanuary 31, 2017
- LanguageEnglish
- Dimensions5.91 x 0.59 x 9.84 inches
- Print length518 pages
Frequently bought together

Similar items that may deliver to you quickly
From the brand
-
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
Editorial Reviews
About the Author
Hadley Wickham is an Assistant Professor and the Dobelman FamilyJunior Chair in Statistics at Rice University. He is an active memberof the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization. His research focuses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualization to better understand data and models.
Garrett Grolemund is a statistician, teacher and R developer who currently works for RStudio. He sees data analysis as a largely untapped fountain of value for both industry and science. Garrett received his Ph.D at Rice University in Hadley Wickham's lab, where his research traced the origins of data analysis as a cognitive process and identified how attentional and epistemological concerns guide every data analysis.
Garrett is passionate about helping people avoid the frustration and unnecessary learning he went through while mastering data analysis. Even before he finished his dissertation, he started teaching corporate training in R and data analysis for Revolutions Analytics. He's taught at Google, eBay, Axciom and many other companies, and is currently developing a training curriculum for RStudio that will make useful know-how even more accessible.
Outside of teaching, Garrett spends time doing clinical trials research, legal research, and financial analysis. He also develops R software, he's co-authored the lubridate R package which provides methods to parse, manipulate, and do arithmetic with date-times and wrote the ggsubplot package, which extends the ggplot2 package.
Product details
- Publisher : O'Reilly Media; 1st edition (January 31, 2017)
- Language : English
- Paperback : 518 pages
- ISBN-10 : 1491910399
- ISBN-13 : 978-1491910399
- Item Weight : 1.47 pounds
- Dimensions : 5.91 x 0.59 x 9.84 inches
- Best Sellers Rank: #66,064 in Books (See Top 100 in Books)
- #8 in Mathematical & Statistical Software
- #22 in Data Processing
- #65 in Probability & Statistics (Books)
- Customer Reviews:
About the authors

Hadley is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (ggplot2, dplyr, tidyr), data ingest (readr, readxl, haven), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his homepage, http://hadley.nz.

Discover more of the author’s books, see similar authors, read author blogs and more
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book very easy to read and follow. They also appreciate the good content and reproducible code for the examples. Readers also appreciate that the book contains technical training, visual reporting, and sample datasets.
AI-generated from the text of customer reviews
Customers find the book very easy to read and follow, and appreciate the structure. They say the authors are knowledgeable, and they write with clarity. They also say the book walks you through setting up the Tidyverse and touches on finer points. They appreciate the elegance and coherence of the Tideverse, and the author and publisher are easy to contact.
"...This section explores how to chain complicated data operations together. You also get to see some good practice in writing human readable code...." Read more
"...It reads really well building up the different components whilst still being a valuable reference if you just need a reminder of a particular..." Read more
"I am very happy with this book. Its easy to read and the exercises/code work and are applicable...." Read more
"...The book is great, easy to understand, and provides a lot of questions and examples to work through in each chapter...." Read more
Customers find the book's content useful for improving their R coding skills. They also say it's the best introduction to the Tidyverse, a great help in college, and a good reference to have on the shelf. Readers appreciate the reproducible code and images that guide them along.
"...This book is a solid reference book for learning R. Another good thing is that the companion website is regularly updated and you can copy and paste..." Read more
"...Its easy to read and the exercises/code work and are applicable. The text is clearly written and the supporting graphics are well constructed...." Read more
"...The book is great, easy to understand, and provides a lot of questions and examples to work through in each chapter...." Read more
"I think this book is a perfect blend of technical training, visual reporting, sample datasets and clear communication...." Read more
Customers find the book easy to download. They also mention that it's free on the internet.
"...And it's free! Also, the author provided answer keys for the exercises. Just google it, you will find it...." Read more
"...This book is available to read online for free from Hadley’s website (easily found in a google search)...." Read more
"The content of this book is useful, but is freely available online. I prefer to read/work from print, so I purchased this book...." Read more
"It's free on the INTERNET, but I still enjoy the physical copy...." Read more
Reviews with images
-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
1. Getting started (Explore)
2. Data manipulation (Wrangle)
3. Scripting (Program)
4. Build models
5. Presenting information (Communicate)
Here’s a brief overview of each section:
Explore
I was a bit puzzled as the book jumps directly into plotting using the ggplot library. Although the authors are clear that the purpose of introducing ggplot is to keep our motivation high, seeing the ggplot syntax without a whole lot of background can be confusing at first. But then the authors do a great job at explaining the various aspects of ggplot. So my advice to other R beginners is to just keep going forward!
After the intro to ggplot, you learn about the basics of R, such as variables and functions, as well as intermediate topics such as writing scripts, manipulating data using dplyr, and lastly EDA. This section ends with a good overview of how to use RStudio to saves script files.
Wrangle
This section starts with an intro to tibbles, which is a new concept for handling data in R. This section is important since most of the functions in later chapters use tibbles, such as reading data from various sources. There are many important concepts that seem useful such as “tidying up” your data and cleaning data points (string and dates).
Program
This is a function section that introduces data pipes. This section explores how to chain complicated data operations together. You also get to see some good practice in writing human readable code. Then the book jumps into using the power of R as a functional programming language. Once I feel more proficient, I will take a deeper look at purr.
Model
Now, depending on your situation, you may not be interested in this section, which is focuses on predictive modeling. There are quite a few recipes that explore “why” questions. Definitely planning to come back to this chapter later.
Communicate
This section is useful because it introduces how to create reusablereports.
Overall, this is my go-to book along with the book I previously mentioned. This book is a solid reference book for learning R. Another good thing is that the companion website is regularly updated and you can copy and paste the code examples directly into RStudio. I’m looking forward to further exploring R and learning how to create my own data visualizations.
The book is broken up into a number of sections that effectively builds up the ability to ingest, transform, visualise and model datasets. A good portion of the book is available in an online version, to give you a taste of how it is written. Many have been following it as it was written. I have passed on copies of the book to a number of colleagues who were just starting out and the response has been uniformly positive. In my own case I was familiar with some of the these packages; ggplot2, dplyr, tidyr, but found the book taught me purrr and how to better use the packages together.
Probably my two biggest caveats to readers are that there are situations where packages from outside the "hadleyverse" maybe required. The authors do a great job of pointing this out, but it does pay in my experience to know data.table and lattice for example. Both because they can occasionally fit a problem better but also because you inevitably come across other people's code where these packages are used. The other caveat is that the modelling is a little rudimentary. Most of the examples are just fitting independent regression models, whereas it seems to me that a hierarchical model would be a better fit. Still these are small things and it would be silly to expect a single book to cover all of these areas.
In short this is the book I would give to someone who was keen to learn about how to use R for data science. It reads really well building up the different components whilst still being a valuable reference if you just need a reminder of a particular package (what is the difference between tibbles and data frames again?). Even though a good portion of the book is available online, it is well worth it to have the full thing on your bookshelf (digital or otherwise). On a broader note with Max Kuhn (author of the excellent "Applied Predictive Modelling" with Kjell Johnson) joining Wickham and Grolemund at RStudio, it is a great time to start your R journey.
Top reviews from other countries
As a graduate student this book provided me with a foundation for using R and Rstudio using the tidyverse package. It makes you a generalist in R where you can do most things afterwards. It could take a few months with meaningful projects on hand to get the hang of most things. The writers are good teachers as they take you through the nicely structured concepts without confusing or overwhelming you. If you know a little you can enter different chapters without necessarily knowing the previous ones. The book exposes you to a lot of snippets of good workflows and you learn how to do things the right way. You also see what is possible generally and this seeds new ideas in one's head. Most of the best practices you learn can be taken to other languages and workflows as well. Learning R and data analysis this way will not only let you be a good data analyst but but it also prepares you to navigate any programming language with the foundation and framework for understanding data you get here. You can always combine R with another language where you think R, and the tidyverse package in particular, will get the job done better.
This book is available online for free but I bought the physical copy as I wanted to read it seriously from cover to cover. In 6 months I have read about 1/2 to all the content in each chapter of the book and used its code in my own workflow. I started using R sporadically for the first time two years prior and Rstudio two months prior to purchasing this book.
In the end the book has become my go to for learning and coding, as a learning tool and a handy reference. I use it with the Oxford Handbook of Medical Statistics by Peacock, Janet L. for statistical methods and the STDHA website for other sample workflows and codes.














