- Paperback: 194 pages
- Publisher: O'Reilly Media; 1 edition (July 2, 2017)
- Language: English
- ISBN-10: 1491981652
- ISBN-13: 978-1491981658
- Product Dimensions: 6.9 x 0.4 x 9.1 inches
- Shipping Weight: 12.6 ounces (View shipping rates and policies)
- Average Customer Review: 8 customer reviews
- Amazon Best Sellers Rank: #61,576 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Text Mining with R: A Tidy Approach 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Frequently bought together
Customers who bought this item also bought
From the Publisher
From the Preface
If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and text-heavy. Many of us who work in analytical fields are not trained in even simple interpretation of natural language.
We developed the tidytext (Silge and Robinson 2016) R package because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Treating text as data frames of individual words allows us to manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.
This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.
About this book
This book is focused on practical software examples and data explorations. There are few equations, but a great deal of code. We especially focus on generating real insights from the literature, news, and social media that we analyze.
We don’t assume any previous knowledge of text mining. Professional linguists and text analysts will likely find our examples elementary, though we are confident they can build on the framework for their own analyses.
We do assume that the reader is at least slightly familiar with dplyr, ggplot2, and the %>% 'pipe' operator in R, and is interested in applying these tools to text data. For users who don’t have this background, we recommend books such as R for Data Science. We believe that with a basic background and interest in tidy data, even a user early in their R career can understand and apply our examples.
Topics this book does not cover
- Clustering, classification, and prediction
- Word embedding
- More complex tokenization
- Languages other than English
About the Author
Julia Silge is a data scientist at Stack Overflow; her work involves analyzing complex datasets and communicating about technical topics with diverse audiences. She has a PhD in astrophysics and loves Jane Austen and making beautiful charts. Julia worked in academia and ed tech before moving into data science and discovering the statistical programming language R.
David Robinson is a data scientist at Stack Overflow with a PhD in Quantitative and Computational Biology from Princeton University. He enjoys developing open source R packages, including broom, gganimate, fuzzyjoin and widyr, as well as blogging about statistics, R, and text mining on his blog, Variance Explained.
Browse award-winning titles. See more
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Pro tip: I would recommend brushing up on your tidyverse before diving in. This should help make understanding what the authors are trying to do much easier.
It's a short book, but it's worth it's weight in gold. I'm already reading this a second time!