- Paperback: 318 pages
- Publisher: O'Reilly Media; 1 edition (May 28, 2017)
- Language: English
- ISBN-10: 1491952962
- ISBN-13: 978-1491952962
- Product Dimensions: 6.9 x 0.6 x 9.1 inches
- Shipping Weight: 1.2 pounds (View shipping rates and policies)
- Average Customer Review: 24 customer reviews
- Amazon Best Sellers Rank: #6,327 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Practical Statistics for Data Scientists: 50 Essential Concepts 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
"Enlightenment Now: The Case for Reason, Science, Humanism, and Progress"
Is the world really falling apart? Is the ideal of progress obsolete? Cognitive scientist and public intellectual Steven Pinker urges us to step back from the gory headlines and prophecies of doom, and instead, follow the data: In seventy-five jaw-dropping graphs, Pinker shows that life, health, prosperity, safety, peace, knowledge, and happiness are on the rise. Learn more
Frequently bought together
Customers who bought this item also bought
Customers who viewed this item also viewed
From the Publisher
About this Book
Data science is a fusion of multiple disciplines, including statistics, computer science, information technology and domain specific fields. As a result, a several different terms could be used to reference a given concept. Key terms and their synonyms will be highlighted throughout the book in a sidebar within the text.
This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics. Both of us came to the world of data science from the world of statistics, and have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a disciple is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia worthy of an ocean liner.
Two goals underlie this book:
- To lay out, in digestible, navigable and easily referenced form, key concepts from statistics that are relevant to data science.
- To explain which concepts are important and useful from a data science perspective, which are less so, and why.
50 Essential Concepts
About the Author
Peter Bruce founded and grew the Institute for Statistics Education at Statistics.com, which now offers about 100 courses in statistics, roughly a third of which are aimed at the data scientist. In recruiting top authors as instructors and forging a marketing strategy to reach professional data scientists, Peter has developed both a broad view of the target market, and his own expertise to reach it.
Andrew Bruce has over 30 years of experience in statistics and data science in academia, government and business. He has a Ph.D. in statistics from the University of Washington and published numerous papers in refereed journals. He has developed statistical-based solutions to a wide range of problems faced by a variety of industries, from established financial firms to internet startups, and offers a deep understanding the practice of data science.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
The concepts are not astronomically explained, but with just enough depth that I can also individually explain to people what they are. What really stands out for me so far is after each or so concept, there is a section labeled as further reading (well, in the digital copy) that is usually at the end of the book altogether & I found myself realizing I have a lot of those books so the authors really know where to look & guide those who wanted more depth.
Yeah yeah yeah, the codes are missing (as of mid-June 2017) but if you really understood / know which packages to use, you wouldn't need the code. The first half of the book are two three liners of code concepts anyways; it's the explanations that matter the most. The second half of the book is the good part, which separates a white hat statistician from a grey hat data scientist, which is exactly what I wanted in a <300 page book.
Thanks for keeping me waiting since November though, thought it would never come! The O`Reilly books always keep me in awe at how they always know what topic I want to have a brief book (probably data collecting on me :P) & simultaneously leave me in suspense because I never notice I am preordering the books! Sigh. My only request is to be able to preorder the Kindle editions rather than the physical editions; my data science book cubby is starting to overwhelm my statistics cubby (NOT FOR LONG MASTERS PROGRAM ~).
It is true that the textbook does not provide in-depth coverage for all topics, but I don't think that was the intent of the authors. However, the text DOES provide an excellent introduction to topics relevant to students and data scientists. After reading the text and working through the examples, you will be equipped to further your knowledge in whichever topic you require for you data analysis task.
It is very easy to read, although sometimes too easy. There are very few math equations in this book, which is good or bad depending on your taste. E.g., Gaussians, of course, are mentioned throughout the book but I do not recall seeing the equation for a Gaussian in any dimension in this book (maybe I missed it?). In fact, it's so easy to read that I finished the book before the github repo saw its initial commit.
I'm not a huge fan of their coverage on classification models, but I might be biased because evaluation of classification models is one my things. E.g., they discuss ROC curves in fair depth but don't mention DET plots, which, in my experience, have been well favored over ROC curves for many years. Perhaps this was a decision based on book length, and again maybe I'm biased in this regard, anyway.
There are some careless typos in the book, e..g, the word "partiular" appears on page 51, "significiantly" appears on page 273, and there are similar errors scattered throughout the book. I don't understand how the authors did not find these with a simple spell checker. Did they typeset this in notepad?
Also, the github repo was pushed on June 17. This seems to be the biggest complaint people have about this book so far.
In summary, a very easy yet worthwhile read for the price. A very high-level view of data science if you're unfamiliar with the stuff. For someone with a solid STEM background this is a light read which can be completed in about two or three weeks of spare time. For someone entirely new to the field, it is certainly accessible and worth exploring if you might be interested in the area.