Data Science from Scratch: First Principles with Python 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
There is a newer edition of this item:
Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know how to dig those answers out.
- Get a crash course in Python
- Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science
- Collect, explore, clean, munge, and manipulate data
- Dive into the fundamentals of machine learning
- Implement models such as k nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering
- Explore recommender systems, natural language processing, network analysis, Map Reduce, and databases.
Frequently bought together

- +
- +
Customers who viewed this item also viewed
From the brand
-
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
Editorial Reviews
About the Author
Joel Grus is a software engineer at Google. Before that he worked as a data scientist at multiple startups. He lives in Seattle, where he regularly attends data science happy hours. He blogs infrequently at joelgrus.com.
I'd like to read this book on Kindle
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Product details
- Publisher : O'Reilly Media; 1st edition (April 30, 2015)
- Language : English
- Paperback : 330 pages
- ISBN-10 : 149190142X
- ISBN-13 : 978-1491901427
- Item Weight : 1.32 pounds
- Dimensions : 7.01 x 0.69 x 9.17 inches
- Best Sellers Rank: #388,480 in Books (See Top 100 in Books)
- #76 in Computer Programming Structured Design
- #129 in Enterprise Data Computing
- #245 in Data Modeling & Design (Books)
- Customer Reviews:
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonTop reviews from the United States
There was a problem filtering reviews right now. Please try again later.
+1 for relevant topics.
-1 for lack of real life library usage.
-1 for lack of real world data.
-1 NaN
^ 2/5 stars.
I don't normally leave reviews. I buy books, read them, and put them away. I decided to make an exception for this one.
Before I go any further I'll give the benefit of the doubt that perhaps I had a misunderstanding of what this book was about. I am a data scientist who works in cyber security. I have been through a good amount of data science boot camps, books, and online training. They all have one thing in common. For being about "data" they all have pretty useless and unrealistic examples. This book is no exception. (I laughed at a part where the author literally says something similar to "real life data is usually very messy" and continues to not use real data).
I will start out with the good parts about this book. It gives you a good crash course into Python and pretty much every critical data science concept. It is concise and filled with code examples written from scratch with little to no libraries being used (which is also a bad thing, I will explain shortly). The flow of the book is well designed as well.
Now for the bad.
1. Pure Python is great, but should be kept at a minimum. Sure, it gives you a good understanding of how to implement a concept in pure Python, but that is not the industry standard whatsoever. There should have been a healthy amount of real world implementations to offset the typical college classroom feel to the book. It also tends to add too much filler to the content. It would have been much better if it was "read resource X to see how to implement this in pure Python" and not "read book X to see how to use a real world library to do this."
2. The data in this book is like all the data everyone uses in their examples. Completely useless. Randomly generated numbers, endless usage of the "coin flip" probability examples, typical artificial data that, I promise you, nobody analyzes on a daily basis. The book starts off with you role playing as a new data scientist for a fictional social networking platform for Data Scientists. Which was very promising start and I was eager to see how this "character" would deal with the data problems they would face... Spoiler, barely ever spoke about it. Most examples are riddled with typical Statistic 101 and randomly generated data. Yet again, another disappointment on that end.
Maybe I misunderstood what this book was about. I could be wrong. That being said, I am now afraid to touch another "Data Science" book or online resource because I am sufficiently tired of reading about 300 ways to solve a problem with np.random generated arrays then turning to my screen with real world data and literally looking like the Persian Room Cat Guardian.
I am not sure if I recommend this or not. It is a good book in the sense that you learn about what Data Science "contains" but definitely not how it applies to the real world.
+ well written I found the author funny, clear, and thorough
- wish it dove deeper into libraries
- some pages were blank other pages had printing errors. Dozens of pages throughout the text are completely blank and it was the sections I was most interested in reading.
At first I was very worried about this book based on the first few chapters for the one reason that the author was cracking jokes throughout the text and I thought if it kept up for the rest of the book I was going to be very upset. But it did not happen and it turns out to have been a very reasonable way to ease into this complicated subject.
The author steps through the toolbox of the data scientist, chapter by chapter, giving useful, insightful, clear pieces of code and textual explanations of each topic. So, for those new to data science it gives just enough to get the basic idea of a concept in terms of code and mathematical explanation, and then moves on to the next topic.
It is often said that in writing, less is better and this book gets things down to their essence. That is one of the great things about the book - that the length of each chapter is about 20 pages (over 25 chapters). So each chapter can be read and the code even exercised in about an hour. Further, the references at the end of each chapter invite the reader to expanded information at the level of one or more entire textbooks or references. Thus the book can be seen as kind of boiling down a 25-volume set of highly technical subject matter into roughly 300 pages.
The topics that were explored the best seem to be the ones on probability, working with data, regression, clustering, and databases (SQL). Some of the small but dense code samples were tough to follow but that is based on their algorithmic complexity - such as that for logistical regression and MapReduce. Occasionally the author uses a term that is not defined or in the index (such as data munging - which I still haven't looked up to see what it means). There are only a small number of typos which indicates good editing. While the Python crash course was pretty good, Python is a vast language and there could have been more to that section.
I read this book from cover to cover and stepped through logically all the code (but did not actually run any of it) and I would wholeheartedly recommend this book for anyone wanting to work in the area of data science or its related fields, such as big data engineering or data analysis.
Top reviews from other countries
In terms of level of maths, the book is about building your own algorithms rather than using ones in a library, so while you'll be able to get through the book if you are not that good at maths, to get the most out of it you have to be comfortable with probability/statistics and some calculus ideas, or willing to put the time in to get there, but not really anything too intense. In terms of level of programming, I am probably considered a noob by people who consider themselves programmers, but not a noob by people who consider themselves non-computer scientists, and it was perfect for me.
* Excellent introduction to data structures throughout
* Useful practical considerations of the underlying algorithms
* Functional programming style which is unusual in this genre
* Lively style by Joel









