Data Science from Scratch: First Principles with Python 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
There is a newer edition of this item:
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Frequently bought together
From the Publisher
|Data Science for Business||Data Science from Scratch||Doing Data Science||R for Data Science||Data Science at the Command Line||Python Data Science Handbook|
|What You Need to Know about Data Mining and Data-Analytic Thinking||First Principles with Python||Straight Talk from the Frontline||Visualize, Model, Transform, Tidy, and Import Data||Facing the Future with Time-Tested Tools||Tools and Techniques for Developers|
About the Author
Joel Grus is a software engineer at Google. Before that he worked as a data scientist at multiple startups. He lives in Seattle, where he regularly attends data science happy hours. He blogs infrequently at joelgrus.com.
- Publisher : O'Reilly Media; 1st edition (April 30, 2015)
- Language : English
- Paperback : 330 pages
- ISBN-10 : 149190142X
- ISBN-13 : 978-1491901427
- Item Weight : 1.32 pounds
- Dimensions : 7.01 x 0.69 x 9.17 inches
- Best Sellers Rank: #262,432 in Books (See Top 100 in Books)
- Customer Reviews:
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
+1 for relevant topics.
-1 for lack of real life library usage.
-1 for lack of real world data.
^ 2/5 stars.
I don't normally leave reviews. I buy books, read them, and put them away. I decided to make an exception for this one.
Before I go any further I'll give the benefit of the doubt that perhaps I had a misunderstanding of what this book was about. I am a data scientist who works in cyber security. I have been through a good amount of data science boot camps, books, and online training. They all have one thing in common. For being about "data" they all have pretty useless and unrealistic examples. This book is no exception. (I laughed at a part where the author literally says something similar to "real life data is usually very messy" and continues to not use real data).
I will start out with the good parts about this book. It gives you a good crash course into Python and pretty much every critical data science concept. It is concise and filled with code examples written from scratch with little to no libraries being used (which is also a bad thing, I will explain shortly). The flow of the book is well designed as well.
Now for the bad.
1. Pure Python is great, but should be kept at a minimum. Sure, it gives you a good understanding of how to implement a concept in pure Python, but that is not the industry standard whatsoever. There should have been a healthy amount of real world implementations to offset the typical college classroom feel to the book. It also tends to add too much filler to the content. It would have been much better if it was "read resource X to see how to implement this in pure Python" and not "read book X to see how to use a real world library to do this."
2. The data in this book is like all the data everyone uses in their examples. Completely useless. Randomly generated numbers, endless usage of the "coin flip" probability examples, typical artificial data that, I promise you, nobody analyzes on a daily basis. The book starts off with you role playing as a new data scientist for a fictional social networking platform for Data Scientists. Which was very promising start and I was eager to see how this "character" would deal with the data problems they would face... Spoiler, barely ever spoke about it. Most examples are riddled with typical Statistic 101 and randomly generated data. Yet again, another disappointment on that end.
Maybe I misunderstood what this book was about. I could be wrong. That being said, I am now afraid to touch another "Data Science" book or online resource because I am sufficiently tired of reading about 300 ways to solve a problem with np.random generated arrays then turning to my screen with real world data and literally looking like the Persian Room Cat Guardian.
I am not sure if I recommend this or not. It is a good book in the sense that you learn about what Data Science "contains" but definitely not how it applies to the real world.
+ well written I found the author funny, clear, and thorough
- wish it dove deeper into libraries
- some pages were blank other pages had printing errors. Dozens of pages throughout the text are completely blank and it was the sections I was most interested in reading.
At first I was very worried about this book based on the first few chapters for the one reason that the author was cracking jokes throughout the text and I thought if it kept up for the rest of the book I was going to be very upset. But it did not happen and it turns out to have been a very reasonable way to ease into this complicated subject.
The author steps through the toolbox of the data scientist, chapter by chapter, giving useful, insightful, clear pieces of code and textual explanations of each topic. So, for those new to data science it gives just enough to get the basic idea of a concept in terms of code and mathematical explanation, and then moves on to the next topic.
It is often said that in writing, less is better and this book gets things down to their essence. That is one of the great things about the book - that the length of each chapter is about 20 pages (over 25 chapters). So each chapter can be read and the code even exercised in about an hour. Further, the references at the end of each chapter invite the reader to expanded information at the level of one or more entire textbooks or references. Thus the book can be seen as kind of boiling down a 25-volume set of highly technical subject matter into roughly 300 pages.
The topics that were explored the best seem to be the ones on probability, working with data, regression, clustering, and databases (SQL). Some of the small but dense code samples were tough to follow but that is based on their algorithmic complexity - such as that for logistical regression and MapReduce. Occasionally the author uses a term that is not defined or in the index (such as data munging - which I still haven't looked up to see what it means). There are only a small number of typos which indicates good editing. While the Python crash course was pretty good, Python is a vast language and there could have been more to that section.
I read this book from cover to cover and stepped through logically all the code (but did not actually run any of it) and I would wholeheartedly recommend this book for anyone wanting to work in the area of data science or its related fields, such as big data engineering or data analysis.
What is pretty underrated about this book is the very clever/clean (and challenging) way of constructing code and solving problems. Things can get very ....nested. Very valuable resource that I come back to again and again.
Top reviews from other countries
In terms of level of maths, the book is about building your own algorithms rather than using ones in a library, so while you'll be able to get through the book if you are not that good at maths, to get the most out of it you have to be comfortable with probability/statistics and some calculus ideas, or willing to put the time in to get there, but not really anything too intense. In terms of level of programming, I am probably considered a noob by people who consider themselves programmers, but not a noob by people who consider themselves non-computer scientists, and it was perfect for me.
* Excellent introduction to data structures throughout
* Useful practical considerations of the underlying algorithms
* Functional programming style which is unusual in this genre
* Lively style by Joel