- Paperback: 548 pages
- Publisher: O'Reilly Media; 1 edition (December 10, 2016)
- Language: English
- ISBN-10: 1491912057
- ISBN-13: 978-1491912058
- Product Dimensions: 7 x 1.2 x 10 inches
- Shipping Weight: 1.8 pounds (View shipping rates and policies)
- Average Customer Review: 30 customer reviews
- Amazon Best Sellers Rank: #10,607 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Python Data Science Handbook: Essential Tools for Working with Data 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Frequently bought together
Customers who bought this item also bought
Customers who viewed this item also viewed
From the Publisher
Data Science for Business
Data Science from Scratch
Doing Data Science
R for Data Science
Data Science at the Command Line
Python Data Science Handbook
|What You Need to Know about Data Mining and Data-Analytic Thinking||First Principles with Python||Straight Talk from the Frontline||Visualize, Model, Transform, Tidy, and Import Data||Facing the Future with Time-Tested Tools||Tools and Techniques for Developers|
About the Author
Jake VanderPlas is a long-time user and developer of the Python scientific stack. He currently works as an interdisciplinary research director at the University of Washington, conducts his own astronomy research, and spends time advising and consulting with local scientists from a wide range of fields.
Author interviews, book reviews, editors picks, and more. Read it now
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Apart from that major oversight, the book is ok. If you want to learn data science, this is not for you; it doesn't get into the fundamentals much at all. If you are an experienced R user looking for how to translate into python, this will get you started. The rest of my review comes from this perspective.
The book spends far too much time on low-level ipython, numpy, and matplotlib functionality (chapters 1, 2, and 4). You are rarely going to use this stuff.
The pandas section (chapter 3) is fine, but I was a little disappointed in the treatment of the grouping/aggregation functions. The book mentions the split-apply-combine paradigm of Hadley Wickham, but doesn't cover the topic in nearly as much detail as the paper of the same name. I was hoping to learn how to translate the dplyr verbs (group_by, filter, select, mutate, summarize, arrange) into pandas, but this book doesn't provide that. You will learn the basics of grouping and aggregation, but your code is going to be a lot more verbose than it was in R.
The machine learning case studies in chapter 5 are pretty nice - probably the only reason I would recommend this book. The chapter provides a good overview of the scikit-learn API and effective patterns for machine learning problems.
There is no one book for data science, and this one is no exception. Just keep that in mind before buying it.
Other than that, I am really happy with my purchase.
P.S. For those complaining about black and white graphs and diagrams - check the author's GitHub.
Secondly this book can't decide if it is a reference or a tutorial. The author gets a bit carried away showing too many features, I often found myself nodding off or losing my concentration. With so many online references, it might be better to concentrate on being a tutorial and not try to show so many features. Or perhaps separate each chapter into a tutorial and then a reference. On the other hand I realize that some readers might want this extra depth, so I'm just saying what I personally would have preferred. A related problem is that the material can quickly go out of date, I already found some options to be deprecated when running code.
Thirdly, I question some of the organization of material, he often introduces some aspect, doesn't explain it properly, and then returns to it later on to explain it in more depth. An example is the Scikit-learn pipeline object, he starts using this leaving me puzzled and only later returned to explain it. This kind of issue was relatively common.
In conclusion this is a decent book and certainly not a bad book, but more suited for particular audiences. This book would be good for those looking for a reference and relatively detailed information on a particular topic.
Not so suited for beginners, I think they would be confused, or overwhelmed. People with some experience who are looking for more of a tutorial could be bored.
It is worth noting as well that there are so many python data science books, but nearly all of them not very good. Relative to the other books, this is probably one of the best. In contrast there are a number of excellent books that use R.