- Paperback: 548 pages
- Publisher: O'Reilly Media; 1 edition (December 10, 2016)
- Language: English
- ISBN-10: 1491912057
- ISBN-13: 978-1491912058
- Product Dimensions: 7 x 1 x 9.1 inches
- Shipping Weight: 1.8 pounds (View shipping rates and policies)
- Average Customer Review: 26 customer reviews
- Amazon Best Sellers Rank: #25,180 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Python Data Science Handbook: Essential Tools for Working with Data 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Frequently bought together
Customers who bought this item also bought
From the Publisher
|Data Science for Business||Data Science from Scratch||Doing Data Science||R for Data Science||Data Science at the Command Line||Python Data Science Handbook|
|What You Need to Know about Data Mining and Data-Analytic Thinking||First Principles with Python||Straight Talk from the Frontline||Visualize, Model, Transform, Tidy, and Import Data||Facing the Future with Time-Tested Tools||Tools and Techniques for Developers|
About the Author
Jake VanderPlas is a long-time user and developer of the Python scientific stack. He currently works as an interdisciplinary research director at the University of Washington, conducts his own astronomy research, and spends time advising and consulting with local scientists from a wide range of fields.
Browse award-winning titles. See more
Top customer reviews
Apart from that major oversight, the book is ok. If you want to learn data science, this is not for you; it doesn't get into the fundamentals much at all. If you are an experienced R user looking for how to translate into python, this will get you started. The rest of my review comes from this perspective.
The book spends far too much time on low-level ipython, numpy, and matplotlib functionality (chapters 1, 2, and 4). You are rarely going to use this stuff.
The pandas section (chapter 3) is fine, but I was a little disappointed in the treatment of the grouping/aggregation functions. The book mentions the split-apply-combine paradigm of Hadley Wickham, but doesn't cover the topic in nearly as much detail as the paper of the same name. I was hoping to learn how to translate the dplyr verbs (group_by, filter, select, mutate, summarize, arrange) into pandas, but this book doesn't provide that. You will learn the basics of grouping and aggregation, but your code is going to be a lot more verbose than it was in R.
The machine learning case studies in chapter 5 are pretty nice - probably the only reason I would recommend this book. The chapter provides a good overview of the scikit-learn API and effective patterns for machine learning problems.
I came to this book thinking it was a book that TAUGHT data science with Python and I foolishly ignored that title clearly has the word "HANDBOOK" in it and it is just a series of examples that show Python's Data Science capabilities for those who are well versed in Python and Data Science. There is little to NO explanation of any of the concepts or examples and the book simply just runs through example after example with little commentary on each in any meaningful way. I am on page 64 and will most likely not continue working with this book. Do not get this book if you are a beginner Python user like my looking to take your skills to the next level, this book will not help with that. Also, there are NO exercises for you to work with on your own to apply the concepts discussed throughout the book.
Secondly this book can't decide if it is a reference or a tutorial. The author gets a bit carried away showing too many features, I often found myself nodding off or losing my concentration. With so many online references, it might be better to concentrate on being a tutorial and not try to show so many features. Or perhaps separate each chapter into a tutorial and then a reference. On the other hand I realize that some readers might want this extra depth, so I'm just saying what I personally would have preferred. A related problem is that the material can quickly go out of date, I already found some options to be deprecated when running code.
Thirdly, I question some of the organization of material, he often introduces some aspect, doesn't explain it properly, and then returns to it later on to explain it in more depth. An example is the Scikit-learn pipeline object, he starts using this leaving me puzzled and only later returned to explain it. This kind of issue was relatively common.
In conclusion this is a decent book and certainly not a bad book, but more suited for particular audiences. This book would be good for those looking for a reference and relatively detailed information on a particular topic.
Not so suited for beginners, I think they would be confused, or overwhelmed. People with some experience who are looking for more of a tutorial could be bored.
It is worth noting as well that there are so many python data science books, but nearly all of them not very good. Relative to the other books, this is probably one of the best. In contrast there are a number of excellent books that use R.