Python Data Science Handbook: Essential Tools for Working with Data 1st Edition
| Jake VanderPlas (Author) Find all the books, read about the author, and more. See search results for this author |
Use the Amazon App to scan ISBNs and compare prices.
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.
With this handbook, you’ll learn how to use:
- IPython and Jupyter: provide computational environments for data scientists using Python
- NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python
- Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python
- Matplotlib: includes capabilities for a flexible range of data visualizations in Python
- Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms
Frequently bought together

- +
- +
Customers who viewed this item also viewed
From the brand
-
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
Editorial Reviews
About the Author
Product details
- Publisher : O'Reilly Media; 1st edition (January 3, 2017)
- Language : English
- Paperback : 546 pages
- ISBN-10 : 1491912057
- ISBN-13 : 978-1491912058
- Item Weight : 2.03 pounds
- Dimensions : 7 x 1.11 x 9.19 inches
- Best Sellers Rank: #19,831 in Books (See Top 100 in Books)
- #8 in Scientific Research
- #10 in Data Processing
- #10 in Data Modeling & Design (Books)
- Customer Reviews:
About the author

Discover more of the author’s books, see similar authors, read author blogs and more
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonReviewed in the United States on September 6, 2021
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
Apart from that major oversight, the book is ok. If you want to learn data science, this is not for you; it doesn't get into the fundamentals much at all. If you are an experienced R user looking for how to translate into python, this will get you started. The rest of my review comes from this perspective.
The book spends far too much time on low-level ipython, numpy, and matplotlib functionality (chapters 1, 2, and 4). You are rarely going to use this stuff.
The pandas section (chapter 3) is fine, but I was a little disappointed in the treatment of the grouping/aggregation functions. The book mentions the split-apply-combine paradigm of Hadley Wickham, but doesn't cover the topic in nearly as much detail as the paper of the same name. I was hoping to learn how to translate the dplyr verbs (group_by, filter, select, mutate, summarize, arrange) into pandas, but this book doesn't provide that. You will learn the basics of grouping and aggregation, but your code is going to be a lot more verbose than it was in R.
The machine learning case studies in chapter 5 are pretty nice - probably the only reason I would recommend this book. The chapter provides a good overview of the scikit-learn API and effective patterns for machine learning problems.
There is no one book for data science, and this one is no exception. Just keep that in mind before buying it.
Other than that, I am really happy with my purchase.
P.S. For those complaining about black and white graphs and diagrams - check the author's GitHub.
I have used it extensively for the intro to ML at Berkeley and for now the book belongs to my short list of desk reference books.
Top reviews from other countries
That said, some pieces are inconsistent with each other and the order of the material may not be the best for a novice programmer. If you need to learn how to program this isn't the book. If you are a good programmer and want to understand the ML ecosystem in Python then get this.
The book provides a well written introduction into the history of different functions, python-related projects and the main aspects of lists, arrays and visualizations under Pandas, NumPy, Matplotlib, Seaborn. Dataframes are only covered rather superficially and while plotting from arrays is extensively covered, plotting from dataframes, deriving data from dataframes etc is only vaguely covered.
I find it quite annoying that the book NEVER formally states the Syntax of a specific command or variable and NEVER gives a full list of available arguments for a given command, function or variable. Therefore the book‘s utility as a reference is just plainly ZERO.
Furthermore, there is NO MENTION, simply NOTHING, as if the topic would not exist, about anything related to INTERACTIVE visualisation functions.
If you are looking for:
- mouse hover functions
- hoverlabel
- hoverlabel layout
- hoverlabel data
- using hovertemplate
- hovertemplate arguments
- spike lines
- shared spike lines
- mouse click events
- stacked dynamic subplots
- shared axes for stacked subplots
>> it‘s not there, no mention, NOTHING.
Further topics I had to find to be covered much more coherently on Stackoverflow.com:
- How to read different types of datafiles into an lists, arrays, dataframes, tuples of dataframes
- Howto adjust decimal precision in arrays, dataframes, plots
- Howto adjust decimal precision in hoverlabel
- Howto create multiple dataframes from a loop
- Howto plot multiple subplots from a loop
- How to plot scatter plot/ line graph and data tables into one grid
- Howto select data points in plots with mouse events
>> Therefore, I did not find the book very helpful. Wirh regards to its publication date 2016, it‘s just outdated.












