- Paperback: 544 pages
- Publisher: O'Reilly Media; 2 edition (October 20, 2017)
- Language: English
- ISBN-10: 1491957662
- ISBN-13: 978-1491957660
- Product Dimensions: 7 x 1.1 x 9.2 inches
- Shipping Weight: 1.8 pounds (View shipping rates and policies)
- Average Customer Review: 17 customer reviews
- Amazon Best Sellers Rank: #4,555 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition
Use the Amazon App to scan ISBNs and compare prices.
Frequently bought together
Customers who bought this item also bought
Customers who viewed this item also viewed
From the Publisher
What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While 'data analysis' is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis.
New for the Second Edition
The first edition of this book was published in 2012, during a time when open source data analysis libraries for Python (such as pandas) were very new and developing rapidly. In this updated and expanded second edition, I have overhauled the chapters to account both for incompatible changes and deprecations as well as new features that have occurred in the last five years. I’ve also added fresh content to introduce tools that either did not exist in 2012 or had not matured enough to make the first cut. Finally, I have tried to avoid writing about new or cutting-edge open source projects that may not have had a chance to mature. I would like readers of this edition to find that the content is still almost as relevant in 2020 or 2021 as it is in 2017.
The major updates in this second edition include:
- All code, including the Python tutorial, updated for Python 3.6 (the first edition used Python 2.7)
- Updated Python installation instructions for the Anaconda Python Distribution and other needed Python packages
- Updates for the latest versions of the pandas library in 2017
- A new chapter on some more advanced pandas tools, and some other usage tips
- A brief introduction to using statsmodels and scikit-learn
- I also reorganized a significant portion of the content from the first edition to make the book more accessible to newcomers.
About the Author
Wes McKinney is a New York?based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He's now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.
Wes was later the co-founder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2016, he joined Two Sigma Investments in New York City, where he continues working to make data analysis faster and easier through open source software.
Author interviews, book reviews, editors picks, and more. Read it now
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Once you get past the introductory point and you need to get into data analysis, this book is just what the doctor ordered, or at least it offers plenty of useful information and insight into tools that you would use for your own data analysis.
I'm lucky enough to be working with the second edition of this book which means that the author has taken the opportunity to update this book in multiple places compared to the original edition. This gives us all the benefit of the lessons learned and feedback that had been provided to the author and also allows us all to get insight on the latest libraries and tools that were known and available when this book was updated.
"This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While 'data analysis' is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis."
It is a massive revision of the 1st Edition – practically a rewrite. The emphasis is on Python 3.6, rather than 2.7 in the earlier edition.s
The book presumes operating knowledge of Python and some of its libraries.
The author describes his effort as “an introductory text in working with [primarily structured] data in Python”. That is, data that could be expressed in a spreadsheet (though the author’s description is substantially broader than my summary).
The treatment is detailed, but well-written and well-illustrated. The level of detail can be fairly described as staggering. While the book does not promise to teach you data analysis itself – which is an almost limitless subject – but rather about the tools with which you can analyze data.
Sample datasets are available for download to perform the exercises.
If you’re new to data analysis, I would suggest at least skimming the entire book to get an overview of what the libraries can do and then honing in on specific libraries.
Overall, I think this is the go-to resource for learning how to use Python’s data wrangling libraries in conjunction with Python.
We’ll written and generally doesn’t get into minutiae. Very useful.