Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 1st Edition

92 customer reviews
Reference
ISBN-13: 978-1449319793
ISBN-10: 1449319793
Why is ISBN important?
ISBN
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
Sell yours for a Gift Card
We'll buy it for $10.44
Learn More
Trade in now
Have one to sell? Sell on Amazon
Buy used
$22.86
Buy new
$25.49
Rent from Amazon Price New from Used from
Kindle
"Please retry"
$8.36
Paperback, November 1, 2012
"Please retry"
$25.49
$21.96 $18.87
More Buying Choices
43 New from $21.96 24 Used from $18.87
Free Two-Day Shipping for College Students with Amazon Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student


InterDesign Brand Store Awareness Textbooks
$25.49 FREE Shipping on orders over $35. In Stock. Ships from and sold by Amazon.com. Gift-wrap available.

Frequently Bought Together

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython + Learning Python, 5th Edition
Price for both: $54.64

Buy the selected items together

Editorial Reviews

Book Description

Data Wrangling with Pandas, NumPy, and IPython

About the Author

Wes McKinney is the main author of pandas, the popular open sourcePython library for data analysis. Wes is an active speaker andparticipant in the Python and open source communities. He worked as aquantitative analyst at AQR Capital Management and Python consultantbefore founding DataPad, a data analytics company, in 2013. Hegraduated from MIT with an S.B. in Mathematics.

NO_CONTENT_IN_FEATURE

Best Books of the Month
Best Books of the Month
Want to know our Editors' picks for the best books of the month? Browse Best Books of the Month, featuring our favorite new books in more than a dozen categories.

Product Details

  • Paperback: 466 pages
  • Publisher: O'Reilly Media; 1 edition (November 1, 2012)
  • Language: English
  • ISBN-10: 1449319793
  • ISBN-13: 978-1449319793
  • Product Dimensions: 7 x 0.9 x 9.2 inches
  • Shipping Weight: 1.6 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (92 customer reviews)
  • Amazon Best Sellers Rank: #4,328 in Books (See Top 100 in Books)

From the Publisher

This is by no means a complete list. Even though it may not always be obvious, a large percentage of data sets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a data set into a structured form.

As an example, a collection of news articles could be processed into a word frequency table which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data.

What Is This Book About?

This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.

When I say 'data', what am I referring to exactly? The primary focus is on structured data, a deliberately vague term that encompasses many different common forms of data, such as:

  • Multidimensional arrays (matrices).
  • Tabular or spreadsheet-like data in which each column may be a different type (string, numeric, date, or otherwise). This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files.
  • Multiple tables of data interrelated by key columns (what would be primary or foreign keys for a SQL user).
  • Evenly or unevenly spaced time series.

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

Most Helpful Customer Reviews

86 of 92 people found the following review helpful By Jason Wirth on October 29, 2012
Format: Paperback
Python For Data Analysis is a book about tools. Python is an excellent general purpose language that has developed some niche applications, science being one of them due to some excellent libraries such as NumPy, SciPy, IPython, Matplotlib, and increasingly Pandas -- which Wes created. Collectively these tools form the basis of the "scientific computing stack" and are utilized by anyone who gets their hands dirty with data.

To steal from the book, Wes states, "This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is NOT (author's emphasis) an exposition on analytical methods using Python as the implementation language."

This is a book for any level of professional, researcher, or academic working with data. You could be a beginner who wants to get started, a professional coming from discipline rooted in another language like Matlab, or even someone seasoned in data-manipulation with Python who wants to get more work done in less time with greater ease.

While Pandas is the main focus of the book, sections dedicated to IPython (a shell for interactive execution) and NumPy (Matlab-like vectorized arrays) means there is something for everyone. For example, you might already use IPython, but not to its fullest potential. Wes shows how to be more efficient using the interactive debugger.

Amazon limits their ratings to 5-stars, but if I gave a star for every time I learned something new that made my analysis easier this book would be off the charts!
4 Comments Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
146 of 170 people found the following review helpful By R. Friesel Jr. on October 22, 2012
Format: Kindle Edition
Wes McKinney's "Python for Data Analysis" (O'Reilly, 2012) is a tour pandas and NumPy (mostly pandas) for folks looking to crunch "big-ish" data with Python. The target audience is not Pythonistas, but rather scientists, educators, statisticians, financial analysts, and the rest of the "non-programmer" cohort that is finding more and more these days that it needs to do a little bit-sifting to get the rest of their jobs done.

First, two warnings:

1. **This book is not an introduction to Python.** While McKinney does not assume that you know *any* Python, he isn't exactly going to hold your hand on the language here. There is an appendix ("Python Language Essentials") that beginners will want to read before getting too far, but otherwise you're on your own. ("Lucky for you Python is executable pseudocode"?)

2. **This book is not about theories of data analysis.** What I mean by that is: if you're looking for a book that is going to tell you the *types* of analyses to do, this is not that book. McKinney assumes that you already know, through your "actual" training, what kinds of analyses you need to perform on your data, and how to go about the computations necessary for those analyses.

That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), offering overviews of the utilities in these packages, and concrete examples on how to employ them to great effect. In examining these libraries, McKinney also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables).
Read more ›
7 Comments Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
94 of 109 people found the following review helpful By Richard C. Yeh on March 29, 2013
Format: Paperback Verified Purchase
I think this book is genuinely trying to be helpful, by giving an extended tutorial on the pandas library; but the tutorial covers only selected topics, and needs to be supplemented with a comprehensive function reference. The narrative also needs to be cut with the help of a strict editor.

If you are trying to decide whether to learn to use the pandas library, this book is for you. It starts with an example of how python and the pandas library can make it easy to do some basic analyses of data, and then develops more specialized chapters: summary statistics, data storage, data transformation (merging and joining), plotting, aggregation, time-series, special considerations for financial or economic data, advanced special topics.

Once I decided to use the pandas library, the book suddenly became less useful. The author has a verbose pedagogical style, and the book never departs from its tutorial perspective. Functions are introduced with examples but no definitions, and it's hard to find the rare summaries of functions, function arguments, or discussion suggesting when to use one method instead of another.

If you want to do something very close to what's done in an example, it's easy to follow along. Once you want to do something not emphasized or covered by an example, there is no guidance, no reference or dictionary section to give any hint about where I might search next --- google will probably direct you to stackoverflow.com, or the official pandas documentation site.

For example, suppose you have loaded your data into a DataFrame, and you want to use another column as the index. The book has several pages on the useful reindex() method, but that method is for resampling the data.
Read more ›
5 Comments Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again

Set up an Amazon Giveaway

Amazon Giveaway allows you to run promotional giveaways in order to create buzz, reward your audience, and attract new followers and customers. Learn more
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
This item: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Price: $25.49
Ships from and sold by Amazon.com