- Paperback: 466 pages
- Publisher: O'Reilly Media; 1 edition (November 1, 2012)
- Language: English
- ISBN-10: 1449319793
- ISBN-13: 978-1449319793
- Product Dimensions: 7 x 0.9 x 9.2 inches
- Shipping Weight: 1.8 pounds
- Average Customer Review: 157 customer reviews
- Amazon Best Sellers Rank: #49,643 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
There is a newer edition of this item:
"Enlightenment Now: The Case for Reason, Science, Humanism, and Progress"
Is the world really falling apart? Is the ideal of progress obsolete? Cognitive scientist and public intellectual Steven Pinker urges us to step back from the gory headlines and prophecies of doom, and instead, follow the data: In seventy-five jaw-dropping graphs, Pinker shows that life, health, prosperity, safety, peace, knowledge, and happiness are on the rise. Learn more
Customers who bought this item also bought
Customers who viewed this item also viewed
What other items do customers buy after viewing this item?
From the Publisher
This is by no means a complete list. Even though it may not always be obvious, a large percentage of data sets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a data set into a structured form.
As an example, a collection of news articles could be processed into a word frequency table which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data.
What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
When I say 'data', what am I referring to exactly? The primary focus is on structured data, a deliberately vague term that encompasses many different common forms of data, such as:
- Multidimensional arrays (matrices).
- Tabular or spreadsheet-like data in which each column may be a different type (string, numeric, date, or otherwise). This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files.
- Multiple tables of data interrelated by key columns (what would be primary or foreign keys for a SQL user).
- Evenly or unevenly spaced time series.
Data Wrangling with Pandas, NumPy, and IPython
About the Author
Wes McKinney is the main author of pandas, the popular open sourcePython library for data analysis. Wes is an active speaker andparticipant in the Python and open source communities. He worked as aquantitative analyst at AQR Capital Management and Python consultantbefore founding DataPad, a data analytics company, in 2013. Hegraduated from MIT with an S.B. in Mathematics.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
In particular, see sections: Tutorials, Intro to Data Structures - Series and DataFrame, and Essential Basic Functionality.
The remaining 1/4 of the book had very useful concentrated intro to NumPy, Advanced NumPy, and Python Essentials reference. This book does not cover the newer development of R function calls from Python. In my opinion, R is winning the R vs Pandas argument due to ggplot2 and statistical learning professors publishing code first in R. Since R is now easy to use from within Python, Pandas might not get as much use. But it's still useful to know how to use Pandas as part of a data analyst's toolkit.
I also want to warn buyers about faint printing on several physical copies of this book. I bought from Amazon AND directly from O'Reilly Media in trying to get a physical book that had good, solid printing on all pages. This was not possible. The physical book from O'Reilly had even fainter/worse printing than the version I got from Amazon. Better to save your money and just get with the eBook version if you are OK with that, which you can usually find cheaper online. O'Reilly puts on excellent conferences, but may be getting out of the printed book business. I guess most programmers buy eBooks now. I just find eBooks difficult to deal with when it comes to dense, technical books. I am fine with eBooks for fiction or more narrative non-fiction such as economics, popular science, or history.
Someone just buying the book now should be aware that the book is a bit old at this point, so it may not completely reflect the most current versions of the libraries covered and it doesn't cover some of the more recently developed libraries that people might be interested in. For instance, there's coverage of matplotlib but no coverage of seaborn or some of the other visualization libraries that people have begun to use more recently.