Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See all 2 images
Have one to sell? Sell on Amazon

Data Analysis with Open Source Tools Paperback – November 25, 2010

ISBN-13: 978-0596802356 ISBN-10: 0596802358 Edition: 1st
$15.63 - $15.64
Buy used
Buy new
Used & new from other sellers Delivery options vary per offer
74 used & new from $8.99
Rent from Amazon Price New from Used from
"Please retry"
Paperback, November 25, 2010
"Please retry"
$18.00 $8.99
Free Two-Day Shipping for College Students with Amazon Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student

Hero Quick Promo
Save up to 90% on Textbooks
Rent textbooks, buy textbooks, or get up to 80% back when you sell us your books. Shop Now
$26.69 FREE Shipping on orders over $35. In Stock. Ships from and sold by Amazon.com. Gift-wrap available.

Frequently Bought Together

Data Analysis with Open Source Tools + Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython + Doing Data Science: Straight Talk from the Frontline
Price for all three: $84.19

Buy the selected items together

Editorial Reviews

Book Description

A hands-on guide for programmers and data scientists

About the Author

After previous careers in physics and softwaredevelopment, Philipp K. Janert currentlyprovides consulting services for data analysis,algorithm development, and mathematical modeling.He has worked for small start-ups and in largecorporate environments, both in the U.S. andoverseas. He prefers simple solutions that workto complicated ones that don't, and thinks thatpurpose is more important than process. Philippis the author of "Gnuplot in Action - UnderstandingData with Graphs" (Manning Publications), and haswritten for the O'Reilly Network, IBM developerWorks,and IEEE Software. He is named inventor on a handfulof patents, and is an occasional contributor to CPAN.He holds a Ph.D. in theoretical physics from theUniversity of Washington. Visit his company websiteat www.principal-value.com.


Shop the new tech.book(store)
New! Introducing the tech.book(store), a hub for Software Developers and Architects, Networking Administrators, TPMs, and other technology professionals to find highly-rated and highly-relevant career resources. Shop books on programming and big data, or read this week's blog posts by authors and thought-leaders in the tech industry. > Shop now

Product Details

  • Paperback: 540 pages
  • Publisher: O'Reilly Media; 1 edition (November 28, 2010)
  • Language: English
  • ISBN-10: 0596802358
  • ISBN-13: 978-0596802356
  • Product Dimensions: 7 x 1.4 x 9.2 inches
  • Shipping Weight: 2.2 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (42 customer reviews)
  • Amazon Best Sellers Rank: #160,953 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

4.2 out of 5 stars

Most Helpful Customer Reviews

41 of 41 people found the following review helpful By Code Monkey on April 16, 2011
Format: Paperback
This book covers such a wide range of topics that it necessarily skims over all of them but it always hits all the major points that an introductory survey should. Each chapter has a straight forward tone, strikes the right balance between developing mathematical rigor and developing an intuitive understanding of data , and undeniably passes on the lessons of hard earned, real world experience. But a reader who is actually working on a real data problem will almost certainly come to the realization that the understanding gained is somewhat superficial - that it's going to take a lot more heavy reading (probably of books, papers, and software tools recommended in this book) to get any real work done!

The single biggest problem with this book is its misleading title. This book is not going to teach you how to use open source software to analyze data. There is only minimal information about how one would actually use the software tools being discussed. What you get is a brief commentary about what the author thinks each software package is good for. It's the same story as with the mathematical details: you will not find them here, but this book will give you an excellent idea of what to look for. So in the end it does leave you feeling just a little bit cheated, even though all the advice you got seems extremely well informed.

What this book does astonishingly well is communicate an attitude to data analysis that most textbooks (and nearly all the college courses I took) seem to miss. Nearly every chapter is a stream of stunningly insightful observations on how to approach data, without the mathematical detail that overwhelms most practicing programmers.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
209 of 231 people found the following review helpful By J. Felipe Ortega Soto on February 7, 2011
Format: Paperback
This book is aimed at offering a practical, hands-on introduction to data analysis for pragmatic readers without strong scientific or statistical background. Some basic programming experience is required. The author provides many personal (and sometimes useful) comments about different tools and procedures in data analysis.

However, a careful reading reveals many problems, specially an obscure presentation of key concepts. In my opinion, the target audience for this book would be people without previous contact with data analysis. Hence the importance of presenting its core elements correctly. Otherwise, it's useless for them.

In particular:

- Few pages are actually dedicated to present open source tools supporting the different graphs and techniques included in the book. From the title, I expected a more complete tour through available open source tools for data analysis.

- No clues about how to obtain most of the graphs and results presented in the book. No related data sets are available for download, either. A book like this is useless if we cannot learn how to replicate all the examples.

- The formula of the variance for a sample is just wrong. One must divide by n-1 and not n; see "Applied Statistics and Probability for Engineers" (Montgomery and Runger 2006).

- The author presents one of the most obscure explanations for the median I've ever come across. Recurring to an RFC (RFC 2330) to explain such a simple concept is really awkward.

- In chapter 3 and Appendix B, natural logarithms (base e) are presented in the text, while graphs plot powers of 10. Definitely, not the right way to transmit correct concepts and methods.
Read more ›
11 Comments Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
48 of 51 people found the following review helpful By Jack Sparrow on January 27, 2011
Format: Paperback
The book is very good for the intermediate-to-advanced data analysts. Beginners beware: there are some important prerequisites that are not obvious before you buy it, and there are some organization problems.

First, the prerequisites. "I strongly recommend that you make it a habit to avoid all statistical language"..."Once we start talking about standard deviations, the clarity is gone." These are two sentences in the same passage from the Preface. The rest of that passage is similar. However, even the first chapters make heavy use of statistical language. Moreover, they assume that you already know statistics to the level of density estimation, noise, splines, and regression. Page 21 even features a footnote about the Fourier transform and Fourier convolution theorem. Clearly this book is not for the statistically-shy or for mathematically-shy in general, no matter what the Preface suggests. You also need to know Python and R.

Second, the chapter organization problems. There's a mismatch between the first part of each chapter, which introduces concepts and techniques, and the Workshop part of the same chapter, which uses software. I was expecting the Workshop to illustrate the implementation of the same concepts and techniques. It's not really so. The Workshop introduces Python and R facilities at a different (lower) speed than the rest of the chapter. One could even wonder why the Workshop is in the same chapter. I'd rather that each chapter consisted of a few detailed case studies that first introduce concepts and techniques and then illustrate them with software libraries.
1 Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again

Most Recent Customer Reviews

Set up an Amazon Giveaway

Amazon Giveaway allows you to run promotional giveaways in order to create buzz, reward your audience, and attract new followers and customers. Learn more
Data Analysis with Open Source Tools
This item: Data Analysis with Open Source Tools
Price: $26.69
Ships from and sold by Amazon.com

Want to discover more products? Check out these pages to see more: datum tools, beginning python visualization: crafting visual transformation