Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your mobile phone number.

Data Analysis with Open Source Tools 1st Edition

4.2 out of 5 stars 42 customer reviews
ISBN-13: 978-0596802356
ISBN-10: 0596802358
Why is ISBN important?
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
Have one to sell? Sell on Amazon
Rent On clicking this link, a new layer will be open
$10.68 On clicking this link, a new layer will be open
Buy new On clicking this link, a new layer will be open
$26.69 On clicking this link, a new layer will be open
More Buying Choices
48 New from $14.90 40 Used from $4.96
Free Two-Day Shipping for College Students with Prime Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student

ITPro.TV Video Training
Take advantage of IT courses online anywhere, anytime with ITPro.TV. Learn more.
$26.69 FREE Shipping. In Stock. Ships from and sold by Amazon.com. Gift-wrap available.
click to open popover

Frequently Bought Together

  • Data Analysis with Open Source Tools
  • +
  • Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
  • +
  • Data Science from Scratch: First Principles with Python
Total price: $87.87
Buy the selected items together

Editorial Reviews

Book Description

A hands-on guide for programmers and data scientists

About the Author

After previous careers in physics and softwaredevelopment, Philipp K. Janert currentlyprovides consulting services for data analysis,algorithm development, and mathematical modeling.He has worked for small start-ups and in largecorporate environments, both in the U.S. andoverseas. He prefers simple solutions that workto complicated ones that don't, and thinks thatpurpose is more important than process. Philippis the author of "Gnuplot in Action - UnderstandingData with Graphs" (Manning Publications), and haswritten for the O'Reilly Network, IBM developerWorks,and IEEE Software. He is named inventor on a handfulof patents, and is an occasional contributor to CPAN.He holds a Ph.D. in theoretical physics from theUniversity of Washington. Visit his company websiteat www.principal-value.com.

The latest book club pick from Oprah
"The Underground Railroad" by Colson Whitehead is a magnificent novel chronicling a young slave's adventures as she makes a desperate bid for freedom in the antebellum South. See more

Product Details

  • Paperback: 540 pages
  • Publisher: O'Reilly Media; 1 edition (November 28, 2010)
  • Language: English
  • ISBN-10: 0596802358
  • ISBN-13: 978-0596802356
  • Product Dimensions: 7 x 1.4 x 9.2 inches
  • Shipping Weight: 2.2 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (42 customer reviews)
  • Amazon Best Sellers Rank: #217,770 in Books (See Top 100 in Books)

Customer Reviews

Top Customer Reviews

By J. Felipe Ortega Soto on February 7, 2011
Format: Paperback
This book is aimed at offering a practical, hands-on introduction to data analysis for pragmatic readers without strong scientific or statistical background. Some basic programming experience is required. The author provides many personal (and sometimes useful) comments about different tools and procedures in data analysis.

However, a careful reading reveals many problems, specially an obscure presentation of key concepts. In my opinion, the target audience for this book would be people without previous contact with data analysis. Hence the importance of presenting its core elements correctly. Otherwise, it's useless for them.

In particular:

- Few pages are actually dedicated to present open source tools supporting the different graphs and techniques included in the book. From the title, I expected a more complete tour through available open source tools for data analysis.

- No clues about how to obtain most of the graphs and results presented in the book. No related data sets are available for download, either. A book like this is useless if we cannot learn how to replicate all the examples.

- The formula of the variance for a sample is just wrong. One must divide by n-1 and not n; see "Applied Statistics and Probability for Engineers" (Montgomery and Runger 2006).

- The author presents one of the most obscure explanations for the median I've ever come across. Recurring to an RFC (RFC 2330) to explain such a simple concept is really awkward.

- In chapter 3 and Appendix B, natural logarithms (base e) are presented in the text, while graphs plot powers of 10. Definitely, not the right way to transmit correct concepts and methods.
Read more ›
11 Comments 211 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
This book covers such a wide range of topics that it necessarily skims over all of them but it always hits all the major points that an introductory survey should. Each chapter has a straight forward tone, strikes the right balance between developing mathematical rigor and developing an intuitive understanding of data , and undeniably passes on the lessons of hard earned, real world experience. But a reader who is actually working on a real data problem will almost certainly come to the realization that the understanding gained is somewhat superficial - that it's going to take a lot more heavy reading (probably of books, papers, and software tools recommended in this book) to get any real work done!

The single biggest problem with this book is its misleading title. This book is not going to teach you how to use open source software to analyze data. There is only minimal information about how one would actually use the software tools being discussed. What you get is a brief commentary about what the author thinks each software package is good for. It's the same story as with the mathematical details: you will not find them here, but this book will give you an excellent idea of what to look for. So in the end it does leave you feeling just a little bit cheated, even though all the advice you got seems extremely well informed.

What this book does astonishingly well is communicate an attitude to data analysis that most textbooks (and nearly all the college courses I took) seem to miss. Nearly every chapter is a stream of stunningly insightful observations on how to approach data, without the mathematical detail that overwhelms most practicing programmers.
Read more ›
Comment 44 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
The book is very good for the intermediate-to-advanced data analysts. Beginners beware: there are some important prerequisites that are not obvious before you buy it, and there are some organization problems.

First, the prerequisites. "I strongly recommend that you make it a habit to avoid all statistical language"..."Once we start talking about standard deviations, the clarity is gone." These are two sentences in the same passage from the Preface. The rest of that passage is similar. However, even the first chapters make heavy use of statistical language. Moreover, they assume that you already know statistics to the level of density estimation, noise, splines, and regression. Page 21 even features a footnote about the Fourier transform and Fourier convolution theorem. Clearly this book is not for the statistically-shy or for mathematically-shy in general, no matter what the Preface suggests. You also need to know Python and R.

Second, the chapter organization problems. There's a mismatch between the first part of each chapter, which introduces concepts and techniques, and the Workshop part of the same chapter, which uses software. I was expecting the Workshop to illustrate the implementation of the same concepts and techniques. It's not really so. The Workshop introduces Python and R facilities at a different (lower) speed than the rest of the chapter. One could even wonder why the Workshop is in the same chapter. I'd rather that each chapter consisted of a few detailed case studies that first introduce concepts and techniques and then illustrate them with software libraries.
1 Comment 50 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse

Most Recent Customer Reviews

Set up an Amazon Giveaway

Data Analysis with Open Source Tools
Amazon Giveaway allows you to run promotional giveaways in order to create buzz, reward your audience, and attract new followers and customers. Learn more about Amazon Giveaway
This item: Data Analysis with Open Source Tools