Programming Books C Java PHP Python Learn more Browse Programming Books

Sorry, this item is not available in
Image not available for
Image not available

To view this video download Flash Player


Sign in to turn on 1-Click ordering
Sell Us Your Item
For a $8.35 Gift Card
Trade in
Kindle Edition
Read instantly on your iPad, PC, Mac, Android tablet or Kindle Fire
Buy Price: $17.27
Rent From: $8.36
More Buying Choices
Have one to sell? Sell yours here

Data Analysis with Open Source Tools [Paperback]

Philipp K. Janert
4.2 out of 5 stars  See all reviews (38 customer reviews)

Buy New
$26.69 & FREE Shipping on orders over $35. Details
In Stock.
Ships from and sold by Gift-wrap available.
Want it Monday, July 14? Choose One-Day Shipping at checkout. Details
Free Two-Day Shipping for College Students with Amazon Student


Amazon Price New from Used from
Kindle Edition
Rent from
Paperback $26.69  
Coupon Book Save 30% with this coupon
Clip this coupon to save 30% on this product when you buy from Details
Clip this coupon

See more coupons

Book Description

November 25, 2010 0596802358 978-0596802356 1
These days it seems like everyone is collecting data. But all of that data is just raw information -- to make that information meaningful, it has to be organized, filtered, and analyzed. Anyone can apply data analysis tools and get results, but without the right approach those results may be useless.

Author Philipp Janert teaches you how to think about data: how to effectively approach data analysis problems, and how to extract all of the available information from your data. Janert covers univariate data, data in multiple dimensions, time series data, graphical techniques, data mining, machine learning, and many other topics. He also reveals how seat-of-the-pants knowledge can lead you to the best approach right from the start, and how to assess results to determine if they're meaningful.

Special Offers and Product Promotions

  • Clip this coupon to save 30% on this product when you buy from Here's how (restrictions apply)

Frequently Bought Together

Data Analysis with Open Source Tools + Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Price for both: $51.45

Buy the selected items together

Editorial Reviews

Book Description

A hands-on guide for programmers and data scientists

About the Author

After previous careers in physics and softwaredevelopment, Philipp K. Janert currentlyprovides consulting services for data analysis,algorithm development, and mathematical modeling.He has worked for small start-ups and in largecorporate environments, both in the U.S. andoverseas. He prefers simple solutions that workto complicated ones that don't, and thinks thatpurpose is more important than process. Philippis the author of "Gnuplot in Action - UnderstandingData with Graphs" (Manning Publications), and haswritten for the O'Reilly Network, IBM developerWorks,and IEEE Software. He is named inventor on a handfulof patents, and is an occasional contributor to CPAN.He holds a Ph.D. in theoretical physics from theUniversity of Washington. Visit his company websiteat

Product Details

  • Paperback: 540 pages
  • Publisher: O'Reilly Media; 1 edition (November 25, 2010)
  • Language: English
  • ISBN-10: 0596802358
  • ISBN-13: 978-0596802356
  • Product Dimensions: 9.2 x 7 x 1.4 inches
  • Shipping Weight: 2.2 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (38 customer reviews)
  • Amazon Best Sellers Rank: #257,460 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

Most Helpful Customer Reviews
202 of 221 people found the following review helpful
2.0 out of 5 stars It falls short of initial expectations February 7, 2011
This book is aimed at offering a practical, hands-on introduction to data analysis for pragmatic readers without strong scientific or statistical background. Some basic programming experience is required. The author provides many personal (and sometimes useful) comments about different tools and procedures in data analysis.

However, a careful reading reveals many problems, specially an obscure presentation of key concepts. In my opinion, the target audience for this book would be people without previous contact with data analysis. Hence the importance of presenting its core elements correctly. Otherwise, it's useless for them.

In particular:

- Few pages are actually dedicated to present open source tools supporting the different graphs and techniques included in the book. From the title, I expected a more complete tour through available open source tools for data analysis.

- No clues about how to obtain most of the graphs and results presented in the book. No related data sets are available for download, either. A book like this is useless if we cannot learn how to replicate all the examples.

- The formula of the variance for a sample is just wrong. One must divide by n-1 and not n; see "Applied Statistics and Probability for Engineers" (Montgomery and Runger 2006).

- The author presents one of the most obscure explanations for the median I've ever come across. Recurring to an RFC (RFC 2330) to explain such a simple concept is really awkward.

- In chapter 3 and Appendix B, natural logarithms (base e) are presented in the text, while graphs plot powers of 10. Definitely, not the right way to transmit correct concepts and methods.
Read more ›
Was this review helpful to you?
39 of 39 people found the following review helpful
3.0 out of 5 stars Full of insight, light on details April 16, 2011
This book covers such a wide range of topics that it necessarily skims over all of them but it always hits all the major points that an introductory survey should. Each chapter has a straight forward tone, strikes the right balance between developing mathematical rigor and developing an intuitive understanding of data , and undeniably passes on the lessons of hard earned, real world experience. But a reader who is actually working on a real data problem will almost certainly come to the realization that the understanding gained is somewhat superficial - that it's going to take a lot more heavy reading (probably of books, papers, and software tools recommended in this book) to get any real work done!

The single biggest problem with this book is its misleading title. This book is not going to teach you how to use open source software to analyze data. There is only minimal information about how one would actually use the software tools being discussed. What you get is a brief commentary about what the author thinks each software package is good for. It's the same story as with the mathematical details: you will not find them here, but this book will give you an excellent idea of what to look for. So in the end it does leave you feeling just a little bit cheated, even though all the advice you got seems extremely well informed.

What this book does astonishingly well is communicate an attitude to data analysis that most textbooks (and nearly all the college courses I took) seem to miss. Nearly every chapter is a stream of stunningly insightful observations on how to approach data, without the mathematical detail that overwhelms most practicing programmers.
Read more ›
Comment | 
Was this review helpful to you?
42 of 45 people found the following review helpful
The book is very good for the intermediate-to-advanced data analysts. Beginners beware: there are some important prerequisites that are not obvious before you buy it, and there are some organization problems.

First, the prerequisites. "I strongly recommend that you make it a habit to avoid all statistical language"..."Once we start talking about standard deviations, the clarity is gone." These are two sentences in the same passage from the Preface. The rest of that passage is similar. However, even the first chapters make heavy use of statistical language. Moreover, they assume that you already know statistics to the level of density estimation, noise, splines, and regression. Page 21 even features a footnote about the Fourier transform and Fourier convolution theorem. Clearly this book is not for the statistically-shy or for mathematically-shy in general, no matter what the Preface suggests. You also need to know Python and R.

Second, the chapter organization problems. There's a mismatch between the first part of each chapter, which introduces concepts and techniques, and the Workshop part of the same chapter, which uses software. I was expecting the Workshop to illustrate the implementation of the same concepts and techniques. It's not really so. The Workshop introduces Python and R facilities at a different (lower) speed than the rest of the chapter. One could even wonder why the Workshop is in the same chapter. I'd rather that each chapter consisted of a few detailed case studies that first introduce concepts and techniques and then illustrate them with software libraries.
Was this review helpful to you?
Most Recent Customer Reviews
5.0 out of 5 stars better than expected
I only read the first two chapters and browse the rest. It explains the topics very well, and the content is practical and usable
Published 1 month ago by joe
5.0 out of 5 stars Excellent, author shares wisdom from actually doing this.
I found this book to be a very good introduction and it gave me a few ideas to take into work.
Published 6 months ago by printer123
3.0 out of 5 stars MatLab is not Open Source; Octave is.
I'm confused: skimming the Table of Cntents, the book says "Open Source Tools" then prominently mentions MatLab (an expensive commercial product) and fails to mention... Read more
Published 7 months ago by Carol
5.0 out of 5 stars Intuitive! Good for Big-Picture Readers
I love this book on data analysis, but I do understand not everybody likes this style.

From a theoretical physics background, I appreciate the book and the author a lot. Read more
Published 13 months ago by Kwan-yuet Ho
5.0 out of 5 stars A gold mine
Every person involved in any computational science
should have read this book and always keep it
at arm's reach. Read more
Published 13 months ago by HappyCrow
5.0 out of 5 stars Awesome intro to Bayesian reasoning as well as a very pragmatic guide...
Given a strong foundation in mathematics that I've rarely ever used since college 5 years ago, this book was great because:
1. Read more
Published 14 months ago by Eric Czech
5.0 out of 5 stars Fantastic resource for fast-track but in depth learning of data...
I recommend this book to these who are interested in learning serious statistic and computational data analysis methods. Read more
Published 16 months ago by Edmon Begoli
5.0 out of 5 stars Janert's work is a good book for better managing businesses by the...
Mr. Janert's exceptionally wide-ranging personal knowledge of how to apply math to business problems is the real strength of this book. Read more
Published 17 months ago by Geoffrey R. Anderson
5.0 out of 5 stars Excellent resource for folks new to Data Analysis
This book is unusual for OReilly. Not because it isn't good, but because it isn't a resource about an open source software project. Read more
Published 20 months ago by Geoffrey Clements
5.0 out of 5 stars Problem Oriented Approach
I've had some statistcs courses in Uni(descriptive, predictive and Discriminatory) but even after those there was much to learn with this book. Read more
Published 20 months ago by Fabio
Search Customer Reviews
Search these reviews only


There are no discussions about this product yet.
Be the first to discuss this product with the community.
Start a new discussion
First post:
Prompts for sign-in

Look for Similar Items by Category