Data Analysis with Open Source Tools and over one million other books are available for Amazon Kindle. Learn more



or
Sign in to turn on 1-Click ordering
Sell Us Your Item
For a $4.25 Gift Card
Trade in
More Buying Choices
Have one to sell? Sell yours here
Start reading Data Analysis with Open Source Tools on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Sorry, this item is not available in
Image not available for
Color:
Image not available

To view this video download Flash Player

 

Data Analysis with Open Source Tools [Paperback]

Philipp K. Janert
4.2 out of 5 stars  See all reviews (33 customer reviews)

List Price: $39.99
Price: $24.20 & FREE Shipping on orders over $25. Details
You Save: $15.79 (39%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In Stock.
Ships from and sold by Amazon.com. Gift-wrap available.
Want it Thursday, May 23? Choose One-Day Shipping at checkout. Details
Free Two-Day Shipping for College Students with Amazon Student

Formats

Amazon Price New from Used from
Kindle Edition $17.27  
Paperback $24.20  
Unknown Binding --  
Amazon.com Textbooks Store
Shop the Amazon.com Textbooks Store and save up to 70% on textbook rentals, 90% on used textbooks and 60% on eTextbooks.

Book Description

November 25, 2010 0596802358 978-0596802356 1
These days it seems like everyone is collecting data. But all of that data is just raw information -- to make that information meaningful, it has to be organized, filtered, and analyzed. Anyone can apply data analysis tools and get results, but without the right approach those results may be useless.

Author Philipp Janert teaches you how to think about data: how to effectively approach data analysis problems, and how to extract all of the available information from your data. Janert covers univariate data, data in multiple dimensions, time series data, graphical techniques, data mining, machine learning, and many other topics. He also reveals how seat-of-the-pants knowledge can lead you to the best approach right from the start, and how to assess results to determine if they're meaningful.


Frequently Bought Together

Data Analysis with Open Source Tools + Python for Data Analysis + Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
Price for all three: $75.26

Buy the selected items together


Editorial Reviews

About the Author

After previous careers in physics and software development, Philipp K. Janert currently provides consulting services for data analysis, algorithm development, and mathematical modeling. He has worked for small start-ups and in large corporate environments, both in the U.S. and overseas. He prefers simple solutions that work to complicated ones that don't, and thinks that purpose is more important than process. Philipp is the author of "Gnuplot in Action - Understanding Data with Graphs" (Manning Publications), and has written for the O'Reilly Network, IBM developerWorks, and IEEE Software. He is named inventor on a handful of patents, and is an occasional contributor to CPAN. He holds a Ph.D. in theoretical physics from the University of Washington. Visit his company website at www.principal-value.com.


Product Details

  • Paperback: 540 pages
  • Publisher: O'Reilly Media; 1 edition (November 25, 2010)
  • Language: English
  • ISBN-10: 0596802358
  • ISBN-13: 978-0596802356
  • Product Dimensions: 7 x 1.1 x 9.2 inches
  • Shipping Weight: 2.2 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (33 customer reviews)
  • Amazon Best Sellers Rank: #31,134 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

Most Helpful Customer Reviews
182 of 198 people found the following review helpful
2.0 out of 5 stars It falls short of initial expectations February 7, 2011
Format:Paperback
This book is aimed at offering a practical, hands-on introduction to data analysis for pragmatic readers without strong scientific or statistical background. Some basic programming experience is required. The author provides many personal (and sometimes useful) comments about different tools and procedures in data analysis.

However, a careful reading reveals many problems, specially an obscure presentation of key concepts. In my opinion, the target audience for this book would be people without previous contact with data analysis. Hence the importance of presenting its core elements correctly. Otherwise, it's useless for them.

In particular:

- Few pages are actually dedicated to present open source tools supporting the different graphs and techniques included in the book. From the title, I expected a more complete tour through available open source tools for data analysis.

- No clues about how to obtain most of the graphs and results presented in the book. No related data sets are available for download, either. A book like this is useless if we cannot learn how to replicate all the examples.

- The formula of the variance for a sample is just wrong. One must divide by n-1 and not n; see "Applied Statistics and Probability for Engineers" (Montgomery and Runger 2006).

- The author presents one of the most obscure explanations for the median I've ever come across. Recurring to an RFC (RFC 2330) to explain such a simple concept is really awkward.

- In chapter 3 and Appendix B, natural logarithms (base e) are presented in the text, while graphs plot powers of 10. Definitely, not the right way to transmit correct concepts and methods.

- I concur with a previous review in that "Workshop" sections just present an ultra-short overview of some open source tools. A quick search in your favourite engine will display much more informative introductions (even quick start guides).

- Today, effective data analysis heavily depends on using the best possible implementation. While I might find educational to learn some of this implementations, in a real situation it is much better to rely on precise implementations of algorithms already available (e.g. libraries in GNU R).

All in all, I still recommend "R in a Nutshell" for a gentle introduction to data analysis with an open source tool (GNU R). It also has some inaccuracies and typos, but at least it's much more informative and clear. Besides, it does include an R package with all datasets and examples, ready to be installed and explored.
Was this review helpful to you?
37 of 37 people found the following review helpful
3.0 out of 5 stars Full of insight, light on details April 16, 2011
Format:Paperback
This book covers such a wide range of topics that it necessarily skims over all of them but it always hits all the major points that an introductory survey should. Each chapter has a straight forward tone, strikes the right balance between developing mathematical rigor and developing an intuitive understanding of data , and undeniably passes on the lessons of hard earned, real world experience. But a reader who is actually working on a real data problem will almost certainly come to the realization that the understanding gained is somewhat superficial - that it's going to take a lot more heavy reading (probably of books, papers, and software tools recommended in this book) to get any real work done!

The single biggest problem with this book is its misleading title. This book is not going to teach you how to use open source software to analyze data. There is only minimal information about how one would actually use the software tools being discussed. What you get is a brief commentary about what the author thinks each software package is good for. It's the same story as with the mathematical details: you will not find them here, but this book will give you an excellent idea of what to look for. So in the end it does leave you feeling just a little bit cheated, even though all the advice you got seems extremely well informed.

What this book does astonishingly well is communicate an attitude to data analysis that most textbooks (and nearly all the college courses I took) seem to miss. Nearly every chapter is a stream of stunningly insightful observations on how to approach data, without the mathematical detail that overwhelms most practicing programmers. I would recommend it to any reader who understands that truly useful insights are hard to come by, but detailed algorithms and formulae are easily found in the Internet Age. I wish the book were a few hundred pages shorter, that it corrected a few sloppy mistakes (like confusing revenue and profit), but I'm certainly glad I read it.
Comment | 
Was this review helpful to you?
38 of 41 people found the following review helpful
Format:Paperback
The book is very good for the intermediate-to-advanced data analysts. Beginners beware: there are some important prerequisites that are not obvious before you buy it, and there are some organization problems.

First, the prerequisites. "I strongly recommend that you make it a habit to avoid all statistical language"..."Once we start talking about standard deviations, the clarity is gone." These are two sentences in the same passage from the Preface. The rest of that passage is similar. However, even the first chapters make heavy use of statistical language. Moreover, they assume that you already know statistics to the level of density estimation, noise, splines, and regression. Page 21 even features a footnote about the Fourier transform and Fourier convolution theorem. Clearly this book is not for the statistically-shy or for mathematically-shy in general, no matter what the Preface suggests. You also need to know Python and R.

Second, the chapter organization problems. There's a mismatch between the first part of each chapter, which introduces concepts and techniques, and the Workshop part of the same chapter, which uses software. I was expecting the Workshop to illustrate the implementation of the same concepts and techniques. It's not really so. The Workshop introduces Python and R facilities at a different (lower) speed than the rest of the chapter. One could even wonder why the Workshop is in the same chapter. I'd rather that each chapter consisted of a few detailed case studies that first introduce concepts and techniques and then illustrate them with software libraries.
Was this review helpful to you?
Most Recent Customer Reviews
5.0 out of 5 stars Awesome intro to Bayesian reasoning as well as a very pragmatic guide...
Given a strong foundation in mathematics that I've rarely ever used since college 5 years ago, this book was great because:
1. Read more
Published 1 month ago by Eric Czech
5.0 out of 5 stars Fantastic resource for fast-track but in depth learning of data...
I recommend this book to these who are interested in learning serious statistic and computational data analysis methods. Read more
Published 3 months ago by Edmon Begoli
5.0 out of 5 stars Janert's work is a good book for better managing businesses by the...
Mr. Janert's exceptionally wide-ranging personal knowledge of how to apply math to business problems is the real strength of this book. Read more
Published 4 months ago by Geoffrey R. Anderson
5.0 out of 5 stars Excellent resource for folks new to Data Analysis
This book is unusual for OReilly. Not because it isn't good, but because it isn't a resource about an open source software project. Read more
Published 6 months ago by Geoffrey Clements
5.0 out of 5 stars Problem Oriented Approach
I've had some statistcs courses in Uni(descriptive, predictive and Discriminatory) but even after those there was much to learn with this book. Read more
Published 6 months ago by Fabio
5.0 out of 5 stars Very Insightful
I took a gamble on this, because some of the other reviews were less that inspiring. I read that the author rambled a lot without giving enough code. Read more
Published 6 months ago by connor
5.0 out of 5 stars A reference book that explains in detail the many ways to make sense...
Summary: Data Analysis with Open Source Tools is a reference book that explains in detail the many ways to make sense of data. Read more
Published 9 months ago by Boanerges Aleman-Meza
2.0 out of 5 stars Wrong enough to hurt
While I'm not an expert in all the areas covered in this book, I am in a few. In those areas, this book is really wrong -- actually doing damage wrong. Read more
Published 9 months ago by T. Carroll
2.0 out of 5 stars Prerequisites not specified
This book covers alot of ground, but even though the author states that no prior mathematical knowledge is necessary, I felt that this was not the case. Read more
Published 12 months ago by Amazon_Customer_X
5.0 out of 5 stars Excellent Book, For What It Is
I'm a Python software developer with an interest in applied statistics. This is an excellent book on data analysis, but for review purposes, it's worth initially pointing out what... Read more
Published 13 months ago by Jim McGaw
Search Customer Reviews
Only search this product's reviews


Forums

There are no discussions about this product yet.
Be the first to discuss this product with the community.
Start a new discussion
Topic:
First post:
Prompts for sign-in
 



So You'd Like to...


Create a guide


Look for Similar Items by Category