Learn more
1.76 mi | Ashburn 20147
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the author
OK
Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists 1st Edition
Purchase options and add-ons
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.
Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.
- Use graphics to describe data with one, two, or dozens of variables
- Develop conceptual models using back-of-the-envelope calculations, as well as scaling and probability arguments
- Mine data with computationally intensive methods such as simulation and clustering
- Make your conclusions understandable through reports, dashboards, and other metrics programs
- Understand financial calculations, including the time-value of money
- Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
- Become familiar with different open source programming environments for data analysis
"Finally, a concise reference for understanding how to conquer piles of data." --Austin King, Senior Web Developer, Mozilla
"An indispensable text for aspiring data scientists." --Michael E. Driscoll, CEO/Founder, Dataspora
- ISBN-100596802358
- ISBN-13978-0596802356
- Edition1st
- PublisherO'Reilly Media
- Publication dateDecember 28, 2010
- LanguageEnglish
- Dimensions7 x 1.4 x 9.19 inches
- Print length530 pages
Frequently bought together

Products related to this item
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book provides clear explanations of topics and concepts. They appreciate the practical and usable content, as well as the instructive approach and insights into how to think about data. The design is described as stunning and unexpected. Overall, customers consider the book a valuable resource that adds value and makes it unique.
AI-generated from the text of customer reviews
Customers find the book provides clear explanations and useful examples of data analysis topics. They say it's a good introduction to the concepts and gives them more insights into thinking about data. The book covers a wide range of topics in a balanced way, providing a refresher on forgotten concepts.
"...The author covers many topics quickly, and not any one in depth, so if one chapter interests you, I'd plan on consulting other resources on..." Read more
"...And it is extremely well written. The writing is in reasonably simple English, relative to the topic, yet not insulting or goofy the way the "..." Read more
"...As Janert mentions in the first chapter, the succinct notation of mathematics is much clearer than having to try to extract the essence of twenty..." Read more
"...This serves a good piece of introduction to the big picture because a lot of other books are overwhelming with the technical details that we often..." Read more
Customers appreciate the book's design. They find it stunning and unexpected, highlighting the content. Readers describe it as insightful.
"...data mining and machine learning techniques, and passing through notions of design...." Read more
"Lucid, learned, and full of insights--a great book on a difficult subject. When I pre-ordered this title, I expected it to be more cookbook-oriented...." Read more
"Stunning! And unexpected..." Read more
Customers appreciate the book's value for money. They find it unique and worth the effort.
"...However, it does not decrease its value. Instead, it adds value and makes it unqiue...." Read more
"...new math concepts, but the value provided makes this book well worth the effort." Read more
"...good feel for how to successfully present it, this is will be worth its weight in gold to you...." Read more
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
- Reviewed in the United States on April 19, 2012I'm a Python software developer with an interest in applied statistics. This is an excellent book on data analysis, but for review purposes, it's worth initially pointing out what this book is not.
It is not a comprehensive survey of open source tools that are available, and it does not contain many examples of working code to implement the techniques he talks about, though there are some. For this reason, I'd strike the "with Open Source Tools" from the title in evaluating whether you want to purchase the book.
The author greatly favors mathematical notation over code examples in describing the data analysis techniques he presents. While this is not a bad thing per se, you'll have to struggle to comprehend the content if you're a programmer without an academic familiarity with math, or if you've been away from mathematics for a long time.
As other reviewers have pointed out, the organization of the content is somewhat disjointed. Going from chapter to chapter, there is little in the way of causality, and the early chapters are pretty math-heavy. The reader is advised to consult appendices at the back of the book to refresh themselves on the basics, if required.
Wait! I didn't say you shouldn't buy it.
Despite a few shortcomings, this book does offer a good introduction and overview of several basic techniques. It's an excellent survey of the current data analysis landscape for anyone who's not familiar with it. If a topic seems irrelevant to you, it's pretty easy to skip that chapter and move forward.
On top of that, the author's writing style and ways of explaining relatively esoteric concepts is generally very good. As with many good books, you get the sense the author is a co-worker, trying to explain something to you in terms you can understand. It's very example-based, even if those examples don't always involve code.
All in all, to get the most out of this book, the best approach is careful and methodical study. The author covers many topics quickly, and not any one in depth, so if one chapter interests you, I'd plan on consulting other resources on particular topics. Luckily, the author does offer several "Further Reading" recommendations for each topic.
Most books containing information on these techniques are far harder to read, and they generally cost at least twice as much. Highly recommended. Thanks for this one, Philipp.
- Reviewed in the United States on January 7, 2011I bought this book hoping for a reference on open source tools. But the open source tools are a minor aspect of this book. The core is about data analysis--and it is fantastic. I should have known this from the title I suppose: the "data analysis" is in big font with a colorful background, and "with open source tools" is in small font--and it is literally about the same ratio with the book. Each chapter has small section that works one example with an open source tool. And there is a chapter at the end about the array of open source tools available.
But the data analysis aspects of the book--most outstanding. I have a master's in computer science, and do data and analytics for a living, so I have many books on the topic. Some books with more of a theoretical and rigorous foundation, some with more of a hands-on slant. I was expecting this book to be the latter, but it is quite the former.
Yet it is still very practical. It is not a "theory" work as such, just a rigorous book useful in practice (there is a big difference!). Throughout the book the author points out the value of solving the problem at hand, rather than being excessively precise--which is the bigger risk in this domain. Examples would be: using visuals to get a feel for data but not trying to use visuals to give precise answers (which they fundementally cannot), and using techniques that get "close enough" such as perturbation.
And it is extremely well written. The writing is in reasonably simple English, relative to the topic, yet not insulting or goofy the way the "Dummies" series can be for example. It is easy to read yet content rich--a fantastic combination.
- Reviewed in the United States on June 3, 2011I've read some of the other reviews, and I do agree with most of the criticisms. There are quite a few errors in formulas and in the text, and it would've been really nice if the source codes and data files were given in a CD or were available on a website.
That being said, the book addresses a lot of different topics - ranging from the introductory, freshman-level statistics to more advanced data mining and machine learning techniques, and passing through notions of design. It doesn't go in depth into each of them, but offers a fairly good overview, and references in case you're interested. Furthermore, the author gives some useful hints on how to do outside-the-box thinking and how to apply these techniques into business.
Being a physics grad student, I've found many of the topics pretty much basic, but even so, I've learned a lot. Overall, a great introduction; I really hope the flaws are corrected on a future 2nd edition.
Top reviews from other countries
bkReviewed in Canada on July 28, 20185.0 out of 5 stars The book provides very good math and stats foundation without diving into the code
The book provides very good math and stats foundation without diving into the code. I liked the methodology behind the way the book is structured.
-
Valerio MaggioReviewed in Italy on June 30, 20135.0 out of 5 stars Ottimo
Un libro da avere per chi (come me ) si occupa di data analysis con strumenti open source ( spesso sviluppati per l'occasione).
Un testo pregno di teoria ma con un occhio sempre orientato all'applicazione concreta della stessa.
Davvero ben scritto,ben organizzato e spesso utile!
Consigliatissimo!
Dr. Stefan BöckerReviewed in Germany on November 24, 20125.0 out of 5 stars Extraordinary
This is the book you want, if you try to get quickly into scientific programming and visualization with Python and R! I strongly reccommend this book!
Thomas BornheimReviewed in the United Kingdom on January 17, 20124.0 out of 5 stars Mixed opinion
I have to agree with a lot of the US reviews. I am missing a focus in the book.
The author wants to make a point how important it is to understand the math behind real world problems, but I was disappointed by his attempts to convey mathematical principles. Formulas may work for some people, to me the book failed to point out why they are necessary - or how i can add value with them in the analyses i do. In this regards, the author overpays his dues to his academic background. I can see how the author studied physics and addresses people with like-wise framed minds. But for these people, the book will be too trivial. The major disappointment for me was that the book failed to live up to its expectations regarding the subtitle "with open Source tools". I would have expected a range of cool tools to work with, instead it's GNU and R, and there is not a single end-to-end case of getting the data, figuring out the issue and then presenting it in a graph. Sometimes, the style is too conversational, sometimes it is too strict and abstract. There are few moments when the two extremes touch. Other parts of the book - were the author shares his academic insights - felt awkward. The statement "You will never understand what mathematics is if you see it only as something you use to obtan certain results" will definitely find its way in my "Dictionary of Received Ideas".
Still after all this negative criticism, I am giving it an average 4 stars. Why? There were some conversational parts that are helpful. This happens especially when the author highlights pitfalls and real-world application on distribution laws and showing/interpreting graphical analysis (although he doesn't point out how it's done). I can put these ideas to use, and they are valuable, because they show the true expertise of the author and can serve as a guideline for people learning to get familiar with advanced statistical analysis. And I want to give credit to the broad scope of the book. I prefer this to textbooks that focus on one aspect only. Although the book is often too abstract, I appreciate the approach to cover many topics in 10-20 page essays.
FrancoisReviewed in Japan on May 23, 20135.0 out of 5 stars A gold mine
Every person involved in any computational science
should have read this book and always keep it
at arm's reach.


