Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Frequently bought together
About the Author
After previous careers in physics and softwaredevelopment, Philipp K. Janert currentlyprovides consulting services for data analysis,algorithm development, and mathematical modeling.He has worked for small start-ups and in largecorporate environments, both in the U.S. andoverseas. He prefers simple solutions that workto complicated ones that don't, and thinks thatpurpose is more important than process. Philippis the author of "Gnuplot in Action - UnderstandingData with Graphs" (Manning Publications), and haswritten for the O'Reilly Network, IBM developerWorks,and IEEE Software. He is named inventor on a handfulof patents, and is an occasional contributor to CPAN.He holds a Ph.D. in theoretical physics from theUniversity of Washington. Visit his company websiteat www.principal-value.com.
- Publisher : O'Reilly Media; 1st edition (December 14, 2010)
- Language: : English
- Paperback : 540 pages
- ISBN-10 : 0596802358
- ISBN-13 : 978-0596802356
- Item Weight : 1.88 pounds
- Dimensions : 7 x 1.4 x 9.19 inches
- Best Sellers Rank: #957,217 in Books (See Top 100 in Books)
- Customer Reviews:
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
From a theoretical physics background, I appreciate the book and the author a lot. The writer put a lot of effort in explaining the background on each topic from the perspective of someone who knows a bit about the topic but not in depth. People who are currently data scientists are from different technical background, and the text is a good introduction into the topics. Technical details are not overwhelming, which is good for people who can pick up the technicalities on their own through other books and the web.
If one is looking for the open source tools implementation, he is certainly disappointed. (The title of the book is unfortunately misleading.) If one is looking for technical details, this is not a good option for them. However, to gain the insights and the big picture, this is the best book.
The following chapters are well written:
- Chapter 2 (A Single Variable: Shape and Distribution): This brings people into the style of the book, some basics to data analysis and wrangling, and an introduction to NumPy.
- Chapter 8 (Models from Scaling Arguments): Mathematical modeling to data, something a lot of theorists doing!
- Chapter 9 (Arguments from Probability Models).
- Chapter 13 (Finding Clusters): Introduction to various clustering (unsupervised learning) techniques.
- Chapter 18 (Predictive Analytics): Something hot recently. This serves a good piece of introduction to the big picture because a lot of other books are overwhelming with the technical details that we often get lost when working with these tools.
Because the book offers a balance between explanation and demonstration it can be read in two different ways. First, you can read the chapters without getting involved with the code to get a better understanding of the whys and hows of the different analysis techniques. On the other hand, if you are more of a brass tacks person, you can focus on the code, run the examples, and just skim the explanations.
For those that are exploring the world of data analysis, this book is a great compliment to Segaran's Programming Collective Intelligence: Building Smart Web 2.0 Applications and Russell's Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites . Where the books overlap the explanations and examples differ which helps enormously when trying to master the concepts and techniques. However, each book contains topics not in the others. Collectively they offer a rather powerful set of tools.
Having read the other books prior to this one, I really appreciated the time spent on the mathematics behind each technique. The others get your hands dirty very quickly - and I appreciated that greatly when first exploring data mining - but I found myself wanting to have a deeper understanding which this book so nicely provides. As Janert mentions in the first chapter, the succinct notation of mathematics is much clearer than having to try to extract the essence of twenty lines of source code. Without a doubt, though, Data Analysis is dense which and that might turn a few people off.
All said and done, I'm glad I took the time to read the book and will definitely keep it nearby.
Top reviews from other countries
The author wants to make a point how important it is to understand the math behind real world problems, but I was disappointed by his attempts to convey mathematical principles. Formulas may work for some people, to me the book failed to point out why they are necessary - or how i can add value with them in the analyses i do. In this regards, the author overpays his dues to his academic background. I can see how the author studied physics and addresses people with like-wise framed minds. But for these people, the book will be too trivial. The major disappointment for me was that the book failed to live up to its expectations regarding the subtitle "with open Source tools". I would have expected a range of cool tools to work with, instead it's GNU and R, and there is not a single end-to-end case of getting the data, figuring out the issue and then presenting it in a graph. Sometimes, the style is too conversational, sometimes it is too strict and abstract. There are few moments when the two extremes touch. Other parts of the book - were the author shares his academic insights - felt awkward. The statement "You will never understand what mathematics is if you see it only as something you use to obtan certain results" will definitely find its way in my "Dictionary of Received Ideas".
Still after all this negative criticism, I am giving it an average 4 stars. Why? There were some conversational parts that are helpful. This happens especially when the author highlights pitfalls and real-world application on distribution laws and showing/interpreting graphical analysis (although he doesn't point out how it's done). I can put these ideas to use, and they are valuable, because they show the true expertise of the author and can serve as a guideline for people learning to get familiar with advanced statistical analysis. And I want to give credit to the broad scope of the book. I prefer this to textbooks that focus on one aspect only. Although the book is often too abstract, I appreciate the approach to cover many topics in 10-20 page essays.
There is not much formulas so if you want a book that gives you the mathematical basis this is not for you.
This book presents the different methods and the tools you can use to get them with some specific examples.
It is an applied book to get it on quickly.
The structure is done in a way that you can pick chapter you are interested in and come back later ofr other parts.
Definitely a good purchase.
Der Untertitel "with OpenSource Tools" ist zu ambitioniert, es werden zwar Beispiele mit Python, R etc angerissen, aber damit kommt man nicht weit.
Empfehlenswert mit Einschränkungen, man liest schon ziemlich lange daran und ein wirkliches Kompendium ist es dann aber doch nicht. Es sind eher die kleinen Fingerzeige wie das Kapitel über die Entstehungsgeschichte der klassischen Statistik und warum diese so ist wie sie ist und warum wir heute einfach andere Voraussetzungen haben (Computer, große Mengen an Messwerten).
The problem with this book though is that it is essentially a Python book. Nearly all the examples the author present are Python. There's code every other page or so, thus taking a considerable portion illustrating Python code. Nowhere in the descreption or the back cover does it say so. In this sense, the title is very misleading because open source should be much more than Python. There is very little R code, which is very disappointing for a book on data analysis. I was hoping and expecting to see a diverse wealth of programming examples and open source solutions for graphing, data conversion, export etc.
Finally, the author certainly shows some bizarre preferences when listing data analysis software.
I think this book would benefit pure programmers who want to do some superficial data analysis.
Un testo pregno di teoria ma con un occhio sempre orientato all'applicazione concreta della stessa.
Davvero ben scritto,ben organizzato e spesso utile!