- Paperback: 540 pages
- Publisher: O'Reilly Media; 1 edition (November 28, 2010)
- Language: English
- ISBN-10: 0596802358
- ISBN-13: 978-0596802356
- Product Dimensions: 7 x 1.4 x 9.2 inches
- Shipping Weight: 2.2 pounds (View shipping rates and policies)
- Average Customer Review: 45 customer reviews
- Amazon Best Sellers Rank: #439,968 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
All Books, All the Time
Read author interviews, book reviews, editors picks, and more at the Amazon Book Review. Read it now
Frequently bought together
Customers who bought this item also bought
A hands-on guide for programmers and data scientists
About the Author
After previous careers in physics and softwaredevelopment, Philipp K. Janert currentlyprovides consulting services for data analysis,algorithm development, and mathematical modeling.He has worked for small start-ups and in largecorporate environments, both in the U.S. andoverseas. He prefers simple solutions that workto complicated ones that don't, and thinks thatpurpose is more important than process. Philippis the author of "Gnuplot in Action - UnderstandingData with Graphs" (Manning Publications), and haswritten for the O'Reilly Network, IBM developerWorks,and IEEE Software. He is named inventor on a handfulof patents, and is an occasional contributor to CPAN.He holds a Ph.D. in theoretical physics from theUniversity of Washington. Visit his company websiteat www.principal-value.com.
Top customer reviews
Because the book offers a balance between explanation and demonstration it can be read in two different ways. First, you can read the chapters without getting involved with the code to get a better understanding of the whys and hows of the different analysis techniques. On the other hand, if you are more of a brass tacks person, you can focus on the code, run the examples, and just skim the explanations.
For those that are exploring the world of data analysis, this book is a great compliment to Segaran's Programming Collective Intelligence: Building Smart Web 2.0 Applications and Russell's Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites. Where the books overlap the explanations and examples differ which helps enormously when trying to master the concepts and techniques. However, each book contains topics not in the others. Collectively they offer a rather powerful set of tools.
Having read the other books prior to this one, I really appreciated the time spent on the mathematics behind each technique. The others get your hands dirty very quickly - and I appreciated that greatly when first exploring data mining - but I found myself wanting to have a deeper understanding which this book so nicely provides. As Janert mentions in the first chapter, the succinct notation of mathematics is much clearer than having to try to extract the essence of twenty lines of source code. Without a doubt, though, Data Analysis is dense which and that might turn a few people off.
All said and done, I'm glad I took the time to read the book and will definitely keep it nearby.
From a theoretical physics background, I appreciate the book and the author a lot. The writer put a lot of effort in explaining the background on each topic from the perspective of someone who knows a bit about the topic but not in depth. People who are currently data scientists are from different technical background, and the text is a good introduction into the topics. Technical details are not overwhelming, which is good for people who can pick up the technicalities on their own through other books and the web.
If one is looking for the open source tools implementation, he is certainly disappointed. (The title of the book is unfortunately misleading.) If one is looking for technical details, this is not a good option for them. However, to gain the insights and the big picture, this is the best book.
The following chapters are well written:
- Chapter 2 (A Single Variable: Shape and Distribution): This brings people into the style of the book, some basics to data analysis and wrangling, and an introduction to NumPy.
- Chapter 8 (Models from Scaling Arguments): Mathematical modeling to data, something a lot of theorists doing!
- Chapter 9 (Arguments from Probability Models).
- Chapter 13 (Finding Clusters): Introduction to various clustering (unsupervised learning) techniques.
- Chapter 18 (Predictive Analytics): Something hot recently. This serves a good piece of introduction to the big picture because a lot of other books are overwhelming with the technical details that we often get lost when working with these tools.
That being said, the book addresses a lot of different topics - ranging from the introductory, freshman-level statistics to more advanced data mining and machine learning techniques, and passing through notions of design. It doesn't go in depth into each of them, but offers a fairly good overview, and references in case you're interested. Furthermore, the author gives some useful hints on how to do outside-the-box thinking and how to apply these techniques into business.
Being a physics grad student, I've found many of the topics pretty much basic, but even so, I've learned a lot. Overall, a great introduction; I really hope the flaws are corrected on a future 2nd edition.