- Paperback: 432 pages
- Publisher: Wiley; 1 edition (November 4, 2013)
- Language: English
- ISBN-10: 111866146X
- ISBN-13: 978-1118661468
- Product Dimensions: 7.3 x 0.8 x 9.1 inches
- Shipping Weight: 1.4 pounds (View shipping rates and policies)
- Average Customer Review: 125 customer reviews
- Amazon Best Sellers Rank: #8,519 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Data Smart: Using Data Science to Transform Information into Insight 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
All Books, All the Time
Read author interviews, book reviews, editors picks, and more at the Amazon Book Review. Read it now
Frequently bought together
Customers who bought this item also bought
From the Back Cover
"Data Smart makes modern statistic methods and algorithms understandable and easy to implement. Slogging through textbooks and academic papers is no longer required!"
—Patrick Crosby, Founder of StatHat & first CTO at OkCupid
"When Mr. Foreman interviewed for a job at my company, he arrived dressed in a 'Kentucky Colonel' kind of suit and spoke about nonsensical things like barbecue, lasers, and orange juice pulp. Then, he explained how to de-mystify and solve just about any complex 'big data' problem in our company with simple spreadsheets. No server clusters, mainframes, or Hadoop-a-ma-jigs. Just Excel. I hired him on the spot. After reading this book, you too will learn how to use math and basic spreadsheet formulas to improve your business or, at the very least, how to trick senior executives into hiring you as their data scientist."
—Ben Chestnut, Founder & CEO of MailChimp
"You need a John Foreman on your analytics team. But if you can't have John, then reading this book is the next best thing."
—Patrick Lennon, Director of Analytics, The Coca-Cola Company
Most people are approaching data science all wrong. Here's how to do it right.
Not to disillusion you, but data scientists are not mystical practitioners of magical arts. Data science is something you can do. Really. This book shows you the significant data science techniques, how they work, how to use them, and how they benefit your business, large or small. It's not about coding or database technologies. It's about turning raw data into insight you can act upon, and doing it as quickly and painlessly as possible.
Roll up your sleeves and let's get going.
Relax — it's just a spreadsheet
Visit the companion website at www.wiley.com/go/datasmart to download spreadsheets for each chapter, and follow them as you learn about:
- Artificial intelligence using the general linear model, ensemble methods, and naive Bayes
- Clustering via k-means, spherical k-means, and graph modularity
- Mathematical optimization, including non-linear programming and genetic algorithms
- Working with time series data and forecasting with exponential smoothing
- Using Monte Carlo simulation to quantify and address risk
- Detecting outliers in single or multiple dimensions
- Exploring the data-science-focused R language
About the Author
John W. Foreman is Chief Data Scientist for MailChimp.com, where he leads a data science product development effort called the Email Genome Project. As an analytics consultant, John has created data science solutions for The Coca-Cola Company, Royal Caribbean International, Intercontinental Hotels Group, Dell, the Department of Defense, the IRS, and the FBI.
Top customer reviews
First, a drop about me from the standpoint of this book. I have been an IT professional for many years specializing in programming, database, and MS Office add-ons. Part of my job entails self enrichment, that is, expand my working knowledge in areas potentially important for my job. I chose Foreman's book to help with this task for a number of reasons: a) Data Science is a hot area and my company does have a Data Science group, b) I have lots of data experience under my belt - I felt that it would be nice for once to get some useful information from the data, and c) I have a really good Excel background - so I figured that Foreman's approach would be perfect for me - little did I know that I would seriously add to my Excel bag of tricks.
The author makes the assumptions that: a) the reader is somewhat technical, b) he knows nothing about Data Science, and c) he is relatively comfortable working in Excel.
Reading the book is a joy because Foreman has a cozy, chummy style. He definitely doesn't throw all the technical stuff at the reader rat-tat-tat machine gun style like many other authors. Instead, Foreman gently introduces his topics and then ramps up technical details carefully. This most definitely helps the learning process.
Speaking of learning, by the end of the you will have learned important concepts in "machine learning" and I believe that you will be ready for the next step. I sure was. I found the topics interesting and I wanted to learn more. This is where the book's only problem area comes into play - the next step. Foreman has 3 references - one good, but minor, one terrible, and the other is inappropriate. Let me explain.
Foreman recommends a free resource as a follow-on to his Forecasting Chapter. This is a good reference, but I believe that Forecasting is a minor topic in Data Science, unless, of course, Forecasting becomes your thing.
Foreman's main reference is: "Data Mining with R" by Luis Torgo. Foreman recommends this as the next step after his book.I tried to read this several times, but couldn't. It certainly wasn't my next step.
The other reference, "The Elements of Statistical Learning" by Trevor Hastie, et. al, is totally inappropriate for Data Science newbies. You can checkout the Amazon reviews for this book and you'll see that you need a pretty serious background in statistics to get anything out of that reference. In fact, the author Hastie says as much in his next book "An Introduction to Statistical Learning- with Applications in R". This is the appropriate next step, but I'll get to that in a moment.
Here are my recommendations:
A. Read Foreman's book and follow along with him in working through the Excel spreadsheets. This is a first step in getting comfortable with Machine Learning.
B. Take the Coursera courses: 1) Machine Learning Foundations: A Case Study Approach, and 2) Machine Learning: Regression. The courses are free unless you want completion certificates, in which case there is nominal cost.
C. Now you are ready for: An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) This book is also available for free by the authors - check online.
Anyway, books about "Data" seem to fit into one of the following categories:
* Extremely technical gradate-level mathematics books with lots of Greek letters and summation signs
* Pie-in-the-sky business bestsellers about how "Data" is going to revolutionize the world as we know it. (I call these "Moneyball" books)
* Technical books about the hottest new "Big Data" technology such as R and Hadoop
Data Smart is none of these. Unlike "Moneyball" books, Data Smart contains enough practical information to actually start performing analyses. Unlike most textbooks, it doesn't get bogged down in mathematical notation. And unlike books about R or the distributed data blah-blah du jour, all the examples use good old Microsoft Excel. It's geared toward competent analysts who are comfortable with Excel and aren't afraid of thinking about problems in a mathematical way. It's goal isn't to "revolutionize" your business with million-dollar software, but rather to make incremental improvements to processes with accessible analytic techniques.
I don't work at a big company, so I can't attest to the number of dollars your company will save by applying the book's methods. But I can attest that the author makes difficult mathematical concepts accessible with his quirky sense of humor and gift for metaphor. For example, I previously had not been exposed to the nitty-gritty of clustering techniques. After a couple of hours with the clustering chapters, which include illuminating diagrams and spreadsheet formulas, I felt like I had a good handle on the concepts, and would feel comfortable implementing the ideas in Excel -- or any other language, for that matter.
What I like most about the book is that it doesn't try to wave a magic data wand to cure all of your company's ills. Instead it focuses on a few areas where data and analytic techniques can deliver a concrete benefit, and gives you just enough to get started. In particular:
* Optimization techniques (Ch. 4) can systematically reduce the cost of manufacturing inputs
* Clustering techniques (Ch. 2 and 5) can deliver insights into customer behavior
* Predictive techniques (Ch. 3, 6, and 7) can increase margins with better predictions of uncertain outcomes
* Forecasting techniques (Ch. 8) can reduce waste with better demand planning
It may take some creativity to figure out how to apply the methods to your own business processes, but all of the techniques are "tried and true" in the sense of being widely deployed at large companies with big analytics budgets and teams of Ph.D.'s on staff. This book's contribution is to make these techniques available to anyone with a little background in applied mathematics and a copy of Excel. For that reason, despite the absence of glitter and/or Jack Welch on the book's cover, I think Data Smart is an important business book.
I had a few criticisms of the book as I was reading drafts, but almost all of them were addressed before the final revision. For the sake of completeness, I'll tell you what they were. Some of the chapters ran on a bit long, but these have been split up into manageable pieces. The Optimization chapter is a bit of a doozie, and used to be at the very beginning, but the reader can now "warm up" with some easier chapters on clustering and simple Bayesian techniques. The Regression chapter originally didn't discuss Receiver Operating Characteristic curves, which are important for evaluating predictive models visually, but now ROC curves are abundant.
Only one real criticism from me remains: I would have liked to see more on quantile regression, which is only mentioned in passing. It's a great technique for dealing with outlier-heavy data. The book by Koenker has good but highly mathematical coverage, and I would have loved to see this subject given the Foreman treatment. But, you can't have everything, and I suppose John needs to leave some material for Data Smart 2: The Spreadsheet of Doom.
In sum, Data Smart is a well-written and engaging guide to getting new insights from data using familiar tools. The techniques aren't really cutting-edge -- in fact, most have been around for decades -- but to my knowledge this is the first time they've been presented in a way that Excel-slinging business analysts can apply the methods without needing her own team of operations researchers and data scientists. If you're not sure whether the book's sophistication is on par with your own skills, you can download a complete sample chapter (as well as example spreadsheets) from the author's website.
One last thing: unlike many books with a technical bent, the prose is engaging and extremely clear. I think this can be traced to John's childhood. When John misbehaved, his father (who is a professor of English) would punish John by forcing him to read a novel by Charles Dickens. Minor infractions resulted in A Christmas Carol being meted out, and when he was really bad he had to read Great Expectations. This is a true story which you should ask John about if you see him at a book-signing event.