Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark 1st Edition
Use the Amazon App to scan ISBNs and compare prices.
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Frequently bought together
Customers who viewed this item also viewed
From the Publisher
Who This Book Is For
Agile Data Science is intended to help beginners and budding data scientists to become productive members of data science and analytics teams. It aims to help engineers, analysts, and data scientists work with big data in an agile way using Hadoop. It introduces an agile methodology well suited for big data.
This book is targeted at programmers with some exposure to developing software and working with data. Designers and product managers might particularly enjoy Chapters 1, 2, and 5, which will serve as an introduction to the agile process without focusing on running code.
Agile Data Science assumes you are working in a *nix environment. Examples for Windows users aren’t available, but are possible via Cygwin.
How This Book Is Organized
This book is organized into two sections. Part I introduces the dataset and toolset we will use in the tutorial in Part II. Part I is intentionally brief, taking only enough time to introduce the tools. We go into their use in more depth in Part II, so don’t worry if you’re a little overwhelmed in Part I. The chapters that compose Part I are as follows:
- Chapter 1, Theory: Introduces the Agile Data Science methodology.
- Chapter 2, Agile Tools: Introduces our toolset, and helps you get it up and running on your own machine.
- Chapter 3, Data: Describes the dataset used in this book.
Part II is a tutorial in which we build an analytics application using Agile Data Science. It is a notebook-style guide to building an analytics application. We climb the data-value pyramid one level at a time, applying agile principles as we go. This part of the book demonstrates a way of building value step by step in small, agile iterations. Part II comprises the following chapters:
- Chapter 4, Collecting and Displaying Records: Helps you download flight data and then connect or “plumb” flight records through to a web application.
- Chapter 5, Visualizing Data with Charts and Tables: Steps you through how to navigate your data by preparing simple charts in a web application.
- Chapter 6, Exploring Data with Reports: Teaches you how to extract entities from your data and parameterize and link between them to create interactive reports.
- Chapter 7, Making Predictions: Takes what you’ve done so far and predicts whether your flight will be on time or late.
- Chapter 8, Deploying Predictive Systems: Shows how to deploy predictions to ensure they impact real people and systems.
- Chapter 9, Improving Predictions: Iteratively improves on the performance of our on-time flight prediction.
- Appendix A, Manual Installation: Shows how to manually install our tools.
About the Author
He lives on the ocean, in the fog, in Pacifica, California with Bella the Data Dog.
There was a problem filtering reviews right now. Please try again later.
So to set the record straight, I read this book cover-to-cover (unusual for me). I found it to be practical, well organized, insightful at times and overall a good introduction to the topic of Data Science.
I would like to clear up something not about this book but about our entire culture-- that always wants something for nothing. You will NOT be generating deep insights about your business effortlessly or quickly. By definition, these things are difficult and time-consuming.
So buy this book and get started.
Working in a data science group in I.T., we've had a lot of conversations about how I.T. operating approaches - agile, devOps, PMO - apply to data science. Data Science tasks are different in that not all work is intended to lead to functioning software, as well as the strongly-iterative approach that is necessary to deliver results to stakeholders in a way that discrete units of software might not otherwise be reviewed.
Russell Jurney's "Agile Data Science 2.0" goes a long way in moving that conversation in the right direction. I had three target audiences in mind when I acquired this book. The first was our PM, who had worked in I.T. for years as a director and project manager but continued to try to wrap his head around the data science process. The second was a director who was new to the data science process and wanted a better grasp of how to communicate expectations to the team. The third was myself, having spent time in both IT and in research, I had seen the two worlds and wanted a way to help explain how the two mesh.
Jurney has offered, as have many data science books, a suggested stack and how to implement it, but the most valuable part of the book I thought was the first two chapters for their emphasis on the agile manifesto for data science, a description of the many roles that go into a team, and highlights of how agile can make for better data science both in terms of research and in terms of products.
This is not a text to learn Spark from a developer's perspective but rather to understand how spark can fit in. Spark isn't the only platform, so those using Dask or other tools will still find value here.
If the book has a weakness it's the focus on developing a web portal to expose the data science product; this isn't a bad way to do things, not at all, but it's not where our work is going at the moment, so this limits the applicability of some of the chapters. But there's nothing that keeps the book from being useful .. so much so that I honestly don't know whose desk it's sitting on at the moment, since as soon as our PM finished it he gave it to a BA, who gave it to another PM...
After the obligatory introductory chapters, the book introduces a suite of tools used for the remaining chapters. These include Jupyter Notebooks, Python 3, Spark, sci-kit learn, and lightweight web applications. The data it introduces is the OpenFlights Database that is freely available from the Bureau of Transportation Statistics followed by weather datasets available from NOAA. The first goal is to use the tools and the data to predict flight delays.
With this setup, the book continues with detailed studies of collecting and displaying records, visualizing data, exploration of data, making predictions, deploying predictive systems, and improvements. I appreciated how the book followed the same datasets throughout as it moved through all the stages it's proposed methodology. Overall, a solid addition to the data science library.
In my opinion, he achieves all of his goals.
As a computer forensics specialist, I always deal with data. What has changed in the past two decades is the scale of the data I have to analyze.
We’ve come a long way from analyzing a few megabytes of data. Now, the possibility exists that I may have to deal with petabytes of data to find what I am looking for – or confirm its absence.
To that extent I have to deal with people who tell me things can’t be done, usually within an adversarial relationship.
What this book does for me in a big way is clarifying the process of getting from here to there.
Jurney describes the process clearly and in great deal.
While I am not exactly the intended audience, I think those who are will benefit greatly from this book.
Top international reviews
The "print-on-demand" is not very sharp, it looks like it was printed on some kind inkjet printer. It is painful to read, especially for figures and graphics. There may be some advantages to "print-on-demand", but the customer should be informed about it, and should have the choice to chose which edition he wants, just like for hardcover vs paperback.
There are random gray/black dots all over, figure are particularly low quality, and main text is fuzzy.
Otherwise the content seems great, but I'll probably wait for a real printed one to read it.