Buy new:
$59.99$59.99
FREE delivery:
Thursday, Feb 16
Ships from: Amazon.com Sold by: Amazon.com
Buy used: $51.80
Other Sellers on Amazon
+ $3.99 shipping
91% positive over last 12 months
Usually ships within 4 to 5 days.
+ $3.99 shipping
85% positive over last 12 months

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required. Learn more
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.


Spark in Action, Second Edition: Covers Apache Spark 3 with Examples in Java, Python, and Scala 2nd Edition
Price | New from | Used from |
- eTextbook
$43.99 Read with Our Free App - Paperback
$51.80 - $59.99
Enhance your purchase
The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.
Foreword by Rob Thomas.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem.
About the book
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.
What's inside
Writing Spark applications in Java
Spark application architecture
Ingestion through files, databases, streaming, and Elasticsearch
Querying distributed datasets with Spark SQL
About the reader
This book does not assume previous experience with Spark, Scala, or Hadoop.
About the author
Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years.
Table of Contents
PART 1 - THE THEORY CRIPPLED BY AWESOME EXAMPLES
1 So, what is Spark, anyway?
2 Architecture and flow
3 The majestic role of the dataframe
4 Fundamentally lazy
5 Building a simple app for deployment
6 Deploying your simple app
PART 2 - INGESTION
7 Ingestion from files
8 Ingestion from databases
9 Advanced ingestion: finding data sources and building
your own
10 Ingestion through structured streaming
PART 3 - TRANSFORMING YOUR DATA
11 Working with SQL
12 Transforming your data
13 Transforming entire documents
14 Extending transformations with user-defined functions
15 Aggregating your data
PART 4 - GOING FURTHER
16 Cache and checkpoint: Enhancing Spark’s performances
17 Exporting data and building full data pipelines
18 Exploring deployment
- ISBN-101617295523
- ISBN-13978-1617295522
- Edition2nd
- PublisherManning
- Publication dateJune 2, 2020
- LanguageEnglish
- Dimensions7.38 x 1.2 x 9.25 inches
- Print length576 pages
![]() |
Frequently bought together
- +
- +
Customers who viewed this item also viewed
Editorial Reviews
Review
--Rob Thomas, IBM
"An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing."
--Anupam Sengupta, GuardHat Inc.
"This book will help spark a love affair with distributed processing."
--Conor Redmond, InComm Product Control
"Currently the best book on the subject!"
--Markus Breuer, Materna IPS
"I am a big fan of your approach to Data Engineering, your book on Spark, and loved your talks... I am training Data Engineers at my company and your Spark in Action, 2e book is a mandatory material for it!"
-- Thiago de Faria, LINKIT
From the Author
About the Author
Product details
- Publisher : Manning; 2nd edition (June 2, 2020)
- Language : English
- Paperback : 576 pages
- ISBN-10 : 1617295523
- ISBN-13 : 978-1617295522
- Item Weight : 2.29 pounds
- Dimensions : 7.38 x 1.2 x 9.25 inches
- Best Sellers Rank: #553,178 in Books (See Top 100 in Books)
- #164 in Java Programming
- #398 in Data Processing
- #654 in Python Programming
- Customer Reviews:
About the author

Jean-Georges "jgp" Perrin is a technology leader focusing on building innovative and modern data platforms at PayPal, president of AIDAUG, and author of Spark in Action, 2nd edition (Manning). He is passionate about software engineering and all things data, small and big data. His latest endeavors bring to more and more data engineering, data governance, industrialization of data science, and his favorite theme, the Data Mesh. He is proud to have been recognized as a Lifetime IBM Champion. Jean-Georges shares his more than 25 years of experience in the IT industry as a presenter and participant at conferences and publishing articles in print and online media. His blog is visible at http://jgp.ai. When not immersed in IT, which he loves, he enjoys exploring Upstate New York and New England with his wife and kids.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on Amazon-
Top reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
Kinda waste of money to be honest.
"Learning PySpark" by Drabas and Lee, published by Packt in 2017
"Frank Kane's Taming Big Data with Apache Spark and Python" by (no surprise) Kane, Packt, 2017
"Data Analytics with Spark Using Python" by Aven, Addison Wesley, 2018
"PySpark Cookbook" by (once again) Drabas and Lee, Packt, 2018
"Developing Spark Applications with Python" by Morera and Campos, self-published in 2019
"PySpark Recipes" by Mishra, Apress, 2017
"Learning Spark" by Damjil et al., O'Reilly, 2020
"Beginning Apache Spark Using Azure Databricks" by Ilijason, Apress, 2020
"Spark: The Definitive Guide" by Chambers and Zaharia, O'Reilly, 2018
Databricks themselves point to "Learning Spark" and "Spark: The Definitive Guide" as preparation aids, so I started with these, skimming both books - and strongly preferring "The Definitive Guide" - and then took a look at the others. You can see my reviews of those titles on their respective Amazon pages, but I can summarize and say that only Chambers-Zaharia and Ilijason are worth considering.
... and then I realized that I missed "Spark in Action". The exam is already taken - and passed, making me a Spark expert ;) - but I thought I needed to take a look, e.g. to see how this tome compares to Chambers-Zaharia.
Well, it could not be further away. "Definitive Spark" has Scala and Python illustrations, and "Spark in Action" is pure Java. "Definitive Spark" is a comprehensive and clearly structured reference, and "Spark in Action" is a limited hands-on tutorial. Between the two heavy tomes, "Definitive Spark" seems to pack *far* more payload, and non-Java-specific content in "Spark in Action" is not remarkable.
It's the Java focus, then, that must be the book's "unique selling proposiiton". Java guys, buy it. Scala/Python/undecided folks, don't bother.