List Price: $69.99 Details

The List Price is the suggested retail price of a new product as provided by a manufacturer, supplier, or seller. Except for books, Amazon will display a List Price if the product was purchased by customers on Amazon or offered by other retailers at or above the List Price in at least the past 90 days. List prices may not necessarily reflect the product's prevailing market price.
Learn more
Save: $29.00 (41%)
FREE Returns
Return this item for free
  • Free returns are available for the shipping address you chose. You can return the item for any reason in new and unused condition: no shipping charges
  • Learn more about free returns.
FREE delivery Tuesday, November 8
Or fastest delivery Friday, November 4. Order within 23 mins
In Stock.
As an alternative, the Kindle eBook is available now and can be read on any device with the free Kindle app.
[{"displayPrice":"$40.99","priceAmount":40.99,"currencySymbol":"$","integerValue":"40","decimalSeparator":".","fractionalValue":"99","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"ytEaQMCjGDr2zvQRcSmh8FOd1yjkSnoqHcDgOTly6ou4bvBzM0yex1ShO%2BiHTkh4TTWhjwuDdLO7aVZDfQvfHCmYy5wg7U43sRQyuqnAHFaizNv67aifKXyyyrK6xlmF%2BooJc66ljsnjBIhABkfbbw%3D%3D","locale":"en-US","buyingOptionType":"NEW"},{"displayPrice":"$22.48","priceAmount":22.48,"currencySymbol":"$","integerValue":"22","decimalSeparator":".","fractionalValue":"48","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"ZGD1ywxagP%2FE9nXyZKY2n7ZNHyVg7lZvjFN%2Fp89kwQQC8UeYzxGocMYIZWy%2FjJD%2FAC%2FoiMGfM%2BrM2pLFoyDFZKiWPHFIsjSe8ggefnggy1QJsmnCrE85H5ToFHp5KFU2B446DAz5rgLU8%2FLh%2BtgSw4lEgt023iXttUiOwOYRLad6rfFG74BoPT6oq4ZHYbv1","locale":"en-US","buyingOptionType":"USED"}]
$$40.99 () Includes selected options. Includes initial monthly payment and selected options. Details
Price
Subtotal
$$40.99
Subtotal
Initial payment breakdown
Shipping cost, delivery date, and order total (including tax) shown at checkout.
Your transaction is secure
We work hard to protect your security and privacy. Our payment security system encrypts your information during transmission. We don’t share your credit card details with third-party sellers, and we don’t sell your information to others. Learn more
Ships from
Amazon.com
Sold by
Amazon.com
Ships from
Amazon.com
Sold by
Amazon.com
Return policy: Returnable until Jan 31, 2023
For the 2022 holiday season, returnable items purchased between October 11 and December 25, 2022 can be returned until January 31, 2023.
Support: Free Amazon product support included
What's Product Support?
In the event your product doesn’t work as expected, or you’d like someone to walk you through set-up, Amazon offers free product support over the phone on eligible purchases for up to 90 days.
To access this option, go to Your Orders and choose Get product support.
Learning Spark: Lightning... has been added to your Cart
FREE delivery November 8 - 10. Details
Or fastest delivery November 7 - 9. Details
Used: Like New | Details
Condition: Used: Like New
Comment: Like new condition. Absolutely no highlighting or marking inside the books. All covers subject to prior use. (If the book was published with a CD/DVD, it will be included in your copy.) All items ship within 24 hours with the exception of Sunday. Please choose expedited (priority) mail if you need an item more quickly than receipt in 7-12 days.
Access codes and supplements are not guaranteed with used items.
Added to

Sorry, there was a problem.

There was an error retrieving your Wish Lists. Please try again.

Sorry, there was a problem.

List unavailable.
Have one to sell?
Other Sellers on Amazon
Added
$42.56
+ $3.99 shipping
Sold by: SuperBookDeals---
Sold by: SuperBookDeals---
(38404 ratings)
87% positive over last 12 months
In stock.
Usually ships within 4 to 5 days.
Shipping rates and Return policy
Added
$46.56
& FREE Shipping
Sold by: Publisher Direct
Sold by: Publisher Direct
(85440 ratings)
93% positive over last 12 months
Only 20 left in stock - order soon.
Shipping rates and Return policy
Added
$49.84
& FREE Shipping
Sold by: Book Depository US
Sold by: Book Depository US
(943491 ratings)
91% positive over last 12 months
In stock.
Usually ships within 2 to 3 days.
Shipping rates and Return policy
Loading your book clubs
There was a problem loading your book clubs. Please try again.
Not in a club? Learn more
Amazon book clubs early access

Join or create book clubs

Choose books together

Track your books
Bring your club to Amazon Book Clubs, start a new book club and invite your friends to join, or find a club that’s right for you for free.
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required. Learn more

Read instantly on your browser with Kindle Cloud Reader.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more

Follow the Author

Something went wrong. Please try your request again later.

Learning Spark: Lightning-Fast Data Analytics 2nd Edition

4.7 out of 5 stars 165 ratings

Price
New from Used from
Kindle
Paperback
$40.99
$36.99 $22.48

Enhance your purchase


Check out reading-themed apparel and accessories in the new Amazon Books merch shop

Frequently bought together

$40.99
Get it as soon as Tuesday, Nov 8
In Stock.
Ships from and sold by Amazon.com.
+
$50.31
Get it as soon as Tuesday, Nov 8
In Stock.
Ships from and sold by Amazon.com.
Total price:
To see our price, add these items to your cart.
Choose items to buy together.

From the brand


From the Publisher

While the book is focused on learning Spark as an analytical engine for diverse workloads, we will not cover all of the languages that Spark supports. Most of the examples in the chapters are written in Scala, Python, and SQL. Where necessary, we have infused a bit of Java. For those interested in learning Spark with R, we recommend Javier Luraschi, Kevin Kuo, and Edgar Ruiz’s Mastering Spark with R (O’Reilly).

Finally, because Spark is a distributed engine, building an understanding of Spark application concepts is critical. We will guide you through how your Spark application interacts with Spark’s distributed components and how this is decomposed into parallel tasks on a cluster. We will also cover which deployment modes are supported and in what environments.

While there are many topics we have chosen to cover, there are a few that we have opted to not focus on. These include the older low-level Resilient Distributed Dataset (RDD) APIs and GraphX, Spark’s API for graphs and graph-parallel computation. Nor have we covered advanced topics such as how to extend Spark’s Catalyst optimizer to implement your own operations, how to implement your own catalog, or how to write your own DataSource V2 data sinks and sources. Though part of Spark, these are beyond the scope of your first book on learning Spark.

Instead, we have focused and organized the book around Spark’s Structured APIs, across all its components, and how you can use Spark to process structured data at scale to perform your data engineering or data science tasks.

Editorial Reviews

About the Author

Jules S. Damji is a senior developer advocate at Databricks and an MLflow contributor. He is a hands-on developer with over 20 years of experience and has worked as a software engineer at leading companies such as Sun Microsystems, Netscape, @Home, Loudcloud/Opsware, Verisign, ProQuest, and Hortonworks, building large scale distributed systems. He holds a B.Sc. and an M.Sc. in computer science and an MA in political advocacy and communication from Oregon State University, Cal State, and Johns Hopkins University, respectively.

Brooke Wenig is a machine learning practice lead at Databricks. She leads a team of data scientists who develop large-scale machine learning pipelines for customers, as well as teaching courses on distributed machine learning best practices. Previously, she was a principal data science consultant at Databricks. She holds an M.S. in computer science from UCLA with a focus on distributed machine learning.

Tathagata Das is a staff software engineer at Databricks, an Apache Spark committer, and a member of the Apache Spark Project Management Committee (PMC). He is one of the original developers of Apache Spark, the lead developer of Spark Streaming (DStreams), and is currently one of the core developers of Structured Streaming and Delta Lake. Tathagata holds an M.S. in computer science from UC Berkeley.

Denny Lee is a staff developer advocate at Databricks who has been working with Apache Spark since 0.6. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premises and cloud environments. He also has an M.S. in biomedical informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise healthcare customers.

Product details

  • Publisher ‏ : ‎ O'Reilly Media; 2nd edition (August 25, 2020)
  • Language ‏ : ‎ English
  • Paperback ‏ : ‎ 397 pages
  • ISBN-10 ‏ : ‎ 1492050040
  • ISBN-13 ‏ : ‎ 978-1492050049
  • Item Weight ‏ : ‎ 1.4 pounds
  • Dimensions ‏ : ‎ 7.25 x 1 x 9.25 inches
  • Customer Reviews:
    4.7 out of 5 stars 165 ratings

About the author

Follow authors to get new release updates, plus improved recommendations.
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.

Discover more of the author’s books, see similar authors, read author blogs and more


Customer reviews

4.7 out of 5 stars
4.7 out of 5
165 global ratings

Top reviews from the United States

Translate all reviews to English
Reviewed in the United States on July 25, 2022
Customer image
5.0 out of 5 stars Covers theoretical and practical aspects of the spark ecosystem in great depth
By Arturo Amador Cruz on July 25, 2022
This book is a great resource to learn about spark. It covers in detail the concepts related to the Spark architecture, theoretical concepts about parallelization and topics related to optimizing analytical pipelines running on Spark. The book has a very nice section about the delta lake. Also covers MLflow yup a good level of detail, more like a complement to the docs. The section on machines learning includes theoretical explanations on how some ML algorithms change when running then parallely, as MLlib does.
I used the book as an extra study resource when taking some Databricks certifications. It was a great addition to my study materials.
Images in this review
Customer image
Customer image
Reviewed in the United States on May 11, 2022
Reviewed in the United States on January 28, 2022
Reviewed in the United States on September 4, 2020
Customer image
5.0 out of 5 stars Well organized and solid information
By JA on September 3, 2020
It was easy to follow the book. The setup of Spark shell was also clearly written. I also find the instructions online to install spark locally to be sufficient as well. The book is well organized to delineate different components of Spark, e.g. intro, structured api, streaming, optimizations, data lake, ml deployment options. While ML deployment needs for individual business use cases are highly specific, I find the overview deployment framework provided by the book to be helpful. I also liked that the book uses screenshots of Spark UI and arrows to point in the screenshots to explain the UI, since the UI can be hard to understand. The code samples and the graphics in other sections are useful as well. There’s also coverage on how to connect to different apps, like beeline (which I’ve never heard of), tableau, thrift. Overall, the book contains solid information on the inner workings of Spark. I would recommend giving this book a read!
Images in this review
Customer image Customer image Customer image Customer image Customer image
Customer imageCustomer imageCustomer imageCustomer imageCustomer image
3 people found this helpful
Report abuse
Reviewed in the United States on April 20, 2022
One person found this helpful
Report abuse
Reviewed in the United States on December 30, 2020
2 people found this helpful
Report abuse
Reviewed in the United States on August 29, 2022
Reviewed in the United States on October 5, 2020
8 people found this helpful
Report abuse

Top reviews from other countries

Josh Fry
5.0 out of 5 stars Nicely laid out and explained
Reviewed in the United Kingdom on February 9, 2022
Customer image
5.0 out of 5 stars Nicely laid out and explained
Reviewed in the United Kingdom on February 9, 2022
I've just started my role as a Data Engineer where I looked at Azure's Data Factory. I needed to learn PySpark so I picked up this book and found it a super useful guide. It is explained clearly, and whilst it's clearly aimed at someone who has been in the industry longer than I, I found I could easily understand it.

I haven't read the chapter on streaming or the two chapters on machine learning as it isn't applicable to me, but everything else has been just what I needed. Well done to the authors for putting together such an amazing guide.

If you want to see the different chapter contents, I've added them as photos for your ease.
Images in this review
Customer image Customer image Customer image Customer image
Customer imageCustomer imageCustomer imageCustomer image
One person found this helpful
Report abuse
kryten
5.0 out of 5 stars Recommended
Reviewed in the United Kingdom on May 22, 2021
Fernando
5.0 out of 5 stars Contenido actualizado
Reviewed in Mexico on December 26, 2020
3 people found this helpful
Report abuse
Carlos
3.0 out of 5 stars Libro complementario.
Reviewed in Spain on March 14, 2021
2 people found this helpful
Report abuse
Neville Andrade
5.0 out of 5 stars Really well written technical book
Reviewed in Germany on November 15, 2020