Programming Books C Java PHP Python Learn more Browse Programming Books
Parallel R and over one million other books are available for Amazon Kindle. Learn more

Sorry, this item is not available in
Image not available for
Color:
Image not available

To view this video download Flash Player

 


or
Sign in to turn on 1-Click ordering
Sell Us Your Item
For a $4.05 Gift Card
Trade in
More Buying Choices
Have one to sell? Sell yours here
Start reading Parallel R on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Parallel R [Paperback]

by Q. Ethan McCallum, Stephen Weston
4.0 out of 5 stars  See all reviews (6 customer reviews)

List Price: $21.99
Price: $19.85 & FREE Shipping on orders over $35. Details
You Save: $2.14 (10%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In Stock.
Ships from and sold by Amazon.com. Gift-wrap available.
Want it tomorrow, April 25? Choose One-Day Shipping at checkout. Details
Free Two-Day Shipping for College Students with Amazon Student

Formats

Amazon Price New from Used from
Kindle Edition $9.99  
Paperback $19.85  
Unknown Binding --  
Sell Us Your Books
Get up to 80% back when you sell us your books, even if you didn't buy them at Amazon. Learn more

Book Description

November 2, 2011 1449309925 978-1449309923 1

It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets, including three chapters on using R and Hadoop together. You’ll learn the basics of Snow, Multicore, Parallel, Segue, RHIPE, and Hadoop Streaming, including how to find them, how to use them, when they work well, and when they don’t.

With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.

  • Snow: works well in a traditional cluster environment
  • Multicore: popular for multiprocessor and multicore computers
  • Parallel: part of the upcoming R 2.14.0 release
  • R+Hadoop: provides low-level access to a popular form of cluster computing
  • RHIPE: uses Hadoop’s power with R’s language and interactive shell
  • Segue: lets you use Elastic MapReduce as a backend for lapply-style operations

Frequently Bought Together

Parallel R + The Art of R Programming: A Tour of Statistical Software Design + Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Price for all three: $65.38

Buy the selected items together


Editorial Reviews

Book Description

Data Analysis in the Distributed World

About the Author

Q Ethan McCallum is a consultant, writer, and technology enthusiast, though perhaps not in that order. His work has appeared online on The O’Reilly Network and Java.net, and also in print publications such as C/C++ Users Journal, Doctor Dobb’s Journal, and Linux Magazine. In his professional roles, he helps companies to make smart decisions about data and technology.

Stephen Weston has been working in high performance and parallel computing for over 25 years. He was employed at Scientific Computing Associates in the 90's, working on the Linda programming system, invented by David Gelernter. He was also a founder of Revolution Computing, leading the development of parallel computing packages for R, including nws, foreach, doSNOW, and doMC. He works at Yale University as an HPC Specialist.


Product Details

  • Paperback: 126 pages
  • Publisher: O'Reilly Media; 1 edition (November 2, 2011)
  • Language: English
  • ISBN-10: 1449309925
  • ISBN-13: 978-1449309923
  • Product Dimensions: 9.2 x 7 x 0.3 inches
  • Shipping Weight: 7.8 ounces (View shipping rates and policies)
  • Average Customer Review: 4.0 out of 5 stars  See all reviews (6 customer reviews)
  • Amazon Best Sellers Rank: #838,997 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

4.0 out of 5 stars
(6)
4.0 out of 5 stars
Share your thoughts with other customers
Most Helpful Customer Reviews
5 of 5 people found the following review helpful
Format:Paperback
You have a problem: R is single-threaded, but your code would be faster if it could simultaneously run on more than one core. You have access to a cluster and/or your computer has multiple cores. Parallel R, by Q. Ethan McCallum and Stephen Weston, can help you put this extra computing power to use. The review on my blog ([...]) has several useful links.

The book describes 6 approaches to distributed computing:

1) snow
The chapter starts by showing you how to create a socket cluster on a single machine (later sections discuss MPI clusters, and socket clusters of several machines). Then a section describes how to initialize workers, with a later section giving a slightly advanced discussion on how functions are serialized to workers.

There's a great demonstration (including graphs) of why/when you should use clusterApplyLB instead of clusterApply. There's also a fantastic discussion on potential I/O issues (probably one of the most surprising/confusing issues to people new to distributed computing) and how parApply handles them. Then the authors provide a very useful parApplyLB function.

There are a few (but very important!) paragraphs on random number generation using the rsprng and rlecuyer packages.

2) multicore
The chapter starts by noting that the multicore package only works on a single computer running a POSIX compliant operating system (i.e. most anything except Windows).

The next section describes the mclapply function, and also explains how mclapply creates a cluster each time it's called, why this isn't a speed issue, and how it is actually beneficial. The next few sections describe some of the optional mclapply arguments, and how you can achieve load balancing with mclapply.
Read more ›
Comment | 
Was this review helpful to you?
1 of 1 people found the following review helpful
3.0 out of 5 stars Out of Date January 28, 2014
Format:Kindle Edition|Verified Purchase
The packages are out of date and have been replaced by ones that are more comprehensive. Still offers a good explanation of parallel programming.
Comment | 
Was this review helpful to you?
By Dennis
Format:Kindle Edition|Verified Purchase
Adding 300pp or so would be very helpful. This book does not cover enough ground for sophisticated, statistics literate beginners in R (like me) and I think that less of it would probably be enough for people who know more about R and 'big data"tools.

I would pay many tenfolds the price for more information in this book. The author is definitely an expert: I hope he writes the right book soon as there is a market for it.

R is a great tool and many of us are very interested in parallel --but this book for some will be just an appetizer.
Comment | 
Was this review helpful to you?
ARRAY(0xa249c778)

What Other Items Do Customers Buy After Viewing This Item?


Forums

There are no discussions about this product yet.
Be the first to discuss this product with the community.
Start a new discussion
Topic:
First post:
Prompts for sign-in
 



Look for Similar Items by Category