Mark Bates has been developing web applications of one kind or another since 1996. He has spent an ungodly amount of time programming Java, but thankfully he discovered Ruby in late 2005, and life has been much nicer since.
Since discovering Ruby, Mark has become a prominent member of the community. He has developed various open-source projects, such as Configatron, Cachetastic, Genosaurus, APN on Rails, and the Mack Framework, just to name a few. The Mack Framework brought Mark to the forefront of distributed programming in the Ruby community. Mack was a web framework designed from the ground up to aid in the development of distributed applications.
Mark has taught classes on both Ruby and Ruby on Rails. He has spoken at several Ruby gatherings, including 2008’s RubyConf, where he spoke about building distributed applications.
Mark has an honors degree in music from the Liverpool Institute for Performing Arts. He still likes to rock out on the weekends, but set times are now 10 p.m., not 2 a.m. He lives just outside of Boston with his wife Rachel and their sons Dylan and Leo, whom he missed very much when writing this book.
Mark can be found at http://www.markbates.com and http://github.com/markbates.
Extreme Programming InstalledPreface
I first found a need for distributed programming back in 2001. I was looking for a way to increase the performance of an application I was working on. The project was a web-based email client, and I was struggling with a few performance issues. I wanted to keep the email engine separate from the client front end. That way, I could have a beefier box handle all the processing of the incoming email and have a farm of smaller application servers handling the front end of it. That seems pretty easy and straightforward, doesn’t it? Well, the language I was using at the time was Java, and the distributed interface was RMI (remote method invocation). Easy and straightforward are not words I would use to describe my experiences with RMI.
Years later I was working on a completely different project, but I had a not-too-dissimilar problemperformance. The application this time was a large user-generated content site built using Ruby on Rails. When a user wrote, edited, or deleted an article for the site, it needed to be indexed by our search engine, our site map needed to be rebuilt, and the article needed to be injected into the top of our rating engine system. As you can imagine, none of this was quick and simple. You can also probably guess that our CEO wanted all of this to happen as close to real time as possible, but without the end user’s having to wait for everything to get done. To further complicate matters, we had limited system resources and millions of articles that needed to be processed.
I didn’t want to burden our already-overworked applications server boxes with these tasks, so I had to offload the processing to another machine. The question came to be how I could best offload this work. The first idea was to use the database as the transfer mechanism. I could store all the information in the database that these systems would need. Then the machine that was to do the processing could poll the database at a regular interval, find any pending tasks, pull them out of the database, create the same heavy objects I already had, and then start processing them. The problem, as you most likely already know, is that I’m now placing more load on the database. I would be polling it continually, regardless of whether it contained any tasks. If it did have tasks, I would have to pull those records out of the database and use more system resources transforming the records back into those same heavy Ruby objects I already had.
What I really wanted to do was just send the fully formed Ruby objects I had already created to the other machine and let it do the processing. This would lessen the burden all around. In addition to the lighter load on the database, memory, and system resources, the machine doing the processing would work only when it was told to, and it wouldn’t waste recourses by continually polling the database. Plus, without polling, the parts of the application the CEO wanted updated in near real time would get updated faster.
Once I realized that what I wanted to do was to use some sort of distributed mechanism, that’s when I decided to see what sort of RMI-esque features Ruby had. I was already impressed with Ruby for being a terse language, but when I found the DRb (Distributed Ruby, also known as dRuby) package, I became a believer. I found that writing distributed applications in Ruby could be simple, and dare I say fun.
Who Is This Book For?
This book is quite simply written for the intermediate to advanced Ruby developer who wants to start developing distributed applications. This book assumes that you have pretty good knowledge of Ruby, at least at the intermediate developer level. Although we will touch on some parts of the Ruby languageparticularly those that might be confusing when dealing with distributed applicationswe will not be going into the language in depth.
While you should know Ruby, this book assumes that you probably do not understand distributed programming and that this is your first venture into this world. If you have done distributed programming before, this book will help you quickly understand how to do it in Ruby. If you haven’t, this book will help you understand what distributed programming is and isn’t.
How Is This Book Organized?
This book is split into four parts. Part I examines what ships with the standard library in Ruby 1.8.x and beyond. We look, in depth, at understanding how DRb (dRuby or Distributed Ruby) and Rinda work. We will build some simple applications in a variety of ways and use those examples to talk about the libraries. We examine the pros and cons of DRb and Rinda. By the end of Part I you should feel comfortable and ready to build your distributed applications using these libraries.
Part II looks at a variety of third-party tools, libraries, and frameworks designed to make distributed programming in Ruby easy, fun, and robust. Some of these libraries build on the DRb and Rinda libraries we learned about in Part I, and others don’t. Some are based on executing arbitrary code on another machine. Others are based on running code in the background to elevate performance.
Part III takes a close look at some of the leading distributed message queues available to the Ruby community. These queues can help facilitate communication and tasks between your applications. Distributed message queues can help increase your applications’ performance by queuing up work to be done at a later date instead of at runtime.
Finally, Part IV looks at a few libraries that are designed to work exclusively with the Ruby on Rails web framework. These libraries might already be familiar to you if you have been using Ruby on Rails for several years. But there is always something to be learned, and that’s what the chapters in this part of this book will help you with.
During the course of the book, we will examine a breadth of different technologies; however, this book is not necessarily a how-to guide. Instead, you will use these different technologies to help understand the complex problems associated with distributed programming and several different ways you can solve these problems. You’ll use these technologies to learn about RMI, message queues, and MapReduce, among others.
How to Run the Examples
I have tried to make this book as easy to use and follow as possible. When a new technology is referenced or introduced, I give you a link to find out more about it and/or its developer(s). When you see a code sample, unless otherwise stated, I present that sample in its entirety. I have also taken extra effort to make sure that you can easily run each of those code samples as is. Unless otherwise stated, you should be able to take any code sample, copy it into a Ruby file, and run it using the ruby command, like this:
$ ruby foo.rb
There are times when a file needs to be named something specific or has to be run with a special command. In particular, Chapter 4, "Starfish," covers this issue. At that time I will call your attention to these details so that you can run the examples without hassle.
In some chapters, such as Chapters 2, "Rinda," and 8, "AMQP/RabbitMQ," background servers need to be run for the examples to run correctly. It is highly recommended that you restart these background servers between each set of examples that are presented in these chapters. A lot of these chapters iteratively build on a piece of software, and restarting the servers between runs helps eliminate potentially confusing results.
© Copyright Pearson Education. All rights reserved.