on November 4, 2011
There are hundreds of R books, but this is the best one to address the core problem of learning to *program* in R. As reviewer Jason notes, R is used by several audiences with varying needs, but anyone who uses R for long must come to terms with learning to program it. This is the book for that.
What Matloff does is to lay out the essentials of the R language (or S, if you prefer) in depth but in a readable fashion, with well-chosen examples that reinforce learning about the language itself (as opposed to focusing on statistics or data analysis).
I'm a long-time (12 years) R user, which is my platform for analytics every day, and I have programmed in a variety of languages from C to Perl. I have long missed the fact that there is nothing for R comparable to Kernighan & Ritchie ("K&R", The C Programming Language) or similar programming classics; finally there is. Matloff is not quite as beautiful and elegant as K&R (and to be fair, is not in their position as the language creator) but this book has similar goals and comes reasonably close.
I think there are two primary audiences for this book: those who are learning R from a computer science or programming background; and statisticians and others who use the programming language and want a thorough exposition. In my case, for instance, despite having written perhaps 100k lines of R code over the years, there remained areas where I was uneasy (e.g., exactly how do lists relate to data frames). Matloff sets it all straight, in friendly, readable fashion. Even in rudimentary chapters, I learned shortcuts and miscellaneous functions that are quite useful. The examples throughout are more "CS-like" than statistical, which is highly advantageous for this topic.
In addition to the tutorial content, it is well-suited as a quick reference. It doesn't aim to be comprehensive from a function point of view (which is almost impossible, and what R Help is for), but it is comprehensive from a programming conceptual point of view.
In short, if you program R, and unless you're a member of R-Core, then I believe you'll enjoy this, will learn something, and will refer back to it repeatedly.
on October 30, 2011
Jason's juxtaposition of "data analysts" and "serious R programmers" strikes me as a little unfair, but I see what he means. Consider yourself a "serious R programmer" (SRP), and buy this book, if you are interested in the following aspects of R:
Variable scope - Chapter 7
User-defined classes - Ch 9
Debugging - Ch 13
Profiling and performance (mostly, vectorization) - Ch 14
Interfacing with C/C++ and Python - Ch 15
Parallel computation ("pure R" approach using "snow" package, and C++-aided approach using "OpenMP" library) - Ch 16
I have not seen the material of Chapters 15-16 in any other R reference; the other topics have shown up elsewhere - in "R in Nutshell", for example - but get more attention here. The chapters would have been much shorter if written in a "Nutshell" style; however, I do not automatically consider a verbose, user-friendly writing style a negative.
The early chapters introduce R in a way similar to other books - except for (a) eschewing discussion of the language's statistical repertoire, which makes sense given "programming" focus, and (b) showing a greater interest in the "matrix" class - and although they do it quite nicely (this said, let me ask the author to reconsider his "extended examples"), I would not recommend "Art of R Programming" to non-SRPs, and point them to Robert Kabacoff's "R in Action" or (the E-Z version) Paul Teetor's "R Cookbook" instead.
Overall, while the book did not quite click for me - I am a "data analyst" and at present do not have much "need for speed" (cf. C/C++); on the other hand, I would like a firmer grasp on R's OOP, but here, "Art of R Programming" only whets one's appetite - I cannot deny its quality and unique value for budding SRPs. If there was any wavering between four and five stars on my part, the appreciation of how pretty and inexpensive the book is tipped the scales.
on January 15, 2012
The uniformly good reviews for "The Art Of R Programming" led me to read it, and I'm glad I did. I've used R casually for years as a sort of "secret weapon" to quickly analyze a few millions data points, graph it, and draw useful conclusions, all before some one could load the data into a SQL database. I've long believed that R is a clean, well designed language for data analysis that was missing a good introductory text for programmers. R's type system, lexical structure, run time mechanics, and functional nature make it one of the best designed languages around, but this also seems to be one of the best kept secrets in the software community. Until I read "The Art of R Programming" I'd never come across material on R that introduced R as a programming language. Most of what I saw presented it as a statistical toolbox that you could, almost accidentally, program.
However, be warned that the book is not rigorous, either as an introduction or a reference. It is concise, easy to read, and much is driven by case studies to show you how to do things. But it often left me uneasy as a software engineer. For example, it states that R uses "lazy evaluation" when a more accurate statement would be that it is simply evaluates function arguments lazily. The description of the run time object environment is clunky: evaluation contexts, closures, and recursion are treated separately. It does not entirely explain how symbol look up works for functions (you won't learn why "sum <- 1; sum(1,2,3)" will still evaluate to 6). The discussion on object copy-on-change was so vague that I failed to understand how I could use that information.
Okay, so it's not perfect, and it's definitely no K&R. But it's still way better than any other introduction I've seen before. It may be the best way to get started and then go on to the masses of freely available information about R. I wish this book had been available years ago when I first typed "R" at my shell prompt. It would have saved me a lot of pain!
on July 6, 2012
This books main strength is also its greatest weakness, it tries to be too much of everything to everyone. The author obviously is a great R programmer (as he will demonstrate way too much) having a masters degree in CS and teaching R at college. However often he is too clever by half, adding non-relevant examples of overly complex and somewhat confuted code. I think he is doing this more out of love for the language then to show off but the effect is the same, much of the book comes off as disorganized and too complex for the beginner/intermediate R user to be helpful given the topic discussed. I will say that anybody who buys this book will find something to about it to like, so it is a useful addition to any R library.
Iterating the main theme, the book is very desultory. Especially when comparing it to a great book like "R Tutorial and Exercise Solution " by Chi Yau, which is organized properly. In the first few chapters of The Art of R Programming the author will lay out and explain some basic concepts and code examples then in the next page he is showing how to manipulate various data frames with 12-20 lines of complex code. I'm not sure what audience is reading introductory chapters and would find this abstruse and erudite code useful at all given the basic chapter concepts. Also the chapter layout itself seems odd as salient and trivial topics get uneven treatment relative to their important in the real world. As a Engineer and a holder of a CS degree myself, it isn't as if the code is too complex per se, its just too complex and superfluous given the topic discussed.
The author would have been much better served saving the fancy coding to advanced topics in which it would have been more relevant later in the book.
on November 1, 2011
I'm always very wary of books about programming that have titles in the form "The Art of ... Programming", but this book is good despite the title. Matloff is clear and thoughtful writer who takes the reader through their first steps with R (which has a syntax that requires learning as it is nothing like other languages that a regular programmer would have encountered).
I did find, however, the comparisons with C programming annoying in the first part of the book. The author continuously goes on about "if you're a C programmer" and then some comparison to C. I didn't find this helpful (and I am a C programmer) and I think it could have been safely left out. A good example of this is on page 12 where is says "Matrices are indexed using double subscripting, much as in C/C++, although subscripts start from 1 instead of 0." So pretty much not like C/C++. That's a good example of how the C interludes don't help the new reader.
Just occasionally the author gets ahead of himself. Early on in the book he introduces matrices and on page 28 does a matrix addition in the form m + 10:13. He hasn't explained how that addition is going to work.
However, these complaints are pretty minor. The book does a good job of taking you from knowing nothing about R to working with complex programs and data. The chapter on S3 and S4 classes is particularly welcome, but I think it could have been more in depth and earlier in the book. They are an important topic.
Overall this is a very good book to learn R from and has enough depth that the experienced R user will find useful things in the later chapters.
on January 28, 2014
I came to this book knowing next to nothing about R. I'm an experienced programmer, but my knowledge of statistics is not as deep as it should be, and rusty.
The book does a great job at times of explaining how the various R functions work, as well as concepts such as "vectorized" functions. A bit of code is shown, and then there is a lot of explanation that describes what it does, and why. Sometimes, the phrasing could use improvement, and I found myself perhaps struggling to master a concept longer than I should have, but it was enough to get the job done.
Then I got about a quarter of the way through the book and hit an extended example of applying logistic regression. First, the code included a tilde operator, which had not been mentioned anywhere the book before that. Next, it called a function, glm, without explaining what it does, and it showed the results, and said, "Sure enough, we get a 2-by-8 matrix, with the jth column given the pair of estimated B[i] values obtained when we do a logistic regression using the jth explanatory variable."
In effect, the book suddenly shifted from an explain-it-all-as-we-go text to a we-assume-you-know-statistics-as-well-as-exotic-R-operators-and-functions text. I am completely unable to understand this example until and unless I dig into both the related concepts in statistics, and the R-related syntax. I can't blame the book too much for my lack of knowledge in statistics, but I can say that it was careful to provide explanations on some much simpler statistical concepts earlier. As far as the R syntax, I don't think there is any excuse for that. It also turns out that the caret operator in this context is not at all what a programmer would expect it to be--no coverage of that either.
Somewhat later was a very long example on a Discrete Event Simulator. Here, as in so many other places, the author likes cryptic variable names such as rw, evntty, inspt and appin. If you were to study the code long enough, you would eventually understand what all of these meant. But it's sloppy and irritating and makes the job of understanding the code much harder.
Not long after this, he makes a comment on recursion that made me burst out laughing:
"It's fairly abstract. I knew that the graduate student [who had asked him for advice on writing a function], as a fine mathematician, would take to recursion like a fish to water.... But many programmers find it tough."
What I, a mere dim-as-a-20-watt-bulb programmer, find tough, is a plethora of cryptic variable names. Recursion, not so much. I followed his example with ease. Maybe if I were a math graduate student I could understand those variables!
I've also been disappointed with how little attention the book gives to the fundamental differences between some of R's "families" of functions, such as apply, lapply, sapply, and tapply, or lm and glm. There is a brief hand-waving comment and then off we go. This is unfortunate especially since, in my view, the builtin R help is often impenetrable and written more as a technical spec then a clear explanation.
I have pushed on to subsequent chapters, and learned more from the book. But be forewarned that it has a tendency to shift suddenly and without warning from a from-the-ground-up perspective to a we're-all-experienced-R-users perspective.
One other comment, as others have noted here, the publisher really should have included data files so that readers could play along with the examples.
on March 20, 2013
The author provides a decent enough basic overview of commonly used R features and does elaborate on some of the internals and best-practices to create efficient code, but I have a particular peeve against including example code that does not work.
How hard could it be to actually try to run the code before publishing, just to see if it functions without errors? This includes not only the printed examples, but also Matloff's downloadable code.
on November 28, 2011
Anyone seeking to learn R faces two major challenges: (1) learning how to swim in the sea of information: R packages, books, websites, blog posts, message boards etc. that threatens to drown a newbie and (2) and coming to grips with the structure, syntax and features of the language itself. Having some idea of what one wants to do with R is clearly an important first step that will set the path of learning. R, an open source computer language, is the premier software system for statistical computing. Not only can any statistical idea be expressed in R, it is likely that someone in the open source community has already written a function to accomplish or at least facilitate any statistical analysis a working statistician or data scientist might be contemplating.
R functions are organized into libraries or packages that usually relate to some particular statistical task. Assuming something like an average of 20 functions per package, the 3400 available contributed packages offer over 68,000 routines to read in data, manipulate it analyze it and visualize the results. No one could possibly become familiar with all of these. But, because R is an interpreted (instant feedback) language that encourages experimentation, some serious, sophisticated statistical analyses can be accomplished by stringing together the appropriate functions into a script. If interest in R is to only perform some particular analysis then a beginner's best bet might be to select one of 100 or so books or blogs on doing statistics with R that provides relevant sample code and cut and paste to get a workable script. There is no shame in this. That is why all the open source authors went to the trouble of packaging up their work.
However, if a person really wants to be able to speak the R language and become a competent R programmer then, at the present time, one can find no better guide than Norman Matloff's The Art of R Programming. Professor Matloff is a statistician and a computer scientist with a considerable amount of teaching experience. His book is no mere programming reference guide. It is a carefully crafted sequence of lessons that start at the beginning and work up to some fairly advanced topics including a lucid account of object-oriented programming in R, a presentation of the rudiments of TCP/IP operations and a discussion of R programming for the internet, examples of parallel programming with R, and a discussion spanning several chapters of how to write production-level R code that includes methods and advice on debugging R code, writing efficient R code, and interfacing R with other languages. Other distinguishing features of the book are brief examples showcasing a large number of functions (including rare gems such as D() for symbolic differentiation) that indicate the power and scope of R, and over thirty "Extended Examples" each of which is a credible study in writing careful, professional code. The most captivating aspect of the book, however, is Matloff's thoughtful manner of exposition. R's rich, compact syntax can be challenging the first time around. Matloff knows where the difficulties are. His presentations of R's various features and functions begin from a point of view that anticipates obstacles that likely to confound someone going down the R path for the first time and guides the novice around them. I expect that The Art of R Programming will appeal to diverse audience of aspiring R programmers.
on March 18, 2015
Why did this book receive so many 5-star ratings?
Being new to R and having worked through the first five chapters I was struggling with the data files that are referenced in the book. Normally, when learning a new programming language working the examples works fine for me, but for this book it proved a nightmare: 1) does not explain where the data files can be found. 2) After searching the internet, I found a link to "the data files" on the publishers web site, only to be disappointed even more: many files are missing or have different names from the ones used in the book. Some are corrupt and/or contain different values from those shown in the book.
It really made me wonder where all the five star ratings for this book were based on. I cannot belief that these reviewers used the book intensively.
This problem is not new although only few reviewers mention it: if you google "missing data files art of r programming" you will find many other people that encountered the same problem.
A second problem is that the code fragments often have errors that are really hard to solve for beginners. One example being the mount rushmore code on page 65 and another one the code for the words frequency problem on page 98. On the web I found some solutions/corrections by other readers.
Then why did this book earn so many five-star ratings? It probably has to do with the fact that it could be a very good introduction to R, if only the author (and editorial staff at No Starch Press) had payed more attention to detail and had spent some extra work in providing correct data files.
on April 24, 2013
The author jumps the gun in introducing topics. The author presents examples that cover topics that haven't been fully explained. In example 2.9.1, the author uses the lapply function- a function that isn't fully covered until chapter 4. Section 2.9 describes the ifelse function. Why doesn't the author simply provide an example based on topics from chapters 1-2.9? Each successive section should build on the prior. As a student, I find this jumping back and forth discouraging.