on October 5, 2012
David Marr's "Vision" presents a bold computational approach to studying vision. The book redefines lay and technical concepts in a way that set the reader up for a full understanding of his view of the way vision science should be approached and mastered.
I found the beginning of the book especially engaging, with an introduction to what human vision can do and how little we know about it. The rest of the book serves a separate purpose, namely to present an elaborate argument for the computational study of vision and Marr's own hypotheses, and interwoven proofs for both the approach and theories presented. During the introduction to Marr's thought process in Chapter 1, I did not want to put the book down. Chapter 2, on low-level representation, was a little difficult to swallow since I am more interested in the recognition and algorithmic approaches of the brain, but my patience was rewarded with the remaining chapters. The forward and afterward by those that perhaps knew Marr the best are a real treasure as well; too young to have known what an impact Marr was making the field, I appreciate the brief glimpse of this man that I was given through these two supplemental sections.
David Marr's writing style is conversational. He invites the reader to see the study of vision the way he sees it, and he uses first person narrative without hesitation. However, the book would hardly be labeled a 'quick read,' or a 'bedside book.' The technical content was a bit overwhelming at first to me, though the supplemental figures peppered throughout the book helped significantly. Marr's main thoughts were easy to follow because he heavily outlined his work not only in chapters and subheadings, but also lists and bulleted arguments embedded within the chapters. This likely stems from the ideas presented in the book itself: computational vision must be discussed with a list of stated assumptions and acknowledged constraints in order to move forward with a neurologically plausible description of the visual system.
Here is a synopsis of each of the book's chapters:
Chapter 1: The Philosophy and the Approach
This chapter is a great introduction to human vision, computational neuroscience, and Marr's own view of the computational theory behind vision. It lays the foundation for what approaches have worked in the past, the main concepts that we learned from them, and what new approaches will get us more of the information we are looking for. Marr boldly criticizes the way artificial intelligence and neuroscience have been done in the past. The core of his argument is that any explanation of vision will have to account for three levels of understanding: computational theory, representation and algorithms, and hardware implementation. There were two bold claims that stood out in this chapter: (1) "an algorithm is likely to be understood more readily by understanding the nature of the problem being solved than by examining the mechanism" (27); and (2) "the only way to figure out how to detect physical invariants is to treat it as an information-processing problem" (30). These ideas lay the foundation for the approach Marr defends in the following chapters.
Chapter 2: Representing the Image
Marr begins describing a hierarchical representation of an image, involving succeeding 'sketches' of the image and a mathematical description for generating and processing these sketches. He uses an interesting term, 'zero-crossings', that seems to be some sort of hybrid between a receptive field and an image feature. This chapter is not for technically-naïve. It begins to assume readers' knowledge of Fourier analysis and filters at a mathematical level. However, it raises important general points, such as the important image properties: intensity, size, density, orientation, and distance (81). Once we understand how the first sketch, the raw primal sketch, is formed, it can then be operated upon by selection, grouping, and discrimination to form tokens, virtual lines, and boundaries at different scales. This leads into the ideas in the next chapter:
Chapter 3: From Images to Surfaces
This chapter is the starting point for the computer vision, algorithm-hungry readers. Marr walks the reader through current evidence for and open questions in initial tasks in interpreting images. Some of the topics covered include stereopsis (resolving two differing images - right and left eyes - as one representation of the world), motion recognition (directional selectivity, apparent motion, shape from motion), several ways in which objects are formed as unique entities (contours and texture), and more. Marr does not just list current findings or psychophysical experiments supporting the existence of the brain's capabilities in these areas. He also presents many implausible and several potential algorithms that could explain how the brain could do this, or at least implementations that are inspired by the robustness of the human visual system. Many of these algorithms are focused on the conttraints of the physical world and assumptions we make. Marr asserts that in order to develop an effective algorithm, we must first have a deep understanding of the problem at hand and the purpose of the algorithm.
Chapter 4: The Immediate Representation of Visible Surfaces
The fourth chapter explains how the features generated in Chapter 3 are combined into a viewer-centric representation, termed the 2 1/2 -D sketch. The chapter starts off once again with Marr's almost scolding comments about the questions asked in computer vision ('What is an object? How do we segment an image?), and why they either have no answer or are irrelevant to a problem formulated in a much more useful manner. He spends a great deal of time reformulating the problem: going back to the earlier list of image characteristics, asking what else is needed, what the information to be represented is, and what the possible representations could look like.
Chapter 5: Representing Shapes for Recognition
Marr uses his last chapter of new content to explain how these representations come together into both shapes and a culminating 3D model of the scene. This is finally where we get to see the theory culminate into recognition and full world representation problems. It was surprising to see that his 3D representation consisted of recognition of segmented postures of objects that is based on the organization of shape descriptors. He turns the recognition problem into a summation of smaller recognition problems (hand, arm, thick torso), which is what he often avoided doing in previous chapters. While short and a bit different in philosophy from the other chapters in my opinion, it was satisfying to finally reach the high level goal of the system by the end.
Chapters 6, 7: Synopsis, Defense
Marr wraps up the book with a quick three-page synopsis of his theory, complete with a flowchart of how different representations and processes interact. The very final chapter is dialog at the Salk Institute in which he clarifies and rebuttles as a few of his colleagues question his new theories. I found this chapter to be the most interesting, as the questions asked were many of the ones I had wondered while reading, especially since this is a fairly radical way of thinking compared to the electrophysiology and empirical modeling approaches on which so many papers have been and are still based.
I have undoubtedly internalized and continue to stew on the main concepts resounding throughout the book as I pursue vision science in the next chapter of my life (graduate school). There are a few immediate problems with using the book as a general study of computational vision; it is missing key ideas such as attention, natural scene statistics, and learning, all of which play a key role in the purpose and mechanisms of vision. While the book is challenging mathematically, lacking in modern additions to the field, and too narrowly focused to completely explore human vision, this work made a huge impact on my view of and excitement for visual neuroscience. It presents a beautiful hybrid view of how human and computer vision can be mutually beneficial, where algorithmic and computational approaches to the visual system not only help us solve engineering/CS problems, but also inform us about the constraints and impressive talents of the visual system. I am guessing that when I am looking for a rekindling of motivation for vision research or a refresher on the advantages of computational neuroscience, the first book I reach for on my shelf will be Marr's.
Recommendation to Potential Buyers:
If you're thinking at all about studying vision, whether human or computer, get this book, if only to read the first chapter. Even if you don't agree with his theories or opinions, the clarity of thought and personality from Marr alone is enough to make one encouraged about where this field could go, and inspire your own direction of thinking. Readers with a fair amount of math/electrical engineering/algorithms experience will get much more out of the book. I do not feel that I was sufficiently-equipped to understand all of the concepts in the book, but still got a lot out of it, so don't be discouraged by the technical detail. Happy reading!