From the Inside Flap
This book is intended as an introduction to computer vision for a broad audience. It provides necessary theory and examples for students and practitioners who will work in fields where significant information must be extracted automatically from images. The book should be a useful resource for professionals, a text for both undergraduate and beginning graduate courses, and a resource for enrichment of college or even high school projects. Our goals were to provide a basic set of fundamental concepts and algorithms and also discuss some of the exciting evolving application areas. This book is unique in that it contains chapters on image databases (Chapter 8) and on virtual and augmented reality (Chapter 15), two exciting evolving application areas. A final chapter (Chapter 16) gives a complete view of real-world systems that use computer vision.
Due to recent progress in the computer field, economical and flexible use of computer images is now pervasive. Computing. with images is no longer just for the realm of the sciences, but also for the arts and social sciences and even for hobbyists. The book should serve an established and growing audience including those interested in multimedia; art and design; geographic information systems; and image databases, in addition to the traditional areas of automation, image science, medical imaging, remote sensing, and computer cartography.
A broad purpose at first seems impossible to achieve. However, there are other kinds of texts that already do this in other areascalculus, physics, and general computing. We hope we have made at least a good beginningwe wanted a book that would be useful in the classroom and also to the independent reader. We find the chosen topics interesting and sometimes exciting, and hope that they are accessible to a large audience. It is assumed that use of the text in a graduate, or even senior level, computer vision course would be supplemented by papers from the archival literature. Coverage is not intended to be comprehensive; only a modest set of papers are cited at the end of each chapter.
The early chapters begin at an intuitive level and progress towards mathematical models with the goal of intuitive understanding before formal characterization. Sections marked by an asterisk (*) are more mathematical or more advanced and need not be covered in a less technical course. To strengthen the intuitive approach, we have stayed with the processing of iconic imagery for the first eleven chapters and have delayed 3D computer vision until the later chapters, but it should be easy for experienced instructors to resequence them to fit a particular course or teaching style. There are many viable applications that are entirely 2D, and many concepts and algorithms are more simply taught in their 2D form. We provide some basics of pattern recognition in Chapter 4, so that students can consider complete recognition systems before the full coverage of image features and matching. A reader should have a good idea of 2D image processing applications after Chapter 4; Chapters 5, 6, and 7 add in gray-tone, color, and texture features. Chapter 8 treats image databases, a popular recent topic. Although some colleagues advised us to place this material near the end of the book, our goal of positioning it early in the chapter sequence is to reinforce the concepts of the prior chapters and to provide material that can lead to an excellent half-term project. Segmentation and matching are treated in their 2D forms in Chapters 10 and 11, so that the basic concepts are presented in a simple form, without introducing the complexities of 3D transformations.
Characteristics of the 3D world are briefly introduced in Chapter 2 and then are studied in much more detail in Chapter 12. Chapter 12 surveys qualitatively many aspects of how a 3D world can be perceived from 2D images: It concludes with quantitative models of stereo and study of the thin lens equation for depth-from-focus and resolving power. The transition to 3D computer vision is made in Chapter 13: The authors have found from their own teaching that the difficulty increases abruptly for students at this point. The use of matrices to model homogeneous transformations are included within the chapter rather than in appendices; the 3D versions are extensions of the simpler 2D versions given in Chapter 11. Least-squares fitting, introduced in a simple 2D context in Chapter 11, is also extended in Chapter 13. Non-linear optimization is introduced in a simple P3P context and then used for camera calibration including the modeling of radial distortion in a lens. Chapter 14 treats 3D models and the matching of models to 3D sensed data: it is of mixed difficulty. Chapter 15 discusses applications in virtual and augmented (mixed) reality and the role of computer vision techniques. Programming Language Issue
The book does not rely on any programming language, but uses a generic algorithmic notation. Commitment to a particular language is unnecessary and would be the wrong language for many readers. Students who are programmers should have little trouble implementing the algorithms, as our own students have shown. Examples will eventually be provided on the World Wide Web when appropriate and available, primarily so students can quickly experiment, secondarily so that they can study some sample code.
Several tools and libraries are available to instructors and students; for example, Khoros, NIH-Image, XView, gimp, MATLAB, etc. There are also packages that can be purchased from companies that make machine vision hardware. The authors have decided not to base the text on any specific software because, first, most readers would be using something else, and second, it would be counterproductive to bury the essence of the image operations within the complex framework of data structures and methods needed in an industrial strength system. Having first studied principles in an environment with few variables, the reader will then be better able to successfully choose and use an industrial system. Ways to Use the Text
The book material can be selected, and sometimes sequenced, in different ways according to the goal of the course and interests of the instructor and students.
Chapter 3, with brief summary of Chapter 2
A minimum usage would be 1-3 lectures in a data structures and algorithms course. Chapter 3, with some background from Chapter 2 contains motivational applications and programming exercises on 2D arrays, depth-first search, and the union-find data structure for sets. Chapters 1, 2, and 3, and optionally some of Chapters 4, 5, and 6
This could serve as an enrichment unit of 1 to 3 weeks for high school or lower division undergrads. The objective could be as simple as a term paper or as complex as group work on a program to, say, create a 2D parts recognition system based on connected components and prototype matching of feature vectors. Much of Chapters 1-11
This would be a survey of 2D material for an elective course for students in geography, natural resources or microbiology, for example, provided that many of the optional sections are passed over. If most sections of Chapters 1-11 are covered, this would constitute a semester undergraduate course in image processing and analysis with an introduction to computer vision. Most of the text
This would constitute a semester course in computer vision for the senior or first year graduate student level. There is more material in the book than can be covered well in one semester. Some sections will have to be ignored or surveyed and the reader should not be expected to be able to work homework problems in all sections. For the quarter system, Chapters 1-4, 6-12, and 14 make a good introduction to computer vision for undergraduates. For a one quarter graduate course, Chapters 1-4 can be minimally covered with the emphasis on Chapters 6-14 and a brief coverage of Chapter 15. For any graduate level course, it is expected that some papers from the current literature would also be covered.
We are grateful to our many colleagues, teachers, and students with whom we have shared our interests. They have contributed much to our growing field and shared their work and excitement. Many have generously supported this book with encouragement and with contributions of ideas, figures, and algorithms. Specific citations are given throughout the book. With regret we have left out some important contributionsa text can only be so large. The several reviewers and many colleagues who have given us feedback have significantly improved our work. In particular, for careful editing, we are indebted to Mohammad Ghavamzadeh, Nick Dutta, Kevin Bowyer, Adam Clark, Yu-Yu Chou, Habib Abi-Racked, and Valentin Razmov. We take responsibility for any errors remaining in the book and for providing corrections in the future.
This book was four years in the making. We are indebted to Paul Becker of Addison Wesley-Longman for much guidance in getting the project going and to Tom Robbins of Prentice Hall for finishing it off. We thank Cathy Davison and Lorraine Evans for their persistence in helping to resolve the many cases where permissions needed to be tracked down. We are grateful to Rose Rummel-Eury and Chanda Wakefield of ICC for meticulous editing of our notation and English, and for pushing the schedule. Creating the book was not light work and it certainly helped to have a team with both skill and humor.
From the Back Cover
Scientists and science fiction writers have long been fascinated by the possibility of building intelligent machines and the capability of understanding the visual world is a prerequisite for such a machine. This book speaks to the notable research progress being conducted and brings together the important problem areas where computer vision is already providing solutions. Due to recent progress in the computer field, economical and flexible use of computer images is pervasive. Computing with images is no longer just for the realm of the sciences, but also for the arts and social sciences and even for hobbyists. This book should serve an established and growing audience including those interested in multimedia, art and design, geographic information systems, and image databases, in addition to the traditional areas of automation, image science, medical imaging, remote sensing, and computer cartography.
Computer Vision presents the necessary theory and techniques for students and practitioners who will work in fields where significant information must be extracted automatically from images. It will be a useful resource automatically from images. It will be a useful resource book for professionals and a core text for both undergraduate and beginning graduate computer vision and imaging courses.
- Topics include image databases an virtual and augmented reality in addition to classical topics.
- Offers a complete view of two real-world systems that use computer vision.
- Contains applications from industry, medicine, land use, multimedia, and computer graphics.
- Includes over 250 exercises and programming projects, 48 separately defined algorithms, and 360 figures.
- The companion website features include image archive, sample