Pattern Recognition and Machine Learning (Information Science and Statistics)
| Christopher M. Bishop (Author) Find all the books, read about the author, and more. See search results for this author |
Use the Amazon App to scan ISBNs and compare prices.
- FREE return shipping at the end of the semester.
- Access codes and supplements are not guaranteed with rentals.
Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and .
If you're a seller, Fulfillment by Amazon can help you grow your business. Learn more about the program.
This is the first textbook on pattern recognition to present the Bayesian viewpoint. The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.
Frequently bought together
Customers who viewed this item also viewed
Editorial Reviews
Review
From the reviews:
"This beautifully produced book is intended for advanced undergraduates, PhD students, and researchers and practitioners, primarily in the machine learning or allied areas...A strong feature is the use of geometric illustration and intuition...This is an impressive and interesting book that might form the basis of several advanced statistics courses. It would be a good choice for a reading group." John Maindonald for the Journal of Statistical Software
"In this book, aimed at senior undergraduates or beginning graduate students, Bishop provides an authoritative presentation of many of the statistical techniques that have come to be considered part of ‘pattern recognition’ or ‘machine learning’. … This book will serve as an excellent reference. … With its coherent viewpoint, accurate and extensive coverage, and generally good explanations, Bishop’s book is a useful introduction … and a valuable reference for the principle techniques used in these fields." (Radford M. Neal, Technometrics, Vol. 49 (3), August, 2007)
"This book appears in the Information Science and Statistics Series commissioned by the publishers. … The book appears to have been designed for course teaching, but obviously contains material that readers interested in self-study can use. It is certainly structured for easy use. … For course teachers there is ample backing which includes some 400 exercises. … it does contain important material which can be easily followed without the reader being confined to a pre-determined course of study." (W. R. Howard, Kybernetes, Vol. 36 (2), 2007)
"Bishop (Microsoft Research, UK) has prepared a marvelous book that provides a comprehensive, 700-page introduction to the fields of pattern recognition and machine learning. Aimed at advanced undergraduates and first-year graduate students, as well as researchers and practitioners, the book assumes knowledge of multivariate calculus and linear algebra … . Summing Up: Highly recommended. Upper-division undergraduates through professionals." (C. Tappert, CHOICE, Vol. 44 (9), May, 2007)
"The book is structured into 14 main parts and 5 appendices. … The book is aimed at PhD students, researchers and practitioners. It is well-suited for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bio-informatics. Extensive support is provided for course instructors, including more than 400 exercises, lecture slides and a great deal of additional material available at the book’s web site … ." (Ingmar Randvee, Zentralblatt MATH, Vol. 1107 (9), 2007)
"This new textbook by C. M. Bishop is a brilliant extension of his former book ‘Neural Networks for Pattern Recognition’. It is written for graduate students or scientists doing interdisciplinary work in related fields. … In summary, this textbook is an excellent introduction to classical pattern recognition and machine learning (in the sense of parameter estimation). A large number of very instructive illustrations adds to this value." (H. G. Feichtinger, Monatshefte für Mathematik, Vol. 151 (3), 2007)
"Author aims this text at advanced undergraduates, beginning graduate students, and researchers new to machine learning and pattern recognition. … Pattern Recognition and Machine Learning provides excellent intuitive descriptions and appropriate-level technical details on modern pattern recognition and machine learning. It can be used to teach a course or for self-study, as well as for a reference. … I strongly recommend it for the intended audience and note that Neal (2007) also has given this text a strong review to complement its strong sales record." (Thomas Burr, Journal of the American Statistical Association, Vol. 103 (482), June, 2008)
"This accessible monograph seeks to provide a comprehensive introduction to the fields of pattern recognition and machine learning. It presents a unified treatment of well-known statistical pattern recognition techniques. … The book can be used by advanced undergraduates and graduate students … . The illustrative examples and exercises proposed at the end of each chapter are welcome … . The book, which provides several new views, developments and results, is appropriate for both researchers and students who work in machine learning … ." (L. State, ACM Computing Reviews, October, 2008)
"Chris Bishop’s … technical exposition that is at once lucid and mathematically rigorous. … In more than 700 pages of clear, copiously illustrated text, he develops a common statistical framework that encompasses … machine learning. … it is a textbook, with a wide range of exercises, instructions to tutors on where to go for full solutions, and the color illustrations that have become obligatory in undergraduate texts. … its clarity and comprehensiveness will make it a favorite desktop companion for practicing data analysts." (H. Van Dyke Parunak, ACM Computing Reviews, Vol. 49 (3), March, 2008)
From the Back Cover
The dramatic growth in practical applications for machine learning over the last ten years has been accompanied by many important developments in the underlying algorithms and techniques. For example, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic techniques. The practical applicability of Bayesian methods has been greatly enhanced by the development of a range of approximate inference algorithms such as variational Bayes and expectation propagation, while new models based on kernels have had a significant impact on both algorithms and applications.
This completely new textbook reflects these recent developments while providing a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.
The book is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. Extensive support is provided for course instructors, including more than 400 exercises, graded according to difficulty. Example solutions for a subset of the exercises are available from the book web site, while solutions for the remainder can be obtained by instructors from the publisher. The book is supported by a great deal of additional material, and the reader is encouraged to visit the book web site for the latest information.
Christopher M. Bishop is Deputy Director of Microsoft Research Cambridge, and holds a Chair in Computer Science at the University of Edinburgh. He is a Fellow of Darwin College Cambridge, a Fellow of the Royal Academy of Engineering, and a Fellow of the Royal Society of Edinburgh. His previous textbook "Neural Networks for Pattern Recognition" has been widely adopted.
Coming soon:
*For students, worked solutions to a subset of exercises available on a public web site (for exercises marked "www" in the text)
*For instructors, worked solutions to remaining exercises from the Springer web site
*Lecture slides to accompany each chapter
*Data sets available for download
About the Author
I'd like to read this book on Kindle
Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Product details
- Publisher : Springer (August 17, 2006)
- Language : English
- Hardcover : 738 pages
- ISBN-10 : 0387310738
- ISBN-13 : 978-0387310732
- Item Weight : 4.73 pounds
- Dimensions : 7.7 x 1.3 x 10.2 inches
- Best Sellers Rank: #20,671 in Books (See Top 100 in Books)
- Customer Reviews:
About the author

Christopher Michael Bishop (born 7 April 1959) FREng, FRSE, is the Laboratory Director at Microsoft Research Cambridge and professor of Computer Science at the University of Edinburgh and a Fellow of Darwin College, Cambridge. Chris obtained a Bachelor of Arts degree in Physics from St Catherine's College, Oxford, and a PhD in Theoretical Physics from the University of Edinburgh, with a thesis on quantum field theory.
Bio from Wikipedia, the free encyclopedia. Photo by MSRCambridge (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons.
Customer reviews
Top reviews from the United States
There was a problem filtering reviews right now. Please try again later.
In my opinion, despite the recent publication of Kevin Murphy's very comprehensive ML book, Bishop is still a better read. This is mostly because of his incredible clarity, but the book has other virtues: best in class diagrams, judiciously chosen; a lot of material, very well organized; excellent stage setting (the first two chapters). Now, sometimes he's a bit cryptic, for example, the proof that various kinds of loss lead to conditional median or mode is left as an exercise (ex 1.27). Murphy actually discusses it in some detail. This is true in general: Murphy actually discusses many things that Bishop leaves to the reader. I thought chapters three and four could have been more detailed, but I really have no other complaints.
Please note that in order to get an optimal amount out of reading this book you should already have a little background in linear algebra, probability, calculus, and preferably some statistics. The first time I approached it was without any background and I found it a bit unfriendly and difficult; this is no fault of the book, however. Still, you don't need that much, just the basics.
Update: I should note that there are some puzzling omissions from this book. E.g. f-score & confusion matrices are not mentioned (see Murphy section 5.7.2) - it would have been very natural to mention these concepts in ch 1, along with decision theory. Nor is there much on clustering, except for K-means (see Murphy ch 25). Not a huge deal, it's easy to get these concepts from elsewhere. I recommend using Murphy as and when you need, to fill in gaps.
One more update: I've been getting into Hastie et al's ESL recently, and I'm really impressed with it so far - I think the practitioner should probably get familiar with both ESL and PRML, as they have complementary strengths and weaknesses. ESL is not very Bayesian at all; PRML is relentlessly so. ESL does not use graphical models or latent variables as a unifying perspective; PRML does. ESL is better on frequentist model selection, including cross-validation (ch 7). I think PRML is better for graphical models, Bayesian methods, and latent variables (which correspond to chs 8-13) and ESL better on linear models and density based methods (and other stuff besides). Finally, ESL is way better on "local" models, like kernel regression & loess. Your mileage may vary...They are both excellent books. ESL seems a bit more mathematically dense than PRML, and is also better for people who are in industry as versus academia (I was in the latter but now in the former),
One strong point are the color figures. I will say, those are very helpful. But they don't quite make up for the terse prose and large conceptual jumps. I also find it strange that sometimes simply algebraic steps are explained while other much more complex ideas are glossed over. Anyway, 3 stars.
Pros:
- not mathematically heavy; lots of good heuristics that capture the math without delving too far in
- choice of topics and their discussion (e.g. a great place to learn about kernel methods and graphical models)
- easy to get hooked on if you mind the gaps
Cons:
- read below...
While the exposition is spotty (compare e.g. with Feller or Gelman), the author manages to follow a mostly linear exposition on fascinating topics.
The book would highly benefit from editing provided by someone with a solid math background. In particular, there are more good mistakes than bad mistakes... Often when speaking with people with more stats background than me, the conversation is isomorphic to:
Me: "...therefore, this statement is wrong. I think what he meant was..."
Bro: "Ah yes. But you get it, that's what he meant"
Me: "Then why didn't he write it?"
But at least dialogues like these help cement ideas...
Please correct me if any of the following contentions are wrong (I may update as I continue to read):
Some parts are not even wrong, for example:
Sec 2.1, paragraph above Eq. 2.19
"We see that this sequential approach to learning arises naturally when we adopt a Bayesian viewpoint. It is independent of the choice of prior and of the likelihood functions and depends only on the assumption of i.i.d data"
First, if you follow the thread of this section and therefore go back to the contrived coin-flipping example, you would see that in the non-Bayesian point of view estimates are also updated in a sequence of experiments. Hence, a Bayesian point of view in this case is no more "natural" than a frequentist. Second, by definition of i.i.d., a single fixed distribution is postulated to exist, and therefore a prior is in fact chosen: how do you define a posterior without a prior? But ok, I think I get it: a sequential approach fits in nicely with the Bayesian point of view - I agree, and that's all that needs to be said.
Mathematically wrong:
Same section, following 2.3.5, the statement(s) following Eq. (2.20)
"Note that in the limit of an infinitely large data set m, l -> infinity the result (2.20) reduces to the maximum likelihood (2.8)."
First, if F is a function of x, then the resulting limit for x to infinity must not involve x. Plus, the order and/or direction of his m and l in the limit is ambiguous. Second, what he meant to say is that for m and l both sufficiently large compared with a and b, we get that (2.20) reduces to (2.8).
3rd paragraph before 2.2:
"For a finite data set, the posterior mean for mu always lies between the prior mean and the maximum likelihood estimate for mu corresponding to the relative frequencies of events given by (2.7)."
Again we are told to forget that the choice of a prior makes a difference. It seems the above statement is false: we may choose a prior that is heavily weighted on a single point so that this prior's mean is greater than mle.
Paragraphs directly above the beginning of 2.2:
"In fact, we might wonder whether it is a general property of Bayesian learning that, as we observe more and more data, the uncertainty represented by the posterior distribution will steadily decrease..."
and then
" this result shows that, on average, the posterior variance of theta is smaller than the prior variance."
The "result", i.e. Eq (2.24) is an assertion of the form: "Suppose a,b, c > 0, c is fixed, and c = a +b. Then, if b goes up, a must go down." I don't see how this relates to what seemed to be his premise that increasing the size of a data set (sequentially or not) has the seemingly desired effect of reducing posterior variance. I suspect there are in fact limiting results in special cases that show the desired "steady" reduction in posterior variance...I wish he would have referenced them
Section 2.3, following Eq. 2.44
"... we note that the matrix \Sigma can be taken to be symmetric..."
Actually, by definition any covariance matrix *is* symmetric.
I could go on...
All this said, it's worth repeating:
I like the book, and not only because its mistakes or sometimes shady logic encourage the interested reader to try and discover correct/less wrong statements.
Top reviews from other countries
If you are after a practitioners guide, or a first ML book for self study, this probably isn't ideal. It assumes significant familiarity with multivariate calculus, probability and basic stats (identities, moments, regression, MLE etc.). The pitch is probably early post-graduate level, but with a few stretching parts. If this is your background, I think it's a better first ML book than MacKay (Information Theory ...), Murphy (Machine Learning ...), or Hastie et al. (Elements of Statistical Learning), due to its coherence of topics and consistency of depth. But those books are all excellent in their own ways. However, Barber (Bayesian Reasoning ...) is a good alternative.
Most chapters are fairly self contained, so once you've worked your way through the first couple of chapters, you can skip around as required. A particular highlight for me were the chapters on EM and variational methods (ch 9 & 10); I think you'd be hard pressed to find a better explanation of either of them. Finally, worth pointing out it's unrepentantly Bayesian, and lacking some subtelty which may be grating for seasoned statisticians. Nevertheless, if the above sounds like what you're looking for, this is probably a good choice.








