Programming Books C Java PHP Python Learn more Browse Programming Books

Sorry, this item is not available in
Image not available for
Image not available

To view this video download Flash Player


Sign in to turn on 1-Click ordering
Sell Us Your Item
For a $4.88 Gift Card
Trade in
More Buying Choices
Have one to sell? Sell yours here
Tell the Publisher!
I'd like to read this book on Kindle

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Text Processing in Python [Paperback]

David Mertz
4.4 out of 5 stars  See all reviews (19 customer reviews)

List Price: $54.99
Price: $41.79 & FREE Shipping. Details
You Save: $13.20 (24%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Only 5 left in stock (more on the way).
Ships from and sold by Gift-wrap available.
Want it tomorrow, July 11? Choose One-Day Shipping at checkout. Details
Free Two-Day Shipping for College Students with Amazon Student


Amazon Price New from Used from
Hardcover --  
Paperback $41.79  
Shop the new
New! Introducing the, a hub for Software Developers and Architects, Networking Administrators, TPMs, and other technology professionals to find highly-rated and highly-relevant career resources. Shop books on programming and big data, or read this week's blog posts by authors and thought-leaders in the tech industry. > Shop now

Book Description

June 12, 2003 0321112547 978-0321112545 1
Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.

Frequently Bought Together

Text Processing in Python + Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Price for both: $66.55

Buy the selected items together

Editorial Reviews

From the Back Cover

Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.

Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site ( contains source code and examples from the book.

Here is some of what you will find in thie book:

  • When do I use formal parsers to process structured and semi-structured data? Page 257
  • How do I work with full text indexing? Page 199
  • What patterns in text can be expressed using regular expressions? Page 204
  • How do I find a URL or an email address in text? Page 228
  • How do I process a report with a concrete state machine? Page 274
  • How do I parse, create, and manipulate internet formats? Page 345
  • How do I handle lossless and lossy compression? Page 454
  • How do I find codepoints in Unicode? Page 465


About the Author

David Mertz came to writing about programming via the unlikely route of first being a humanities professor. Along the way, he was a senior software developer, and now runs his own development company, Gnosis Software ("We know stuff!"). David writes regular columns and articles for IBM developerWorks, Intel Developer Network, O'Reilly ONLamp, and other publications.


Product Details

  • Paperback: 544 pages
  • Publisher: Addison-Wesley Professional; 1 edition (June 12, 2003)
  • Language: English
  • ISBN-10: 0321112547
  • ISBN-13: 978-0321112545
  • Product Dimensions: 9.4 x 6.7 x 1.2 inches
  • Shipping Weight: 1.9 pounds (View shipping rates and policies)
  • Average Customer Review: 4.4 out of 5 stars  See all reviews (19 customer reviews)
  • Amazon Best Sellers Rank: #910,571 in Books (See Top 100 in Books)

More About the Author

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

Most Helpful Customer Reviews
49 of 51 people found the following review helpful
Text Processing in Python, by David Mertz, 2003, Addison Wesley, 520 pages.
If you have read an introductory book or two about programming, but you are far from being an expert, then you will benefit a lot from reading this book. If you are a competent programmer in any other language, you will benefit from this book. If you are an expert Python programmer, you will also benefit from this book.
For, as you know, there are many good introductory texts about Python. This is not one of them, for this is an advanced book, but not an inaccessible one. David Mertz has a unique style and focus that we have become familiar with from his "Charming Python" series of articles on the IBM Developer Network. Dr. Mertz is more interested in facilitating our learning process than in lecturing us, and rather than fill his pages with impressive examples designed to illustrate his expertise, he gently guides us by offering subtle yet important examples of code and analysis that makes us think for ourselves.
He has a special talent for programming in the functional style, and this is a great introduction to that style of Python programming. Thus, this is also a good guide to using the newer features introduced into Python in the last few revisions, which often facilitate the functional style of programming.
The text includes, in an appendix, a 40 page tutorial covering the basic Python language. This tutorial is, like the book, unique in its approach and is worthwhile even for experienced Pythonistas, as it sheds light on some of the underlying ideas behind the syntax and semantics, and it also illustrates the functional style of programming, which is sometimes quite useful when doing text processing. And, despite its many other virtues, this is a book about text processing.
Read more ›
Comment | 
Was this review helpful to you?
41 of 44 people found the following review helpful
5.0 out of 5 stars A beautiful book September 2, 2003
Format:Paperback|Verified Purchase
Yes, I mean it: this is a beautiful book. If your aesthetic sensibilities have been informed, directly or indirectly, by Kernighan and Ritchie's influential book on C, you'll know what I mean.
I've been programming computers in various capacities since I was in my early teens (the mid-1970s) and I've been through a number of languages. Not long ago I discovered Python, and I suspect I won't need to learn any other languages for quite a long time. Guido van Rossum is a wizard.
If you're interested in learning Python, don't start here. If you've got some programming background already, Guido's tutorial (which comes bundled with the Python download) will be enough to get you rolling. I personally recommend all of O'Reilly's books on the subject (_Learning Python_ for the absolute beginner, Mark Lutz's idiosyncratic but highly useful _Programming Python_ for the next level up, the magisterial _Python Cookbook_ for pretty much anybody, and the _Nutshell_ book to be placed permanently next to your keyboard). There are others as well, and after you've gotten started, you'll be a better judge than I am of what will be most useful to you. (But I'd skip the vastly overpriced and not-very-deep _Python Programming Patterns_ unless you can buy it used.)
This one's for later; although it does offer some beginning instruction in Python, it isn't really an introductory book. However, if you do any text processing with Python -- which you almost undoubtedly do if you use Python at all -- then you _do_ want this book even if you don't know it yet.
Most of what you'll want to know is in chapter two, which sets out the basics of string processing in Python.
Read more ›
Comment | 
Was this review helpful to you?
22 of 24 people found the following review helpful
4.0 out of 5 stars Round the world tour of string processing for Pythoneers February 28, 2004
This is the only book that really attacks the issue of string processing using Python. Unfortunately it didn't attack the text processing problems that I wanted discussed.
Also, in the area of Regular Expressions the examples didn't directly use the Python library, instead a wrap around function was used for the many examples and that detracted from using the book as a reference book for this purpose.
I found that Python has several different ways to do string processing. Also, some of those ways come up with conflicting results. At the time of this writing the authors of Python are re-organizing and improving this area.
What is truly great about the book is the discussion of state machines, parsers, and functional programming. Although these topics detract from the focus on string processing somewhat this book is perhaps the only popular Python book out there that does these topics justice. I thought they were very well written.
My overall complaint is that this book includes too many things outside of text processing using the core Python language. But other readers may appreciate this aspect more than I did. If you want coverage on handling email specifically, the author covers that. Same with HTML processing and other specialized topics. I just wanted to low down on using the full string processing capabilities of the core Python language -- not necessarily all the specialized libraries.
I found string processing to be messy with Python but found Ruby to be much easier. That is perhaps because Ruby is a newer language and it has some features of Perl built in. Ruby however does not have the extent of libraries available like Python, nor does it have as nice of Windows GUI.
Read more ›
Comment | 
Was this review helpful to you?
Most Recent Customer Reviews
5.0 out of 5 stars Illustrative book
Text processing is a fascinating field in computer science with numerous algorithms. This area is very attractive to researchers. Read more
Published on February 24, 2012 by rpv
4.0 out of 5 stars A very good intro book
This book is interesting, the field it covers is not one with many texts, so it's hard to do comparative analysis. Read more
Published on December 18, 2007 by Sean Fritz
5.0 out of 5 stars Instant Classic
TPIP is an instant classic in that all you need to do is add a solid understanding of python and you can instantly appreciate its classic nature. Read more
Published on August 22, 2007 by James Stroud
3.0 out of 5 stars Valuable information poorly presented
There is a lot of good stuff in this book, but the presentation is lousy.

The first chapter dives into functional programming using obscure and terse high order... Read more
Published on April 10, 2007 by Dale Wilson
5.0 out of 5 stars And now for something different...
I'd second most of the positive statements given by other reviewers. To boot - the author's voice is clear and pleasant. Read more
Published on September 22, 2005 by R. Dlugy-Hegwer
5.0 out of 5 stars If you need to process text, this is a great book!
This book is not for everyone, but for "text processing", I know of nothing else that comes close; this book merits careful study. Read more
Published on August 1, 2005 by Elizabeth H. Papageorge
5.0 out of 5 stars Very good, somewhat advanced
This book is not for novice programmers. However, if you are an reasonably experienced programmer in Python, or any other language for that matter, this book will serve you very... Read more
Published on December 30, 2004 by Amazon Customer
4.0 out of 5 stars What is this book about?
In any booksotre you'd probably find hundreds of titles of "Web programming in XXX language". What about "Text processing in XXX language"? Read more
Published on August 1, 2004 by Waiyip Tung
5.0 out of 5 stars More author clarification
Added note: The review by phrodod was quite nice, IMO. One little thing: s/he mentions my little re_show() utility that I use in the regex tutorial. Read more
Published on June 9, 2004 by Amazon Customer
5.0 out of 5 stars Great Advanced Python Book
This book covers many of the details of processing text files to extract and/or generate more textual information from them. Read more
Published on May 8, 2004 by Phillip David
Search Customer Reviews
Search these reviews only

What Other Items Do Customers Buy After Viewing This Item?

Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more


There are no discussions about this product yet.
Be the first to discuss this product with the community.
Start a new discussion
First post:
Prompts for sign-in

Look for Similar Items by Category