- Use promo code PRIMEBOOKS18 to save $5.00 when you spend $20.00 or more on Books offered by Amazon.com. Enter code PRIMEBOOKS18 at checkout. Here's how (restrictions apply)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Parsing Techniques: A Practical Guide (Monographs in Computer Science) 2nd Edition
Use the Amazon App to scan ISBNs and compare prices.
See the Best Books of 2018 So Far
Looking for something great to read? Browse our editors' picks for the best books of the year so far in fiction, nonfiction, mysteries, children's books, and much more.
Special offers and product promotions
From the Back Cover
Parsing, also referred to as syntax analysis, has been and continues to be an essential part of computer science and linguistics. Today, parsing is also applied in other disciplines; some examples are document preparation and conversion, chemical formulae typesetting, and chromosome recognition.
In addition to the traditional parsing techniques, this second edition presents new developments and discoveries: generalized deterministic parsing, linear-time substring parsing, parallel parsing, parsing as intersection, non-canonical methods, non-Chomsky systems, and many more.
Parsing techniques provide a solid basis for compiler construction and linguistics, and contribute to all existing software: they enable Web browsers to analyze HTML pages and PostScript printers to analyze PostScript, and some of the more advanced techniques are used in code generation in compilers and in data compression. Also their importance as general pattern recognizers is slowly being acknowledged.
To provide readers with low-threshold access to the full field of parsing techniques, this book uses a two-tiered structure. The basic ideas behind the existing parsing techniques are explained in an intuitive and narrative style, starting from the first principles of data structures and algorithms; this provides breadth and accessibility. The hundreds of realizations and improvements of these basic ideas are explained in an extensive annotated bibliography, in a much terser, yet still informal style; this provides depth.
The reader should have an understanding of algorithmic thinking, especially recursion; however, knowledge of any particular programming language is not required.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
This is a pity, since parsing algorithms have a much wider applicability than just interpreting source code. It is usual that, when confronted with a parsing problem, one tends to write a specialized ad-hock parser from scratch every time. This is not only time consuming, but is also error prone, not easily extendable and leads to code that is hard to maintain and debug. If some part of the input changes, one often has to make non-trivial changes to the original code in multiple places.
Instead, one has to realize that there are three major parts to any parsing problem, only two of which are domain specific. Tokenization (or lexer) is the first step, where the input is broken into independent tokens, possibly with attached values. Tokenization helps to abstract your input into units of data that can't be broken any further. This part is dependent on the problem, and there are no general algorithms that can solve it. The second part is writing the grammar for your input. The grammar defines the structure or the syntax on the incoming token stream - what combinations of tokens are allowed to appear. It is also domain specific, but no code has to be written in this step. Finally, the third step is to apply a general parsing algorithm to your grammar and token stream. Instead of implementing the algorithm from scratch, the user just has to define a couple of semantic actions to execute for each production.
When one writes ad-hock parsing code for each problem from scratch, all three steps - tokenization, syntactic analysis, and semantic analysis - are conflated into one large piece of code, usually in an obfuscated way, that is hard to debug and maintain. On the other hand, when the three steps are clearly separated, there is less domain specific code to write. The code becomes clearer and easier to maintain. Furthermore, it now becomes much more extendable by just a few tweaks of the grammar or the tokenizer, and is much easier to share between different parser modules. The actual structural analysis of the input, which is usually the trickiest to write and is a source of major bugs, is now offloaded to a well known algorithm that has been thoroughly tested and is shared between various projects.
The ability to break the problem this way, and choosing the right parsing algorithm that is fast, but at the same time general enough to handle your grammar requires knowledge of various parsing techniques that are usually not taught in undergraduate CS curriculum.
This is where this book comes in. It provides an accessible, non-technical introduction to parsing, in sufficient detail for one to understand and use these formal parsing techniques in everyday projects.
In it's full generality, parsing can be viewed as a graph search problem, where the nodes are the sentential forms (streams of terminals and non-terminals), and the edges are productions for each non-terminal of a node. This is a useful abstraction, but quickly becomes impractical since in general the search requires exponential time. To deal with this, various methods and heuristics have been developed over the past 50 years that speed up the process, but at the same time narrow the set of languages that can be accepted by them.
Grune and Jacobs book starts out with this general view, and then narrows down to more specialized and less general but faster methods. They divide parsing algorithms into a few general categories: top-down vs bottom-up, directional (on-line) vs non-directional (where whole input must be known ahead of time), and deterministic vs non-deterministic (where back tracking is necessary). They devote a chapter on each combination of these categories. They start out with the general exponential time depth first and breadth first back-tracking parsers, followed by CYK and Earley parsers that can parse any context free grammar in polynomial time, and then go on to linear time LL, LR, and LALR methods that are used in compilers. Along the way they cover regular languages and their parsers which are used to implement regular expressions, as well as generalized LL and LR techniques, that can parse any context free grammar but still retain the speed of LL and LR parsers for most practical grammars. There are also chapters on more esoteric topics that might be of interest only to select few. They include non-canonical parsers, parsers as intersection of context free and regular languages, and context sensitive grammars. On the other hand, there are also chapters on the applied aspects of parsers: error handling, parallel parsing, and practical parser writing.
The language of the book is mostly non-technical. It seems like there has been great effort made to keep CS jargon to the minimum and to explain everything in simple terms. Most of the concepts are presented as examples first, followed by general description. There is almost no pseudo code, most of the algorithms are described in plain English. The authors do a surprisingly good job at that, leaving few ambiguities and at the same time not bogging down the readers with unnecessary details. This makes the book suitable for gentle night time reading, without a need for paper and pencil at hand. The material is presented in sufficient detail for basic implementation of most of the presented algorithms, and there are plenty of references to consult for extensions.
I really liked the first half of the book. It was clear and easy to understand. However things do speed up considerably in the second half.
Explanations become terser, there are fewer details, and sometimes it requires several re-reading of each section to fully absorb the material. I think the authors wanted to cram in more material towards the end and had to make some sacrifices in presentation.
There are some additional things that I think could have been done better. Sometimes the authors use terms that haven't been fully defined yet and only define them a few sections further. This makes it a little hard to read, since you're constantly wondering if you have somehow missed the definition earlier. I also wish that more connections to graph theory were presented. For example, there are lots of algorithms that could be viewed as a graph search. The authors do allude to that but never mention it explicitly. I wish that they would have stated clearly what the nodes and edges of those graphs are.
Some of their examples are too simplistic and may lead the reader to believe that the algorithms are simpler than they really are. For example, the authors spend only two paragraphs on parse tree construction from the Earley parser and their example makes it look like the algorithm is deterministic. In reality tree construction for the Earley parser requires back tracking and I had to consult outside sources to understand that. In addition, I wish the authors have spent more time giving examples of the LALR(1) parser. This is a very widely used method (it's used in Unix's yacc), but it's limitation are not well understood by beginners. It's difficult for someone new to it to understand why their grammars produce conflicts and are not parsable using LALR, and I wish the authors spent more time on explaining how to detect those conflicts in various grammars.
Overall, I think this is a great overview of the parsing landscape, with enough detail to understand, use, and even implement some of the methods. Every serious programmer should be aware that these techniques and algorithms are available to them, and be ready to at least consider them as a possibility the next time they're thinking of writing a parser by hand. Should the reader be interested in implementing their own parsing algorithm, there is a whole chapter devoted to bibliography, with a short description for each reference. Equipped with the background provided by this book the reader can go on to explore the various books or papers on parsing and should have no trouble understanding their content.
Note that the book covers many other algorithms and parser types, and I'm sure readers interested in them would find excellent coverage. This is one of the nice things about this book - you can pick up a chapter covering your particular area, and read it, without need to read the entire book from start to end. It works as an excellent detailed reference on wide variety of parsing techniques.
No other book approaches the clarity and comprehensiveness of this book.
When you try to read most literature about parsing, authors tend to throw around a lot of terms without explaining them. What exactly is a "deterministic" parser, a "canonical" parser, a "directional" parser? Grune and Jacobs explain every one of these distinctions lucidly, and put all known algorithms in context of how they compare to the rest of the field. How do the algorithms compare in what languages they can parse, how fast they are, and how much of the work can be done ahead of time? The book addresses all of these trade-offs, but doesn't stop at asymptotic complexity: in chapter 17 (the comparative survey), they note that general parsers may be a factor of ten or so slower than deterministic methods, even though both are linear. This high-level overview and comparative survey are something I was desperately seeking, and I've found nothing comparable to them anywhere.
There is also a lot of important background information that other authors tend to assume you know: for example, did you know that when authors say "LL" they almost always mean "strong LL" unless they specifically say "full LL?" Are you totally clear on the difference between strong LL, simple LL, and full LL? If you're not sure, Grune and Jacobs will give you all the explanation you need to fully understand.
This book strikes a perfect balance between breadth and depth. All significant algorithms are covered, most with enough detail to fully understand and implement them, but Grune and Jacobs punt on less practical material like proofs or rigorous formal descriptions. That information is never more than a citation away though, thanks to the 417-entry annotated bibliography, which gives you not only references to source material but a paragraph or two describing their key results.
I couldn't be happier about adding this book to my bookshelf of compiler books -- it quickly became the book I refer to most often, and I thank Grune and Jacobs for this superb guide to this vast and diverse field of computer science.
Most recent customer reviews
I have been most fortunate lately to have stumbled upon two such books: