This book is about three UNIX utilities: grep, sed, and awk. These three utilities enable you to write various terse applications. These UNIX utilities have been around almost as long as the UNIX operating system and yet still are used to solve a variety of tasks even to this day. Grep, awk, and sed can be very useful for file processing; working at the command line, searching within files; working in combination with other UNIX utilities to perform common tasks at the command line; or writing short scripts to solve an application. These are just a few of the tasks for which these utilities can be used. The goal of this book is to introduce and reinforce your expertise in grep, awk, and sed so that you may solve applications that you currently need to implement. In addition, I hope that, by reading this book, when you do have an application to implement, you have enough understanding of the three utilities to consider using them both to solve your application, and to implement a solution. To achieve this goal, the elements of each utility are taught. Each utility's concepts and elements are presented, with detail about the particular syntax, behavior, rules, and nuances of each concept and element. Exercises and exercise discussions then highlight and reinforce the elements and concepts.
All three utilities share common properties. They all can work on standard input, standard output, or user-supplied input files. They may interact with the UNIX environment by utilizing pipes. And all work with regular expressions.How This Book Is Organized
Regular expressions are covered first in this book, because all three utilities require a knowledge of them. If you are not familiar with regular expressions, these chapters will introduce them and go over the various metacharacters that are used with them. Although this coverage is extensive, it is not intended to provide the reader with advanced knowledge on the subject. It is meant for beginners or intermediate users of regular expressions. Whenever covering grep, sed, and awk and using regular expressions, I assume that you have not encountered regular expressions before. If you already have encountered regular expressions, then you may skim over the material or skip it altogether.
The next section describes grep. Grep is a utility that is best suited for searching files; thus, I first describe grep with the use of regular expressions. Next, I cover using grep in conjunction with pipes and other UNIX programs. Finally, I discuss the various options with which you may invoke grep.
The next part of the book discusses sed. Sed addressing is covered, as well as the various sed commands that are available within sed. Sed commands provide more functionality than is available in grep and can be used for a variety of applications. Finally, we will discuss more advanced sed commands, such as the multiline pattern space.
Awk is covered last. Awk, as opposed to grep and sed, can be considered more closely tied to a general-purpose language rather than a utility or specialized language. The reason is that awk exhibits more constructs and features that general-purpose languages exhibit. These include:
The ability to control flow to any part of an awk program
The ability to store values in user-defined variables that reference general storage locations
The ability to perform arithmetic operations
The ability to write functions
The ability to perform output in a user-specified format
Sed and grep either restrict these features or do not include them. Generally, most books take the approach of teaching awk as a utility and teach solely by giving an example utility using an awk feature. The approach taken in this book is to teach awk as a language rather than a utility. Therefore, we will talk about awk data types, variables, built-in functions, arrays, control statements, input and output, and functions. Many pitfalls occur with teaching awk as a utility and providing examples that use awk to solve that particular utility. First, if all you ever need is to solve those particular utilities, then you are set. However, if you need to implement utilities that are different from those provided, you are left figuring out and understanding the language through the examples on your own. A better approach is to teach awk as a language and give examples that reinforce each particular feature of the language, the rules regarding their use, and the various methods for which they may be used. Various benefits exist to taking this approach.
You cannot anticipate most of the ways in which awk will be used. By learning awk as a language and mastering your understanding of its features, you will reach the goal of this book, quickly determining whether awk can be used to solve a problem and implementing a solution.
If you were to learn awk simply as a utility and not as a language, then if you get an error (especially if learning by example utilities), determining what caused the error is hard. If you understand the language, then by going over the erroneous program, you can more easily determine which statement caused the error, because you understand how each language construct is used to make up the statement. A lot of people complain about the awk error messaging system as being difficult to decipher. My belief is that the problem does not reside in awk's error-messaging system-its run-time error-messaging system is more verbose and informative than GNU C/C++-rather, it resides in not learning the awk language.
Languages are based on very similar principles. Understanding one language can greatly enhance your ability to understand other languages. The reason is that most languages share very similar elements and constructs inherent in the language, such as functions, scoping, coercion, call-by-value, and call-by-reference. (We go over each of these in this book.) If you have already learned a general-purpose language such as C, C++, or Java, then to learn awk will be easier. If you haven't learned a general-purpose language, then the approach taken in this book will help you learn new languages as well as understand this one. In awk, as with most languages, certain design issues go into the creation of the language. One important design issue is orthogonality-which simply means that more than one way of expressing an action must exist. The concept of orthogonality means that more than one language construct (i.e., a for loop and while loop can both be used to implement a program) can be used to solve a program. Orthogonality is beneficial to the programmer because it gives the programmer flexibility in finding more efficient, compact, and creative coding solutions to any given problem using the language. The impact of orthogonality does not mean that all things are considered equal. Although more than one construct can be used to solve a problem, one construct is better to use than another construct in certain programs. Truly mastering a language is the ability to understand when one construct is better to use than another because, as mentioned, it is more efficient (uses less storage or is faster than another construct) or more compact (requires fewer lines of code or is easier to read). This requires practice. My hope is that while reading this book and after, you will practice writing awk programs. Also think about more than one solution to a problem, write an awk program that implements all solutions, and then test which solution is better (more readable, shorter, quicker, and taking less storage space).
I cannot emphasize enough that the best way to learn computer languages and utilities is by practicing. Try out on your own sample queries. In addition, I have found that sometimes the best way to learn is by making your own mistakes. Try figuring out where you went wrong, and try entering a query that you suspect might not work. You might be surprised at the result.My Intended Audience
This book is not a book on languages. Although this introduction has mentioned language details and issues, whether you have encountered languages before or taken a course on programming languages is not important. Also, you do not need to have an advanced understanding of the UNIX operating system. This book is on grep, sed, and the awk programming language. The book is suitable for novice users of UNIX who have never encountered grep, sed, and awk. It is also suitable for intermediate or advanced users of UNIX who have used grep, sed, and awk but do not have advanced knowledge. All you need to go through this book is a general knowledge of the UNIX operating system. In particular, knowledge of how to execute programs from the command line is necessary.You'll find a student lounge where you can meet and greet other readers of the Interactive Workbooks and share tips and programs. I have an author's corner, where you can find supplemental material to the book and notes from me about it, errata, and so forth. The answers to the "Test Your Thinking" sections from each chapter of the book have their own module. And additional Self-Review Questions reinforce your understanding of the concepts explored in this book.
Visit the Web site periodically to share and discuss your answers.