- Paperback: 300 pages
- Publisher: Manning Publications (January 15, 2001)
- Language: English
- ISBN-10: 1930110006
- ISBN-13: 978-1930110007
- Product Dimensions: 7.4 x 0.6 x 9.2 inches
- Shipping Weight: 1 pounds (View shipping rates and policies)
- Average Customer Review: 11 customer reviews
- Amazon Best Sellers Rank: #1,813,638 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Data Munging with Perl
Use the Amazon App to scan ISBNs and compare prices.
Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and Amazon Prime.
If you're a seller, Fulfillment by Amazon can help you increase your sales. We invite you to learn more about Fulfillment by Amazon .
See the Best Books of 2018 So Far
Looking for something great to read? Browse our editors' picks for the best books of the year so far in fiction, nonfiction, mysteries, children's books, and much more.
Frequently bought together
Customers who bought this item also bought
" . . . well written, informative, thought provoking . . . will be as relevant five years from now as it is today. . . . buy [one]." -- Dr. Dobbs Journal
"A very good resource for programmers who want to learn more about data parsing, data filters, and data conversion..." -- ACM Computing Reviews
"I found the sample problems and the author's solutions to be very well done. I . . . liked the design tips..." -- Pikes Peak Perl Mongers
"Well worth the price, and a good starting point for more advanced forays." -- Use.Perl.com
the chapters are concise, the coverage is comprehensive, and the examples are plentiful and relevant. -- Web Techniques Magazine
About the Author
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
The book is written for programmers or analysts who transform data as a regular part of their jobs. It assumes a beginning knowledge of Perl programming, as one might gain from reading Learning Perl. Part I introduces data munging as a recurring necessary evil and points out aspects of Perl that recommend it for this task. Part II surveys different types of unstructured and semi-structured data formats and suggests Perl-based strategies for working with them. PART III examines the limitations of simple data formats and discusses parsing strategies and specific techniques for working with HTML, XML and other hierarchical data structures. PART IV extracts some useful lessons from the previous chapters and suggests sources for additional study. The organization is logical and easy to follow.
Cross has written a well-designed book with helpful examples and insights. The accompanying book web site and author web site provide downloadable code and other resources. This book is of course most useful to those working in Perl. But many general concepts and strategies have transferred well to data munging tasks I have done in TextPipe.
One of Perl's mottos is: "There's more than one way to do it." A variety of ways are illustrated and explained in this book. Note that it is over ten years old and does not include the latest evolutions of the Perl language.
The book begins by revising some of those basic Perl practices that come in handy for scripting, e.g. command line options, regular expressions and sorting. The second part of the book deals with parsing fairly simple data: traditional fixed-width record data (e.g. the column-based stuff that you often find as the output of old Fortran and C programs), unstructured data (e.g. doing word counts on text files), and formats such as CSV, PNG and MP3. This is the strongest section of the book, and contains lots of useful hands-on information.
The third part of the book deals with more modern forms of data files, in the shape of XML. Parsing HTML also gets a chapter to itself, after the author usefully demonstrates the limitations of any simple solution (e.g. using regexes), which provides pretty strong evidence in favour of the standard 'don't try it yourself, use a CPAN module' argument. The XML chapter itself covers the XML::Parser module in reasonable detail. However, there are now many more XML parsers in Perl out there, and XML::Parser is probably no longer the best solution (Grant McClean's Perl XML FAQ on the net has a good overview of the options). Excluding the seemingly obligatory 'here's a bunch of books and websites to learn more' chapter, the last proper chapter is on parsing, and the Rec::Descent module, and it's a very good gentle introduction.
If you're not working in a command line environment, there's not a whole lot here you're going to need. Equally, if you've been doing this sort of thing for a while, there's not much here that will be new to you, not all the subjects are explored in any great depth. And some of it (particularly the XML chapter) is a bit outdated and superficial, so I would knock off a star from my rating if you're more interested in the XML/HTML chapters.
But for the simpler tasks, e.g. parsing column based data, this is recommended. You're shown all the handy tricks you need such as piping, taking input from standard in as well as files, slurping paragraphs etc. My 4-star rating applies if this sounds like what you need: it's a clear, short and to-the-point book, which is definitely taking with you on your first journey into data munging.
The author gives you enough information, and background to start working with the more advanced Perl functions like map, grep, pack, unpack, etc. It is possible to write Perl without ever having to use these modules, but David Cross shows you how they are more effective, more powerful. This book will expand your Perl vocabulary by leaps and bounds.
I know that some people would say that the book is too thin, and it is thinner than many computer books today, but the thickness of a book does not determine it's merit. Effective Perl Programming by Joseph Hall and Randal Schwartz is often cited as one of the best Perl books ever and it's thinner than this one.
If you are a junior to intermediate level programmer, and you want to improve your Perl skills, pick up this book. You won't be disappointed.