Customer Reviews


14 Reviews
5 star:
 (8)
4 star:
 (5)
3 star:
 (1)
2 star:    (0)
1 star:    (0)
 
 
 
 
 
Average Customer Review
Share your thoughts with other customers
Create your own review
 
 
Only search this product's reviews

The most helpful favorable review
The most helpful critical review


14 of 16 people found the following review helpful:
4.0 out of 5 stars Cursed by a bad name
If only this book didn't have the "Data Crunching" name. Far from being about data analysis this really a general book about different data formats (e.g. text, XML, database, binary) and how they are created and accessed in different languages. It's a reasonable fundamentals book. It also serves to introduce a wide variety of different technologies, without going into...
Published on May 19, 2005 by Jack D. Herrington

versus
2 of 2 people found the following review helpful:
3.0 out of 5 stars Too short, lacks more material
Yeah, its 'Short, Informative, Useful and Clear' (like someone already said) but... it's not enough. It seems like introductory chapters for a excelent book, but the really important chapters do not exist.

Too expensive for 176 pages with tips of XML, regexes, DB and some unix commands.
Published on February 26, 2009 by Luiz Augusto


‹ Previous | 1 2 | Next ›
Most Helpful First | Newest First

14 of 16 people found the following review helpful:
4.0 out of 5 stars Cursed by a bad name, May 19, 2005
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
If only this book didn't have the "Data Crunching" name. Far from being about data analysis this really a general book about different data formats (e.g. text, XML, database, binary) and how they are created and accessed in different languages. It's a reasonable fundamentals book. It also serves to introduce a wide variety of different technologies, without going into explicit depth about each.

As long as you understand what you are getting then I recommend this book. This is not a book about data processing techniques. Also, I recommend NOT using the material in the book that references SQL. The code does not use prepared statements properly and is vulnerable to SQL injection attacks.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


6 of 6 people found the following review helpful:
5.0 out of 5 stars It's about using the right tool for the right job, June 13, 2006
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools.

In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process.

From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt. After presenting the Java equivalent of a Python script that counts the number of times every email address appears in a list of email addresses, he writes:

All right. It's two-and-a-half times longer than the equivalent Python program, it isn't as fast on small files, and we have to compile it before we can run it, but other than that, it's almost as easy...

With a table of useful commands, explanation of redirection and piping, and some guidelines on how to make sure that your command-line tools follow convention, the text chapter could actually be viewed as a pretty passable introduction to the philosophy of Unix.

The chapter on Regular Expressions is great. So good, in fact, that I wish I could go back in time and give myself a photocopy of those thirty-odd pages at the point that I was struggling to get a handle on RE's some years back. Also included in this chapter is a brief, but very lucid, discussion of character encoding and a bit on using grep.

Although the Text and RE chapters were my favorite, Wilson's clear and concise writing style makes th eentire book, including the coverage of XML, binary data processing, and relational databases, a joy to read. With segues like "But wait a second. Wait just one pattern-matching second.", lists of email addresses to munge that include entries for Alan Turning, John von Neumann, and Grace Hopper, and the like, he also manages to inject some pleasant, if a bit groan-worthy, humor here and there into what could otherwise be a rather dry book.

He uses the last chapter, titled "Horshoe Nails" to quickly address a number topics, like encoding, the pitfalls of floating point arithmatic, and unit testing, which (not a surprise in a title coming from the Pragmatic Bookshelf) he likes, going so far as to say that the spread of test-driven development has been the "real revolution in programming in the last decade"). Diff is introduced and he brings the venerable make to the table as a tool for automating test running.

He doesn't say it in so many words, though his retooling the old saying that "two years of hard work can save you an hour in the library" as "an hour of hard work can often save you sixty seconds on Google" comes close, but the message is to work smarter rather than harder. Use industrial-strength tools and processes when industrial-strength solutions are called for and agile, simplest-things-that-work solutions whenever possible.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


6 of 7 people found the following review helpful:
5.0 out of 5 stars Just what the newbie or occasional programmer needs, June 10, 2005
By 
Eric Walstad (San Francisco, CA USA) - See all my reviews
(REAL NAME)   
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.

The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing. The author of course covers unit testing but also simple ways of testing when full-blown unit testing is overkill. The last chapter also has sections on encoding, dealing with floating point numbers, dates and times and how to format them with strftime. I was impressed by the author's ability to cull such important techniques and idioms and organize them into a small, yet incredibly useful text.

Data Crunching covers real-life data parsing and manipulation concepts. It does so without tangential journeys into other areas of programming. Each of the five main topics include simple code examples, usually in Python, Java or both, that clearly demonstrate the topic. The author does an impressive job of squeezing in most all the issues in the daily work of data crunching. The reader can expect to come away with something of value on each topic covered, especially the newbie or occasional script writer.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


3 of 3 people found the following review helpful:
4.0 out of 5 stars good data-handling cookbook for a beginner, July 18, 2006
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
This book is mainly concerned with scripting as a 'glue' between applications: processing various input and output formats. The book is divided into 5 main categories of data handling: plain text, regular expressions, XML, binary data and SQL. There is a final chapter on various miscellaneous topics. Most of the examples are given in Python. Some of the code is demonstrated in Java, although, disappointingly for a book published in 2005, none of the Java 5.0 features are leveraged. However, if nothing else, it demonstrates why Java is not anyone's first choice for such activities.

If you've read any of the O'Reilly cookbook series, you will know what to expect, although the chapters are more cohesive and less episodic. Beginning programmers will get the most out of this book, although intermediate programmers should find at least some material here that's new to them.

The XML chapter is a pretty good introduction the use and advantages/disadvantages of SAX and DOM, and XSLT is also described, although the discussion is not so clear. Those without experience with databases will welcome the chapter on SQL. The discussion on dealing with plain text files in chapter 1 was highlight for me, a subject not often covered in much depth in cookbooks; if, like me, you still regularly need to convert between various plain text formats, this chapter will help formalise approaches that you may already be carrying out in a less than rigorous fashion.

Additionally, the paragraphs on floating point arithmetic were intriguing but all too brief. The chapter on dealing with binary is fairly good, although rather dry. Peter Seibel's discussion of binary data in the context of writing a Shoutcast server in Practical Common Lisp shows that the subject can be dealt with in a more compelling fashion. That said, for the most part, author Greg Wilson is a genial companion; the writing style is chatty, but doesn't overdo it.

Overall, if you own any cookbook-style books, there is little here that you don't already know. Even for a beginner, it's hard to see how anyone who decides they need this book hasn't already been exposed to some of the material here. In particular, does anyone really need yet another introduction to regular expressions? The treatment here isn't bad, it's just that this material is already covered in many introductory programming books (especially those that cover scripting languages like Perl and Python). As this takes up nearly 20% of the book, and there's less than 200 pages, it's a bit of a waste. Personally, I would have preferred more discussion of the less well-treated subjects, some of which are too sparsely described, but this would have detracted from the book's main aim.

This would be suitable for a beginner Pythonista, who for some reason didn't want the bulk of the likes of Python Cookbook. Otherwise, if you feel that some Pragmatic Programmers books can be rather lightweight and somewhat overpriced, this will not change your mind.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


5 of 6 people found the following review helpful:
5.0 out of 5 stars Just the information you need to know to get rolling, July 21, 2005
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
There exists a set of tasks common to every software developer independent of the type of application developed and the language used. Concisely presenting these tasks to the new developer has always been a problem without burying the hapless soul under a pile of thick texts. The Pragmatic Bookshelf attempts to remedy this situation by giving the developer the knowledge they need to get the job done in a concise and, well, pragmatic format. One of the latest offerings in this outstanding series is "Data Crunching: Solving Everyday Problems Using Java, Python, and More" by Greg Wilson.

The core of programming comes down to data manipulation. This may be parsing XML, reformatting text data, searching a database, or any other number of a host of tasks. Typically, figuring out how to do each of these would require digesting several books in order to just get to the nuts and bolts of simple operations. "Data Crunching" fills this hole by concisely presenting the minimum amount of information required to get the job done. Just the information you need to know to get rolling, without all the fluff.

There are chapters on manipulating text files, XML documents, binary data, and relational databases. Included is a nice chapter on regular expressions, as well as a chapter on various "glue" topics relevant to solving data manipulation problems. Each chapter examines the tools and methods used to successfully manipulate the format of data being discussed. The examples used, and the book is chock full of them, are practical and relevant to the problems most often faced by developers. The examples are clearly illustrated and easy to follow.

Wilson does a fine job of presenting things in the "pragmatic" style that readers familiar with other books in the series have come to know. Each chapter stands well on its own, so the book may be used as a reference, although it's concise and a pleasant enough read that it's also worth reading through once. Great for the new developer who hasn't yet gotten his feet wet with data manipulation, yet also a nice reference for those who have been around the block a bit more, "Data Crunching" makes a fine addition to the Pragmatic series and is definitely worth having on the bookshelf.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


9 of 12 people found the following review helpful:
5.0 out of 5 stars A gold mine for the software developer..., June 21, 2005
By 
Mathias Meyer (Berlin, Germany) - See all my reviews
(REAL NAME)   
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
If you're reading this, you probably spend some quality time developing software. If you're developing software, chances are that you have to move data around on a daily basis (lucky you, if you don't). Be it getting data from one text format to another, moving data from a legacy system to a newer project's database, transforming XML into some more readable format for your boss or trying to get some useful data out of a former colleague's own binary format. Whatever you do in that manner, you're crunching data. Greg Wilson seems to have spent a lot of time crunching data and wants to share his wisdom with the world of pragmatic programmers. The book's coding focus is on working with Python and Java. I for my part haven't worked with Python yet, but being familiar with Ruby and Groovy it wasn't actually that hard to get an idea about what the Python code does (and I'm starting to like Python). So you've been warned about that.

Being a big fan of The Pragmatic Programmers' bookshelf I didn't hesitate to buy a copy of "Data Crunching" as well. Since I spend a lot of time doing stuff with some more or less usable data I thought it might be a good read to get some fresh ideas. And as it turns out that was a good choice.

Let's dive into the world of crunching data. Greg takes it easy on the reader in the introduction. He starts off with short examples of his professional career. This helps a lot to get an idea about what data crunching actually is. If you didn't already know it, reading the first chapter will give you some hints. The book is split up in a simple way. The next chapters will take you most of the data source/formats/crunching you'll most likely get in touch with. Mainly, that's text, regular expressions, XML, binary data, and relational databases. The book ends with a short chapter about the so called horseshoe nails, that being things that didn't fit anywhere else. But we'll get to that later. Not surprisingly, every chapter ends with a short summary.

The (more or less) simplest data you can work with is text. While some genius programmer in some company whose products you use/once used can always come up with a great new text format that nobody will ever understand, there's a good chance that you'll at least get an idea of its meaning by looking at a text file. Greg takes an example from the introduction some steps further to show the basics of working with text files, and also how to work with and around the common pitfalls. Being a pragmatic book, you also get the idea of how to keep your data crunching code nice and clean, and how to deal with normalising, collision detection and, of course, the basics of working with the UNIX shell (the tool of my choice for dealing with most "normal" text). After reading this chapter you have a very good idea about dealing with text. Compressing more information about dealing with text should be almost impossible.

Ah, regular expressions. The sheer joy of getting to know all the differences between grep, sed, awk, vim, Perl RE and the like just keeps me alive. Giving probably the best and shortest (but still understandable) introduction into working with regular expressions, Greg also gives good examples about what you can and what you can (or should) not do with regular expressions. Skimming through the pages you'll find that regular expressions can be applied to a lot of problems when it comes to handling input.

Working with XML is something I never really got comfortable with, but I gotta say Greg could convince to change my mind here. He introduces the basic techniques to work with XML, being SAX and DOM, showing their strengths and their weaknesses. Pretty much nothing else to say here. Good thing is that he prefers showing how to work with JDOM (Java) and xml.dom.minidom (Python) rather than the clunky C-style DOM-API. The real beauty of working with XML is XPath, at least for me. On the other hand there is XSLT which is more verbose than useful. You might get a similar impression reading this chapter. But I'm not here to judge (well, not about XPath and XSLT, anyway), the day might come when I'll have to get back to XSLT. It's always good to know the choices you have.

If you didn't get a chance to work with binary data yet, then next chapter is for you. One could discuss, if there's still a need to fiddle with binary data in the modern world. Or you could just give it shot. The examples are pretty straight-forward and understandable. Greg does an impressive job at working through the ups and downs of working with binary data. My fears certainly turned into curiosity after this chapter. After a short introduction into the world of 0 and its buddy 1 you'll learn how to pack and unpack different data types in fixed and variable formats with metadata.

The chapter on relational databases starts off with the best summary to SQL I've read so far, including joins, nested queries and normalisation. Besides text, databases definitely are one of my favourite tools for data crunching, in whatever tongue of SQL they speak. The SQL you'll learn in this chapter might be almost everything you'll ever need for working with data from MySQL, Oracle, and the like. Since working with SQL in your code is not the hardest part here, Greg keeps the focus on showing what you can do with SQL itself.

The grand finale is a small collection of so called horseshoe nails, miscellaneous techniques that will help you while crunching data that didn't really fit somewhere else. I definitely agree with Greg here in that those nails didn't fit anywhere else, but they're very much worth reading anyway. He introduces some basic tools like JUnit, diff and Make. He finishes with some short information about encoding/decoding, floating point arithmetic and working with dates and times.

This book is a gold mine for the software developer, be it a beginner or one that crunches data on a daily basis for years. The examples are very applicable to the every day life of a developer. They are simple enough to be immediately understood, but powerful enough to be good snippets to reuse, work on or build own data crunching code on them. Greg does an amazing job at keeping the examples and the text at a level that is both understandable and helpful for every developer. The book should be on your (as well as on mine) shelf whenever you have to write a small script or program to work with yet another data format the world didn't know existed. Greg will keep you sane and on track with his book. It's, after all, a pragmatic book! Just like with the other ones (which I can recommend without hesitation), you'll find tons of information packed into an entertaining, but nonetheless helpful book.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


2 of 2 people found the following review helpful:
3.0 out of 5 stars Too short, lacks more material, February 26, 2009
By 
Luiz Augusto (São Paulo, Brazil) - See all my reviews
(REAL NAME)   
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
Yeah, its 'Short, Informative, Useful and Clear' (like someone already said) but... it's not enough. It seems like introductory chapters for a excelent book, but the really important chapters do not exist.

Too expensive for 176 pages with tips of XML, regexes, DB and some unix commands.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


2 of 2 people found the following review helpful:
5.0 out of 5 stars Short, Informative, Useful and Clear, August 15, 2006
By 
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
Some of the best technical books are short, clear, easy to understand, and practical. Greg's book falls into this description. This a great book for exploring algorithms in the python language. The book assumes the reader has at least a basic understanding of the python programming language or some programming experience. I was delighted that topics were presented in a concise and unambigous way and that the book was short. There should be more short books published!
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


2 of 2 people found the following review helpful:
4.0 out of 5 stars An overview of parsing and mining data with python., July 3, 2006
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
The book presents the topics in conjunction with showing some practical data mining examples that any person might encounter. This book is recommended to people who are interested in basic parsing of data (text, XML, binary, etc) using python.

I got the impression that the author was trying to cover too much in too little space. The title, for example, mentions Java, Python, and more. This is deceiving since the book uses python for about 99% of its examples. And while the book does present Java, it only does so to show that it would be easier to use python. Almost no other languages are covered, although there are some examples in Ruby and Bash.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


4 of 5 people found the following review helpful:
5.0 out of 5 stars extremely well written and practical, May 26, 2005
By 
Stephen Vinoski (Chelmsford, MA USA) - See all my reviews
(REAL NAME)   
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
I think this book is nothing short of stellar.

The book says it targets the beginning to intermediate programmer, but I think even seasoned developers will learn from this book, especially those who grew up on systems other than UNIX. Users of UNIX are of course familiar with the collection of classic programs it provides, such as grep, cat, find, sed, and awk, that can quickly be piped together to create incredibly powerful data processing facilities. While this is not a UNIX book, it does describe and promote the same mindset that resulted in those UNIX capabiltiies. That mindset, which has always been useful, is more important than ever given today's focus on software agility. Greg keeps that theme front and center throughout his book.

The book is full of thoughtful, practical examples. Many are written in Python, and some are in Java. There's some XML, some SQL, and even some C as well. All tastefully convey the power of simplicity as well as the utility of having agile tools that can be quickly assembled into larger solutions.

Greg's writing is exceptional. It's minimal and tight, yet it never fails to fully and clearly explain precisely what needs explaining. The fact that it was written so well meant that I tore through the roughly 200 pages in no time, and enjoyed every page. I think even Brian Kernighan would be pretty impressed.

If you have friends or family members who are about to graduate from college and enter the working world as software developers, or if you have folks on your team who have been working as developers for just a few years, get each of them a copy of this book. The fundamentals it will give them will serve them well throughout their careers. And while you're at it, get yourself a copy too, so you can remind yourself how fun it is to be able to quickly put together useful solutions to everyday problems -- when you have the right tools and the right mindset, of course.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


‹ Previous | 1 2 | Next ›
Most Helpful First | Newest First

This product

Data Crunching: Solve Everyday Problems Using Java, Python, and more.
$29.95 $21.72
In Stock
Add to cart Add to wishlist