Automotive Holiday Deals Up to 50% Off Select Books Shop Men's Athletic Shoes Learn more nav_sap_SWP_6M_fly_beacon Prime Music Sweepstakes egg_2015 All-New Amazon Fire TV Grocery Gifts Under $50 Amazon Gift Card Offer cm15 cm15 cm15 $30 Off Amazon Echo $15 Off All-New Fire Kindle Voyage Cyber Monday Sweepstakes in Prime Music Shop Now Tikes

Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your email address or mobile phone number.

Buy Used
Condition: Used: Like New
Comment: Unbeatable customer service, and we usually ship the same or next day. Over one million satisfied customers!
Have one to sell? Sell on Amazon
Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See this image

Data Crunching: Solve Everyday Problems Using Java, Python, and more. Paperback – April 20, 2005

15 customer reviews

See all 2 formats and editions Hide other formats and editions
New from Used from
"Please retry"
$4.88 $0.01

Take an Extra 30% Off Any Book

Special Offers and Product Promotions

  • Take an Extra 30% Off Any Book: Use promo code HOLIDAY30 at checkout to get an extra 30% off any book for a limited time. Excludes Kindle eBooks and Audible Audiobooks. Restrictions apply. Learn more | Shop now

Editorial Reviews

About the Author

Greg Wilson holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is the author of Data Crunching and Practical Parallel Programming (MIT Press, 1995), and is a contributing editor at Doctor Dobb's Journal, and an adjunct professor in Computer Science at the University of Toronto.


Hero Quick Promo
Holiday Deals in Kindle Books
Save up to 85% on more than 1,000 Kindle Books. These deals are valid until November 30, 2015. Learn more

Product Details

  • Paperback: 176 pages
  • Publisher: Pragmatic Bookshelf (April 20, 2005)
  • Language: English
  • ISBN-10: 0974514071
  • ISBN-13: 978-0974514079
  • Product Dimensions: 7.5 x 0.6 x 9 inches
  • Shipping Weight: 13.4 ounces
  • Average Customer Review: 4.5 out of 5 stars  See all reviews (15 customer reviews)
  • Amazon Best Sellers Rank: #1,545,777 in Books (See Top 100 in Books)

More About the Author

Born and raised on Vancouver Island; studied engineering at Queen's University in Ontario, worked for a while, then went to Edinburgh for a Master's, some more work, and a PhD. Traveled while writing my first book on parallel programming; came to Toronto "for a couple of years" in 1994, and have never left. I've worked for big corporations, startups, and myself (prefer the small to the large), been a university professor (enjoyed the teaching more than the red tape), and am now project lead for Software Carpentry, a crash course on software development for scientists and engineers. You can find me online at (personal stuff) or (the course).

Customer Reviews

Most Helpful Customer Reviews

11 of 11 people found the following review helpful By Brian Donovan on June 13, 2006
Format: Paperback
Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools.

In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process.

From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
17 of 19 people found the following review helpful By Jack D. Herrington on May 19, 2005
Format: Paperback
If only this book didn't have the "Data Crunching" name. Far from being about data analysis this really a general book about different data formats (e.g. text, XML, database, binary) and how they are created and accessed in different languages. It's a reasonable fundamentals book. It also serves to introduce a wide variety of different technologies, without going into explicit depth about each.

As long as you understand what you are getting then I recommend this book. This is not a book about data processing techniques. Also, I recommend NOT using the material in the book that references SQL. The code does not use prepared statements properly and is vulnerable to SQL injection attacks.
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
7 of 8 people found the following review helpful By Eric Walstad on June 10, 2005
Format: Paperback
Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.

The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
4 of 4 people found the following review helpful By Thing with a hook on July 18, 2006
Format: Paperback
This book is mainly concerned with scripting as a 'glue' between applications: processing various input and output formats. The book is divided into 5 main categories of data handling: plain text, regular expressions, XML, binary data and SQL. There is a final chapter on various miscellaneous topics. Most of the examples are given in Python. Some of the code is demonstrated in Java, although, disappointingly for a book published in 2005, none of the Java 5.0 features are leveraged. However, if nothing else, it demonstrates why Java is not anyone's first choice for such activities.

If you've read any of the O'Reilly cookbook series, you will know what to expect, although the chapters are more cohesive and less episodic. Beginning programmers will get the most out of this book, although intermediate programmers should find at least some material here that's new to them.

The XML chapter is a pretty good introduction the use and advantages/disadvantages of SAX and DOM, and XSLT is also described, although the discussion is not so clear. Those without experience with databases will welcome the chapter on SQL. The discussion on dealing with plain text files in chapter 1 was highlight for me, a subject not often covered in much depth in cookbooks; if, like me, you still regularly need to convert between various plain text formats, this chapter will help formalise approaches that you may already be carrying out in a less than rigorous fashion.

Additionally, the paragraphs on floating point arithmetic were intriguing but all too brief. The chapter on dealing with binary is fairly good, although rather dry.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse

Most Recent Customer Reviews