Enter your mobile number below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
Getting the download link through email is temporarily not available. Please check back later.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your mobile phone number.

Buy Used
+ $3.99 shipping
Used: Like New | Details
Condition: Used: Like New
Comment: Nearly new condition book. Sail the Seas of Value.
Have one to sell? Sell on Amazon
Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See this image

Data Crunching: Solve Everyday Problems Using Java, Python, and more. Paperback – April 20, 2005

4.5 out of 5 stars 15 customer reviews

See all formats and editions Hide other formats and editions
New from Used from
"Please retry"
$15.00 $0.01

Windows 10 For Dummies Video Training
Get up to speed with Windows 10 with this video training course from For Dummies. Learn more.
click to open popover

Editorial Reviews

About the Author

Greg Wilson holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is the author of Data Crunching and Practical Parallel Programming (MIT Press, 1995), and is a contributing editor at Doctor Dobb's Journal, and an adjunct professor in Computer Science at the University of Toronto.


New York Times best sellers
Browse the New York Times best sellers in popular categories like Fiction, Nonfiction, Picture Books and more. See more

Product Details

  • Paperback: 176 pages
  • Publisher: Pragmatic Bookshelf; 1 edition (April 20, 2005)
  • Language: English
  • ISBN-10: 0974514071
  • ISBN-13: 978-0974514079
  • Product Dimensions: 7.5 x 0.6 x 9 inches
  • Shipping Weight: 13.4 ounces
  • Average Customer Review: 4.5 out of 5 stars  See all reviews (15 customer reviews)
  • Amazon Best Sellers Rank: #1,504,108 in Books (See Top 100 in Books)

Customer Reviews

Top Customer Reviews

Format: Paperback
Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools.

In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process.

From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt.
Read more ›
Comment 11 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.

The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing.
Read more ›
Comment 7 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
This book is mainly concerned with scripting as a 'glue' between applications: processing various input and output formats. The book is divided into 5 main categories of data handling: plain text, regular expressions, XML, binary data and SQL. There is a final chapter on various miscellaneous topics. Most of the examples are given in Python. Some of the code is demonstrated in Java, although, disappointingly for a book published in 2005, none of the Java 5.0 features are leveraged. However, if nothing else, it demonstrates why Java is not anyone's first choice for such activities.

If you've read any of the O'Reilly cookbook series, you will know what to expect, although the chapters are more cohesive and less episodic. Beginning programmers will get the most out of this book, although intermediate programmers should find at least some material here that's new to them.

The XML chapter is a pretty good introduction the use and advantages/disadvantages of SAX and DOM, and XSLT is also described, although the discussion is not so clear. Those without experience with databases will welcome the chapter on SQL. The discussion on dealing with plain text files in chapter 1 was highlight for me, a subject not often covered in much depth in cookbooks; if, like me, you still regularly need to convert between various plain text formats, this chapter will help formalise approaches that you may already be carrying out in a less than rigorous fashion.

Additionally, the paragraphs on floating point arithmetic were intriguing but all too brief. The chapter on dealing with binary is fairly good, although rather dry.
Read more ›
Comment 4 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse

Most Recent Customer Reviews