or
Sign in to turn on 1-Click ordering.
 
 
Express Checkout with PayPhrase
What's this? | Create PayPhrase
Sorry!
More Buying Choices
35 used & new from $11.93

Have one to sell? Sell yours here
 
   
Data Crunching: Solve Everyday Problems Using Java, Python, and More
 
See larger image
 
Tell the Publisher!
I’d like to read this book on Kindle

Don’t have a Kindle? Get your Kindle here.
 
  

Data Crunching: Solve Everyday Problems Using Java, Python, and More (Paperback)

~ (Author)
4.5 out of 5 stars  See all reviews (14 customer reviews)

List Price: $29.95
Price: $22.76 & eligible for FREE Super Saver Shipping on orders over $25. Details
You Save: $7.19 (24%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In Stock.
Ships from and sold by Amazon.com. Gift-wrap available.

Only 1 left in stock--order soon (more on the way).

Want it delivered Wednesday, November 11? Choose One-Day Shipping at checkout. Details
23 new from $13.99 12 used from $11.93

Frequently Bought Together

Customers buy this book with Practical Programming: An Introduction to Computer Science Using Python (Pragmatic Programmers) by Jennifer Campbell

Data Crunching: Solve Everyday Problems Using Java, Python, and More + Practical Programming: An Introduction to Computer Science Using Python (Pragmatic Programmers)

Customers Who Bought This Item Also Bought

Beginning Python Visualization: Crafting Visual Transformation Scripts (Books for Professionals by Professionals)

Beginning Python Visualization: Crafting Visual Transformation Scripts (Books for Professionals by Professionals)

by Shai Vaingast
4.7 out of 5 stars (9)  $35.60
Python Essential Reference (4th Edition)

Python Essential Reference (4th Edition)

by David M. Beazley
5.0 out of 5 stars (5)  $29.69
Python Scripting for Computational Science (Texts in Computational Science and Engineering)

Python Scripting for Computational Science (Texts in Computational Science and Engineering)

by Hans Petter Langtangen
4.5 out of 5 stars (8)  $55.96
Desktop GIS: Mapping the Planet with Open Source Tools

Desktop GIS: Mapping the Planet with Open Source Tools

by Gary Sherman
Python Cookbook

Python Cookbook

by Alex Martelli
4.3 out of 5 stars (42)  $23.01
Explore similar items

Editorial Reviews

Product Description

Every day, all around the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check that configuration files are internally consistent, and search through web logs to see how many people have downloaded the latest release of their product. This kind of "data crunching," may not be glamorous, but knowing how to do it efficiently is essential to being a good programmer.

This book describes the most useful data crunching techniques, explains when you should use them, and shows how they will make your life easier. Along the way, it will introduce you to some handy, but under-used, features of Java, Python, and other languages. It will also show you how to test data crunching programs, and how data crunching fits into the larger software development picture.



About the Author

Greg Wilson is a computer programmer and former teacher who lives in Toronto.

William Lytle has been teaching architectural design for several years as well as illustrating books for children. He lives in Toronto.


Product Details

  • Paperback: 193 pages
  • Publisher: Pragmatic Bookshelf (April 20, 2005)
  • Language: English
  • ISBN-10: 0974514071
  • ISBN-13: 978-0974514079
  • Product Dimensions: 8.7 x 7.5 x 0.6 inches
  • Shipping Weight: 13.4 ounces (View shipping rates and policies)
  • Average Customer Review: 4.5 out of 5 stars  See all reviews (14 customer reviews)
  • Amazon.com Sales Rank: #411,280 in Books (See Bestsellers in Books)

    Popular in these categories: (What's this?)

    #11 in  Books > Computers & Internet > Databases > Java & Databases
    #61 in  Books > Computers & Internet > Programming > Languages & Tools > Python

More About the Author

Greg Wilson
Discover books, learn about writers, read author blogs, and more.

Visit Amazon's Greg Wilson Page


Tags Customers Associate with This Product

 (What's this?)
Click on a tag to find related items, discussions, and people.
 

Your tags: Add your first tag
 

Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more

 

Customer Reviews

14 Reviews
5 star:
 (8)
4 star:
 (5)
3 star:
 (1)
2 star:    (0)
1 star:    (0)
 
 
 
 
 
Average Customer Review
4.5 out of 5 stars (14 customer reviews)
 
 
 
 
Share your thoughts with other customers:
Most Helpful Customer Reviews

 
12 of 14 people found the following review helpful:
4.0 out of 5 stars Cursed by a bad name, May 19, 2005
If only this book didn't have the "Data Crunching" name. Far from being about data analysis this really a general book about different data formats (e.g. text, XML, database, binary) and how they are created and accessed in different languages. It's a reasonable fundamentals book. It also serves to introduce a wide variety of different technologies, without going into explicit depth about each.

As long as you understand what you are getting then I recommend this book. This is not a book about data processing techniques. Also, I recommend NOT using the material in the book that references SQL. The code does not use prepared statements properly and is vulnerable to SQL injection attacks.
Comment Comment | Permalink | Was this review helpful to you? Yes No (Report this)



 
5 of 5 people found the following review helpful:
5.0 out of 5 stars It's about using the right tool for the right job, June 13, 2006
Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools.

In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process.

From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt. After presenting the Java equivalent of a Python script that counts the number of times every email address appears in a list of email addresses, he writes:

All right. It's two-and-a-half times longer than the equivalent Python program, it isn't as fast on small files, and we have to compile it before we can run it, but other than that, it's almost as easy...

With a table of useful commands, explanation of redirection and piping, and some guidelines on how to make sure that your command-line tools follow convention, the text chapter could actually be viewed as a pretty passable introduction to the philosophy of Unix.

The chapter on Regular Expressions is great. So good, in fact, that I wish I could go back in time and give myself a photocopy of those thirty-odd pages at the point that I was struggling to get a handle on RE's some years back. Also included in this chapter is a brief, but very lucid, discussion of character encoding and a bit on using grep.

Although the Text and RE chapters were my favorite, Wilson's clear and concise writing style makes th eentire book, including the coverage of XML, binary data processing, and relational databases, a joy to read. With segues like "But wait a second. Wait just one pattern-matching second.", lists of email addresses to munge that include entries for Alan Turning, John von Neumann, and Grace Hopper, and the like, he also manages to inject some pleasant, if a bit groan-worthy, humor here and there into what could otherwise be a rather dry book.

He uses the last chapter, titled "Horshoe Nails" to quickly address a number topics, like encoding, the pitfalls of floating point arithmatic, and unit testing, which (not a surprise in a title coming from the Pragmatic Bookshelf) he likes, going so far as to say that the spread of test-driven development has been the "real revolution in programming in the last decade"). Diff is introduced and he brings the venerable make to the table as a tool for automating test running.

He doesn't say it in so many words, though his retooling the old saying that "two years of hard work can save you an hour in the library" as "an hour of hard work can often save you sixty seconds on Google" comes close, but the message is to work smarter rather than harder. Use industrial-strength tools and processes when industrial-strength solutions are called for and agile, simplest-things-that-work solutions whenever possible.
Comment Comment | Permalink | Was this review helpful to you? Yes No (Report this)



 
6 of 7 people found the following review helpful:
5.0 out of 5 stars Just what the newbie or occasional programmer needs, June 10, 2005
By Eric Walstad (San Francisco, CA USA) - See all my reviews
(REAL NAME)   
Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.

The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing. The author of course covers unit testing but also simple ways of testing when full-blown unit testing is overkill. The last chapter also has sections on encoding, dealing with floating point numbers, dates and times and how to format them with strftime. I was impressed by the author's ability to cull such important techniques and idioms and organize them into a small, yet incredibly useful text.

Data Crunching covers real-life data parsing and manipulation concepts. It does so without tangential journeys into other areas of programming. Each of the five main topics include simple code examples, usually in Python, Java or both, that clearly demonstrate the topic. The author does an impressive job of squeezing in most all the issues in the daily work of data crunching. The reader can expect to come away with something of value on each topic covered, especially the newbie or occasional script writer.
Comment Comment | Permalink | Was this review helpful to you? Yes No (Report this)


Share your thoughts with other customers: Create your own review
 
 
 
Most Recent Customer Reviews

5.0 out of 5 stars Excellent Python Oriented Primer for Novice to Intermediate Data Parsing
This book is full of great ideas and very helpful code.

It's also nice to see the Java equivalent app/code for the python solution. Read more
Published 6 months ago by Chris M

3.0 out of 5 stars Too short, lacks more material
Yeah, its 'Short, Informative, Useful and Clear' (like someone already said) but... it's not enough. Read more
Published 8 months ago by Luiz Augusto

5.0 out of 5 stars Short, Informative, Useful and Clear
Some of the best technical books are short, clear, easy to understand, and practical. Greg's book falls into this description. Read more
Published on August 15, 2006 by Noah Gift

4.0 out of 5 stars good data-handling cookbook for a beginner
This book is mainly concerned with scripting as a 'glue' between applications: processing various input and output formats. Read more
Published on July 18, 2006 by Thing with a hook

4.0 out of 5 stars An overview of parsing and mining data with python.
The book presents the topics in conjunction with showing some practical data mining examples that any person might encounter. Read more
Published on July 3, 2006 by Abdulmajed Dakkak

5.0 out of 5 stars A clever guide to extracting the data you need
Data Crunching by Greg Wilson.

The book opens with a statement of purpose: transmuting data from one form into another. Read more
Published on June 30, 2006 by Scott Ellsworth

4.0 out of 5 stars Data Crunching: Solve Everyday Problems Using Java, Python, and More
This is an excellent introduction to typical data crunching applications. I found it to be very readable and filled with lots of examples. Read more
Published on November 3, 2005 by Nick Bulitka

5.0 out of 5 stars Just the information you need to know to get rolling
There exists a set of tasks common to every software developer independent of the type of application developed and the language used. Read more
Published on July 21, 2005 by Jason Menard

5.0 out of 5 stars A gold mine for the software developer...
If you're reading this, you probably spend some quality time developing software. If you're developing software, chances are that you have to move data around on a daily basis... Read more
Published on June 21, 2005 by Mathias Meyer

5.0 out of 5 stars extremely well written and practical
I think this book is nothing short of stellar.

The book says it targets the beginning to intermediate programmer, but I think even seasoned developers will learn from... Read more
Published on May 26, 2005 by Stephen Vinoski

Only search this product's reviews



Customer Discussions

This product's forum
Discussion Replies Latest Post
No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
Topic:
First post:
Prompts for sign-in
 

Search Customer Discussions
Search all Amazon discussions
   


Listmania!


So You'd Like to...


Create a guide

Product Information from the Amapedia Community

Beta (What's this?)


Look for Similar Items by Category


Look for Similar Items by Subject

 

Feedback

If you need help or have a question for Customer Service, contact us.
 Would you like to update product info or give feedback on images?
Is there any other feedback you would like to provide?

Your comments can help make our site better for everyone.


Your Recent History

 (What's this?)

After viewing product detail pages or search results, look here to find an easy way to navigate back to pages you are interested in.