Amazon.com: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (9780974514079): Greg Wilson: Books

Buy New

or
Sign in to turn on 1-Click ordering.
or
Amazon Prime Free Trial required. Sign up when you check out. Learn More
Buy Used
Used - Good See details
$15.00 & eligible for FREE Super Saver Shipping on orders over $25. Details

or
Sign in to turn on 1-Click ordering.
 
   
Sell Back Your Copy
For a $1.81 Gift Card
Trade in
More Buying Choices
Have one to sell? Sell yours here
Data Crunching: Solve Everyday Problems Using Java, Python, and more.
 
See larger image
 
Tell the Publisher!
I'd like to read this book on Kindle

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Data Crunching: Solve Everyday Problems Using Java, Python, and more. [Paperback]

Greg Wilson (Author)
4.5 out of 5 stars  See all reviews (14 customer reviews)

List Price: $29.95
Price: $21.72 & eligible for FREE Super Saver Shipping on orders over $25. Details
You Save: $8.23 (27%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In Stock.
Ships from and sold by Amazon.com. Gift-wrap available.
Only 2 left in stock--order soon (more on the way).
Want it delivered Monday, February 27? Choose One-Day Shipping at checkout. Details

Book Description

April 20, 2005

Every day, all around the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check that configuration files are internally consistent, and search through web logs to see how many people have downloaded the latest release of their product. This kind of "data crunching," may not be glamorous, but knowing how to do it efficiently is essential to being a good programmer.

This book describes the most useful data crunching techniques, explains when you should use them, and shows how they will make your life easier. Along the way, it will introduce you to some handy, but under-used, features of Java, Python, and other languages. It will also show you how to test data crunching programs, and how data crunching fits into the larger software development picture.


Frequently Bought Together

Customers buy this book with Practical Programming: An Introduction to Computer Science Using Python (Pragmatic Programmers) $20.00

Data Crunching: Solve Everyday Problems Using Java, Python, and more. + Practical Programming: An Introduction to Computer Science Using Python (Pragmatic Programmers)


Editorial Reviews

About the Author

Greg Wilson has worked on high-performance scientific computing, data visualization, and computer security, and is currently project lead at Software Carpentry (http://software-carpentry.org). Greg has a Ph.D. in Computer Science from the University of Edinburgh, and has written and edited several technical and children's books, including "Beautiful Code" (O'Reilly, 2007).


Product Details

  • Paperback: 176 pages
  • Publisher: Pragmatic Bookshelf (April 20, 2005)
  • Language: English
  • ISBN-10: 0974514071
  • ISBN-13: 978-0974514079
  • Product Dimensions: 9 x 7.5 x 0.6 inches
  • Shipping Weight: 13.4 ounces (View shipping rates and policies)
  • Average Customer Review: 4.5 out of 5 stars  See all reviews (14 customer reviews)
  • Amazon Best Sellers Rank: #387,237 in Books (See Top 100 in Books)

More About the Author

Born and raised on Vancouver Island; studied engineering at Queen's University in Ontario, worked for a while, then went to Edinburgh for a Master's, some more work, and a PhD. Traveled while writing my first book on parallel programming; came to Toronto "for a couple of years" in 1994, and have never left. I've worked for big corporations, startups, and myself (prefer the small to the large), been a university professor (enjoyed the teaching more than the red tape), and am now project lead for Software Carpentry, a crash course on software development for scientists and engineers. You can find me online at http://third-bit.com (personal stuff) or http://software-carpentry.org (the course).

 

Customer Reviews

14 Reviews
5 star:
 (8)
4 star:
 (5)
3 star:
 (1)
2 star:    (0)
1 star:    (0)
 
 
 
 
 
Average Customer Review
4.5 out of 5 stars (14 customer reviews)
 
 
 
 
Share your thoughts with other customers:
Most Helpful Customer Reviews

14 of 16 people found the following review helpful:
4.0 out of 5 stars Cursed by a bad name, May 19, 2005
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
If only this book didn't have the "Data Crunching" name. Far from being about data analysis this really a general book about different data formats (e.g. text, XML, database, binary) and how they are created and accessed in different languages. It's a reasonable fundamentals book. It also serves to introduce a wide variety of different technologies, without going into explicit depth about each.

As long as you understand what you are getting then I recommend this book. This is not a book about data processing techniques. Also, I recommend NOT using the material in the book that references SQL. The code does not use prepared statements properly and is vulnerable to SQL injection attacks.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


6 of 6 people found the following review helpful:
5.0 out of 5 stars It's about using the right tool for the right job, June 13, 2006
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools.

In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process.

From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt. After presenting the Java equivalent of a Python script that counts the number of times every email address appears in a list of email addresses, he writes:

All right. It's two-and-a-half times longer than the equivalent Python program, it isn't as fast on small files, and we have to compile it before we can run it, but other than that, it's almost as easy...

With a table of useful commands, explanation of redirection and piping, and some guidelines on how to make sure that your command-line tools follow convention, the text chapter could actually be viewed as a pretty passable introduction to the philosophy of Unix.

The chapter on Regular Expressions is great. So good, in fact, that I wish I could go back in time and give myself a photocopy of those thirty-odd pages at the point that I was struggling to get a handle on RE's some years back. Also included in this chapter is a brief, but very lucid, discussion of character encoding and a bit on using grep.

Although the Text and RE chapters were my favorite, Wilson's clear and concise writing style makes th eentire book, including the coverage of XML, binary data processing, and relational databases, a joy to read. With segues like "But wait a second. Wait just one pattern-matching second.", lists of email addresses to munge that include entries for Alan Turning, John von Neumann, and Grace Hopper, and the like, he also manages to inject some pleasant, if a bit groan-worthy, humor here and there into what could otherwise be a rather dry book.

He uses the last chapter, titled "Horshoe Nails" to quickly address a number topics, like encoding, the pitfalls of floating point arithmatic, and unit testing, which (not a surprise in a title coming from the Pragmatic Bookshelf) he likes, going so far as to say that the spread of test-driven development has been the "real revolution in programming in the last decade"). Diff is introduced and he brings the venerable make to the table as a tool for automating test running.

He doesn't say it in so many words, though his retooling the old saying that "two years of hard work can save you an hour in the library" as "an hour of hard work can often save you sixty seconds on Google" comes close, but the message is to work smarter rather than harder. Use industrial-strength tools and processes when industrial-strength solutions are called for and agile, simplest-things-that-work solutions whenever possible.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


6 of 7 people found the following review helpful:
5.0 out of 5 stars Just what the newbie or occasional programmer needs, June 10, 2005
By 
Eric Walstad (San Francisco, CA USA) - See all my reviews
(REAL NAME)   
This review is from: Data Crunching: Solve Everyday Problems Using Java, Python, and more. (Paperback)
Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.

The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing. The author of course covers unit testing but also simple ways of testing when full-blown unit testing is overkill. The last chapter also has sections on encoding, dealing with floating point numbers, dates and times and how to format them with strftime. I was impressed by the author's ability to cull such important techniques and idioms and organize them into a small, yet incredibly useful text.

Data Crunching covers real-life data parsing and manipulation concepts. It does so without tangential journeys into other areas of programming. Each of the five main topics include simple code examples, usually in Python, Java or both, that clearly demonstrate the topic. The author does an impressive job of squeezing in most all the issues in the daily work of data crunching. The reader can expect to come away with something of value on each topic covered, especially the newbie or occasional script writer.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No

Share your thoughts with other customers: Create your own review
 
 
 
Most Recent Customer Reviews











Only search this product's reviews



What Other Items Do Customers Buy After Viewing This Item?


Tags Customers Associate with This Product

 (What's this?)
Click on a tag to find related items, discussions, and people.
 

Your tags: Add your first tag
 

Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more

Customer Discussions

This product's forum
Discussion Replies Latest Post
No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
Topic:
First post:
Prompts for sign-in
 


Active discussions in related forums
Search Customer Discussions
Search all Amazon discussions
   
Related forums



So You'd Like to...



Look for Similar Items by Category


Look for Similar Items by Subject