or
Sign in to turn on 1-Click ordering
More Buying Choices
Have one to sell? Sell yours here
Tell the Publisher!
I'd like to read this book on Kindle

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.
Sorry, this item is not available in
Image not available for
Color:
Image not available

To view this video download Flash Player

 

Data Crunching: Solve Everyday Problems Using Java, Python, and more. [Paperback]

Greg Wilson
4.5 out of 5 stars  See all reviews (15 customer reviews)

List Price: $29.95
Price: $22.10 & FREE Shipping on orders over $25. Details
You Save: $7.85 (26%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Only 3 left in stock (more on the way).
Ships from and sold by Amazon.com. Gift-wrap available.
Want it tomorrow, June 21? Choose One-Day Shipping at checkout. Details
Shop the new tech.book(store)
New! Introducing the tech.book(store), a hub for Software Developers and Architects, Networking Administrators, TPMs, and other technology professionals to find highly-rated and highly-relevant career resources. Shop books on programming and big data, or read this week's blog posts by authors and thought-leaders in the tech industry. > Shop now

Book Description

April 20, 2005

Every day, all around the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check that configuration files are internally consistent, and search through web logs to see how many people have downloaded the latest release of their product. This kind of "data crunching," may not be glamorous, but knowing how to do it efficiently is essential to being a good programmer.

This book describes the most useful data crunching techniques, explains when you should use them, and shows how they will make your life easier. Along the way, it will introduce you to some handy, but under-used, features of Java, Python, and other languages. It will also show you how to test data crunching programs, and how data crunching fits into the larger software development picture.


Frequently Bought Together

Data Crunching: Solve Everyday Problems Using Java, Python, and more. + Practical Programming: An Introduction to Computer Science Using Python (Pragmatic Programmers)
Price for both: $42.08

Buy the selected items together


Editorial Reviews

About the Author

Greg Wilson holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is the author of Data Crunching and Practical Parallel Programming (MIT Press, 1995), and is a contributing editor at Doctor Dobb's Journal, and an adjunct professor in Computer Science at the University of Toronto.


Product Details

  • Paperback: 176 pages
  • Publisher: Pragmatic Bookshelf (April 20, 2005)
  • Language: English
  • ISBN-10: 0974514071
  • ISBN-13: 978-0974514079
  • Product Dimensions: 7.6 x 0.6 x 9 inches
  • Shipping Weight: 13.4 ounces (View shipping rates and policies)
  • Average Customer Review: 4.5 out of 5 stars  See all reviews (15 customer reviews)
  • Amazon Best Sellers Rank: #849,450 in Books (See Top 100 in Books)

More About the Author

Born and raised on Vancouver Island; studied engineering at Queen's University in Ontario, worked for a while, then went to Edinburgh for a Master's, some more work, and a PhD. Traveled while writing my first book on parallel programming; came to Toronto "for a couple of years" in 1994, and have never left. I've worked for big corporations, startups, and myself (prefer the small to the large), been a university professor (enjoyed the teaching more than the red tape), and am now project lead for Software Carpentry, a crash course on software development for scientists and engineers. You can find me online at http://third-bit.com (personal stuff) or http://software-carpentry.org (the course).

Customer Reviews

4.5 out of 5 stars
(15)
4.5 out of 5 stars
Most Helpful Customer Reviews
15 of 17 people found the following review helpful
4.0 out of 5 stars Cursed by a bad name May 19, 2005
Format:Paperback
If only this book didn't have the "Data Crunching" name. Far from being about data analysis this really a general book about different data formats (e.g. text, XML, database, binary) and how they are created and accessed in different languages. It's a reasonable fundamentals book. It also serves to introduce a wide variety of different technologies, without going into explicit depth about each.

As long as you understand what you are getting then I recommend this book. This is not a book about data processing techniques. Also, I recommend NOT using the material in the book that references SQL. The code does not use prepared statements properly and is vulnerable to SQL injection attacks.
Comment | 
Was this review helpful to you?
8 of 8 people found the following review helpful
5.0 out of 5 stars It's about using the right tool for the right job June 13, 2006
Format:Paperback
Gregory Wilson likes Python and bash but doesn't particularly care for XSLT (or Perl, and possibly Java as well, either), doesn't express a preference in the great Emacs vs. Vi(m) holy war, and divides programming languages into two camps - agile, like Python and Ruby, and "sturdy", like Java. He's an adjunct CS professor at the University of Toronto, a contributing editor with Dr. Dobb's Journal, and is developing "Software Carpentry", which is either a basic course on software development aimed at scientists and engineers for the Python Software Foundation or a project to develop a newer, easier-to-use set of software development tools.

In the book, "Data Crunching: Solve Everyday Problems Using Java, Python, and More", data crunching is explored through a series of examples. The closest that Wilson comes to giving a definition is when, at the start of the first chapter, he refers to data crunching/munging as the "other 10%" of a programming task that takes up the "other 90% of the time". The first example that he gives is his experience helping a high school science teacher convert PDB (Protein Data Bank) files containing the coordinates of atoms in various molecules into a format that a Fortran sphere-drawing program could process.

From the introduction, he moves on to the manipulation of text and text files using Unix command-line tools and Python, with Java work-alikes following most of the Python scripts. Although the book's subtitle, "Solve Everyday Problems Using Java, Python, and More", gives Java first billing (possibly for marketing reasons?), Wilson's preference for Python over Java is never in doubt.
... Read more ›
Comment | 
Was this review helpful to you?
6 of 7 people found the following review helpful
5.0 out of 5 stars Just what the newbie or occasional programmer needs June 10, 2005
Format:Paperback
Data Crunching is a short book with great how-to-like code examples of very common data parsing and manipulation techniques. The examples are easy to follow and clearly demonstrate the author's point. None of the topics are covered in great depth but each contains enough to whet the reader's appetite for more. The text and examples are thought provoking, leading the reader to ask the right kind of questions when detailed information is needed.

The book covers the most common aspects of data crunching, including text files, regular expressions, XML, binary files, relational databases and unit testing. The book dedicates a chapter to each of these topics. Each chapter has one or more sample problems to solve. I found the sample problems to be well thought out. If not exactly the same as a real-life data crunching problem I've had to solve in the past, then sufficiently close to easily apply the principals (and sample code) to my problem. I thought the regular expressions section was an excellent, succinct, (re)introduction to regular expressions. Wilson starts with basic patterns, quickly and clearly working up to common complex patterns. The regular expressions chapter also includes a nice bit of Python code that generates a table of patterns, test strings and those patterns that match them. I liked the chapter on XML but noticed that there was no code example on performing an XSLT. There is, however, a good example of an XSLT template, but no code on how to process it. The chapter on relational databases covers all the most common SQL needed for daily use (think 10% of the SQL that works on 90% of the problems). This includes sub-selects, negation, aggregation and views. The last chapter, "Horshoe Nails", covers miscellaneous topics including testing.
... Read more ›
Comment | 
Was this review helpful to you?
3 of 3 people found the following review helpful
3.0 out of 5 stars Too short, lacks more material February 26, 2009
Format:Paperback
Yeah, its 'Short, Informative, Useful and Clear' (like someone already said) but... it's not enough. It seems like introductory chapters for a excelent book, but the really important chapters do not exist.

Too expensive for 176 pages with tips of XML, regexes, DB and some unix commands.
Comment | 
Was this review helpful to you?
3 of 3 people found the following review helpful
4.0 out of 5 stars good data-handling cookbook for a beginner July 18, 2006
Format:Paperback
This book is mainly concerned with scripting as a 'glue' between applications: processing various input and output formats. The book is divided into 5 main categories of data handling: plain text, regular expressions, XML, binary data and SQL. There is a final chapter on various miscellaneous topics. Most of the examples are given in Python. Some of the code is demonstrated in Java, although, disappointingly for a book published in 2005, none of the Java 5.0 features are leveraged. However, if nothing else, it demonstrates why Java is not anyone's first choice for such activities.

If you've read any of the O'Reilly cookbook series, you will know what to expect, although the chapters are more cohesive and less episodic. Beginning programmers will get the most out of this book, although intermediate programmers should find at least some material here that's new to them.

The XML chapter is a pretty good introduction the use and advantages/disadvantages of SAX and DOM, and XSLT is also described, although the discussion is not so clear. Those without experience with databases will welcome the chapter on SQL. The discussion on dealing with plain text files in chapter 1 was highlight for me, a subject not often covered in much depth in cookbooks; if, like me, you still regularly need to convert between various plain text formats, this chapter will help formalise approaches that you may already be carrying out in a less than rigorous fashion.

Additionally, the paragraphs on floating point arithmetic were intriguing but all too brief. The chapter on dealing with binary is fairly good, although rather dry.
... Read more ›
Comment | 
Was this review helpful to you?
Most Recent Customer Reviews
4.0 out of 5 stars Good book for beginning data crunchers
I've never owned a book on this subject matter, but I wish I had it years ago. It presents many topics that become common sense over time in the field. Read more
Published 7 months ago by phalseprofet
5.0 out of 5 stars Excellent Python Oriented Primer for Novice to Intermediate Data...
This book is full of great ideas and very helpful code.

It's also nice to see the Java equivalent app/code for the python solution. Read more
Published on April 20, 2009 by Chris M
5.0 out of 5 stars Short, Informative, Useful and Clear
Some of the best technical books are short, clear, easy to understand, and practical. Greg's book falls into this description. Read more
Published on August 15, 2006 by Noah Gift
4.0 out of 5 stars An overview of parsing and mining data with python.
The book presents the topics in conjunction with showing some practical data mining examples that any person might encounter. Read more
Published on July 3, 2006 by Abdulmajed Dakkak
5.0 out of 5 stars A clever guide to extracting the data you need
Data Crunching by Greg Wilson.

The book opens with a statement of purpose: transmuting data from one form into another. Read more
Published on June 30, 2006 by Scott Ellsworth
4.0 out of 5 stars Data Crunching: Solve Everyday Problems Using Java, Python, and More
This is an excellent introduction to typical data crunching applications. I found it to be very readable and filled with lots of examples. Read more
Published on November 3, 2005 by Nick Bulitka
5.0 out of 5 stars Just the information you need to know to get rolling
There exists a set of tasks common to every software developer independent of the type of application developed and the language used. Read more
Published on July 21, 2005 by Jason Menard
5.0 out of 5 stars A gold mine for the software developer...
If you're reading this, you probably spend some quality time developing software. If you're developing software, chances are that you have to move data around on a daily basis... Read more
Published on June 21, 2005 by Mathias Meyer
5.0 out of 5 stars extremely well written and practical
I think this book is nothing short of stellar.

The book says it targets the beginning to intermediate programmer, but I think even seasoned developers will learn from... Read more
Published on May 26, 2005 by Stephen Vinoski
Search Customer Reviews
Only search this product's reviews


Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more

Forums

There are no discussions about this product yet.
Be the first to discuss this product with the community.
Start a new discussion
Topic:
First post:
Prompts for sign-in
 



So You'd Like to...



Look for Similar Items by Category