Join Amazon Prime and ship Two-Day for free and Overnight for $3.99. Already a member? Sign in.

 

or
Sign in to turn on 1-Click ordering.
 
 
More Buying Choices
46 used & new from $2.50

Have one to sell? Sell yours here
 
   
Spidering Hacks
 
 
Tell the Publisher!
I’d like to read this book on Kindle

Don’t have a Kindle? Get yours here.
 
  
4.4 out of 5 stars See all reviews (14 customer reviews)

List Price: $24.95
Price: $16.47 & eligible for FREE Super Saver Shipping on orders over $25. Details
You Save: $8.48 (34%)
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
In Stock.
Ships from and sold by Amazon.com. Gift-wrap available.

Want it delivered Monday, July 20? Choose One-Day Shipping at checkout. Details
24 new from $10.10 22 used from $2.50
Like this book? Find similar titles from O'Reilly and Partners in our O'Reilly Bookstore.

Best Value

Buy Google Pocket Guide and get Spidering Hacks at an additional 5% off Amazon.com's everyday low price.

Google Pocket Guide + Spidering Hacks
Buy Together Today: $25.60

Show availability and shipping details

  • Google Pocket Guide

    In Stock.
    Ships from and sold by Amazon.com.
    Eligible for FREE Super Saver Shipping on orders over $25. Details

  • This item: Spidering Hacks

    In Stock.
    Ships from and sold by Amazon.com.
    Eligible for FREE Super Saver Shipping on orders over $25. Details


Customers Who Bought This Item Also Bought

Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL

Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL

by Michael Schrenk
4.6 out of 5 stars (14)  $26.37
Google Hacks: Tips & Tools for Finding and Using the World's Information

Google Hacks: Tips & Tools for Finding and Using the World's Information

by Rael Dornfest
4.4 out of 5 stars (61)  $18.24
Perl & LWP

Perl & LWP

by Sean M. Burke
Programming Collective Intelligence: Building Smart Web 2.0 Applications

Programming Collective Intelligence: Building Smart Web 2.0 Applications

by Toby Segaran
4.5 out of 5 stars (48)  $26.39
Online Investing Hacks: 100 Industrial-Strength Tips & Tools

Online Investing Hacks: 100 Industrial-Strength Tips & Tools

by Bonnie Biafore
Explore similar items

Editorial Reviews

Product Description
Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:
  • Aggregate and associate data from disparate locations, then store and manipulate the data as you like
  • Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
  • Integrate third-party data into your own applications or web sites
  • Make your own site easier to scrape and more usable to others
  • Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day
Like the other books in O'Reilly's popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you're interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data.

About the Author
Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine, he'd love to give you a Fry Pan of Intellect upside the head. Politely, of course. And with love.

Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.

Product Details

  • Paperback: 424 pages
  • Publisher: O'Reilly Media, Inc.; 1st edition (November 1, 2003)
  • Language: English
  • ISBN-10: 0596005776
  • ISBN-13: 978-0596005771
  • Product Dimensions: 8.9 x 6 x 1 inches
  • Shipping Weight: 1.3 pounds (View shipping rates and policies)
  • Average Customer Review: 4.4 out of 5 stars See all reviews (14 customer reviews)
  • Amazon.com Sales Rank: #321,713 in Books (See Bestsellers in Books)

    Popular in these categories: (What's this?)

    #9 in  Books > Nonfiction > Social Sciences > Library & Information Science > Automation
    #52 in  Books > Computers & Internet > Microsoft > Web Browsers

Inside This Book (learn more)

Citations (learn more)


Tags Customers Associate with This Product

 (What's this?)
Click on a tag to find related items, discussions, and people.
Check the boxes next to the tags you consider relevant or enter your own tags in the field below.
(7)
(2)
(2)

Your tags: Add your first tag
 
Help others find this product — tag it for Amazon search
No one has tagged this product for Amazon search yet. Why not be the first to suggest a search for which it should appear?

Sell a Digital Version of This Book in the Kindle Store

If you are a publisher or author and hold the digital rights to a book, you can sell a digital version of it in our Kindle Store. Learn more

 

Customer Reviews

14 Reviews
5 star:
 (7)
4 star:
 (6)
3 star:    (0)
2 star:
 (1)
1 star:    (0)
 
 
 
 
 
Average Customer Review
4.4 out of 5 stars (14 customer reviews)
 
 
 
 
Share your thoughts with other customers:
Most Helpful Customer Reviews

 
26 of 26 people found the following review helpful:
4.0 out of 5 stars Good book with a light start, February 14, 2004
By A Williams "honestpuck" (Neutral Bay, NSW Australia) - See all my reviews
(REAL NAME)      
The `Hacks' series from O'Reilly seems to be breeding as fast as virii in a Windows network - every time you turn around another one. While the writing and editing have remained high some such as `eBay Hacks' have not really had great material. `Spidering Hacks' is an improvement almost back to the quality I remember in the last contribution from Calishain, `Google Hacks'.

She and Kevin Hemenway have taken a fairly complex topic, spidering and scraping web sites and reduced it to manageable chunks in their hundred hacks. The writing has the same light, readable feel you can quickly grow to expect from O'Reilly. Certainly I have never found myself faulting their editing.

There are some caveats. It seems that O'Reilly and Dornfest (the Editor of this book and the series) have fallen in love with having a hundred hacks and little in the way of an introduction. I think this may have been a better book if it was done as 90 `hacks' and had a much larger introduction as the first chapters hacks are all too light and more truly introductory material such as how a HTML page is built and how to properly register your spider. Given that only someone with a fair amount of web knowledge is going to consider spidering a website in the first place then this early material is way too slight. From Hack 9 on it quickly gets down to useful and informative chunks in each and no longer feels `lightweight'.

This may be a reflection on trying to extend the `Hacks' series into places it has to be forced. While the format worked well for Google and Amazon I felt the entire topic of eBay too light for a topic in this series and perhaps spidering is too heavy or complex. If this book had been written in a more traditional format some of my complaints would disappear.

All the examples are in Perl and the serious part of the book starts with examples using LWP::Simple to grab a page before going on to LWP::UserAgent and much more complex requests using authentication, custom headers and posting form data. It also covers using curl and wget.

Then it gets down to the nitty gritty of scraping using HTML:Treebuilder and HTML:TokeParser. This is all further expanded through the next few hacks until starting at Hack 39 through to 89 there are a good series of examples (perhaps a few too many). Finally there are two chapters on maintaining your collection and `Giving Back To The World' which tells how to make it easy to scrape your site and using RSS.

O'Reilly have a page for the book with ten example hacks, index, Table of Contents and errata and you can also visit hacks.oreilly.com for the same ten hacks with the possibility of more being added.

As a whole this volume seems a little thin. If you've been doing the maths then you've realised that only about thirty of the hundred hacks actually give any details on building and running a serious web spider. Sure, a number of the examples provide good information on how to perform various tasks and some of the last eleven hacks are good to know but in all the book feels like it lacks solid information throughout. A bit more information on various crawling and page parsing techniques would have been good.

After that criticism I'm now surprising myself, I'm going to recommend this book. This isn't a large field and when you consider that most other books on writing spiders and crawlers are less than practical and more than expensive "Spidering Hacks" has many good points. It's written for the practical Perl programmer, it examines several methods and gives lots of examples and while not cheap it's certainly inexpensive. Given that I found it both useful and inspiring the complaints above may be a little like nitpicking. I should also say that I found this volume immensely useful in writing my own spider and scraper (it gets a list of new books from the web sites of several publishers.) I have to be honest and admit that there are three publishers, O'Reilly, Addison Wesley and Prentice Hall, from whom I expect a decent standard and criticise a little harder when they move from that norm. If this book had come from SAMS or Wrox I may well have not looked quite so hard for flaws and been a little more generous in my treatment of the ones I found.

That said, I recommend this book to you if you want a practical introduction to building a web spider in Perl.

Comment Comment | Permalink | Was this review helpful to you? Yes No (Report this)



 
21 of 21 people found the following review helpful:
4.0 out of 5 stars Many examples of how to use spiders, April 8, 2004
By W Boudville (Terra, Sol 3) - See all my reviews
(TOP 50 REVIEWER)    (REAL NAME)      
The book has a nice collection of case studies on how to gather data from disparate websites. You might consider this as showing a simple way for you to use Web Services.

Spidering is the way that search engines gather their data. But you do not have to be Altavista or Google to use spiders. Nor do you have to be scanning a large fraction of the Web. The authors demistify spiders. If you can follow their examples, then you get concrete instances of usage that might help your particular application.

Thoughtfully, the examples are mostly written in Perl, with a few in Java. These languages should be familiar to many. Though even if you don't know them, the logic of the code can still be useful. (That is, you can treat the code as pseudocode.)

While spiders are probably best known as being used by search engines, they are really only the starting point for the latter. The much harder problems start when you have the data amassed by a spider. Now you have to efficiently find correlations between the various web pages. You should be aware that the book does not discuss these with any significant depth. Not surprising, because these are outside the scope of the book. The examples do show how to use the data found by spiders. But most of these are for web pages that sit in a given domain. So the pages are closely affiliated in content and structure.

Comment Comment (1) | Permalink | Was this review helpful to you? Yes No (Report this)



 
17 of 17 people found the following review helpful:
5.0 out of 5 stars Lots of great ideas, March 22, 2004
Once in a long while you get a book that inspires you with a lot of great small ideas. Spidering Hacks is just that type of book. The web has a wealth of structured and semi-structured that is just waiting to be mined with automated tools. This book not only teaches you how to get the data out of these sources, but gives you idea about where to look for information and what to do with it.

This book demonstrates everything I like in a technical book. It not only describes how things are done. It also gives practical examples of how the technology can be useful in the real world, and presents them enthusiastically. It makes you want to go out and implement all of the ideas and to keep on going with some of your own.

Nitpicks I have with the book are minor. The 'Hacks' format seems imposed, for example, hack #8 is about installing CPAN. I don't think that section should be left out, but I don't think it's a hack either. But hey, I don't care that much about the structure as long as it isn't an imposing flaw and the content within the structure is great, as it is with this book.

Have to say, O'Reilly is on a roll with the Hacks series. They have all been fine books.

Comment Comment | Permalink | Was this review helpful to you? Yes No (Report this)


Share your thoughts with other customers: Create your own review
 
 
Ad
 
Most Recent Customer Reviews

4.0 out of 5 stars Non-Fiction
Definitely a very useful book.

If you pick one of these books up, from this series, and learn just one thing, or something that saves you some time, it is well worth... Read more
Published 14 months ago by Blue Tyson

5.0 out of 5 stars One of My Favorite 'Hacks' Books
I bought this book shortly after it came out, and actually still refer to it from time to time. This is just another book that shows you how powerful Perl can be when in the... Read more
Published 18 months ago by M. Gorsuch

4.0 out of 5 stars Very good book
This book has a strong perl focus, so make sure you want to use perl. Otherwise, it's a great book with plenty of examples on integrating website data into your site.
Published 21 months ago by Gary A. Mort

5.0 out of 5 stars Perl-intensive book on web crawler design
A spider (also known as a web crawler or web robot) is a program which browses the World Wide Web in a methodical, automated manner. Read more
Published on May 16, 2006 by calvinnme

2.0 out of 5 stars what is in a name?
well, sometimes a generalizing lie.
.
IMHO, this book should have been named "(some) Spidering Hacks using Perl"
. Read more
Published on December 29, 2005 by Onetitfemme

4.0 out of 5 stars Good, but needs more variety of languages
Nearly all of the examples were written using Perl, but the few pages written with PHP contained some very useful nuggets! Read more
Published on November 12, 2004 by Kevin Carlson

5.0 out of 5 stars Example-filled and easy-to-follow
The knowledgeable collaboration of Kevin Hemenway and Tara Calishain, Spidering Hacks: 100 Industrial-Strength Tips & Tools is an extensive, 402-page instructional guidebook... Read more
Published on March 7, 2004 by Midwest Book Review

4.0 out of 5 stars Rich samples, fit your specific needs if you're Perl lover
If you are a Perl lover and looking for a book to help you extracting contents from this huge resourceful Internet, this book quite fits your needs. Read more
Published on February 25, 2004 by Otto Yuen

5.0 out of 5 stars Great Book
Are you ready to be the next Google? It is widely known that Google pulled out in front of (and largely obsoleted) major search engine players like Altavista and Yahoo largely... Read more
Published on January 5, 2004

5.0 out of 5 stars A fresh idea
Spidering hacks like other oreilly "hacks" books live up to the tradition. This book shows some of the internet guru tips and tricks. Read more
Published on December 29, 2003

Only search this product's reviews



Customer Discussions

 Beta (What's this?)
New! See all customer communities, and bookmark your communities to keep track of them.
This product's forum (0 discussions)
  Discussion Replies Latest Post
  No discussions yet

Ask questions, Share opinions, Gain insight
Start a new discussion
Topic:
First post:
Prompts for sign-in
  [Cancel]


Active discussions in related forums
  Discussion Replies Latest Post
Free Blog/IM Novel 0 2 days ago
   


Product Information from the Amapedia Community

Beta (What's this?)


Look for Similar Items by Category


Perfect Programming

Shop for programmable thermostats

Install a programmable thermostat to help reduce heating costs by ensuring your home is heated optimally. Shop for name-brand thermostats, including Honeywell and Lux, in Home Improvement.

Shop all programmable thermostats

 

Big Savings in Books

Bargain Books
Find great titles at fantastic prices in our Bargain Books Store.
 

Make Room for Your Car

Shop for Utility Cabinets
Clear the clutter from your garage with garage storage cabinets from the Storage & Home Organization Store.

Shop for garage storage cabinets

 

Best Books

Best of the Month
See our editors' picks and more of the best new books on our Best of the Month page.
 
Ad

 

Feedback

If you need help or have a question for Customer Service, contact us.
 Would you like to update product info or give feedback on images?
Is there any other feedback you would like to provide?

Your comments can help make our site better for everyone.


Where's My Stuff?

Shipping & Returns

Need Help?

Your Recent History

  (What's this?)
You have no recently viewed items or searches.

After viewing product detail pages or search results, look here to find an easy way to navigate back to pages you are interested in.

Look to the right column to find helpful suggestions for your shopping session.

Continue shopping: Top Sellers
Free
Free by Chris Anderson
Paranoia
Paranoia by Joseph Finder
My Soul to Lose
My Soul to Lose by Rachel Vincent
Darkfever
Darkfever by Karen Marie Moning

Conditions of Use | Privacy Notice © 1996-2009, Amazon.com, Inc. or its affiliates