Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your email address or mobile phone number.

Spidering Hacks 1st Edition

4.2 out of 5 stars 19 customer reviews
ISBN-13: 978-0596005771
ISBN-10: 0596005776
Why is ISBN important?
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
Have one to sell? Sell on Amazon
Buy used On clicking this link, a new layer will be open
$4.20 On clicking this link, a new layer will be open
Buy new On clicking this link, a new layer will be open
$22.28 On clicking this link, a new layer will be open
More Buying Choices
28 New from $3.00 43 Used from $0.01 3 Collectible from $9.99
Free Two-Day Shipping for College Students with Amazon Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student

Save Up to 90% on Textbooks Textbooks
$22.28 FREE Shipping on orders over $35. Only 8 left in stock (more on the way). Ships from and sold by Amazon.com. Gift-wrap available.

Frequently Bought Together

  • Spidering Hacks
  • +
  • Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
  • +
  • Wicked Cool PHP: Real-World Scripts That Solve Difficult Problems
Total price: $78.15
Buy the selected items together

Editorial Reviews

About the Author

Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine, he'd love to give you a Fry Pan of Intellect upside the head. Politely, of course. And with love.

Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.


Like this book? Find similar titles from O'Reilly and Partners in our O'Reilly Bookstore.

Product Details

  • Paperback: 424 pages
  • Publisher: O'Reilly Media; 1 edition (November 1, 2003)
  • Language: English
  • ISBN-10: 0596005776
  • ISBN-13: 978-0596005771
  • Product Dimensions: 6 x 1 x 9 inches
  • Shipping Weight: 1.3 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (19 customer reviews)
  • Amazon Best Sellers Rank: #795,780 in Books (See Top 100 in Books)

More About the Authors

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

Top Customer Reviews

Format: Paperback
The `Hacks' series from O'Reilly seems to be breeding as fast as virii in a Windows network - every time you turn around another one. While the writing and editing have remained high some such as `eBay Hacks' have not really had great material. `Spidering Hacks' is an improvement almost back to the quality I remember in the last contribution from Calishain, `Google Hacks'.
She and Kevin Hemenway have taken a fairly complex topic, spidering and scraping web sites and reduced it to manageable chunks in their hundred hacks. The writing has the same light, readable feel you can quickly grow to expect from O'Reilly. Certainly I have never found myself faulting their editing.
There are some caveats. It seems that O'Reilly and Dornfest (the Editor of this book and the series) have fallen in love with having a hundred hacks and little in the way of an introduction. I think this may have been a better book if it was done as 90 `hacks' and had a much larger introduction as the first chapters hacks are all too light and more truly introductory material such as how a HTML page is built and how to properly register your spider. Given that only someone with a fair amount of web knowledge is going to consider spidering a website in the first place then this early material is way too slight. From Hack 9 on it quickly gets down to useful and informative chunks in each and no longer feels `lightweight'.
This may be a reflection on trying to extend the `Hacks' series into places it has to be forced. While the format worked well for Google and Amazon I felt the entire topic of eBay too light for a topic in this series and perhaps spidering is too heavy or complex. If this book had been written in a more traditional format some of my complaints would disappear.
Read more ›
Comment 33 of 34 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
The book has a nice collection of case studies on how to gather data from disparate websites. You might consider this as showing a simple way for you to use Web Services.
Spidering is the way that search engines gather their data. But you do not have to be Altavista or Google to use spiders. Nor do you have to be scanning a large fraction of the Web. The authors demistify spiders. If you can follow their examples, then you get concrete instances of usage that might help your particular application.
Thoughtfully, the examples are mostly written in Perl, with a few in Java. These languages should be familiar to many. Though even if you don't know them, the logic of the code can still be useful. (That is, you can treat the code as pseudocode.)
While spiders are probably best known as being used by search engines, they are really only the starting point for the latter. The much harder problems start when you have the data amassed by a spider. Now you have to efficiently find correlations between the various web pages. You should be aware that the book does not discuss these with any significant depth. Not surprising, because these are outside the scope of the book. The examples do show how to use the data found by spiders. But most of these are for web pages that sit in a given domain. So the pages are closely affiliated in content and structure.
1 Comment 24 of 24 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
Once in a long while you get a book that inspires you with a lot of great small ideas. Spidering Hacks is just that type of book. The web has a wealth of structured and semi-structured that is just waiting to be mined with automated tools. This book not only teaches you how to get the data out of these sources, but gives you idea about where to look for information and what to do with it.
This book demonstrates everything I like in a technical book. It not only describes how things are done. It also gives practical examples of how the technology can be useful in the real world, and presents them enthusiastically. It makes you want to go out and implement all of the ideas and to keep on going with some of your own.
Nitpicks I have with the book are minor. The 'Hacks' format seems imposed, for example, hack #8 is about installing CPAN. I don't think that section should be left out, but I don't think it's a hack either. But hey, I don't care that much about the structure as long as it isn't an imposing flaw and the content within the structure is great, as it is with this book.
Have to say, O'Reilly is on a roll with the Hacks series. They have all been fine books.
Comment 18 of 19 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
By A Customer on January 5, 2004
Format: Paperback
Are you ready to be the next Google? It is widely known that Google pulled out in front of (and largely obsoleted) major search engine players like Altavista and Yahoo largely because of Google's highly accurate search results -- you find what you search for. They are so confident in their search engine spiders they even have a "I'm feeling lucky" button to transport you to the first search result found -- it's arrogance, but well deserved arrogance. In a sentence, Google works.
Enter Kevin Hemenway and Tara Calishain's latest O'Reilly book: Spidering Hacks. Continuing in the Oreilly "Hacks" tradition, this comprehensive guidebook provides a hundred clear, useful tools for designing and implementing the next generation -- or maybe just your own customized -- spider (or bot, if you prefer.)
So why build your own spider? Well, if you have a large website, your spider could check link integrity, HTML standards and check meta-tags. If you are researching a topic and Google is not returning what you want, creating your own spider might be just what you need. This handy book (with examples in Perl) will show you how to:
* Create a site-friendly bot that wont get you banned by webmasters (Hack #16 --Respecting your Scrapee's Bandwidth, and Hack # 17 -- Respecting robots.txt)
* Interested in graphics, audio and video? Hacks #33 through #42 step you through collecting media files. Specific examples including scraping films from [...] (Hack #24), gathering movies from the Library of Congress (Hack #35) and archiving images from Webshots. You'll have your own personalized library in no time.
* Weblog-Free Google Results -- Weblogs (aka Blogs) are amazingly popular these days.
Read more ›
Comment 15 of 17 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse

Most Recent Customer Reviews

Set up an Amazon Giveaway

Amazon Giveaway allows you to run promotional giveaways in order to create buzz, reward your audience, and attract new followers and customers. Learn more
Spidering Hacks
This item: Spidering Hacks
Price: $22.28
Ships from and sold by Amazon.com

Want to discover more products? Check out these pages to see more: web crawler software, web crawler