Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.

  • Apple
  • Android
  • Windows Phone
  • Android

To get the free app, enter your mobile phone number.

Spidering Hacks 1st Edition

4.2 out of 5 stars 19 customer reviews
ISBN-13: 978-0596005771
ISBN-10: 0596005776
Why is ISBN important?
ISBN
This bar-code number lets you verify that you're getting exactly the right version or edition of a book. The 13-digit and 10-digit formats both work.
Scan an ISBN with your phone
Use the Amazon App to scan ISBNs and compare prices.
Have one to sell? Sell on Amazon
Buy used On clicking this link, a new layer will be open
$4.30 On clicking this link, a new layer will be open
Buy new On clicking this link, a new layer will be open
$20.90 On clicking this link, a new layer will be open
More Buying Choices
34 New from $9.97 57 Used from $0.01 2 Collectible from $20.00
Free Two-Day Shipping for College Students with Prime Student Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student


Windows10ForDummiesVideo
Windows 10 For Dummies Video Training
Get up to speed with Windows 10 with this video training course from For Dummies. Learn more.
$20.90 FREE Shipping on orders with at least $25 of books. Only 5 left in stock (more on the way). Ships from and sold by Amazon.com. Gift-wrap available.
click to open popover

Frequently Bought Together

  • Spidering Hacks
  • +
  • Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
Total price: $50.86
Buy the selected items together

Editorial Reviews

About the Author

Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine, he'd love to give you a Fry Pan of Intellect upside the head. Politely, of course. And with love.

Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.

NO_CONTENT_IN_FEATURE

The latest book club pick from Oprah
"The Underground Railroad" by Colson Whitehead is a magnificent novel chronicling a young slave's adventures as she makes a desperate bid for freedom in the antebellum South. See more

Product Details

  • Paperback: 424 pages
  • Publisher: O'Reilly Media; 1 edition (November 1, 2003)
  • Language: English
  • ISBN-10: 0596005776
  • ISBN-13: 978-0596005771
  • Product Dimensions: 6 x 1 x 9 inches
  • Shipping Weight: 1.4 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (19 customer reviews)
  • Amazon Best Sellers Rank: #1,290,219 in Books (See Top 100 in Books)

Customer Reviews

Top Customer Reviews

By W Boudville HALL OF FAMEVINE VOICE on April 8, 2004
Format: Paperback
The book has a nice collection of case studies on how to gather data from disparate websites. You might consider this as showing a simple way for you to use Web Services.
Spidering is the way that search engines gather their data. But you do not have to be Altavista or Google to use spiders. Nor do you have to be scanning a large fraction of the Web. The authors demistify spiders. If you can follow their examples, then you get concrete instances of usage that might help your particular application.
Thoughtfully, the examples are mostly written in Perl, with a few in Java. These languages should be familiar to many. Though even if you don't know them, the logic of the code can still be useful. (That is, you can treat the code as pseudocode.)
While spiders are probably best known as being used by search engines, they are really only the starting point for the latter. The much harder problems start when you have the data amassed by a spider. Now you have to efficiently find correlations between the various web pages. You should be aware that the book does not discuss these with any significant depth. Not surprising, because these are outside the scope of the book. The examples do show how to use the data found by spiders. But most of these are for web pages that sit in a given domain. So the pages are closely affiliated in content and structure.
1 Comment 24 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
Once in a long while you get a book that inspires you with a lot of great small ideas. Spidering Hacks is just that type of book. The web has a wealth of structured and semi-structured that is just waiting to be mined with automated tools. This book not only teaches you how to get the data out of these sources, but gives you idea about where to look for information and what to do with it.
This book demonstrates everything I like in a technical book. It not only describes how things are done. It also gives practical examples of how the technology can be useful in the real world, and presents them enthusiastically. It makes you want to go out and implement all of the ideas and to keep on going with some of your own.
Nitpicks I have with the book are minor. The 'Hacks' format seems imposed, for example, hack #8 is about installing CPAN. I don't think that section should be left out, but I don't think it's a hack either. But hey, I don't care that much about the structure as long as it isn't an imposing flaw and the content within the structure is great, as it is with this book.
Have to say, O'Reilly is on a roll with the Hacks series. They have all been fine books.
Comment 18 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
By A Customer on January 5, 2004
Format: Paperback
Are you ready to be the next Google? It is widely known that Google pulled out in front of (and largely obsoleted) major search engine players like Altavista and Yahoo largely because of Google's highly accurate search results -- you find what you search for. They are so confident in their search engine spiders they even have a "I'm feeling lucky" button to transport you to the first search result found -- it's arrogance, but well deserved arrogance. In a sentence, Google works.
Enter Kevin Hemenway and Tara Calishain's latest O'Reilly book: Spidering Hacks. Continuing in the Oreilly "Hacks" tradition, this comprehensive guidebook provides a hundred clear, useful tools for designing and implementing the next generation -- or maybe just your own customized -- spider (or bot, if you prefer.)
So why build your own spider? Well, if you have a large website, your spider could check link integrity, HTML standards and check meta-tags. If you are researching a topic and Google is not returning what you want, creating your own spider might be just what you need. This handy book (with examples in Perl) will show you how to:
* Create a site-friendly bot that wont get you banned by webmasters (Hack #16 --Respecting your Scrapee's Bandwidth, and Hack # 17 -- Respecting robots.txt)
* Interested in graphics, audio and video? Hacks #33 through #42 step you through collecting media files. Specific examples including scraping films from [...] (Hack #24), gathering movies from the Library of Congress (Hack #35) and archiving images from Webshots. You'll have your own personalized library in no time.
* Weblog-Free Google Results -- Weblogs (aka Blogs) are amazingly popular these days.
Read more ›
Comment 15 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
The knowledgeable collaboration of Kevin Hemenway and Tara Calishain, Spidering Hacks: 100 Industrial-Strength Tips & Tools is an extensive, 402-page instructional guidebook and reference to Internet data retrieval through the use of spiders and scrapers. Including information on methodology, philosophies, and ethical considerations, as well as freely available modules, scripts, frameworks, and templates, information on how to build alternative interfaces to online databases, how to keep one's data current and share it in a user-friendly manner, and so much more, Spidering Hacks is an example-filled, easy-to-follow, highly recommended computer shelf resource.
Comment 8 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse
Format: Paperback
If you are a Perl lover and looking for a book to help you extracting contents from this huge resourceful Internet, this book quite fits your needs. Overall is good, the author shows you how to setup your spidering tools -- Perl modules. Yes, Perl, if you're Java folks, too bad. He shows you how to use Perl modules on crawling web pages, logging on to systems, extracting specific contents, and massaging data to your needs, across 100 different scenarios. Most of them are practical, but they don't cover much of the details, you have to read the programs listed in the book, which is quite painful for non-Perl people like me. In addition, it doesn't provide much of resulting screen shots after running the sample codes. Most importantly, the author tries to avoid the copyright questions by delegating URL links for readers to reference. In general, it's still a good tool book in spidering field.
Comment 14 people found this helpful. Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback.
Sorry, we failed to record your vote. Please try again
Report abuse

Most Recent Customer Reviews

Set up an Amazon Giveaway

Spidering Hacks
Amazon Giveaway allows you to run promotional giveaways in order to create buzz, reward your audience, and attract new followers and customers. Learn more about Amazon Giveaway
This item: Spidering Hacks