Programming Books C Java PHP Python Learn more Browse Programming Books
Qty:1
  • List Price: $29.99
  • Save: $7.85 (26%)
FREE Shipping on orders over $35.
Only 6 left in stock (more on the way).
Ships from and sold by Amazon.com.
Gift-wrap available.
FREE Shipping on orders over $35.
Used: Very Good | Details
Sold by PF Media
Condition: Used: Very Good
Comment: May have light edge or corner wear, binding is tight, pages are unmarked. Eligible for FREE Super Saving Shipping! Fast Amazon shipping plus a hassle free return policy mean your satisfaction is guaranteed! Tracking number provided with every order.
Access codes and supplements are not guaranteed with used items.
Have one to sell? Sell on Amazon
Flip to back Flip to front
Listen Playing... Paused   You're listening to a sample of the Audible audio edition.
Learn more
See all 2 images

Spidering Hacks Paperback – November 1, 2003

ISBN-13: 063-6920005773 ISBN-10: 0596005776 Edition: 1st

Buy New
Price: $22.14
39 New from $6.03 48 Used from $0.01 4 Collectible from $9.98
Amazon Price New from Used from
Paperback
"Please retry"
$22.14
$6.03 $0.01
Free%20Two-Day%20Shipping%20for%20College%20Students%20with%20Amazon%20Student


Frequently Bought Together

Spidering Hacks + Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL + Mining the Web: Discovering Knowledge from Hypertext Data
Price for all three: $124.96

Some of these items ship sooner than the others.

Buy the selected items together

NO_CONTENT_IN_FEATURE

Shop the new tech.book(store)
New! Introducing the tech.book(store), a hub for Software Developers and Architects, Networking Administrators, TPMs, and other technology professionals to find highly-rated and highly-relevant career resources. Shop books on programming and big data, or read this week's blog posts by authors and thought-leaders in the tech industry. > Shop now

Product Details

  • Paperback: 424 pages
  • Publisher: O'Reilly Media; 1 edition (November 1, 2003)
  • Language: English
  • ISBN-10: 0596005776
  • ISBN-13: 978-0596005771
  • Product Dimensions: 1 x 5.8 x 8.7 inches
  • Shipping Weight: 1.3 pounds (View shipping rates and policies)
  • Average Customer Review: 4.2 out of 5 stars  See all reviews (18 customer reviews)
  • Amazon Best Sellers Rank: #183,421 in Books (See Top 100 in Books)

Editorial Reviews

About the Author

Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine, he'd love to give you a Fry Pan of Intellect upside the head. Politely, of course. And with love.

Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.


More About the Authors

Discover books, learn about writers, read author blogs, and more.

Customer Reviews

Overall is good, the author shows you how to setup your spidering tools -- Perl modules.
Otto Yuen
If you pick one of these books up, from this series, and learn just one thing, or something that saves you some time, it is well worth it.
Blue Tyson
That said, I recommend this book to you if you want a practical introduction to building a web spider in Perl.
A Williams

Most Helpful Customer Reviews

32 of 33 people found the following review helpful By A Williams on February 14, 2004
Format: Paperback
The `Hacks' series from O'Reilly seems to be breeding as fast as virii in a Windows network - every time you turn around another one. While the writing and editing have remained high some such as `eBay Hacks' have not really had great material. `Spidering Hacks' is an improvement almost back to the quality I remember in the last contribution from Calishain, `Google Hacks'.
She and Kevin Hemenway have taken a fairly complex topic, spidering and scraping web sites and reduced it to manageable chunks in their hundred hacks. The writing has the same light, readable feel you can quickly grow to expect from O'Reilly. Certainly I have never found myself faulting their editing.
There are some caveats. It seems that O'Reilly and Dornfest (the Editor of this book and the series) have fallen in love with having a hundred hacks and little in the way of an introduction. I think this may have been a better book if it was done as 90 `hacks' and had a much larger introduction as the first chapters hacks are all too light and more truly introductory material such as how a HTML page is built and how to properly register your spider. Given that only someone with a fair amount of web knowledge is going to consider spidering a website in the first place then this early material is way too slight. From Hack 9 on it quickly gets down to useful and informative chunks in each and no longer feels `lightweight'.
This may be a reflection on trying to extend the `Hacks' series into places it has to be forced. While the format worked well for Google and Amazon I felt the entire topic of eBay too light for a topic in this series and perhaps spidering is too heavy or complex. If this book had been written in a more traditional format some of my complaints would disappear.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
24 of 24 people found the following review helpful By W Boudville HALL OF FAMETOP 1000 REVIEWERVINE VOICE on April 8, 2004
Format: Paperback
The book has a nice collection of case studies on how to gather data from disparate websites. You might consider this as showing a simple way for you to use Web Services.
Spidering is the way that search engines gather their data. But you do not have to be Altavista or Google to use spiders. Nor do you have to be scanning a large fraction of the Web. The authors demistify spiders. If you can follow their examples, then you get concrete instances of usage that might help your particular application.
Thoughtfully, the examples are mostly written in Perl, with a few in Java. These languages should be familiar to many. Though even if you don't know them, the logic of the code can still be useful. (That is, you can treat the code as pseudocode.)
While spiders are probably best known as being used by search engines, they are really only the starting point for the latter. The much harder problems start when you have the data amassed by a spider. Now you have to efficiently find correlations between the various web pages. You should be aware that the book does not discuss these with any significant depth. Not surprising, because these are outside the scope of the book. The examples do show how to use the data found by spiders. But most of these are for web pages that sit in a given domain. So the pages are closely affiliated in content and structure.
1 Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
18 of 19 people found the following review helpful By Jack D. Herrington on March 22, 2004
Format: Paperback
Once in a long while you get a book that inspires you with a lot of great small ideas. Spidering Hacks is just that type of book. The web has a wealth of structured and semi-structured that is just waiting to be mined with automated tools. This book not only teaches you how to get the data out of these sources, but gives you idea about where to look for information and what to do with it.
This book demonstrates everything I like in a technical book. It not only describes how things are done. It also gives practical examples of how the technology can be useful in the real world, and presents them enthusiastically. It makes you want to go out and implement all of the ideas and to keep on going with some of your own.
Nitpicks I have with the book are minor. The 'Hacks' format seems imposed, for example, hack #8 is about installing CPAN. I don't think that section should be left out, but I don't think it's a hack either. But hey, I don't care that much about the structure as long as it isn't an imposing flaw and the content within the structure is great, as it is with this book.
Have to say, O'Reilly is on a roll with the Hacks series. They have all been fine books.
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again
15 of 17 people found the following review helpful By A Customer on January 5, 2004
Format: Paperback
Are you ready to be the next Google? It is widely known that Google pulled out in front of (and largely obsoleted) major search engine players like Altavista and Yahoo largely because of Google's highly accurate search results -- you find what you search for. They are so confident in their search engine spiders they even have a "I'm feeling lucky" button to transport you to the first search result found -- it's arrogance, but well deserved arrogance. In a sentence, Google works.
Enter Kevin Hemenway and Tara Calishain's latest O'Reilly book: Spidering Hacks. Continuing in the Oreilly "Hacks" tradition, this comprehensive guidebook provides a hundred clear, useful tools for designing and implementing the next generation -- or maybe just your own customized -- spider (or bot, if you prefer.)
So why build your own spider? Well, if you have a large website, your spider could check link integrity, HTML standards and check meta-tags. If you are researching a topic and Google is not returning what you want, creating your own spider might be just what you need. This handy book (with examples in Perl) will show you how to:
* Create a site-friendly bot that wont get you banned by webmasters (Hack #16 --Respecting your Scrapee's Bandwidth, and Hack # 17 -- Respecting robots.txt)
* Interested in graphics, audio and video? Hacks #33 through #42 step you through collecting media files. Specific examples including scraping films from [...] (Hack #24), gathering movies from the Library of Congress (Hack #35) and archiving images from Webshots. You'll have your own personalized library in no time.
* Weblog-Free Google Results -- Weblogs (aka Blogs) are amazingly popular these days.
Read more ›
Comment Was this review helpful to you? Yes No Sending feedback...
Thank you for your feedback. If this review is inappropriate, please let us know.
Sorry, we failed to record your vote. Please try again

Most Recent Customer Reviews