Amazon.com: Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL (9781593271206): Michael Schrenk: Books
Webbots, Spiders, and Screen Scrapers and over one million other books are available for Amazon Kindle. Learn more

Kindle Edition
 
   
Sell Back Your Copy
For a $18.38 Gift Card
Trade in
Have one to sell? Sell yours here
Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
 
 
Start reading Webbots, Spiders, and Screen Scrapers on your Kindle in under a minute.

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL [Paperback]

Michael Schrenk (Author)
4.5 out of 5 stars  See all reviews (17 customer reviews)


Available from these sellers.


Textbook Student FREE Two-Day Shipping for students on millions of items. Learn more

Formats

Amazon Price New from Used from
Kindle Edition $17.57  
Paperback --  
Sell Back Your Copy for $18.38
Whether you buy it used on Amazon for $37.62 or somewhere else, you can sell it back through our Book Trade-In Program at the current price of $18.38.
Used Price$37.62
Trade-in Price$18.38
Price after
Trade-in
$19.24
There is a newer edition of this item:
Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
$21.79
Available for Pre-order

Book Description

March 30, 2007 1593271204 978-1593271206 Annotated

The Internet is bigger and better than what a mere browser allows. Webbots, Spiders, and Screen Scrapers is for programmers and businesspeople who want to take full advantage of the vast resources available on the Web. There's no reason to let browsers limit your online experience-especially when you can easily automate online tasks to suit your individual needs.

Learn how to write webbots and spiders that do all this and more:

Programmatically download entire websites Effectively parse data from web pages Manage cookies Decode encrypted files Automate form submissions Send and receive email Send SMS alerts to your cell phone Unlock password-protected websites Automatically bid in online auctions Exchange data with FTP and NNTP servers

Sample projects using standard code libraries reinforce these new skills. You'll learn how to create your own webbots and spiders that track online prices, aggregate different data sources into a single web page, and archive the online data you just can't live without. You'll learn inside information from an experienced webbot developer on how and when to write stealthy webbots that mimic human behavior, tips for developing fault-tolerant designs, and various methods for launching and scheduling webbots. You'll also get advice on how to write webbots and spiders that respect website owner property rights, plus techniques for shielding websites from unwanted robots.

As a bonus, visit the author's website to test your webbots on sample target pages, and to download the scripts and code libraries used in the book.

Some tasks are just too tedious-or too important!- to leave to humans. Once you've automated your online life, you'll never let a browser limit the way you use the Internet again.



Editorial Reviews

About the Author

Michael Schrenk uses webbots and data-driven web applications to create competitive advantages for businesses. He has written for Computerworld and Web Techniques magazines and has taught courses on Web usability and Internet marketing. He has also given presentations on intelligent Web agents and online corporate intelligence at the DEFCON hacker's convention.


Product Details

  • Paperback: 328 pages
  • Publisher: No Starch Press; Annotated edition (March 30, 2007)
  • Language: English
  • ISBN-10: 1593271204
  • ISBN-13: 978-1593271206
  • Product Dimensions: 9.2 x 7 x 1 inches
  • Shipping Weight: 1.4 pounds
  • Average Customer Review: 4.5 out of 5 stars  See all reviews (17 customer reviews)
  • Amazon Best Sellers Rank: #449,247 in Books (See Top 100 in Books)

More About the Author

Michael Schrenk is a software developer, author and instructor. He specializes in automated web browsing agents known as webbots. His book, "Webbots, Spiders, & Screen Scrapers" (2007, No Starch Press, San Francisco) is the definitive work on the subject.

Michael uses the Internet in new and innovative (odd?) ways to provide competitive advantages for his clients in The US, Europe and Asia.

He also helps journalists more effectively use computers to conduct online research through automation and by describing where and how to find otherwise hidden online information. No stranger to Europe--he's lived in Moscow and Madrid, Mike taught at the 2008 European Investigative Journalism Conference (Brussels Belgium), twice in 2009 he lectured at The Center for Investigative Journalism (London England) and later in 2009, he lead several sessions at the VVOJ Journalism conference (Utrecht The Netherlands).

Last August, Mike made his fourth speaking appearance at the DEFCON computer hacking conference. Mike lives in sunny Las Vegas, Nevada (USA). You can contact him at http://www.schrenk.com

Watch this book's video at:
http://www.youtube.com/watch?v=B7z7tP74RxQ

 

Customer Reviews

17 Reviews
5 star:
 (13)
4 star:
 (1)
3 star:
 (2)
2 star:
 (1)
1 star:    (0)
 
 
 
 
 
Average Customer Review
4.5 out of 5 stars (17 customer reviews)
 
 
 
 
Share your thoughts with other customers:
Most Helpful Customer Reviews

32 of 35 people found the following review helpful:
2.0 out of 5 stars Does the basics., December 5, 2007
By 
Brian "eateroftheham" (Crown Point, IN United States) - See all my reviews
This review is from: Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL (Paperback)
"Webbots, Spiders, adn Screen Scrapers" is a solid book for building basic scripts to do web scraping. Michael Schrenk goes covers the "should you do this" aspect very well, and devotes much of the book to these kinds of topics. On that reason alone I give him major kudos, "just because you CAN do a thing, doesn't mean you SHOULD."

Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.

I was very dismayed at Mr. Schrenk's opinion of regular expressions:
"The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."

This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:

"The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."

PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.

The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.

If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


24 of 26 people found the following review helpful:
3.0 out of 5 stars Solid introduction to webbots, with a catch., April 27, 2007
By 
Paul M. Reinheimer "Author" (Montréal, Quebec, Canada) - See all my reviews
(REAL NAME)   
This review is from: Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL (Paperback)
I picked up this book full of enthusiasm, spiders are just plain cool, they go out and start downloading data for you, reading webpages, and even understanding them a little. My enthusiasm was dashed a little however on page four: You may use any of the scripts in this book for your own personal use, as long as you agree not to redistribute them... and agree not to sell or create derivative products under any circumstances.. I develop in PHP professionally, and a lot of the code I write ends up getting used somewhere with some sort of a for-profit basis, which pretty effectively prevents me from using any code between the covers (at its strictest reading, Im not sure I can even change the code).

The book does a great job of introducing different sorts of web agents that you can create programatically (more than just spiders) and introduces all sorts of interesting projects along those lines. Throughout the book a series of libraries written by the author are leveraged to make the retrieval and parsing of the various pages much easier. While newer developers will enjoy being able to concentrate on the big picture I found myself itching for more information on the nitty gritty.

Some of the projects explored include: price monitoring, image capturing (want to be your own google image search? :) ), link verification, spiders, and snipers. Each of the different projects received its own chapter, and effectively covered a lot of the topics covered within.

Overall, I would recommend this book to beginner to intermediate PHP developers looking to tackle the world of web agents, its a good primer on the related topics, and at the very least will give you some ideas on the complexities involved. As their skill grows they will probably find them-self either moving past the libraries included with the book, or modifying them greatly. My biggest complaint is the lack of coverage on the robots.txt file, some talk is given to it in terms of blocking robots from your own site, but I didnt see any code that actually dealt with parsing it for your own robot.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No


9 of 9 people found the following review helpful:
5.0 out of 5 stars Great Book with Lots of Information, August 25, 2007
This review is from: Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL (Paperback)
This book covers every aspect I could ever hope a book on web bots would cover. It goes into great detail and provides lots of background information about things such as why you should use web bots, security issues, how to authenticate a bot with password protected sites, writing search engine crawlers, parsing HTML, how to handle cookies, HTTP headers, dealing with forms and a lot more.

I was very pleased with how this book covered concepts. The book uses PHP and the cURL library as a teaching tool instead of trying to give a lesson in how to use PHP as a crawler language. The way the code is explained makes it very easy to translate into whatever language you are most comfortable coding in. The book uses fundamental functional programming concepts which make it easy to pick up the general idea without actually knowing PHP.

My boss bought this book to help my group us with a project we were working on, and even my co-workers who had no background with PHP were able to use this book to write a web bot in C# (using the cURL library) very easily. The concepts from this book easily transfered over to object-oriented concepts.
Help other customers find the most helpful reviews 
Was this review helpful to you? Yes No

Share your thoughts with other customers: Create your own review
 
 
 
Most Recent Customer Reviews











Only search this product's reviews



Inside This Book (learn more)
Key Phrases - Statistically Improbable Phrases (SIPs): (learn more)
insertion parse, login criteria, stealthy webbots, target webserver, header redirections, curl setopt, referer variable, spider script, curl library, harvest links, parse array, form analyzer, target web page, page redirection, subject web page, form handler, robots meta tag, screen scrapers, target website, diagnostic page, http codes, temporary cookies, raw message, page base, search ranking
Key Phrases - Capitalized Phrases (CAPs): (learn more)
Done Figure, Mozilla Firefox, Bidder's Edge, Poker Robot, Task Scheduler, Terms of Service, File Folder, Sabrina Duncan, United States, Arriba Soft, Jill Monroe, Internet Explorer, Tue Mar, Control Panel, Kelly Garrett Culver City, Request Diagnostic Page, Request Method, Text Document
New!
Books on Related Topics | Concordance | Text Stats
Browse Sample Pages:
Front Cover | Table of Contents | First Pages | Index | Back Cover | Surprise Me!
Search Inside This Book:



What Other Items Do Customers Buy After Viewing This Item?


Tags Customers Associate with This Product

 (What's this?)
Click on a tag to find related items, discussions, and people.
 
(7)
(6)

Your tags: Add your first tag
 

Customer Discussions

This product's forum
Discussion Replies Latest Post
Download the book's software libraries 0 Sep 27, 2007
Meet the author at DEFCON XV in Las Vegas (Aug 3-5) 0 Jul 5, 2007
See all 2 discussions...  
Start a new discussion
Topic:
First post:
Prompts for sign-in
 


Active discussions in related forums
Search Customer Discussions
   
Related forums



So You'd Like to...


Create a guide


Look for Similar Items by Category


Look for Similar Items by Subject