Webbots, Spiders, and Screen Scrapers and over 450,000 other books are available for Amazon Kindle – Amazon’s new wireless reading device. Learn more

14 used & new from $30.02

Have one to sell? Sell yours here
 
 
Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
 
 
Start reading Webbots, Spiders, and Screen Scrapers on your Kindle in under a minute.

Don’t have a Kindle? Get your Kindle here.
 
  

Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL (Paperback)

~ Michael Schrenk (Author)
Key Phrases: insertion parse, login criteria, stealthy webbots, Done Figure, Mozilla Firefox, Bidder's Edge (more...)
4.7 out of 5 stars  See all reviews (16 customer reviews)


Available from these sellers.


5 new from $46.95 9 used from $30.02

Formats

Amazon Price New from Used from
Kindle Edition $17.57  
Paperback --  

Customers Who Bought This Item Also Bought


Editorial Reviews

Product Description

The Internet is bigger and better than what a mere browser allows. Webbots, Spiders, and Screen Scrapers is for programmers and businesspeople who want to take full advantage of the vast resources available on the Web. There's no reason to let browsers limit your online experience--especially when you can easily automate online tasks to suit your individual needs.

Learn how to write webbots and spiders that do all this and more:

  • Programmatically download entire websites
  • Effectively parse data from web pages
  • Manage cookies
  • Decode encrypted files
  • Automate form submissions
  • Send and receive email
  • Send SMS alerts to your cell phone
  • Unlock password-protected websites
  • Automatically bid in online auctions
  • Exchange data with FTP and NNTP servers

    Sample projects using standard code libraries reinforce these new skills. You'll learn how to create your own webbots and spiders that track online prices, aggregate different data sources into a single web page, and archive the online data you just can't live without. You'll learn inside information from an experienced webbot developer on how and when to write stealthy webbots that mimic human behavior, tips for developing fault-tolerant designs, and various methods for launching and scheduling webbots. You'll also get advice on how to write webbots and spiders that respect website owner property rights, plus techniques for shielding websites from unwanted robots.

    Some tasks are just too tedious--or too important!-- to leave to humans. Once you've automated your online life, you'll never let a browser limit the way you use the Internet again.
  • About the Author

    Michael Schrenk develops webbots and spiders for clients across North America. He has written for Computerworld and Web Techniques magazines and has taught college courses on web usability and Internet marketing. He's also an occasional speaker at DEFCON.


    Product Details

    • Paperback: 328 pages
    • Publisher: No Starch Press; Annotated edition (March 30, 2007)
    • Language: English
    • ISBN-10: 1593271204
    • ISBN-13: 978-1593271206
    • Product Dimensions: 9.1 x 6.9 x 0.9 inches
    • Shipping Weight: 1.4 pounds
    • Average Customer Review: 4.7 out of 5 stars  See all reviews (16 customer reviews)
    • Amazon.com Sales Rank: #350,596 in Books (See Bestsellers in Books)

      Popular in this category: (What's this?)

      #99 in  Books > Computers & Internet > Web Development > Programming > PHP

    More About the Author

    Michael Schrenk
    Discover books, learn about writers, read author blogs, and more.

    Visit Amazon's Michael Schrenk Page

    Inside This Book (learn more)



    What Do Customers Ultimately Buy After Viewing This Item?

    Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
    62% buy the item featured on this page:
    Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL 4.7 out of 5 stars (16)
    Wicked Cool PHP: Real-World Scripts That Solve Difficult Problems
    17% buy
    Wicked Cool PHP: Real-World Scripts That Solve Difficult Problems 4.5 out of 5 stars (17)
    $19.77
    Spidering Hacks
    14% buy
    Spidering Hacks 4.4 out of 5 stars (15)
    $16.47
    Practical Web 2.0 Applications with PHP
    4% buy
    Practical Web 2.0 Applications with PHP 4.5 out of 5 stars (19)
    $30.34

    Tags Customers Associate with This Product

     (What's this?)
    Click on a tag to find related items, discussions, and people.
     
    (7)
    (6)

    Your tags: Add your first tag
     

     

    Customer Reviews

    16 Reviews
    5 star:
     (14)
    4 star:    (0)
    3 star:
     (1)
    2 star:
     (1)
    1 star:    (0)
     
     
     
     
     
    Average Customer Review
    4.7 out of 5 stars (16 customer reviews)
     
     
     
     
    Share your thoughts with other customers:
    Most Helpful Customer Reviews

     
    26 of 28 people found the following review helpful:
    2.0 out of 5 stars Does the basics., December 5, 2007
    By Brian "eateroftheham" (Crown Point, IN United States) - See all my reviews
    "Webbots, Spiders, adn Screen Scrapers" is a solid book for building basic scripts to do web scraping. Michael Schrenk goes covers the "should you do this" aspect very well, and devotes much of the book to these kinds of topics. On that reason alone I give him major kudos, "just because you CAN do a thing, doesn't mean you SHOULD."

    Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.

    I was very dismayed at Mr. Schrenk's opinion of regular expressions:
    "The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."

    This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:

    "The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."

    PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.

    The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.

    If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.
    Help other customers find the most helpful reviews  
    Was this review helpful to you? Yes No


     
    22 of 24 people found the following review helpful:
    3.0 out of 5 stars Solid introduction to webbots, with a catch., April 27, 2007
    By Paul M. Reinheimer "Author" (Montréal, Quebec, Canada) - See all my reviews
    (REAL NAME)   
    I picked up this book full of enthusiasm, spiders are just plain cool, they go out and start downloading data for you, reading webpages, and even understanding them a little. My enthusiasm was dashed a little however on page four: You may use any of the scripts in this book for your own personal use, as long as you agree not to redistribute them... and agree not to sell or create derivative products under any circumstances.. I develop in PHP professionally, and a lot of the code I write ends up getting used somewhere with some sort of a for-profit basis, which pretty effectively prevents me from using any code between the covers (at its strictest reading, Im not sure I can even change the code).

    The book does a great job of introducing different sorts of web agents that you can create programatically (more than just spiders) and introduces all sorts of interesting projects along those lines. Throughout the book a series of libraries written by the author are leveraged to make the retrieval and parsing of the various pages much easier. While newer developers will enjoy being able to concentrate on the big picture I found myself itching for more information on the nitty gritty.

    Some of the projects explored include: price monitoring, image capturing (want to be your own google image search? :) ), link verification, spiders, and snipers. Each of the different projects received its own chapter, and effectively covered a lot of the topics covered within.

    Overall, I would recommend this book to beginner to intermediate PHP developers looking to tackle the world of web agents, its a good primer on the related topics, and at the very least will give you some ideas on the complexities involved. As their skill grows they will probably find them-self either moving past the libraries included with the book, or modifying them greatly. My biggest complaint is the lack of coverage on the robots.txt file, some talk is given to it in terms of blocking robots from your own site, but I didnt see any code that actually dealt with parsing it for your own robot.
    Help other customers find the most helpful reviews  
    Was this review helpful to you? Yes No


     
    16 of 18 people found the following review helpful:
    5.0 out of 5 stars WOW, WOW, WOW! I'll say it again...WOW!, April 13, 2007
    By J. Dadlez "Dadio" (Riverside, CA.) - See all my reviews
    (REAL NAME)   
    I waited months for this book to come out and the wait was worth it. This is a great introduction to webbots , spiders and scrapers. The writing is easy and never boring. Lots of code examples and resources to tap into. I couldn't put it down. When was the last time you got a computer book that made you run to the keyboard to try something out?

    I'm sure there will be some a#$h@#e that will say it's too rudimentory. It's an intro and it takes you up to intermediate and explains stuff about PHP that I didn't even know existed. Definitely worth the money. I can't wait for the sequel.
    Help other customers find the most helpful reviews  
    Was this review helpful to you? Yes No

    Share your thoughts with other customers: Create your own review
     
     
     
    Most Recent Customer Reviews

    5.0 out of 5 stars Best for this subject
    The power of this book is not so much in it's code examples but rather in it's ability to change your perspective. Read more
    Published 1 month ago by Dwight R. Schofield

    5.0 out of 5 stars Great introduction
    This is a great introduction on the subject. The supplied PHP library does all the work.
    Published 5 months ago by introfini

    5.0 out of 5 stars This book is useful
    This book is not like very algorithmic, but you can know the basic of webbots writing and some techniques involved. Read more
    Published 13 months ago by Ching C. Nang

    5.0 out of 5 stars Great Basic Book
    Need to learn how to browse the web with your own software instead of manually browsing? The is the best book on the subject. Read more
    Published 15 months ago by Joe Todd

    5.0 out of 5 stars a super introduction to web spiders
    I won't re-iterate the excellent reviews already posted on this book, other than to say this is probably my favorite all-time programming book: excellently written, highly... Read more
    Published 16 months ago by Yannick Pouliot

    5.0 out of 5 stars :-) bots
    This book is a great reference and/or introduction to the cURL library. After reading this book, I realized it is not intended as a single solution for bot programming. Read more
    Published 19 months ago by C. D. Cox

    5.0 out of 5 stars Excellent Source
    I can't say enough about this book. It's informative, laid out well, dynamic examples and has an awesome website tie-in. Read more
    Published 19 months ago by nita gale

    5.0 out of 5 stars Excellent cURL primer
    This is an excellent book used as an introduction to the cURL library. The author has created a set of his own functions that are well written and, with the help of the book, easy... Read more
    Published 21 months ago by M. Strong

    5.0 out of 5 stars barry naice!
    This book is simply awesome. You will need to come armed with at least a basic knowledge of php, but everything is pretty straight forward. Read more
    Published on January 14, 2008 by J. S. Garfield

    5.0 out of 5 stars Must buy for any Webbot programmer
    great book. very well organized and code in book is available for download and code is well documented
    Published on October 17, 2007 by Varun Krishnan

    Only search this product's reviews



    Customer Discussions

    This product's forum
    Discussion Replies Latest Post
    Download the book's software libraries 0 September 2007
    Meet the author at DEFCON XV in Las Vegas (Aug 3-5) 0 July 2007
    See all 2 discussions...  
    Start a new discussion
    Topic:
    First post:
    Prompts for sign-in
     


    Active discussions in related forums
    Search Customer Discussions
       



    So You'd Like to...


    Create a guide

    Product Information from the Amapedia Community

    Beta (What's this?)


    Look for Similar Items by Category


    Look for Similar Items by Subject

     

    Feedback

    If you need help or have a question for Customer Service, contact us.
     Would you like to update product info or give feedback on images?
    Is there any other feedback you would like to provide?

    Your comments can help make our site better for everyone.



    Your Recent History

     (What's this?)

    After viewing product detail pages or search results, look here to find an easy way to navigate back to pages you are interested in.