Industrial Deals HPC Best Books of the Month Introducing Prime Wardrobe nav_sap_plcc_ascpsc Weekly One GNO for Samsung S9 Starting at $39.99 Grocery Handmade Personalized Jewelry Home and Garden Book a house cleaner for 2 or more hours on Amazon howardsend howardsend howardsend  Echo Fire tablets: Designed for entertainment Kindle Paperwhite GNO Shop now TG18PP_gno

on August 3, 2014
I learned a lot about bot and web scraping software. It was easy to put the scripts into place and then to manipulate them to my own means. The provided API made things super easy. If you have a basic understanding of PHP then you should be fine when putting these API's to work and then updating them to your own specifications. The sites that I have used the scrapers on that show the parameters in the query string have been very easy to work with.

It would be great to see an update to the book that specifically speaks to forms on .asp pages and how the parameters are passed from the client to the server. I have not been able to get a scraper to work with those types of pages.
0Comment| One person found this helpful. Was this review helpful to you? Report abuse
on August 25, 2007
This book covers every aspect I could ever hope a book on web bots would cover. It goes into great detail and provides lots of background information about things such as why you should use web bots, security issues, how to authenticate a bot with password protected sites, writing search engine crawlers, parsing HTML, how to handle cookies, HTTP headers, dealing with forms and a lot more.

I was very pleased with how this book covered concepts. The book uses PHP and the cURL library as a teaching tool instead of trying to give a lesson in how to use PHP as a crawler language. The way the code is explained makes it very easy to translate into whatever language you are most comfortable coding in. The book uses fundamental functional programming concepts which make it easy to pick up the general idea without actually knowing PHP.

My boss bought this book to help my group us with a project we were working on, and even my co-workers who had no background with PHP were able to use this book to write a web bot in C# (using the cURL library) very easily. The concepts from this book easily transfered over to object-oriented concepts.
0Comment| 10 people found this helpful. Was this review helpful to you? Report abuse
on October 30, 2011
I was hoping for more from this book due to the reviews but it's not all that. As pointed out, it's has solid basics.

I did download some of the material to check it out and tried a few things. If you do not know PHP and want to get started webscraping as your primary goal, this book would be for you.

However if you're like me, I've been programming a rather complex database driven personal site for over 8 months and learning PHP. Previously, I had very succussfully used PERL for webscraping and I became interested in the webscraping possibilites of PHP decided to check it out. This book did not add much to what I'd already learned in the past 8+ months; although, it does have a decent jumpstart guide to CURL.
0Comment| One person found this helpful. Was this review helpful to you? Report abuse
on December 5, 2007
"Webbots, Spiders, adn Screen Scrapers" is a solid book for building basic scripts to do web scraping. Michael Schrenk goes covers the "should you do this" aspect very well, and devotes much of the book to these kinds of topics. On that reason alone I give him major kudos, "just because you CAN do a thing, doesn't mean you SHOULD."

Technically the book and examples are very basic and beginner level. All code is procedural and has absolutely no references to object oriented programming at all. This is great for a simple project, but building anything larger than a targetted webbot or two is beyond the scope of this book.

I was very dismayed at Mr. Schrenk's opinion of regular expressions:
"The use of regular expressions is a parsing language in itself, and most modern programming languages support aspects of regular expressions. In the right hands, regular expressions are also useful for parsing and substituting text; however, they are famous for thier sharp learning curve and cryptic syntax. I avoid regular expressions whenever possible."

This disregard for regular expressions effectively wipes out a powerful toolset for budding developers. Regular expressions are no harder to learn than PHP. The reasons for his disdain for them is also flawed:

"The regular expression engine used by PHP is not as efficient as engines used in other languages, and is certainly less efficient than PHP's built-in functions for parsing HTML."

PHP uses the same regular expression engine used (very effectively) in PERL with the use of the preg_* functions. There has been many studies that show preg_* style expressions outperform basic text matching in PHP. In this assesment the author is terribly wrong.

The book does a great job of explaining how to make single use scripts for scraping, but never how to create a larger infrastructure. There is no focus on creating multi process engines with pcntl_fork(), or proc_open(), these are critical for scaling web scraping applications. A single script scraping a few hundred websites on a single thread would take ages over a multi-threaded engine.

If you are looking to break into web scraping and not sure where to start, this is likely the best (and possibly only) book on the market. If you are intermediate or advanced you will quickly question the author's logic and see that scaling will become the number one issue you have to over come.
55 comments| 38 people found this helpful. Was this review helpful to you? Report abuse
on November 16, 2008
I won't re-iterate the excellent reviews already posted on this book, other than to say this is probably my favorite all-time programming book: excellently written, highly informative, and full of smarts. Plus, if you're not too aware of how powerful and useful web spiders can be, this is the book for you.
0Comment| One person found this helpful. Was this review helpful to you? Report abuse
on August 31, 2010
If you want to 'automate' your browsing then this is a great book, with examples for every conceivable application. My only grumble is that, for me at least, it needs a chapter giving the step by step installation process for PHP/CURL so as to get up and running quickly.
11 comment| One person found this helpful. Was this review helpful to you? Report abuse
on January 14, 2008
This book is simply awesome. You will need to come armed with at least a basic knowledge of php, but everything is pretty straight forward. The projects are well explained and applicable to a wide range of projects that you might be getting yourself into.
0Comment| 2 people found this helpful. Was this review helpful to you? Report abuse
on April 13, 2007
I waited months for this book to come out and the wait was worth it. This is a great introduction to webbots , spiders and scrapers. The writing is easy and never boring. Lots of code examples and resources to tap into. I couldn't put it down. When was the last time you got a computer book that made you run to the keyboard to try something out?

I'm sure there will be some a#$h@#e that will say it's too rudimentory. It's an intro and it takes you up to intermediate and explains stuff about PHP that I didn't even know existed. Definitely worth the money. I can't wait for the sequel.
0Comment| 18 people found this helpful. Was this review helpful to you? Report abuse
on October 8, 2009
This is a great introduction on the subject. The supplied PHP library does all the work.
0Comment| One person found this helpful. Was this review helpful to you? Report abuse
on March 5, 2012
This is a review of Michael's 2nd Edition of the same book (I received an early release copy from the publisher, I did not have an opportunity to read the 1st edition):

I thoroughly enjoy this book. I found myself glued to this topic, I have heard about it many times before just never investigated it. This is "good stuff" and I missed out by not starting earlier. The author, Michael Schrenk knows his stuff and is passionate about his craft and it shows in the way he writes. All throughout his book his excitement about how incredible this technology is, and his use of these tools in creative ways is contagious. I like to read books by authors who are so enthusiastic about their subject matter, as oppose to just droning out facts and knowledge. Reading this book was exciting and addicting. Following along, tinkering with his examples was just play fun. His excitement and ingenious way of looking at things just rubs off, even before I got to the real-world examples the ideas just started flowing. It's like I just discovered the next BIG THING, but I'm not going to shared that here.

He does a great job of explaining everything in step by step details and then compliments them with photos and diagrams to aide with comprehension. His code examples are simple and it was easy to see what was going on. His code examples are written in an imperative, or procedural style as oppose to an object oriented style, which in my opinion, is better suited when teaching new or difficult concepts. Also, it's just easier to follow along by a wider range of people with varying programming backgrounds. He also provides his own supplemental library (via the book website), to simplify using cURL itself. Using his library, I was able to quickly get things up and running and see how everything works, and that is a good thing when learning something new. It sets you on a possible spin and leaves you with nothing but good stuff to say about the subject you just learned. In the end, would I recommend this book to others? Absolutely. It is just like learning the command line, once you start and see the benefits, you never look back.
0Comment|Was this review helpful to you? Report abuse

Need customer service? Click here