- Paperback: 392 pages
- Publisher: No Starch Press; Second Edition edition (March 12, 2012)
- Language: English
- ISBN-10: 1593273975
- ISBN-13: 978-1593273972
- Product Dimensions: 7 x 1 x 9.2 inches
- Shipping Weight: 1.6 pounds (View shipping rates and policies)
- Average Customer Review: 37 customer reviews
- Amazon Best Sellers Rank: #809,971 in Books (See Top 100 in Books)
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL Second Edition Edition
Use the Amazon App to scan ISBNs and compare prices.
All Books, All the Time
Read author interviews, book reviews, editors picks, and more at the Amazon Book Review. Read it now
Frequently bought together
Customers who bought this item also bought
About the Author
Michael Schrenk develops webbots and spiders for clients across North America. He has written for Computerworld and Web Techniques magazines and has taught college courses on web usability and Internet marketing. He is also an occasional speaker at DEFCON.
Top customer reviews
1) There is a web site associated with the book (Kindel edition), and it's supposed to contain snippets for all the code examples. Not all the examples are, in fact, there, but more important...
2) One of the very first examples is actually wrong - using a function get_with_header() when the sample output clearly had to be dervied by using http_header() - which gets only the header. If you're a complete beginner, this could cause you endless grief. And maybe it's the only bad example, but it's worth noting.
3) Neither the book nor the website spends much effort at helping you get a PHP environment set up. Other than ht eauthor referring you to php.net, you're on your own. And if you successfully navigate to a place from which you can download PHP, the array of choices is quite daunting - unless you're already an expert, in which case some of the book would be redundant. I've got 45 years of programming under my belt, and I did the installation OK; but a nice, simple set of hints as to which flavor of PHP to download, and which lines in php.ini to change in order to enable the features the book relies on would have been nice.
4) That having been said, I disagree with some of the other reviewers as to the value of the custom libraries that the book relies on. The raw PHP/cURL has a daunting set of features, flags, and options. The libraries provide a much more learner-friendly interface - a way to get the concepts without being overwhelmed by things that can be safely ignored. If PHP/cURL is particle physics, this book's libraries are Newtonian - more than good enough to learn why apples fall on your head.
I'll update this review when I've finished the book.
With that said I found the book to be very useful, the author shares very useful information and can save a newbie scraper hours of head scratching. The code library is also very useful. I hope the author comes out with a 3rd addition that's error free.
Author also provides plenty of example code which is a very nice touch. You get to see the inner workings of web spiders. Overall it's a book worth reading.
I had never heard of cURL at the time I purchased this book, and was a slave to the web browser "OK" button. I did however have experience in applications programming with C, C++ and Java as standalone, non-networked applications, along with very limited JSP and servlet knowledge. For me it was obvious that I could write a script that accessed files over a network. Any network! If an application or browser can ... enough said.
What this book is all about is letting people (who think that the "Internet lives inside a web browser") know that there is more. It's informing those bound by web browsers to gather information on the www that there is another way. A way that can be automated without you having to click a button every step of the way. If this is news to you, then this book was written to let you know and give you some ideas of things you can do. I hadn't previously heard of cURL, but the idea was not foreign. I just wanted to find out how it was done. What I learned from this book can be summarised in one word (actually an acronym) "cURL". I did the first exercise in the book just to see it work and now the book sits on the shelf and has never been touched since. Was I disappointed? YES!!!!
But why is this a good book? It's not written for experienced coders or command line buffs. It is written to open a world to those who didn't know about web automation, or have heard of it and want to get a start. The title is very deceptive for someone who has developed web applications before. I don't think intentionally either, as the tone of the book and the authors discussions aren't directed towards an audience of programmers. The real value of the book is introducing web automation as an alternative to relentlessly click buttons in a browser.
Maybe the title should be "Webbots, Spiders, and Screen Scrapers: An Introduction to automated web activities using PHP & cURL". This may better elude to the level and intention of the book. You can use this book with no programming experience, only a little initiative to pick it up along the way. Obviously PHP knowledge gets you a long way and if you already know cURL there's not much in this book for you outside of a few pieces of advice that are not coding related. Things like making you think about how your activities are recorded in a server's log.
If you are new to programming and wish your computer could automatically do some mundane web stuff you're sick of clicking on all the time ... read the book, that world exists.
If you are a confident programmer and your imagination bugs you constantly to get your computer to do all of your work ... "cURL".
If you already know something about web automation ... use your imagination.
He could also include Selenium instead of macros, overall I found outdated and suited for more beginners in the domain.