Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
Web Scraping with Python (Community Experience Distilled) Paperback – October 28, 2015
There is a newer edition of this item:
The Amazon Book Review
Author interviews, book reviews, editors picks, and more. Read it now
Frequently bought together
Customers who bought this item also bought
Customers who viewed this item also viewed
About the Author
Richard Lawson is from Australia and studied Computer Science at the University of Melbourne. Since graduating, he built a business specializing at web scraping while traveling the world, working remotely from over 50 countries. He is a fluent Esperanto speaker, conversational at Mandarin and Korean, and active in contributing to and translating open source software. He is currently undertaking postgraduate studies at Oxford University and in his spare time enjoys developing autonomous drones.
If you buy a new print edition of this book (or purchased one in the past), you can buy the Kindle edition for only $2.99 (Save 83%). Print edition purchase must be sold by Amazon. Learn more.
For thousands of qualifying books, your past, present, and future print-edition purchases now lets you buy the Kindle edition for $2.99 or less. (Textbooks available for $9.99 or less.)
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Packt offers the most uneven of technical material, and this is one that really requires editing. Hilariously, Packt's reviewers/editors don't speak English natively, or if they do, it's hard to believe they've finished grade school.
Go with the o'reilly book on web scraping; it's no gem, but the code works.
Why is software so poorly designed? Because software/technical writers like Lawson lack an orderly approach to their work.
Ultimately, a disappointment made more bitter by the promise.
Specifically about python one should know how to load and use functions from a module and how to install new modules.
The book gives a world wind tour of the process and existing python tools to enable: web crawling (how to grab html content from a server) and
web scraping (how to extract actual data from the content)
The first two chapters introduce the basic web scraping scenario and the later chapters introduce complications to the process and tools/approaches to handle them.
The book provides a great high level idea of web scraping and provides a self contained python starter kit to get up and running. It’s a short read at 175 pages with very accessible content with links to get more detailed documentation. The site also offers a practice website to try out scraping techniques.
I came to the book with previous exposure to various python web scraping tools that I pieced together from the web (from tutorials, blogs, stackoverflow). I really enjoyed the self contained nature of the whole web scraping process that the book provides. This book would have saved me a LOT of time and pain had it been available when I got started with web scraping. Having said that, I still learned a few things – specifically the CAPTCHA tools and some basic utilities that I can employ in my day to day python.
I recommend this book for anyone who is just starting out with web scraping or is already familiar with scraping but wants to learn how to employ python to the cause. Once you read the book, there is significant opportunity to explore the individual tools further.
Typically python libs dealing with url fetching have very simple examples of the parser.parse(<html>"hello world"</html>) type, so you have no idea what to do with a real web page, like a 'needs login' page.
The author, to his credit, does not tell you to download a 'magic' python library.In these cases he gives a thorough walk-through on how to research the structure or scripting of the page and then go about fetching it via python. Most of the book is in fact devoted to analysis, rather than action.
The only parts that didn't seem particularly useful to me, were the chapters on creating crawlers and spiders. I don't see myself doing any of that and I don't see why an amateur or professional would use them. A professional would probably use Elasticsearch for instance.
Other than that the book will probably be useful for a long time.
But, almost every other examples are working well( (I'm starting chapter 4 now). I think this book has very good resources for anyone interested in web scraping. At least, you can practice with a real example website.
The O'reilly web scraping book and this one can be complementary. No book guides perfectly.
Notice: The example webpage addresses changed a little. Therefore you should change the addresses in your code, too.
I would definitely recommend this book.