Large Scale Web Scraping: Building a Toolkit to Scrape All the Things!

At previous Code4Lib conferences, there have been wonderful workshops showcasing a variety of tools and methodologies useful for web scraping. However, every tool has its limitations. In this workshop, we will demonstrate some of these limitations via a case study approach. Using this demonstration, we will highlight the need for a robust toolkit of scraping tools and methodologies when collecting for archival and research purposes. This will include a discussion of best practices when it comes to the ethics of scraping.

Then we will use practical, hands-on demonstrations to assist workshop participants in implementing the tools and methodologies discussed, enabling them to leave the workshop with an understanding of how to add these tools into their scraping repertoires.

Participants should have some experience with Python and command line interfaces for the hands on portion of the workshop.


Room: Omni Shoreham Hotel Congressional B