Tidyverse

Scraping Friends

TL;DR HTML data can be messy and difficult to work with. Tools from the tidyverse (like dlpyr, purrr, and rvest) make this process much easier, althought creating clean data from HTML takes time and patience. Ad hoc testing can be used to quickly evaluate the accuracy of an HTML parsing function. Clean data is well worth the time and effort required to obtain/create it. Getting Started This post outlines the process of scraping and cleaning the scripts to every Friends TV episode.