Batteries Not Included

IID observations from a universe of data

dockerterm

This details my motivation and efforts in creating dockerterm, an R package the provides an RStudio Addin for running a Docker container in the RStudio terminal. Recently I’ve been using Docker quite a bit with various projects I’m working on. I’ve always liked the idea of Docker, but as I’ve become more familiar with it I’ve started to really appreciate its magic. As a “way too brief” overview, Docker provides a convenient way to manage system dependencies by creating lightweight virtual environments known as containers.

The FutuRe is Bright

What a rush. I’m still trying to fully process everything that happened at RStudio::conf. To be short, it was an incredible and transformative experience. Rather than just blast through everything that happened chronologically at the conference and everything I enjoyed (which would could go on forever, the list is long), I’ll instead highlight five general takeaways/themes from the conference. In a subsequent post I’ll outline personal goals and takeaways that came from the conference as well.

Ad Hoc Testing

TL;DR testthat provides a convenient and easy to use unit test framework for R. While traditionally used as a formal part of package development, it can also be used interactively. Ad hoc test suites can be run as functions within an R session to quickly test the impact of code changes. I use this workflow when writing parsing functions for HTML data. Introduction Like all Hadley Wickham creations, testthat is a wonderful tool that generally improves the lives of R users.

Scraping Friends

TL;DR HTML data can be messy and difficult to work with. Tools from the tidyverse (like dlpyr, purrr, and rvest) make this process much easier, althought creating clean data from HTML takes time and patience. Ad hoc testing can be used to quickly evaluate the accuracy of an HTML parsing function. Clean data is well worth the time and effort required to obtain/create it. Getting Started This post outlines the process of scraping and cleaning the scripts to every Friends TV episode.

Summer of Data Science 2017

I first learned of #SoDS17 through Mara Averick and was further enlighted by Data Science Renee’s tweet Here's how to participate in the Summer of Data Science #SoDS17: — Data Science Renee (@BecomingDataSci) May 29, 2017 and accompanying blog post. As described, the basic premis is simple: set a goal to learn something new in the broad data science domain and make an effort to share the journey with others.