Rxivist logo

Automated data extraction from historical city directories the rise and fall of mid-century gas stations in Providence, RI

By Samuel Bell, Thomas Marlow, Kai Wombacher, Anina Hitt, Neev Parikh, Andras Zsom, Scott Frickel

Posted 12 Jul 2019
bioRxiv DOI: 10.1101/701136 (published DOI: 10.1371/journal.pone.0220219)

The location of defunct environmentally hazardous businesses like gas stations has many implications for modern American cities. To track down these locations, we present the directoreadr code (github.com/brown-ccv/directoreadr). Using scans of Polk city directories from Providence, RI, directoreadr extracts and parses business location data with a high degree of accuracy. The image processing pipeline ran without any human input for 94.4% of the pages we examined. For the remaining 5.6%, we processed them with some human input. Through hand-checking a sample of three years, we estimate that ~94.6% of historical gas stations are correctly identified and located, with historical street changes and non-standard address formats being the main drivers of errors. As an example use, we look at gas stations, finding that gas stations were most common early in the study period in 1936, beginning a sharp and steady decline around 1950. We are making the dataset produced by directoreadr publicly available. We hope it will be used to explore a range of important questions about socioeconomic patterns in Providence and cities like it during the transformations of the mid-1900s.

Download data

  • Downloaded 201 times
  • Download rankings, all-time:
    • Site-wide: 107,756
    • In ecology: 3,628
  • Year to date:
    • Site-wide: 123,916
  • Since beginning of last month:
    • Site-wide: 115,888

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)