The goal of datacheck is to scans datasets for potential sensitive information that could be considered privacy violations according to data protection laws (e.g., GDPR, HIPAA). This is a tool to help screen datasets that researchers may want to openly share online. It can also be used as a research tool to scan datasets for potential privacy violations.

The dataset itself is scanned in the R environment and is not sent to any remote server; it happens 100% with the package and on the person’s device.

Installation

You can install the development version of datacheck from GitHub with:

# install.packages("devtools")
devtools::install_github("libscie/datacheck")