Recognize and Handle Data in Formats Used by Swedish Cancer Centers • incadata

output: github_document

The goal of incadata is to provide basic functionality to handle data from INCA and the Regional cancer centers in Sweden.

You can install the released version of incadata from CRAN with:

install.packages("incadata")

And the development version from BitBucket with:

# install.packages("remotes")
remotes::install_bitbucket("cancercentrum/incadata")

The function as.incadata standardize data from INCA and Rockan:

All date formats used by Rockan are recognized as dates and coerced to such (for example: 1985-05-04, "", 19850504, 19850500 , 19850000 and 8513).
Boolean values are numeric vectors in INCA: c(0, 1, 0, 1, 0, 0), but coerced to character when exported: c(NA, "True", NA, "True", NA, NA). The package recognize this peculiarity and coerce to Boolean.
Personal identity numbers are recognized even if they end with X et cetera (used in Rockan).
Standard numerical codes from Rockan are decoded (using the decoder package).
Column names are always coerced to lower case, since these are generally easier to work with.
Data frames are coerced to tibbles .
An id column is always added to data frames in order to always have an identification variable at hand (regardless if the data has none or one of PERSNR, PNR or PAT_ID)

The package also provides functionality for easier access and archiving of register documentation (se vignette âincadocâ) and function documents.

The package also lets you:

cache data between work sessions to speed up the data loading and munging process
use a single data reading/munging function regardless if you work on INCA or locally
interactively engage in the coercing process of variable formats. This is handy for example if a variable is almost a date but has some additional entries that are not recognized as such.