Introduction

Registers managed by the Swedish Cancer centers (quality registers and the cancer register) can have date variables in different formats. This is most evident for old registers handled by Rockan.

The as.Dates function

The function as.Dates (plural) is intended to replace the ordinary as.Date (singular) function from base R.

Missing dates

RCC dates are usually in the form %Y-%m-%d, such as 2020-04-09. These are recognized by ordinary R-functions such as as.Date if there are no missing values or if missing values are coded as NA. It is however common for RCC that missing dates are coded as empty strings:

library(incadata)
## Loading required package: decoder
d <- c("", "2016-06-17")
as.Date(d)
## Error in charToDate(x): character string is not in a standard unambiguous format

as.Dates on the other hand has no problem:

## [1] NA           "2016-06-17"

Missing day and/or month

Date from old Rockan registers can also have missing days and months (coded as “00”). as.Dates will recognize these dates and will replace “00” by an approximate date:

as.Date("2000-01-00")  # as.Date fails!
## Error in charToDate(x): character string is not in a standard unambiguous format
as.Dates("2000-01-00") # Missing day
## [1] "2000-01-15"
as.Dates("2000-00-00") # Missing month and day
## [1] "2000-07-01"

Year and week

Some old dates can have format %Y%V (see ?strptime), such as “7403” for week 3 in 1974. This is tricky for four reasons:

  • Exact date is unknown but has to be approximated.
  • Different countries numerate weeks of year differently. Sweden use ISO-8601 (“If the week [starting on Monday] containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1.”) but R does not.
  • Procedures for week number also differs between operating systems.
as.Date("7403")
## Error in charToDate(x): character string is not in a standard unambiguous format
as.Dates("7403")
## [1] "1974-01-17"

It is also possible to have a mixture of different dates within the same vector:

as.Dates(c("", NA, "2000-01-01", "20000101", "20000000", "7403"))
## [1] NA           NA           "2000-01-01" "2000-01-01" "2000-07-01"
## [6] "1974-01-17"