Reading Time: 3 minutes
The silent [and maybe mortal?] trap in bracket subsetting.
It should be clear to you that, as several other programming languages, R provides different ways to tackle the same problem. One common problem in data analysis is to subset your data frame and, as Google can show you, there are several blog posts and articles trying to teach you different ways to subset your data frame in R. Let’s do a quick review here:
Before starting to subset a data frame, we must first create one. I will create a data frame of patients named var_example with two columns, one for vital status (is_alive) and one for birth year (birthyear). Birth year values are 4-digit numbers representing the year of birth. The is_alive column can have one of three values:
- TRUE: The person is alive;
- FALSE: The person is dead;
- NA: We do not know if this person is either alive or dead.
> var_example <- cbind(as.data.frame(sample(c(NA, TRUE, FALSE),
prob = c(0.1, 0.5, 0.4))),
> colnames(var_example) <- c("is_alive", "birthyear")