http://www.strengejacke.de/sjPlot/labelleddata/

Working with labelled Data {sjmisc}

This document shows basic usage of the sjmisc package and how to work with labelled data.

Ressources:

  • Download package from CRAN

  • Developer snapshot at GitHub

  • Submission of bug reports and issues at GitHub

(back to table of content)

The sjmisc-Package

Basically, this package covers three domains of functionality:

  • reading and writing data between other statistical packages (like SPSS) and R, based on the haven and foreign packages

  • hence, sjmisc also includes function to work with labelled data

  • frequently applied recoding and variable conversion tasks

Labelled Data

In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly:

factor(c("low", "high", "mid", "high", "low"))
## [1] low  high mid  high low 
## Levels: high low mid

Reading SPSS-data (from haven, foreign or sjmisc), keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the sjmisc-package:

library(sjmisc)data(efc)str(efc$e42dep)
##  atomic [1:908] 3 3 3 4 4 4 4 4 4 4 ...
##  - attr(*, "label")= chr "elder's dependency"
##  - attr(*, "labels")= Named num [1:4] 1 2 3 4
##   ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

While all plotting and table functions of the sjPlot-package make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics:

library(sjmisc)data(efc)barplot(table(efc$e42dep, efc$e16sex), 
        beside = T, 
        legend.text = T)

unnamed-chunk-3-1.png

As you can see in the above figure, the plot has neither axis nor legend labels.

Adding value labels as factor values

to_label is a sjmisc-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled.

barplot(table(to_label(efc$e42dep),              to_label(efc$e16sex)), 
        beside = T, 
        legend.text = T)

unnamed-chunk-4-1.png

to_factor is a convenient replacement of as.factor, which converts a numeric vector into a factor, but keeps the value and variable label attributes.

Getting and setting value and variable labels

There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame:

  • get_label() to get variable labels

  • get_labels() to get value labels

  • set_label() to set variable labels (add them as vector attribute)

  • set_labels() to set value labels (add them as vector attribute)

With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted.

barplot(table(to_label(efc$e42dep),              to_label(efc$e16sex)), 
        beside = T, 
        legend.text = T,        main = get_label(efc$e42dep))

unnamed-chunk-5-1.png

get_label(efc) would return all data.frame’s variable labels. And get_labels(efc) would return a list with all value labels of all data.frame’s variables.

Another example

Converting labelled vectors into factors usually drops label attributes (e.g. using as_factor) or replaces values with the associated labels (like to_label does). If you want to convert a labelled vector into a numeric factor, but keep the label attributes (including variable labels), use to_factor.

Functions like lm simply copy these attributes and store these information in the returned object; see following example from the sjPlot-package:

library(sjPlot)
## #refugeeswelcomedata(efc)# make education categoricalefc$c172code <- to_factor(efc$c172code)
fit <- lm(barthtot ~ c160age + c12hour + c172code + c161sex, 
          data = efc)
sjt.lm(fit, group.pred = TRUE)


Total score BARTHEL INDEX


BCIp
(Intercept)
87.5476.34 – 98.75<.001
carer’ age
-0.21-0.35 – -0.07.004
average number of hours of care per week
-0.28-0.32 – -0.24<.001
carer’s level of education
intermediate level of education
1.37-3.12 – 5.85.550
high level of education
-1.64-7.22 – 3.93.564
carer’s gender
-0.39-4.49 – 3.71.850
Observations
821
R2 / adj. R2
.271 / .266

Looking at str(fit$frame) shows us that both variable and value label attributes are still there. Packages like sjPlot make use of this feature and automatically label the table output (like seen above).

Restore labels from subsetted data

The base subset function drops label attributes (or vector attributes in general) when subsetting data. Since version 1.0.3 of the sjmisc-package, there are handy functions to deal with this problem: copy_labels and remove_labels.

copy_labels adds back labels to a subsetted data frame based on the original data frame. And remove_labelsremoves all label attributes.

Losing labels during subset

efc.sub <- subset(efc, subset = e16sex == 1, select = c(4:8))str(efc.sub)
## 'data.frame':    296 obs. of  5 variables:
##  $ e17age : num  74 68 80 72 94 79 67 80 76 88 ...
##  $ e42dep : num  4 4 1 3 3 4 3 4 2 4 ...
##  $ c82cop1: num  4 3 3 4 3 3 4 2 2 3 ...
##  $ c83cop2: num  2 4 2 2 2 2 1 3 2 2 ...
##  $ c84cop3: num  4 4 1 1 1 4 2 4 2 4 ...

Add back labels

efc.sub <- copy_labels(efc.sub, efc)str(efc.sub)
## 'data.frame':    296 obs. of  5 variables:
##  $ e17age : atomic  74 68 80 72 94 79 67 80 76 88 ...
##   ..- attr(*, "label")= Named chr "elder' age"
##   .. ..- attr(*, "names")= chr "e17age"
##  $ e42dep : atomic  4 4 1 3 3 4 3 4 2 4 ...
##   ..- attr(*, "label")= Named chr "elder's dependency"
##   .. ..- attr(*, "names")= chr "e42dep"
##   ..- attr(*, "labels")= Named num  1 2 3 4
##   .. ..- attr(*, "names")= chr  "independent" "slightly dependent" "moderately dependent" "severely dependent"
##  $ c82cop1: atomic  4 3 3 4 3 3 4 2 2 3 ...
##   ..- attr(*, "label")= Named chr "do you feel you cope well as caregiver?"
##   .. ..- attr(*, "names")= chr "c82cop1"
##   ..- attr(*, "labels")= Named num  1 2 3 4
##   .. ..- attr(*, "names")= chr  "never" "sometimes" "often" "always"
##  $ c83cop2: atomic  2 4 2 2 2 2 1 3 2 2 ...
##   ..- attr(*, "label")= Named chr "do you find caregiving too demanding?"
##   .. ..- attr(*, "names")= chr "c83cop2"
##   ..- attr(*, "labels")= Named num  1 2 3 4
##   .. ..- attr(*, "names")= chr  "Never" "Sometimes" "Often" "Always"
##  $ c84cop3: atomic  4 4 1 1 1 4 2 4 2 4 ...
##   ..- attr(*, "label")= Named chr "does caregiving cause difficulties in your relationship with your friends?"
##   .. ..- attr(*, "names")= chr "c84cop3"
##   ..- attr(*, "labels")= Named num  1 2 3 4
##   .. ..- attr(*, "names")= chr  "Never" "Sometimes" "Often" "Always"

Conclusion

When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The sjmisc-package supports this feature and offers useful functions for these tasks.