This article describes how Data Sets are represented when their contents are referred to in R code.
A Displayr document.
If you type a variable's name into the R CODE section of a Calculation or an R Variable, Displayr will automatically use the data for that variable. For example, if there are 300 records in your data set, referencing Q3 refer in your R code will be interpreted as a variable of length 300.
Note, missing values appear as NAs in R, and NaNs remain as NaNs.
2. Categorical variables
When a categorical variable is used in R (i.e., a Nominal or Nominal - Multi) it is automatically converted to a factor (that is, it has both a value and label), or if its Structure is Ordered Categorical to an ordered factor (these are R classes). If the categories have been merged, this merging will be reflected in the way the data appears in R. This is done as follows:
- If all the categories of the variable are mutually exclusive and exhaustive, they all appear in R.
- Where there are overlapping categories, the broadest of these will be excluded. For example, if the data contains three unique values, 0, 1, and 2, with labels of A, B, and C, respectively, and the categories shown on the table are A, B, C, NET, the NET category will be removed. Similar, if the categories are A, B, B + C, C, NET, then both NET and B + C are removed.
- Any categories that are missing (i.e., hidden), are inserted, such that the categories are mutually exclusive and exhaustive.
3. Attributes of variables
When a variable from a data set is referred to in R code, the variable is automatically uploaded to the R Server prior to any R code being run. A variable will have the following attributes:
- name: This is the name in the original data file that has been imported into Displayr (where such a name exists and is not problematic).
- question: This is the name of the Variable Set, where the name is provided in the metadata or can be inferred.
- label: This is the label of the variable, where such a label exists.
While these attributes can be accessed in R in the usual way (e.g.,
attr(my.variable, "label"), the best way to access them is often using
flipFormat::Labels, which will attempt to construct a label of form Question Name: Variable Label where these are different, and Variable Label where these two are the same (e.g.,
flipFormat::Labels(Q3) will show Q3. Age). It falls back to name, and, if even this is not provided, it attempts to discern the original name of the argument.
4. Variable Set
You can refer to a Variable Set by its name in R code. Where names contain spaces, they are surrounded by backticks (i.e., `). For example:
Where a Variable Set contains multiple variables, they will be provided in a data.frame tabular format. Where a question contains multiple variables, these can be selected using $. For example,
`Q4. Frequency numeric`$Coffee, will return a variable from the question called Q4. Frequency numeric. Here, Coffee refers to the name of one of the categories in the question, and may not correspond to a variable in the initial data file (e.g., because the user may have renamed the category, or created a new category by merging categories).
5. Multiple Data Sets
If you have multiple Data Sets in the project, and these contain variables or questions with the same names, the data file name is used to distinguish between them (e.g.,
Cola.sav$`Q3. Age`). See How to Reference and Distinguish between Different R Objects in Displayr for more information.