Data Sets can be added to Displayr by writing R Code. This article describes how to:
- Add a Data Sets Using R Code
- Optimize the Data Set's Structure Using Names, Labels, Values, and Order
Please note this requires the Data Stories module or a Displayr license.
Add a Data Sets Using R Code
1. Select Data Sources > Plus (+) > R.
2. Enter a name for the data set under Name.
3. Enter your R code where it states "Enter your R code here":
Note, your R code must return a tabular result (e.g. a data frame)
4. Press OK to import your data. It will then be added to your Data Sources tree.
5. OPTIONAL: You can test out your code first by selecting the Calculation icon > Custom Code from the toolbar. This will help with troubleshooting your code.
Note, unlike when using R locally, in Displayr you are unable to import a data set from your local C: drive as a browser cannot access this. Instead, this should be made available as a URL. See How to Import Data via URL.
Optimize the Data Set's Structure Using Names, Labels, Values, and Order
Displayr data sets contain many concepts that don't exist natively in R data frames (e.g., variable labels, variable sets, "factors" with negative values). When Displayr imports data it applies a large number of expert systems to clean and organize the data. These expert systems can also be used to tidy data sets read in via R code, as follows:
- Use consistent structures in variable names
- Use variable labels to indicate variable sets and variable labels
- Add values using the values attribute
- Place variables you wished to be grouped into variable sets next to each other
Use consistent structures in variable names
The code below generates a data frame containing two numeric variables, one factor, and one Date variable:
attitude = c("Strongly disagree" = -2, "Disagree" = -1, "Agree" = 1, "Strongly agree" = 2)
my.df = data.frame(q1 = round((0:99)/100),
q2 = 1,
q3 = rep(factor(names(attitude)), 25),
q4 = seq(as.Date("2000/1/1"), by = "month", length.out = 100))
- A Numeric variable set containing one variable, q1.
- A second Numeric variable set containing one variable, q2.
- A Nominal variable set. (This is Displayr's equivalent to an R factor).
- A Date/Time variable set.
The code below is identical to the code above, except that the names of the first two variables have been modified:
attitude = c("Strongly disagree" = -2, "Disagree" = -1, "Agree" = 1, "Strongly agree" = 2)
my.df = data.frame(q1a = round((0:99)/100),
q1b = 1,
q3 = rep(factor(names(attitude)), 25),
q4 = seq(as.Date("2000/1/1"), by = "month", length.out = 100))
When a new R data set is created using this code (you will get a different result if you modify an existing data set), Displayr deduces that q1a and q1b belong in the same variable set, and joins them together in a Binary - Multi variable set (appropriate for multiple response data). Further, it names the set q1 and the labels a and b respectively.
Use variable labels to indicate variable sets and variable labels
Variable labels can be passed in using R an attribute called label for the variable:
attitude = c("Strongly disagree" = -2, "Disagree" = -1, "Agree" = 1, "Strongly agree" = 2)
my.df = data.frame(q1 = round((0:99)/100),
q2 = 1,
q3 = rep(factor(names(attitude)), 25))
attr(my.df[[1]], "label") = "Q1. Dog"
attr(my.df[[2]], "label") = "Q1. Cat"
attr(my.df[[3]], "label") = "Q3. Attitude"
my.df
When read into Displayr (see below), the first two variables are again grouped into a Binary - Multi variable set, even though they no longer share a common structure with their variable names. Displayr has made the decision to group them due to the common structure in their variable labels. Furthermore, Displayr has:
- Deduced that the variable set should be called Q1.
- Given the two variables in the variable set labels of Dog and Cat.
- Omitted the period, recognizing it was unhelpful punctuation.
Add values using the values attribute
The code below uses the attribute called values to associate values with specific labels:
attitude = c("Strongly disagree" = -2, "Disagree" = -1, "Agree" = 1, "Strongly agree" = 2)
my.df = data.frame(q1 = round((0:99)/100),
q2 = 1,
q3 = rep(factor(names(attitude)), 25))
attr(my.df[[1]], "label") = "Q1. Dog"
attr(my.df[[2]], "label") = "Q1. Cat"
attr(my.df[[3]], "label") = "Q3. Attitude"
attr(my.df[[3]], "values") = attitude
my.df
In the resulting data set, shown below, we can see that:
- The variable's name is still called q3.
- The label is Q3. Attitude.
- The values are correctly associated with the labels (i.e., they are not coerced to positive integers, as done with R factors by default).
Place variables you wish to be grouped into variable sets next to each other
In this next example, we have two factors next to each other, q3 and q4:
attitude = c("Strongly disagree" = -2, "Disagree" = -1, "Agree" = 1, "Strongly agree" = 2)
my.df = data.frame(q1 = round((0:99)/100),
q2 = 1,
q3 = rep(factor(names(attitude)), 25),
q4 = rep(factor(names(attitude)), 25))
attr(my.df[[1]], "label") = "Q1. Dog"
attr(my.df[[2]], "label") = "Q1. Cat"
attr(my.df[[3]], "label") = "Q3. Attitude: I love cola"
attr(my.df[[4]], "label") = "Q3. Attitude: I love margerine"
attr(my.df[[3]], "values") = attitude
attr(my.df[[4]], "values") = attitude
my.df
When read into Displayr, Displayr infers that because q3 and q4 are adjacent and have consistent labels and values they should be grouped into a single Nominal - Multi variable set:
Next
How to Import SPSS Data Sets Using R