This article describes how to access your data using R code and use it in calculations or other manipulations. This includes:
Working with different data structures
1. Vector = a single row/column or a series of values/strings (most commonly a single variable or single column table in Displayr).
2. Matrix = a variable set with more than one variable and a table showing 1 statistic of one type (either all cells are numeric or character). This is the built-in drag and drop tables and most other tables created by Displayr features.
3. Array = a drag and drop table showing more than 1 statistic (more than 1 number in each cell).
4. Data.frame = a table showing 1 value in each cell that can have different types of data in the cells. You may want to convert matrices to data.frames in order to more easily manipulate the table and its content. Custom R Calculations can be data.frames.
Requirements
- Some of the functions below may require a Displayr license.
- Familiarity with the different Structures and Value Attributes for Variable Sets
- A Calculation or R variable
- Knowledge of How to Use Different Types of Data in R
Method - Accessing your data
There are several ways to access your data within R code. For more details on our point-and-click functionality, see How to Use Point and Click Inside R Code.
- By dragging and dropping the variables from the Data Sources or Reports tree:
- By clicking on rows/columns in a table:
- By the Label inside backticks:
- By the Name of the variable (found by hovering over the variable in the Data Sources tree as in the screenshot above) or an output on your page (found in General > Name in the object inspector
):
Though you cannot reference a Data Set as a whole in the R code, if you have a variable that is named the same in multiple Data Sets, you will need to specify which Data Set to get it from by adding a prefix:
When a variable is referred to in R Code, the variable is automatically uploaded to the R Server prior to any R Code being run. A variable will have the following attributes:
- name. This is the Variable Name.
- question. This is the Question Name.
- label. This is the Variable Label.
While these attributes can be accessed in R in the usual way (e.g., attr(my.variable, "label"), the best way to access them is often using flipFormat::Labels, which will attempt to construct a label of form Question Name: Variable Label where these are different, and Variable Label where these two are the same (e.g., flipFormat::Labels(Q3) will show Q3. Age). It falls back to name, and, if even this is not provided, it attempts to discern the original name of the argument.
Method - Working with different data structures
You can reference a particular bit of your data structure using square brackets [ ] and the appropriate index.
1. Vector
A vector is a series of data points that can be any one data type (character, numeric, etc), but not be a mix of types (otherwise they will convert everything to character). When you reference a single variable, it will be in the form of a vector. One-column tables are also interpreted by R as vectors.
Vectors can also be created manually using the c() function:
numbers = c(2,5,10)strings = c("hello", "good day", "good bye")For numeric vectors, you can also create vectors with sequential numbers using :
#numbers 1 through 10
fingers = 1:10Or numbers at set intervals using seq():
#even numbers 1 through 10
evens = seq(from=2, to=10, by=2)For strings, those are usually automatically generated using paste or by pulling off the row or column headers from a table, see How to Use Paste Functions to Create Dynamic Text Using R.
Detailed example:
In the example below, we have a table called fruit:
Referencing:
- The syntax for indexing is:
fruit[Item]. - To return the value for Pear, we can use the row number
fruit[2]or the row labelfruit["Pear"]. - To fill in missing data for values under 5, we can use a condition inside brackets
fruit[fruit < 5] = NA.
Other Useful Functions:
- To return the row labels, we use
names(fruit). - To return the number of rows, we use
length(fruit)orNROW(fruit).
2. Matrix
A matrix is a table with rows and columns where data is the same data type. In Displayr, built-in tables showing a single statistic, variable sets, and those created by cbind in R will be interpreted as matrices.
This can be created manually using the matrix() function:
tab = matrix(c(c(1,2,3), c("a","b","c")), ncol=2, nrow=3)Detailed example:
In the example below, we have a crosstab table called living.alone:
Referencing:
- The syntax for indexing is:
-
living.alone[Row]for a single-column SUMMARY table. -
living.alone[Row , Column]for any other table.
-
- To return the value for Male, we can use
living.alone[1,]orliving.alone["Male",]. - If there is only one column, use
living.alone[Row , Column , drop = F]to keep the original table dimensions. Otherwise, the result will be interpreted as a vector.
Other Useful Functions:
- To return the row labels, we use
rownames(living.alone). - To return the column labels, we use
colnames(living.alone). - To return the table's dimensions we use
dim(living.alone). This will return 3 (rows) and 1 (column). - To return the number of rows, we use
NROW(living.alone). - To return the number of columns, we use
NCOL(living.alone).
3. Array
An array is a multi-layered table where data is the same data type. In Displayr, this is a crosstab with multiple statistics.
This can be created manually using the array() function:
tab = array(c(1,2,3), dim=c(3,4,2))Detailed example:
In the example below, we have a table called living.alone with two statistics:
Referencing
- The syntax for indexing is:
living.alone[Row , Column , Statistic]. - To return the Count value for Male, we can use
living.alone[1,,2]orliving.alone["Male",,"Count"]. - If there is only one column, use
living.alone[Row, Column, Statistic, drop = F]to keep the original table dimensions.
Other Useful Functions:
- To return the table's dimensions we use
dim(living.alone). This will return 3 (rows), 1 (column), and 2 (statistics). - To return the row labels, we use
rownames(living.alone)ordimnames(living.alone)[[1]]. - To return the column labels, we use
colnames(living.alone)ordimnames(living.alone)[[2]]. - To return the statistic labels, we use
dimnames(living.alone)[[3]]. - To return the number of rows, we use
NROW(living.alone). - To return the number of columns, we use
NCOL(living.alone).
4. Data.frame
A data.frame is a table with rows and columns, like a matrix, but can be a mix of different types of data.
This can be created manually using the data.frame() function:
mydf = data.frame(Numbers=c(1,2,3), Letters=c("a","b","c"))Detailed example:
Referencing and other useful functions are the same as those used when working with a matrix, with some additional functionality below.
You can additionally reference an entire column using $. For example, mydf$Letters would return only the Letters column.
You can also add new columns on the fly with $:
mydf$`More Letters`=c("d","e","f")
mydfGet a copy of the examples above in your account by clicking HERE.
See Also
Using R in Displayr Video Series
How R Works Differently in Displayr Compared to Other Programs
How to Use Different Types of Data in R
How to Use Point and Click Inside R Code
How to Reference Different Items in Your Document in R
How to Work with Conditional R Formulas
How to Extract Data from a Single Column Summary Table
How to Extract Data from a Multiple Column Table
How to Extract Data from a Multiple Column Table with Multiple Statistics
How to Extract Data from a Multiple Column Table with Nested Data