This article describes how to access your data using R code and use it in calculations or other manipulations. This includes:
Working with different data structures
1. Vector = a single row/column or a series of values/strings (most commonly a single variable or single column table in Displayr)
2. Matrix = a variable set with more than one variable and a table showing 1 statistic of one type (either all cells are numeric or character)
3. Array = a table showing more than 1 statistic (more than 1 number in each cell)
4. Data.frame = a table showing 1 value in each cell that can have different types of data in the cells
Requirements
- Some of the functions below may require a Displayr license.
- Familiarity with the different Structures and Value Attributes for Variable Sets
- A Calculation or R variable
- Knowledge of How to Use Different Types of Data in R
Method - Accessing your data
There are several ways to access your data within R code. For more detail on our point-and-click functionality see How to Use Point and Click Inside R Code.
- By dragging and dropping the variables from the Data Sources or Reports tree:
- By clicking on rows/columns in a table:
- By the Label inside backticks:
- By the Name of the variable (found by hovering over the variable in the Data Sources tree as in the screenshot above) or an output on your page (found in General > Name):
You can also reference variables in specific data sets by adding a prefix:
When a variable from a data set is referred to in R Code, the variable is automatically uploaded to the R Server prior to any R Code being run. A variable will have the following attributes:
- name. This is the Variable Name.
- question. This is the Question Name.
- label. This is the Variable Label.
While these attributes can be accessed in R in the usual way (e.g., attr(my.variable, "label"), the best way to access them is often using flipFormat::Labels, which will attempt to construct a label of form Question Name: Variable Label where these are different, and Variable Label where these two are the same (e.g., flipFormat::Labels(Q3) will show Q3. Age). It falls back to name, and, if even this is not provided, it attempts to discern the original name of the argument.
Method - Working with different data structures
You can reference a particular bit of your data structure using square brackets [ ] and the appropriate index.
1. Vector
A vector is a series of data points that can be any one data type (character, numeric, etc), but not be a mix of types (otherwise they will convert everything to character). When you reference a single variable in your data set using R, it will be in the form of a vector. One-column tables are also interpreted by R as vectors.
Vectors can also be created manually using the c() function:
numbers = c(2,5,10)
strings = c("hello", "good day", "good bye")
For numeric vectors, you can also create vectors with sequential numbers using :
#numbers 1 through 10
fingers = 1:10
Or numbers at set intervals using seq():
#even numbers 1 through 10
evens = seq(from=2, to=10, by=2)
For strings, those are usually automatically generated using paste or by pulling off the row or column headers from a table, see How to Use Paste Functions to Create Dynamic Text Using R.
Detailed example:
In the below example, we have a table called fruit:
Referencing:
- The syntax for indexing is:
fruit[Item]
. - To return the value for Pear, we can use the row number
fruit[2]
or the row labelfruit["Pear"]
. - To fill in missing data for values under 5, we can use a condition inside brackets
fruit[fruit < 5] = NA
.
Other Useful Functions:
- To return the row labels, we use
names(fruit)
. - To return the number of rows, we use
length(fruit)
orNROW(fruit)
.
2. Matrix
A matrix is a table with rows and columns where data is the same data type. In Displayr, built-in tables showing a single statistic, variable sets, and those created by cbind in R will be interpreted as matrices.
This can be created manually using the matrix() function:
tab = matrix(c(c(1,2,3), c("a","b","c")), ncol=2, nrow=3)
Detailed example:
In the below example, we have a crosstab table called living.alone:
Referencing:
- The syntax for indexing is:
-
living.alone[Row]
for a single-column SUMMARY table. -
living.alone[Row , Column]
for any other table.
-
- To return the value for Male, we can use
living.alone[1,]
orliving.alone["Male",]
. - If there is only one column, use
living.alone[Row , Column , drop = F]
to keep the original table dimensions. Otherwise, the result will be interpreted as a vector.
Other Useful Functions:
- To return the row labels, we use
rownames(living.alone)
. - To return the column labels, we use
colnames(living.alone)
. - To return the table's dimensions we use
dim(living.alone)
. This will return 3 (rows) and 1 (column). - To return the number of rows, we use
NROW(living.alone)
. - To return the number of columns, we use
NCOL(living.alone)
.
3. Array
An array is a multi-layered table where data is the same data type. In Displayr this is a crosstab with multiple statistics.
This can be created manually using the array() function:
tab = array(c(1,2,3), dim=c(3,4,2))
Detailed example:
In the below example, we have a table called living.alone with two statistics:
Referencing
- The syntax for indexing is:
living.alone[Row , Column , Statistic]
. - To return the Count value for Male, we can use
living.alone[1,,2]
orliving.alone["Male",,"Count"]
. - If there is only one column, use
living.alone[Row, Column, Statistic, drop = F]
to keep the original table dimensions.
Other Useful Functions:
- To return the table's dimensions we use
dim(living.alone)
. This will return 3 (rows), 1 (column), and 2 (statistics). - To return the row labels, we use
rownames(living.alone)
ordimnames(living.alone)[[1]]
. - To return the column labels, we use
colnames(living.alone)
ordimnames(living.alone)[[2]]
. - To return the statistic labels, we use
dimnames(living.alone)[[3]]
. - To return the number of rows, we use
NROW(living.alone)
. - To return the number of columns, we use
NCOL(living.alone)
.
4. Data.frame
A data.frame is a table with rows and columns, like a matrix, but can be a mix of different types of data.
This can be created manually using the data.frame() function:
mydf = data.frame(Numbers=c(1,2,3), Letters=c("a","b","c"))
Detailed example:
Referencing and other useful functions are the same as used when working with a matrix, with some additional functionality below.
You can additionally reference an entire column using $. For example, mydf$Letters
would return only the Letters column.
You can also add new columns on the fly with $:
mydf$`More Letters`=c("d","e","f")
mydf
Get a copy of the examples above in your account by clicking HERE.
See Also
Using R in Displayr Video Series
How R Works Differently in Displayr Compared to Other Programs
How to Use Different Types of Data in R
How to Use Point and Click Inside R Code
How to Reference Different Items in Your Document in R
How to Work with Conditional R Formulas
How to Extract Data from a Single Column Summary Table
How to Extract Data from a Multiple Column Table
How to Extract Data from a Multiple Column Table with Multiple Statistics
How to Extract Data from a Multiple Column Table with Nested Data