This article describes how to access your data using R code and use it in calculations or other manipulations. This includes:
Working with different data structures
1. Vector = a single row/column or a series of values/strings
2. Matrix = a table showing 1 statistic of one type (either numeric or character)
3. Array = a table showing more than 1 statistic (more than 1 number in each cell)
4. Data.frame = a table showing 1 value in each cell that can have different types
Requirements
- A document with a data set.
- Familiarity with the different Structures and Value Attributes for Variable Sets.
- A Calculation or R variable.
- Knowledge of How to Use Different Types of Data in R.
Method - Accessing your data
There are several ways to access your data within R code. For more detail on our point and click functionality see How to Use Point and Click Inside R Code.
- By dragging and dropping the variables from the Data Sets or Pages tree:
- By clicking on rows/columns in a table:
- By the Label inside backticks:
- By the Name of the variable (found by hovering over the variable in the Data Sets tree as in the screenshot above) or an output on your page (found in Properties > GENERAL > Name):
You can also reference variables in specific data sets by adding a prefix:
Method - Working with different data structures
You can reference a particular bit of your data structure using square brackets [] and the appropriate index.
1. Vector
A vector is a series of data points that can be anyone data type (character, numeric, etc), but not be a mix of types (otherwise they will convert everything to character). When you reference a single variable in your data set using R, it will be in the form of a vector. One-column tables are also interpreted by R as vectors.
Vectors can also be created manually using the c() function:
numbers = c(2,5,10)
strings = c("hello", "good day", "good bye")
For numeric vectors, you can also create vectors with sequential numbers using :
#numbers 1 through 10
fingers = 1:10
Or numbers at set intervals using seq():
#even numbers 1 through 10
evens = seq(from=2, to=10, by=2)
For strings, those are usually automatically generated using paste or by pulling off the row or column headers from a table, see How to Use Paste Functions to Create Dynamic Text Using R.
Detailed example:
In the below example, we have a table called fruit:
Referencing:
- The syntax for indexing is:
fruit[Item]
. - To return the value for Pear, we can use the row number
fruit[2]
or the row labelfruit["Pear"]
. - To fill in missing data for values under 5, we can use a condition inside brackets
fruit[fruit < 5] = NA
.
Other Useful Functions:
- To return the row labels, we use
names(fruit)
. - To return the number of rows, we use
length(fruit)
orNROW(fruit)
.
2. Matrix
A matrix is a table with rows and columns where data is the same data type. In Displayr, built-in tables showing a single statistic, variable sets, and those created by cbind in R will be interpreted as matrices.
This can be created manually using the matrix() function:
tab = matrix(c(c(1,2,3), c("a","b","c")), ncol=2, nrow=3)
Detailed example:
In the below example, we have a crosstab table called living.alone:
Referencing:
- The syntax for indexing is:
-
living.alone[Row]
for a single column SUMMARY table. -
living.alone[Row , Column]
for any other table.
-
- To return the value for Male, we can use
living.alone[1,]
orliving.alone["Male",]
. - If there is only one column, use
living.alone[Row , Column , drop = F]
to keep the original table dimensions. Otherwise, the result will be interpreted as a vector.
Other Useful Functions:
- To return the row labels, we use
rownames(living.alone)
. - To return the column labels, we use
colnames(living.alone)
. - To return the table's dimensions we use
dim(living.alone)
. This will return 3 (rows) and 1 (column). - To return the number of rows, we use
NROW(living.alone)
. - To return the number of columns, we use
NCOL(living.alone)
.
3. Array
An array is a multi-layered table where data is the same data type. In Displayr this is a crosstab with multiple statistics.
This can be created manually using the array() function:
tab = array(c(1,2,3), dim=c(3,4,2))
Detailed example:
In the below example, we have a table called living.alone with two statistics:
Referencing
- The syntax for indexing is:
living.alone[Row , Column , Statistic]
. - To return the Count value for Male, we can use
living.alone[1,,2]
orliving.alone["Male",,"Count"]
. - If there is only one column, use
living.alone[Row, Column, Statistic, drop = F]
to keep the original table dimensions.
Other Useful Functions:
- To return the table's dimensions we use
dim(living.alone)
. This will return 3 (rows), 1 (column) and 2 (statistics). - To return the row labels, we use
rownames(living.alone)
ordimnames(living.alone)[[1]]
. - To return the column labels, we use
colnames(living.alone)
ordimnames(living.alone)[[2]]
. - To return the statistic labels, we use
dimnames(living.alone)[[3]]
. - To return the number of rows, we use
NROW(living.alone)
. - To return the number of columns, we use
NCOL(living.alone)
.
4. Data.frame
A data.frame is a table with rows and columns, like a matrix, but can be a mix of different types of data.
This can be created manually using the data.frame() function:
mydf = data.frame(Numbers=c(1,2,3), Letters=c("a","b","c"))
Detailed example:
Referencing and other useful functions are the same as used when working with a matrix, with some additional functionality below.
You can additionally reference an entire column using $. For example, mydf$Letters
would return only the Letters column.
You can also add new columns on the fly with $:
mydf$`More Letters`=c("d","e","f")
mydf
Get a copy of the examples above in your account by clicking HERE.
See Also
Using R in Displayr Video Series
How R Works Differently in Displayr Compared to Other Programs
How to Use Different Types of Data in R
How to Use Point and Click Inside R Code
How to Reference and Distinguish between Different R Objects in Displayr
How to Work with Conditional R Formulas
How to Extract Data from a Single Column Summary Table
How to Extract Data from a Multiple Column Table
How to Extract Data from a Multiple Column Table with Multiple Statistics
How to Extract Data from a Multiple Column Table with Nested Data
Comments
0 comments
Please sign in to leave a comment.