This article outlines the various types of data and how to use them in your R code.
- An R variable, calculation, or a data set.
- Familiarity with the different Structures and Value Attributes for Variable Sets.
Knowing the type of data you are working with in R is useful because certain functions require specific data types for inputs/outputs. For example, you can't perform mathematic operations on numbers that have a character data type. Below is a list of the various data types and examples of what you can do with each.
Note, a few functions used throughout:
head() - shows just the first part of the data,
cbind() - combines data into columns,
rbind() - combines data by rows.
A logical data type will always return TRUE or FALSE. This can be created by a condition (logical test) and some functions. In Displayr, logical variables can also be used as binary filters.
The below are examples that return a logical result:
When the final result is a list of T/F results, Displayr shows them as Xs (false) and check marks (true).
You can use logical results to create a series of if .... else statements.
You can use conditions to subset data in R. When you put a condition inside square brackets (as described in How to Work with Data in R), the TRUE/FALSE (T/F) results are used to select the data that is TRUE for the condition. In the example below on line 2, we create a vector of data (called
a) equal to: 1, 2, 3. On line 4, the
a > 2 returns F, F, T inside the brackets to select the number 3 in
a. The rest of line 4 changes that number 3 to a 10, which is then displayed in the final result.
A numeric data type is a number that is not encompassed in "" when Show raw R output is checked:
Here, the first 3 numbers are numeric, while the last 3 are text. Note, numeric formulas will only work on data with the correct data type. In Displayr, only numeric and binary data will allow you to adjust decimal places.
A character is text or string. This will always be encompassed in "" when Show raw R output is checked. When Displayr sees "z", it will view it as text. However, writing z in your code will lead Displayr to believe you are referencing an R object called z.
A date can be either a Date or Date/Time object.
as.Date("2021-05-20") # Date
as.POSIXct("2021-05-20") # Date/time
Date/Time variables in Displayr are POSIXct so they can store a date or date and time. R will pull in the raw dates, but you can view aggregated dates by using the following code:
A factor in R is equivalent to a category in data. This is equivalent to Nominal or Ordinal variables in Displayr. A factor contains both a value and a label. These are called levels which can be referenced using
levels(x). Levels can also be viewed in a raw output:
The below examples produce the data as labels and values separately:
as.character(Gender) # label
as.numeric(Gender) # value
Note, if your data is ordinal, it will appear as an ordered factor.
Particulars of using factors in your code:
- When using factors in code you can treat them as character variables:
- However, when using factors in certain functions like cbind and rbind, you'll need to convert them to character first:
- Sorting and ordering will use the sequential levels not the labels:
- To access the underlying coded values, use the attr() function:
- Certain mathematical functions like max can only be run using Ordinal variables because you can't take a max of unordered categories (like male, female).
Get a copy of the examples above in your account by clicking HERE.