This article outlines the various types of data and how to use them in your R code.
- An R variable, calculation, or a data set.
- Familiarity with the different Structures and Value Attributes for Variable Sets.
Knowing the type of data you are working with in R is useful because certain functions require specific data types for inputs/outputs. For example, you can't perform mathematic operations on numbers that have a character data type. Below is a list of the various data types and examples of what you can do with each.
Note, a few functions used throughout:
head() - shows just the first part of the data,
cbind() - combines data into columns,
rbind() - combines data by rows.
A logical data type will always return TRUE or FALSE. This can be created by a condition (logical test) and some functions. In Displayr, logical variables can also be used as binary filters.
The below are examples that return a logical result:
When the final result is a list of T/F results, Displayr shows them as Xs (false) and check marks (true).
You can use logical results to create a series of if .... else statements.
You can use conditions to subset data in R. When you put a condition inside square brackets (as described in How to Work with Data in R), the TRUE/FALSE (T/F) results are used to select the data that is TRUE for the condition. In the example below on line 2, we create a vector of data (called
a) equal to: 1, 2, 3. On line 4, the
a > 2 returns F, F, T inside the brackets to select the number 3 in
a. The rest of line 4 changes that number 3 to a 10, which is then displayed in the final result.
A numeric data type is a number that is not encompassed in "" when Show raw R output is checked:
Here, the first 3 numbers are numeric, while the last 3 are text. Note, numeric formulas will only work on data with the correct data type. In Displayr, only numeric and binary data will allow you to adjust decimal places.
A character is text or string. This will always be encompassed in "" when Show raw R output is checked. When Displayr sees "z", it will view it as text. However, writing z in your code will lead Displayr to believe you are referencing an R object called z.
A date can be either a Date or Date/Time object.
as.Date("2021-05-20") # Date
as.POSIXct("2021-05-20") # Date/time
Date/Time variables in Displayr are POSIXct so they can store a date or date and time. R will pull in the raw dates, but you can view aggregated dates by using the following code:
A factor in R is equivalent to a category in data. This is equivalent to Nominal or Ordinal variables in Displayr. A factor contains both a value and a label. These are called levels which can be referenced using
levels(x). Levels can also be viewed in a raw output:
The below examples produce the data as labels and values separately:
as.character(Gender) # label
as.numeric(Gender) # value
Note, if your data is ordinal, it will appear as an ordered factor.
If the categories have been merged, this merging will be reflected in the way the data appears in R. This is done as follows:
- If all the categories of the variable are mutually exclusive and exhaustive, they all appear in R.
- Any categories that are missing (i.e., hidden), are inserted, such that the categories are mutually exclusive and exhaustive.
Particulars of using factors in your code:
- When using factors in code you can treat them as character variables:
- However, when using factors in certain functions like cbind and rbind, you'll need to convert them to character first:
- Sorting and ordering will use the sequential levels not the labels:
To access the underlying coded values, use the attr() function:
- Certain mathematical functions like max can only be run using Ordinal variables because you can't take a max of unordered categories (like male, female).
Get a copy of the examples above in your account by clicking HERE.