This article describes how to go from raw data in your Document to long-format aggregated data that can be used in ggplot2 or other functions that require this format.
The final output will look like this:
Requirements
Please note these steps require a Displayr license.
- At least 3 variables from your data file. In the example above, we're using two categorical variables and a single numeric variable to create an aggregated sum.
Method - By Banner
For this example, we're working with three existing variables in our data: a Date variable, a categorical variable representing age ranges, and a numeric variable representing the number of Cola drinks each respondent has consumed.
1. First let's make a nested banner to use. In the Data Sources tree, select the variable that you want to use for the first (top) level of aggregation. In the example, this is a Date variable.
2. Go to Anything > Data > Variables > New > Banner. This will create a new banner variable set.
3. In the Data Sources tree, select the new banner, and drag the second categorical variable underneath the first, to nest it under the first set of categories, like this:
4. Drag the banner variable set onto the Page to create a table where the banner forms the stub/rows of the table.
5. Drag the third variable onto the new table and place it in the columns to create the aggregated values for each of the rows. In our example here, we end up with a table like this (after changing the Data > Statistics > Cells > Sum in the object inspector):
6. Now we will create the long-format aggregated data using a custom R calculation. Go to Calculation > Custom Code.
7. Under General > R Code paste in the following code (on the first line highlight your.table.name then click on the table you just created to insert its name in its place).
# Get the data from the existing table:
tab <- your.table.name
# Get the two sets of nested row names and turn them into a data frame:
df <- as.data.frame(attr(tab, "span"))
# Get the values from the table and set that into the third column in the data frame:
df[, 3] <- tab
# Provide names to the columns in your data frame
colnames(df) <- c("Date", "Age", "Cola drinks consumed")
# Output the data frame
df
You will now have an R Output that looks, structurally, like the below (depending on your input data, of course):
Supplemental Method - Making categories into numbers
In the above example, the first column consists of text values, i.e. the labels of the months. To turn these into a scale using numbers from 1 to 12, add the following code before the last line in the code above:
# Get all unique months/labels in the first column
lvs <- unique(df[,1])
# Convert all the months/labels to factors (i.e. numeric values with labels as metadata)
x <- as.factor(df[, 1])
# Ensure that the order of the original labels is maintained in the factor (i.e. that January becomes 1)
x <- factor(x, levels = lvs)
# Put the numeric values back into the original position in the data frame
df[, 1] <- as.numeric(x)
Supplemental Method - Using Melt() in R
Please see How to Quickly Make Data Long or Wide Using R.