This article describes how to go from raw data in your Document to long-format aggregated data that can be used in ggplot2 or other functions that require this format.
The final output will look like this:
- Three variables in your data file that contain different data. In the example above, we're using two categorical variables and a single numeric variable to create an aggregated sum.
For this example, we're working with three existing variables in our data: a Date variable, a categorical variable representing age ranges, and a numeric variable representing the number of Cola drinks each respondent has consumed.
1. In the Data Sets tree, select the variable that you want to use for the first (top) level of aggregation. In the example, this is a Date variable.
2. Go to Anything > Data > Variables > New > Banner. This will create a new banner variable set.
3. In the Data Sets tree, select the new banner, and drag the second categorical variable underneath the first, to nest it under the first set of categories, like this:
4. Drag the banner variable set onto the Page to create a table where the banner forms the stub/rows of the table.
5. Drag the third variable onto the new table and place it in the columns to create the aggregated values for each of the rows. In our example here, we end up with a table like this:
6. With this table selected, go to Properties > GENERAL > Name and copy out the name of the table.
7. Go to Calculation > Custom Calculation.
8. Under Properties > R CODE enter the following code (on the first line replace your.table.name with the name copied out in step 6.
# Get the data from the existing table:
tab <- your.table.name
# Get the two sets of nested row names and turn them into a data frame:
df <- as.data.frame(attr(tab, "span"))
# Get the values from the table and set that into the third column in the data frame:
df[, 3] <- tab
# Provide names to the columns in your data frame
colnames(df) <- c("Date", "Age", "Cola drinks consumed")
# Output the data frame
You will now have an R Output that looks, structurally, like the below (depending on your input data, of course):
Supplemental Method - Making categories into numbers
In the above example, the first column consists of text values, i.e. the labels of the months. To turn these into a scale using numbers from 1 to 12, add the following code between the last line in the code above:
# Get all unique months/labels in the first column
lvs <- unique(df[,1])
# Convert all the months/labels to factors (i.e. numeric values with labels as metadata)
x <- as.factor(df[, 1])
# Ensure that the order of the original labels is maintained in the factor (i.e. that January becomes 1)
x <- factor(x, levels = lvs)
# Put the numeric values back into the original position in the data frame
df[, 1] <- as.numeric(x)