How to Create Long-Format Aggregated Data Using R

This article describes how to go from raw data in your Document to long-format aggregated data that can be used in ggplot2 or other functions that require this format.

The final output will look like this:

Requirements

At least 3 variables from your data file. In the example above, we're using two categorical variables and a single numeric variable to create an aggregated sum.

Method - By Banner

For this example, we're working with three existing variables in our data: a Date variable, a categorical variable representing age ranges, and a numeric variable representing the number of Cola drinks each respondent has consumed.

1. First, let's make a nested banner to use. In the Data Sources tree, select the variable that you want to use for the first (top) level of aggregation. In the example, this is a Date variable.

2. Click '+' on any two Nominal, Binary - Multi, Binary - Multi (Compact), Ordinal, and Date/Time variables in the Data Sources Tree > Banner. This will create a new banner variable set.

3. In the object inspector , select the new banner and drag the second categorical variable underneath the first, to nest it under the first set of categories, like this:

4. Drag the banner variable set onto the Page to create a table where the banner forms the stub/rows of the table.

5. Drag the third variable onto the new table and place it in the columns to create the aggregated values for each of the rows. In our example here, we end up with a table like this (after changing the Data > Statistics > Cells > Sum in the object inspector ):

6. Now we will create the long-format aggregated data using a custom R calculation. Go to Calculation > Custom Code.

7. In the R Code editor, paste in the following code (on the first line, highlight your.table.name, then click on the table you just created to insert its name in its place).

# Get the data from the existing table:
tab <- your.table.name

# Get the two sets of nested row names and turn them into a data frame:
df <- as.data.frame(attr(tab, "span"))

# Get the values from the table and set that into the third column in the data frame:
df[, 3] <- tab

# Provide names to the columns in your data frame
colnames(df) <- c("Date", "Age", "Cola drinks consumed")

# Output the data frame
df

You will now have an R Output that looks, structurally, like the below (depending on your input data, of course):

Supplemental Method - Making categories into numbers

In the above example, the first column consists of text values, i.e., the labels of the months. To turn these into a scale using numbers from 1 to 12, add the following code before the last line in the code above:

# Get all unique months/labels in the first column
lvs <- unique(df[,1])

# Convert all the months/labels to factors (i.e. numeric values with labels as metadata)
x <- as.factor(df[, 1])

# Ensure that the order of the original labels is maintained in the factor (i.e. that January becomes 1)
x <- factor(x, levels = lvs)

# Put the numeric values back into the original position in the data frame
df[, 1] <- as.numeric(x)

Supplemental Method - Using Melt() in R

Please see How to Quickly Make Data Long or Wide Using R.

How to Create a Custom Area Chart Using R and ggplot

How to Quickly Make Data Long or Wide Using R

Articles in this section

Requirements

Method - By Banner

Supplemental Method - Making categories into numbers

Supplemental Method - Using Melt() in R

Next

Articles in this section

Requirements

Method - By Banner

Supplemental Method - Making categories into numbers

Supplemental Method - Using Melt() in R

Next

Related articles