Sankey charts show the sizes of the links between different items (called nodes). Nodes are organized into groups, also known as stages or levels. These diagrams are useful to visually see how much of something flows through to other items.
This article describes how to use a set of variables:
or a data table:
To create a Sankey visualization, which shows the flows:
Requirements
You will need either:
- A data set of raw data in the Data Sources pane with at least two variables of any type.
- A table with at least one row for each full path of the sankey, with or without a count/weight column at the end. Examples of possible formats are listed under Data Table formats below. This can be a Raw Data Table, R table, summary.table in your Data Sources pane, Pasted table, or native drag and drop table.
Method
- Either in the Report tree hover > + menu or toolbar (if you are on a Page), go to Visualization > Exotic > Sankey.
- In the object inspector, go to Data > Data Source, and select the type of data source you wish to use to create the Sankey diagram.
- If you wish to use an existing table go to Input table and select the desired table from the drop-down menu.
- If you wish to use variables go to Variables and select the desired variables from the drop-down menu. Alternatively, you can drag and drop the variables from the Data Sources tree into the menu itself. In this example, we have selected Gender and Preferred cola as variables.
- OPTIONAL: If using a table as the input, you may need to check Last column contains weights if the last column of the table contains the flow values to tally for the Sankey.
- Click Calculate.
- OPTIONAL: Specify the maximum number of categories by entering a number in Maximum number of categories.
You can customize the look of the diagram by going to the object inspector > Chart and adjusting the settings for Appearance, Labels, and Hover text. More detail is found in our technical documentation here.
Settings for Chart > Appearance > Links colored by are as follows:
- None: all links are shown in grey.
- Source: links are shown in the same color as the source node (left).
- Target: links are shown in the same color as the target node (right).
- First variable: (or if using a table, the left-most column in your table) This is similar to Source, but nodes will also be the same color as nodes they are linked to on the left. If there are multiple such nodes, then the color will be taken from the node which is linked with the largest weight.
- Last variable: (or if using a table, the right-most column in your table) similar to First variable, but using the color of the Target node, and looking at downstream links.
NOTE: An error will occur if more than 20 variables are selected. It is generally advisable to show a relatively small number (e.g., 4 or 5).
Technical Notes
Data Table formats
If you'd like to use a data table as the input to the sankey there are various ways of formatting the data. Generally, each column is a group of nodes (aka stage such as Gender and Preferred Cola from above) and lists all the category combinations (nodes). You will need at least 2 columns in the table and will need to list all combinations of nodes to plot them. You can also include a last column which is the size of the flow or weight. Possible examples are below:
1. Raw case-level data for each path
A table of "raw data" in a sense that each combination of categories is repeated in a new row for each observation and the flows will automatically be tallied by the visualization:
You'll see the flows in the following Sankey are how many rows of the specific combination:
2. A row for each full path of links and count value for that path
A table of each combination of categories, plus a final column with the count for that row to use as the value to tally instead of one. Paths can be repeated and the counts will automatically be added together for duplicate paths. If formatted with a count column, you will need to also check Data > Data Source > Last column contains weights:
you'll see the a -> d flow is now bigger because one of the rows is weighted as 5:
Other examples of Sankeys can be found on our blog here.
Next
Visualization - Exotic - Sankey technical documentation
How to Create an R Visualization Template
Publishing to Excel, PowerPoint, and as a PDF
UPCOMING WEBINAR: 10 Market Research Predictions Over the Next 4 Years