Sankey charts show the sizes of the flow of values between different variables. This article describes how to use a set of variables or a data table:
To create a Sankey visualization, which shows the flows:
Requirements
You will need any of the following:
- At least two variables of any type.
- A table of data with at least two columns (one column for each node such as Gender and Preferred Cola above). Possible formats are listed under Data Table formats below. The data can be in a table on the page (potentially using a custom R Calculation) or by pasting the data directly in the visualization.
Method
- From the toolbar, go to the Visualization icon > Exotic > Sankey.
- In the object inspector, go to Data > Data Source, and select the type of data source you wish to use to create the Sankey diagram.
- If you wish to use an existing table go to Input table and select the desired table from the drop-down menu.
- If you wish to use variables go to Variables and select the desired variables from the drop-down menu. Alternatively, you can drag and drop the variables from the Data Sources tree into the menu itself. In this example, we have selected Gender and Preferred cola as variables.
- To input the data manually directly into the visualization, select Paste or type table. A new dialog box window will open where you can paste or type in your data.
- OPTIONAL: If using a table as the input, you may need to check Last column contains weights if the last column of the table contains the flow values to tally for the Sankey.
- Click Calculate.
- OPTIONAL: Specify the maximum number of categories by entering a number in Maximum number of categories.
OPTIONAL: You can customize the look of the diagram by going to the object inspector > Chart and adjusting the settings for Appearance, Labels, and Hover text. More detail is found in our technical documentation here.
Settings for Appearance > Links colored by are as follows:
- None: all links are shown in grey.
- Source: links are shown in the same color as the source node (left).
- Target: links are shown in the same color as the target node (right).
- First variable: similar to Source, but nodes will also be the same color as nodes they are linked to on the left. If there are multiple such nodes, then the color will be taken from the node which is linked with the largest weight.
- Last variable: similar to First variable, but using the color of the Target node, and looking at downstream links.
NOTE: An error will occur if more than 20 variables are selected. It is generally advisable to show a relatively small number (e.g., 4 or 5).
Technical Notes
Data Table formats
If you'd like to use a data table as the input to the sankey there are various ways of formatting the data. Generally, each column is a node (i.e. Gender and Preferred Cola from above) and lists the category combinations in the node. You will need at least 2 nodes in the table and will need to list all combinations of nodes to plot them. You can also include a last column which is the size of the flow. If doing this, you will need to also check Data > Data Source > Last column contains weights. Possible examples are below:
- A table of "raw data" in a sense that each combination of categories is repeated in a new row for each observation and the flows will automatically be tallied by the visualization:
You'll see the flows in the following Sankey are how many rows of the specific combination: - A table of each combination of categories with repeats if needed, plus a final column with the weight for that row to use as the value to tally instead of one:
you'll see the a -> d flow is now bigger because one of the rows is weighted as 5: - A table of each unique combination of nodes with a final column with the total tally for the flow. The table below would generate the example Sankey from above:
Other examples of Sankeys can be found on our blog here.
Next
Visualization - Exotic - Sankey technical documentation
How to Create an R Visualization Template
Publishing to Excel, PowerPoint, and as a PDF