How to Create Data File Relationships

When an analysis is conducted between variables in multiple data files, Displayr needs to work out how observations in the data files relate to each other. That is, rules are needed to merge the data files.

This allows users to crosstab questions from two different data files, provided those files have a data file relationship that tells Displayr how the observations relate to each other. The link connects common respondents/observations in both data sets together, so if you want to analyze them, the appropriate value(s) from each data set line up for the corresponding observations.

Linking data sets is commonly done when you have unstacked data and want to analyze it against stacked data from the same survey, or when you have related data from two different data sources. However, this should not be used as a substitute when merging new variables into your data. See How to Combine Data Files by Adding New Variables for details on how to add new variables to your data set. It also will not work for analyzing wave-on-wave data. See How to Combine Data Files by Adding New Records for instructions on how to create a single data set with multiple waves of data.

There are also some limitations to working with variables across data sets when using a data file relationship:

They cannot be used in the same banner.
Filters created can only apply to an output that uses data from the same data set as the filter.
They cannot be used together in R-based analyses and outputs.

Requirements

A Displayr document containing two related datasets.
To create a relationship between two files, there must be a variable in both files that contains the same type of data (text, categorical, dates, etc.) and has some values that match. Normally, this is some sort of ID variable.

Method

To establish a relationship between two files:

Click one of the two files you wish to link in the Data Sources tree:
Click the Edit Relationships button.
Click the New button if the files have not already been linked.

Edit Relationship Dialog

Use the two Data set dropdowns to select the two files, and the two Variable dropdowns to choose the common variable you are using to match the files, e.g., UniqueID. Note that both variables must contain the same type of data (text, categorical, date, etc.).
Use the Relationship type menu to specify how the data should be matched across the data sets (in this example, the match is One to one):
- One to one: Each single value from the first data set's variable matches exactly to a single value in the second data set's variable.
- One to many: A single value from the first data set's variable matches multiple values in the second data set's variable. This option is commonly used with stacked data.
- Many to one: Multiple values from the first data set's variable match a single value in the second data set's variable. This is the same type of relationship as One to many, with the first and second data sets swapped.
- Many to many: Multiple values from the first data set's variable match multiple values in the second data set's variable, resulting in Data Fusion.
Use the When a value is not found in the other data set menu to specify how you want to treat values that exist in one file but not the other. The choices are:
- Show a warning message (default) - When a respondent's value in the first data set's variable cannot be found in the second data set's variable (or the other way round), a warning is shown and you will not be able to proceed with the crosstab until you either fix the data or come back to this screen and select another option.
- Insert missing values into the matched data - If a respondent's value in the first data set's variable cannot be found in the second data set's variable (or the other way round), the respondent is included in the sample as missing data (NaN) rather than their actual response data.
- Exclude cases from the matched data - If a respondent's value in the first data set's variable cannot be found in the second data set's variable (or the other way round), the respondent is excluded from the sample.
Use the Match dates that fall in the same: menu for ways to treat dates for when a case in the first data set's variable falls in the same year (or month, week, or day) as the date for a case in the second data set's variable.
Use the Recipient menu to define the recipient when the relationship type between the two data sets is Many to many.
Click OK.
Note: if you receive a warning message, the message will tell you how to fix the problem before proceeding. For example, this message indicates that I need to either update the file or click General > Edit relationships and choose another option for when a value is found in one file but not the other.

To fix this particular problem, I will insert missing values into the matched area.
Click OK.
The results are as follows:

Diagnostics button

Use the Diagnostics button in the Edit Relationships window if you want to be warned of problems you should fix prior to matching the files.

Performing an analysis between variables in different files

Now that the files are linked, I can use variables from both files in the same table. In this example, the Gender variable comes from the Demographics file, and the Preferred cola variable comes from the Cola Tracking - January to December file.
relationship table.png

Using Weights and Filters

When you have two data files with no relationship between them, Displayr expects the filter or weight variable to have identical names/labels and will allow users to choose a filter, and it automatically works out which data set’s filter to use.

However, if you have a data file relationship, Displayr expects the weight/filter to only appear in one of the files. This is because if they appear in both, you end up with unresolvable logical problems.

If you wish to filter the table, and the relationship is Many-to-many, there must be a filter variable with the same Name and Label in both data files. See How to Create Variables Across Data Files Using JavaScript for the steps to do this.

Charting Time Series With Multiple Data Sources

Once you have a project with two linked data files, you can chart a time series that shows data from both data files.

The following prerequisites must be met:

There must be a Date question in both data files with:
- The same Variable Name
- The same Question Name
- The same Question Type (Date)
If you wish to filter the Time Series chart, there must be a variable with the same Name and Label in both data files. The Usable as a filter box must be checked in the object inspector .
If you wish to weight the Time Series chart, there must be a variable with the same Name and Label in both data files. The Usable as a weight box must be checked in the object inspector .

With the above prerequisites met, you may create a Time Series chart with multiple data sources by following these steps:

Click Visualization > Time Series > Time Series with Dynamic Window from the toolbar.
From the object inspector , go to Data > Data Source, and select the type of data source you wish to use to create the Time Series Graph.
- If you wish to use an existing table, go to Data and select the desired table from the dropdown menu.
- If you wish to use variables, go to Variables and first select the Date/Time structured variable, followed by the Numeric variable. Alternatively, you can drag and drop the variables from the Data Sources tree into the menu itself.