How to Perform Data Fusion

This article describes how to link two data sets through data fusion. Data fusion, also known as statistical matching, involves combining data from two files whose samples do not overlap. For example, if there is one study on customer satisfaction and another on brand attitudes, data fusion can be used to combine the data.

Requirements

Two data sets are loaded in Displayr in your Data Sources tree.
In each of the files that you wish to fuse, you will need to have a micro-segment variable that has the following properties:
- It is a Nominal or Ordinal variable.
- It has the same Name, Label, and Variable Set Structure in each data file.
- It has the same unique values in each file. For example, if all respondents have values of 1, 2, 3, ..., 100 in one file, the same must be true in the other file. Importantly, there cannot be a situation where a value appears in one file but not the other.
- The unique values represent small segments. The assumption of the analysis is that:
  - The people in a data file in one of these segments are broadly similar to those in the other data file of the same value.
  - The segments explain the differences between people in both data files. For example, if brand attitudes are fused with customer satisfaction data, and age is the key determinant of both brand attitudes and customer satisfaction, then you could use age as the variable. More commonly, creating an index representing multiple variables will be appropriate.

Method

1. Select any data set folder in your Data Sources tree.

2. In the object inspector , click Edit relationships > New.

4. Select the names of each data set to link.

5. Set the micro-segment variable that appears in both data sets to match on.

6. Set the Relationship type to Many to many.

7. Choose what to do when a value is not found in the other data file.

8. Select which data file is the Recipient.

10. Press OK to save the relationship and again to go back to your document.

Please note the following:

The sample size of the combined data will be that of the Recipient data file specified in Edit Data File Relationships.
All of the respondents in the recipient sample are kept and used in analyses.
The respondents in the other data file are probabilistically selected to match the same number of respondents in the recipient, for each matching value in the micro-segment variable. For example, if the micro-segment variable is "Gender" and there are 10 Males in the recipient data file and 20 Males in the other data file, 10/20 Males are probabilistically selected from the other data file to be used in analyses.
The other data file's weights, if any, are an input to probabilistically selecting its respondents.
For filters to work on tables that use variables from both files in a many-to-many relationship, there must be a filter variable in each file. The filters must have identical variable names. Find the steps for how to set this up in How to Create Variables Across Data Files Using JavaScript

How to Create Data File Relationships

How to Work with Multiple Data Files

How to Create Variables Across Data Files Using JavaScript

Articles in this section

Requirements

Method

Next

Articles in this section

Requirements

Method

Next

Related articles