This article describes how to link two data sets together by performing data fusion. Data fusion, also known as statistical matching, involves combining the data from two data files, where the samples for the data files are not overlapping. For example, if there is one study looking at customer satisfaction and a completely separate study which looks at brand attitudes, data fusion can be used to combine the data.
- Two data sets loaded in Displayr in your Data Sets tree.
- In each of the files that you wish to fuse, you will need to have a micro-segment variable which has the following properties:
- It is a Nominal or Ordinal variable.
- It has the same Name, Label, and Variable Set Structure in each data file.
- It has the same unique values in each file. For example, if in one file all respondents have values of 1, 2, 3, ..., 100, then the same must be true in the other file. Importantly, there cannot be a situation where a value appears in one file but not the other.
- The unique values represent small segments. The assumption of the analysis is that:
- The people in a data file in one of these segments are broadly similar to those in the other data file of the same value.
- The segments explain differences between people in both data files. For example, if fusing brand attitudes with customer satisfaction data, if it is the case that age is the key determinant of both brand attitudes and customer satisfaction, then you could use age as the variable. More commonly, it will be appropriate to create an index representing multiple variables.
1. Select any data source folder in your Data Sets tree.
2. In the object inspector, click Edit relationships > New.
4. Select the names of each data set to link.
5. Set the micro-segment variable that appears in both data sets to match on.
6. Set the Relationship type to Many to many.
7. Choose what to do When a value is not found in the other data file.
8. Select which data file is the Recipient.
10. Press OK to save the relationship and again to go back to your document.
Please note the following:
- The sample size of the combined data will be that of the Recipient data file specified in Edit Data File Relationships.
- All of the respondents in the recipient sample are kept and used in analyses.
- The respondents in the other data file are probabilistically selected to match the same number of respondents in the recipient, for each matching value in the micro-segment variable. For example, if the micro-segment variable is "Gender" and there are 10 Males in the recipient data file, and 20 Males in the other data file, 10/20 Males are probabilistically selected from the other data file to be used in analyses.
- The other data file's weights, if any, are an input to probabilistically selecting its respondents.
- For filters to work on tables that use variables from both files in a many-to-many relationship, there must be a filter variable in each file. The filters must have identical variable names.