This article describes how to merge data files together when you wish to add new variables to your current data set. This scenario can occur when you have further analysis or additional data stored in a different data set and wish to combine them into a single data set.
- Select the Data Sets
- Set the Case Matching method
- Set the Variables to exclude or include
- Reading the Output
- Automatic updating
- Add the data set to your document
In this example, we have a Cola_tracker data set and a Cola_demos data set whereby we wish to add 3 of the 4 demographic variables from the latter into the main tracker data set.
Requirements
- Files to be combined must be stored in the Displayr Cloud Drive. These files can be added by clicking the Initials icon at the top right > Displayr Cloud Drive > + Upload. See How to Use the Displayr Cloud Drive for details.
- Only .SAV files can be combined using this method.
- See Data - Data Set - Combine - By Variable for further technical details.
Method
Select the Data Sets
1. In the toolbar, select > Data > Data Set > Combine > By Variable.
2. Under Data > Data sets > Data set 1, type the name of the first file from the Cloud Drive to combine. In this example, we will enter Cola_tracker.sav (including the file extension).
3. Under Data > Data sets > Data set 2, type the name of the second file from the Cloud Drive to combine. In this example, we will enter Cola_demos.sav (including the file extension).
4. [OPTIONAL] If you have more data sets to combine, the object inspector will keep adding a further data set field to give you this option.
5. Type the name of the new data set you wish to create in the Combined data set name field. Here, we will enter Cola_final (without the file extension):
Set the Case Matching method
The best method to use for matching records is to set Combine data sets by to Matching IDs. This is for when you have an ID variable in both data sets that store matching values (although the rows in both data sets do not need to be in the same order).
1. Enter the name of the ID variable under ID variable (data set 1).
2. Then do the same for data set 2.
Note, the variable name can be found in Displayr for a loaded data set by selecting the relevant variable in your Data Sources tree and going to General > General > Name in the object inspector.
3. [OPTIONAL] Tick Only keep cases matched to all data sets when you wish to only retain records that have the same ID in both data sets.
Things to note here:
- If a record in one file cannot be matched to a record in the other file then missing values will be substituted.
- If a record in one file matches to more than one record in the other file then the data from the first file will be duplicated across each case in the other file.
The option of Joining them side-by-side (no matching) is instead used when you don't have a matching ID variable in both data sets but each row in data set 1 corresponds to the same row in data set 2. This requires all the rows to be in the exact same order.
Set the Variables to exclude or include
1. By default, each Variables from data set option will be set to Include all variables except those manually omitted.
2. This provides you with a Variables to omit from data set field for each data set.
3. [OPTIONAL] If you instead change any of these to Only choose manually specified variables, you will be presented with a Variables to include from data set field for that data set.
4. When manually specifying variables to include or exclude, you can do so by using any of the below methods:
- List the variable names as a list separated by a comma, e.g.
Q1,Q2
. - Set a range of variable names that also include all variables in between based on data set order, e.g.
Q1-Q6
. - Use wildcards that include any variables that, for example, start with a variable name prefix, e.g.
Q2_*
. - You can combine any of these methods by separating them with a comma.
In this example, we will omit d4 from the second data set.
The output now appends the new variables to the primary data set and, in this case, omits d4 which is not required here:
Reading the Output
- All variables in the combined data set will be listed in the output.
- Any variable in the output that has been matched will appear in blue and be expandable to reveal how it was matched.
- By default, this list will be truncated when there are many variables, but you can select Diagnostics > Variables from Combined Data in the object inspector to output the entire list of variables.
Automatic updating
If your input data sets will be periodically updated in the Cloud Drive, you can set this as a repeatable workflow that automatically combines the latest input data sets and exports the combined data file to the Cloud Drive. This can be achieved by adding a schedule to your Data - By Variable output.
Add the data set to your document
Once the output has run, it will save the combined data set to your Displayr Cloud Drive using the name you specified. You can then do one of the following to add this data set to your document, depending on your requirements.
Add a new data set
1. Click the icon in the Data Sources tree.
2. Select Displayr Cloud Drive.
3. Select the combined data set from the list of files and press OK.
Update an existing data set
1. Select the original data set that you would like to update with the merged file in the Data Sources tree.
2. From the object inspector, click Update.
3. Select Displayr Cloud Drive.
4. Select the combined data set from the list of files and press OK.
5. A Data Difference Warning may appear by alerting you of any changes to the data set. Review these results and click Accept or Remind Me Later.
6. The data set and any connected outputs will now be updated with the combined data.
Next
How to Combine Data Files by Adding New Records
How to Work with Multiple Data Files