This article describes how to merge data file together to add new variables to your current data. You might want to do this when you've categorized open ends outside Displayr (though it is super easy to do this within Displayr) and want to merge those into the data file to analyze alongside the rest of your survey.
In this example, two files census_05_demographics_men_women.sav and census_05_attitudes.sav were merged. Variables with the black box in the first column came from the demographics file and those on the second column from the attitudes files. Both files have a common variable called ID.
Requirements
Please note these steps require a Displayr license.
- Files to be merged must be stored on the Displayr Cloud Drive.
- Files to be merged must be .sav files
Method
File name specification
- Select + > Data > Data Set > Combine > By Variable.
- In Data > Data sets, type the name of the first file on the Cloud Drive in the Data set 1 box.
- In Data > Data sets, type the name of the second file on the Cloud Drive in the Data set 2 box.
- Type the name of the merged data set in the Combined data set name box. By default, it will be called Combined data set.sav.
Case Matching
You can either Combine data sets by Matching IDs or joining them side-by-side (no matching). The latter method is useful if both files have the same cases but do not have an ID variable. Case 1 will be matched with Case 1, Case 2 with Case 2, etc. You should only use this method if the cases in both files are in the same order and both files have the same number of cases.
It is usually preferable to match by an ID variable if one is available. If you are using this method, when entering the ID variable fields, you will need to use the variable Name (not label).
Notes
- If a case in one file cannot be matched to a case in the other then missing values will be substituted.
- If a case in one file matches more than one case in the other then the data from the first file will be duplicated across each case in the other file.
If Only keep cases matched to all data sets is selected, only cases that have IDs in all data sets are retained. This checkbox is only shown when Combine data sets by has "Matching IDs" selected.
Variables
Options are:
Variables from data set 1, Variables from data set 2, ... Combo boxes with the options "Include all variables except those manually omitted" and "Only include manually specified variables". If the former is selected, all variables from the data set will be included in the combined data set except those manually specified to be omitted in the text controls that appear below. If the latter is selected, all variables from the data set will be excluded from the combined data set except those manually specified to be included in the text controls that appear below.
Variables to omit from data set 1, Variables to omit from data set 2, ... Input text controls that appear when "Include all variables except those manually omitted" is selected. These are used to specify variables that should be omitted from the combined data set. Variable ranges are supported (a range is specified by the start and end variable names separated by a dash, e.g. "Q2-Q6") and also variable name wildcards, e.g. "Q2_*" which matches all variables with names starting with "Q2_".
Variables to include from data set 1, Variables to include from data set 2, ... Input text controls that appear when "Only include manually specified variables" is selected. These are used to specify variables that should be included in the combined data set. Variable ranges are supported (a range is specified by the start and end variable names separated by a dash, e.g. "Q2-Q6") and also variable name wildcards, e.g. "Q2_*" which matches all variables with names starting with "Q2_".
Automatic Updating
Use Automatic updating if you want the merge to refresh automatically after a specified time period. This is used when the input data set is regularly updated.
Update period The time unit for regular updates. Shown when Automatic updating is selected.
Frequency The multiple of the Update period for regular updating. Shown when Automatic updating is selected.
Start date and time The date and time of the first update in the format dd-mm-yyyy hh:mm or mm-dd-yyyy hh:mm. Shown when Automatic updating is selected.
US date format Whether the Start date and time is expressed in US format i.e. mm-dd-yyyy hh:mm. Shown when Automatic updating is selected.
Time zone An optional time zone for the Start date and time, or else default of UTC applies. Format must be Continent/City, e.g. America/Los_Angeles. See Wikipedia for a list of time zones. Shown when Automatic updating is selected.
Next Steps:
Once the merge has been performed and is successful, the new merged file is automatically saved to the Cloud Drive using the combined data set name given at Step 4:
You can now update your existing data set with the merged file by:
- Select the original data set that you would like to update with the merged file in Data Sources.
- From the object inspector, click Update.
- Select Displayr Cloud Drive.
- Select the merged data set from the list of files and click OK.
- A Data Difference Warning may appear alerting you of any changes to the data set. Review these results and click Accept or Remind Me Later.
- The data set and outputs that you created will be updated with data from the combined data set.
Next
How to Merge Files by Case (Add New Cases)
How to Work with Multiple Data Files