This article describes how to automatically recode the values in categorical variables similar to below:
To new midpoint values in a numeric variable, quantifying the information contained within the labels of input variable(s):
Requirements
- A data set with Nominal/Ordinal, Nominal/Ordinal - Multi, Numeric, or Numeric - Multi variables that have numbers or ranges of numbers in their labels.
Method
- Select the variable(s) you want to recode in the Data Sources tree.
- Hover and click Plus (+) > Ready-Made New Variable(s) > Numeric Variable(s) from Code/Category Midpoints.
The QScript will create a new numeric variable(s) and attempt to set midpoint values for each category that correspond to the numbers in the labels or midpoints of that category.
Where labels contain a single number, this value will be used. If no number is detected in the label, then the value of NaN will be assigned. Recoding will only be applied for
that have three or more labels containing numbers.Where the label contains a range of numbers, for example, 18 to 24 then the midpoint value will be used (for example 21 in this case). If a question is recoded according to mid-points and it contains a lower label like Less than 18 then the midpoint will be halfway between zero and the number in the label (in this example 9). When the question is recoded according to midpoints and it contains an upper label like 55 or more then the midpoint will be the number in the label plus half of the previous interval - so if the previous interval was 50 to 54 this midpoint will be set to 57. If no midpoint for a label can be determined then a value of NaN will be assigned.
If the labels include any kind of brackets, e.g. [[ or (, then only the text inside the brackets will be used. If there is no closing bracket (the label has been truncated) then everything after the opening bracket will be used.
Labels that contain references to time periods, such as days, weeks, minutes, and hours, or other units like liters or kilograms are difficult to recode in this way.