This article describes how to automatically recode the labels in categorical variables similar to below:
To new numeric midpoint values, quantifying the information contained within the labels of input variable(s):
Requirements
- A data set with Nominal/Ordinal, Nominal/Ordinal - Multi, Numeric, or Numeric - Multi variables that have numbers or ranges of numbers in their labels.
Please note these steps require a Displayr license.
Method
- Select the variable you want to recode in the Data Sources tree.
- Either hover over any variable in your Data Sources tree > Plus (+) > Ready-Made New Variable(s) > Numeric Variable(s) from Code/Category Midpoints or else from the object inspector, select Data > TRANSFORMATIONS > Numeric Variable(s) from Code/Category Midpoints.
- The QScript will create a new variable and attempt to set values for each category that corresponds to the numbers in the labels or midpoints of that category.
Where labels contain a single number, this value will be used. If no number is detected in the label, then the value of NaN will be assigned. Recoding will only be applied for
that have three or more labels containing numbers.Where the label contains a range of numbers, for example, 18 to 24 then the midpoint value will be used (for example 21 in this case). If a question is recoded according to mid-points and it contains a lower label like Less than 18 then the midpoint will be halfway between zero and the number in the label (in this example 9). When the question is recoded according to mid-points and it contains an upper label like 55 or more then the midpoint will be the number in the label plus half of the previous interval - so if the previous interval was 50 to 54 this midpoint will be set to 57. If no midpoint for a label can be determined then a value of NaN will be assigned.
If the labels include any kind of brackets, e.g. [[ or (, then only the text inside the brackets will be used. If there is no closing bracket (the label has been truncated) then everything after the opening bracket will be used.
Labels that contain references to time periods, such as days, weeks, minutes, and hours, or other units like liters or kilograms are difficult to recode in this way.
Next
How to Recode Variables Using Category Midpoints