This article describes how to go from verbatim text responses in either a single language or multiple languages:
To a state where the responses are translated into the language of your choice and automatically categorized:
This feature automatically categorizes the text variable containing unstructured text into single-response or multiple-response categories.
You can allow the algorithm to determine appropriate categories and their labels automatically based on patterns observed in the data. Alternatively, you can a provide partial categorization of cases in the data and the algorithm will predict which of the user-specified categories the remaining cases belong to using the method described here: How to Automatically Classify New Text Data Using an Existing Categorization.
These categories for all cases can then be saved by clicking Data > SAVE VARIABLE(S) > Categories.
Requirements
You will need a Text variable in order to perform automatic coding. Text variables are represented by a small a next to the variable in the Data Sources tree:
OPTIONAL: If you have an input variable with multiple languages you will need to supply a nominal variable indicating language, enabling multiple languages to be translated at the same time.
Please note these steps require a Displayr license.
Method
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Automatic Categorization > Unstructured Text.
- From the object inspector, select the text variable you would like to categorize and translate from Data > DATA SOURCE > Text variable.
- From Data > TRANSLATE (GOOGLE CLOUD TRANSLATION), specify the Source language. If your text variable contains more than one language, select Specify with variable, and select the nominal variable that contains a list of the languages in the Source language variable dropdown. If the variable with the languages has missing data for any cases, then Displayr will make a best guess at the language.
- Specify the Output language that you would like the responses to be translated to.
- Click Calculate if Automatic is not already ticked.
OPTIONAL:
You can a provide partial categorization of cases in the data and the algorithm will predict which of the user-specified categories the remaining cases belong to using the method described here: How to Automatically Classify New Text Data Using an Existing Categorization.
These categories for all cases can then be saved by clicking Data > SAVE VARIABLE(S) > Categories or First Category.
- Categories: Save variables to the data set containing the categories. Where there are multiple input variables, multiple sets of variables are added for each.
- First category: Save a variable to the data set containing the first category mentioned. Where there are multiple input categories, the first category of each will be saved as a separate variable.
NOTE: The variables created from this using SAVE VARIABLE(S) > Categories and First Category may become invalid and need to be deleted and recreated if the output has changed, either due to the input text variable being modified or the input settings modified.
Next
How to Automatically Translate Text Variables into Other Languages
How To Automatically Code Unstructured Text Data
How to Automatically Classify New Text Data Using an Existing Categorization