This article describes how to go from verbatim text responses in either a single language or multiple languages:
To a state where the responses are translated into the language of your choice and automatically categorized:
This feature automatically categorizes the text variable containing unstructured text into single-response or multiple-response categories.
You can allow the algorithm to determine appropriate categories and their labels automatically based on patterns observed in the data. Alternatively, you can a provide partial categorization of cases in the data and the algorithm will predict which of the user-specified categories the remaining cases belong to using the method described here: How to Automatically Classify New Text Data Using an Existing Categorization.
These categories for all cases can then be saved by clicking Data > Save Variable(s) > Categories or First Category.
Requirements
You will need a Text variable that is in a foreign language. Text variables are represented by an A next to the variable in the Data Sources tree:
A nominal variable that stores the language for each text response, if the text variable contains multiple languages. Click here for a list of the currently supported languages.
Method
- Select the variable to translate and categorize automatically in the Data Sources tree.
- From the toolbar, go to Anything > Advanced Analysis > Text Analysis > Automatic Categorization > Unstructured Text.
- From Data > Translate (Google Cloud Translation), specify the Source language. If your text variable contains more than one language, select Specify with variable, and select the nominal variable that contains a list of the languages in the Source language variable dropdown. If the variable with the languages has missing data for any cases, then Displayr will make a best guess at the language.
- Specify the Output language that you would like the responses to be translated to.
- From Categories, select Category creation > Create new categorization.
- Click Calculate if Calculate automatically is not already ticked.
OPTIONAL:
You can a provide partial categorization of cases in the data and the algorithm will predict which of the user-specified categories the remaining cases belong to using the method described here: How to Automatically Classify New Text Data Using an Existing Categorization.
These categories for all cases can then be saved by clicking Data > Save Variable(s) > Categories or First Category.
- Categories: Save variables to the data set containing the categories. Where there are multiple input variables, multiple sets of variables are added for each.
- First category: Save a variable to the data set containing the first category mentioned. Where there are multiple input categories, the first category of each will be saved as a separate variable.
NOTE: The variables created from this using Save Variable(s) > Categories and First Category may become invalid and need to be deleted and recreated if the output has changed, either due to the input text variable being modified or the input settings modified.
Next
How to Automatically Translate Text Variables into Other Languages
How To Automatically Classify Unstructured Text Data
How to Automatically Classify New Text Data Using an Existing Categorization