We regularly update the underlying algorithms of our main text categorization feature. When the text categorizations are performed with longitudinal data, changes to the algorithms can necessitate re-creating the themes or re-classifying already-classified text responses.
This article describes:
- The "algorithms" covered by the release notes
- Why changes to the underlying algorithms can necessitate redoing text analyses
- Release notes
The "algorithms" covered by the release notes
By changes to "algorithms," we're referring to any of the following:
- Changes to large language models.
- Changes to the prompts passed to large language models. Please be aware that prompts shown in our user interface (e.g., Form the text into groups) are only a part of the actual prompt that is sent to the large language models.
- Changes to how we process the data sent to the prompts.
- Changes to machine learning models (e.g., the algorithms used to calculate similarity).
These algorithms affect the following features:
- Automatic theme creation (when the user presses Create).
- Automatic classification (when the user presses Classify).
- Sort by: Similarity.
Why changes to the underlying algorithms can necessitate redoing text analyses
Some algorithm changes lead to the text categorization features creating better themes. Typically, this has no implications for the user unless they are dissatisfied with the previous themes and wish to recreate them.
Some algorithm changes cause the AI to improve its accuracy in classifying responses to themes. The consequence of this is always that some responses that were previously classified to some themes are now, with the updated algorithm, classified to different themes. This can result in misleading conclusions with longitudinal studies. A particular theme may end up having more or fewer responses classified to it due to the change in the algorithm. If the earlier data is not re-classified using the new algorithm, it will result in changes in theme size over time being incorrectly interpreted as changes in the population over time. The way to rectify this problem is to redo the classification for the previously classified data.
Release notes
2025-06-27
The change: There have been two updates which may impact your Text Categorization:
- The underlying language model has changed to Gemini 2.0 Flash.
- Algorithmic changes to the way Text Categorization is performed for the multiple response classifications.
2024-11-18
The change: The theme creation and classification algorithms now take the variable set's name. Previously, the classification was based solely on the similarity of the text to the themes. For example, previously, if a response was "price", it would likely be classified in a theme relating to price, but the AI had no way of knowing the user's sentiment from this response. However, now, if say, the variable set name was "Reasons for disliking Microsoft," the AI would interpret the "price" response as indicating negative sentiment due to price.
This change improves both theme creation and classification. However, it can lead to large changes in both the themes identified and the classification.
Recommendations:
- Users should both re-create themes to see if more informative themes are identified.
- Any previous classifications should be redone.