When analyzing numeric data (i.e. Age in years or Income in $), you may want to review the spread of the data and create categories from this to analyze vs the raw numeric values. Displayr makes it very easy to do this with its categorizable histogram feature. With this, you can visually see and set the boundaries of your categories as well as take advantage of some automatic features. There are also other methods of banding (or categorizing) numeric data given your needs. This article describes how to use a numeric variable to create a histogram that can be used to divide responses into groups (buckets) for analysis.
The benefits of categorizing are:
- Allows you to avoid creating categories upfront in your survey question, for when you do not yet understand your market.
- Dynamically create and adjust categories/buckets that suit the distribution of your responses (i.e. if you get a lot of young people answering your survey, you can specify lots of young age categories).
- View percentages instead of an average.
- Break down other questions in your survey by the newly created categories.
Requirements
- Log in to Displayr and select + Add New to start a new document.
- Select Anything > Data > Data Sets > Add.
- Select My Computer and upload a data set that has at least one numeric variable. The data set called bus phone survey.sav has been used in this example.
Method
- Drag Years of operation of business (numeric) from Data Sets tree onto the page and you will get a table that looks similar to below.
- With the table selected, click on Visualization > Distributions > Categorizable Histograms from the object inspector to convert your table into a histogram.
- Click the Data tab from the object inspector.
- Scroll down to the HISTOGRAM CATEGORIES section.
- This is the area that will allow you to allocate the numeric data into categories
- Observe that the options are:
- Do not generate – currently selected
- With equal proportions – this is a starting point, where the data is categorized into 3 categories with equal proportions (e.g. 33%, or as close as it can be, according to the data)
- With equal intervals – this is an alternative starting point, where the 3 categories are equally spaced between the minimum and maximum.
- Choose With equal proportions.
- Observe that 2 red lines have been overlayed on top of the histogram.
- Behind the scenes, a new data item has been added which represents the percentages of people in each category. The labels of the data match the labels shown above the histogram (“Less than 11,” “11 – 24” and “25 or more”).
- OPTIONAL: You may customize the categories. For example:
- Change the Number of categories to 4. A new red line is added on top of the histogram.
- Change the category cutoff points. For example, click on the first red line so it appears selected. Once it appears selected (a new grey rectangle appears around it), click and drag the line to the left or right to change its cutoff point. Once you let go of your mouse, you can observe that the category labels and percentages update automatically. (Tip: when the red lines are overlapping the blue bins, they can be difficult to select. You can change the category cutoff point in the object inspector under Chart > CATEGORY LINE > Category cutoff point.)
- You may now use the new categorized data in other charts or tables:
- Drag Q4. The businesses number of locations (a nominal variable) from under Data Sets tree onto the page.
- Change it to a visualization using Visualization > Bar > Small Multiples Bar with Tests.
- In Inputs > DATA > Columns, select the new data Histogram categories - Q5. Years of operation of business (numeric). (Tip: This new variable will be next to the original Q5 variable!)
- Observe that the chart now shows the data by your categories.
- OPTIONAL: You can change the labels by selecting the variable Histogram categories - Q5. Years of operation of business (numeric) in the Data Sets tree and then clicking the Labels button in the DATA VALUES section of object inspector and entering new labels.
- OPTIONAL: Displayr will normally infer that this data is Numeric, however, if the data is Text, you will first need to convert it to Numeric using Data Manipulation > Structure > Average.
Now that you have made category data, there are two important points to note about Displayr in general that will make you work better:
- Displayr is dynamic: if you further customize your categories in the histogram (e.g. changing the cutoff points or category labels), that will automatically flow through to any chart that shows this categorized data.
- If you make a mistake when categorizing, you can use the Undo button at any time.
Next