Figure 1. Good Life: All by respondents ONLY Code System
After several false starts parsing my data set by respondent versus interviewer turns and further by structural subcodes based on semi-structured interview themes of the Good Life (i.e. aspirations, resources and obstacles related to the Good Life) I have managed to create a useful concordance and correspondence analysis (apart from words in my stop-list) focused exclusively on data limited to all parts of the interviews that explore the Good Life and limited to statements from respondents. Figure 1. depicts what I’ve done in my Code System to achieve this. The great advantage to creating such an aggregate structural code is that it allows you to then utilize the powerful word frequency function of MAXdictio to create a matrix of the most frequent words within very specific areas of your data to best hone in on the information you are most interested in.
Creating Aggregate Structural Codes
The first step to creating a matrix using this approach is to create your aggregate code. In my case, I had already coded all of the interviews according to the semi-structured interview which itself was dividing into three subsections for the Good Life. This meant I needed to aggregate up to all instances of the three subcodes for the Good Life while also limiting this code to respondents-only. This can only be achieved in MAXQDA by using a two-step process. First, you must create an aggregate of the Good Life codes by activating all three of the Good Life subcodes and all interview texts and then specifying the OR combination in the Text Retrieval Fx (which is the default function) to retrieve all of the applicable codes. At this point you can autocode all of these retrieved statements to a new aggregate Good Life code. This is necessary because the next step requires utilizing a different Text Retrieval Fx—the “if inside” function—to retrieve only respondent turns within the Good Life codes. If you try to achieve this all-at-once you’ll be forced to choose between the two functions which then means your word frequency will include superfluous or overly limited information. Once you’ve created your aggregate Good Life code, however, you can now use the ‘if inside’ Fx with respondent turn code to retrieve the text portions you want and once again use the autocode function in Text Retrieval to create a meta structural code ‘Good Life: All by respondent turns ONLY’ that can be used in a word frequency to produce a very targeted word-by-respondent matrix.
Executing the Word Frequency using a Meta Structural Code
Figure 2. Word Frequency of Meta Structural Code
The next step is simply to activate all of your texts along with your new meta code for ‘Good Life: All by respondent turns ONLY’. You can now use the MAXDictio menu word frequency option to produce a matrix. Important to this process is selecting the options specified in Figure 2. including only using retrieved segments and differentiating by documents (i.e. respondents).
If you have a good stop list in place, then your results can be exported as a comma separated values file.
Figure 3. Word-by-Respondent List
Now we can use Excel to edit our matrix according to a selected cut-off which I’ll choose here by creating a scree-plot which can be done with a couple of clicks in Excel by selecting the entire frequency column and choosing the X Y Scatter Plot Chart option as shown in Figure 4.
Figure 4. Scree Plot in Excel
Finally, once you have your scree plot you can create a trendline by right-clicking on any of the points in the chart and selecting ‘create trendline’. Now it’s up to you whether you want to choose the point that intersects with the trend line as your cut-off (more parsimonious but this usually means you’ll have more words to consider) or to eye-ball the elbow of the scree plot as your cut-off (more common) for your word list. In my case, I eyeballed the elbow and chose 72 as my cut-off which means deleting rows 74 and below (don’t forget the row numbers include the header column!). You’ll also want to delete the three columns listing cumulative statistics on the concordance and re-save your matrix. From here on out it’s just a matter of copying and pasting the matrix into UCINET in order to save it as a UNINET file which can then be used to produce the Correspondence plot shown in Figure 6 by using the menus: Toos–>Scaling/Decomposition–>Correspondence….
Figure 5. Pasting your word-by-respondent matrix into UCINET.
Figure 6. Word-by-respondent Correspondence Analysis
Figure 7. Income Similarity non-metric MDS with 3 dimensions (showing 2 for X and 3 for Y)
Finally, as an alternative to the above method I can use my scree plot to choose a different cut off according to a break in the data, which I found around the 26th word. When I limit things to just the top 26 words there is more clarity in the correspondence analysis. And for even more clarity I can transform my my data into binary variables and run a similarity matrix by rows or columns only to see just words or just subjects.
Figure 8. Words Similarity non-metric MDS with 3 dimensions (showing 2 for X and 3 for Y)
You’ll notice in figure 7 that I’ve replaced subject IDs with income range labels to allow me to explore whether those of various income groups seem to mention any words more than other groups.