A Good and Satisfying Life among African Americans in Tallahassee, FL

I’ve spent the last few days finalizing my analysis.  The majority of my time was focused on two things, coding and quantitatively analyzing my themes (still working on the qualitative analysis!). During that process I’ve also been assembling a few powerpoint slides to help me organize and think things through.  I’ve come to really appreciate the process of putting together slides because it forces me to think about the big picture while also making sure that each step makes sense. Check it out below (and click any slide to make it super-big).

Posted in Quantitative Analysis, Text Analysis | Leave a comment

Word-by-Respondent Matrices and Correspondence Analysis using Aggregate and Meta Structural Codes


Figure 1. Good Life: All by respondents ONLY Code System

After several false starts parsing my data set by respondent versus interviewer turns and further by structural subcodes based on semi-structured interview themes of the Good Life (i.e. aspirations, resources and obstacles related to the Good Life) I have managed to create a useful concordance and correspondence analysis (apart from words in my stop-list) focused exclusively on data limited to all parts of the interviews that explore the Good Life and limited to statements from respondents. Figure 1. depicts what I’ve done in my Code System to achieve this.  The great advantage to creating such an aggregate structural code is that it allows you to then utilize the powerful word frequency function of MAXdictio to create a matrix of the most frequent words within very specific areas of your data to best hone in on the information you are most interested in.

Creating Aggregate Structural Codes

The first step to creating a matrix using this approach is to create your aggregate code.  In my case, I had already coded all of the interviews according to the semi-structured interview which itself was dividing into three subsections for the Good Life. This meant I  needed to aggregate up to all instances of the three subcodes for the Good Life while also limiting this code to respondents-only.  This can only be achieved in MAXQDA by using a two-step process.  First, you must create an aggregate of the Good Life codes by activating all three of the Good Life subcodes and all interview texts and then specifying the OR combination in the Text Retrieval Fx (which is the default function) to retrieve all of the applicable codes.  At this point you can autocode all of these retrieved statements to a new aggregate Good Life code.  This is necessary because the next step requires utilizing a different Text Retrieval Fx—the “if inside” function—to retrieve only respondent turns within the Good Life codes.  If you try to achieve this all-at-once you’ll be forced to choose between the two functions which then means your word frequency will include superfluous or overly limited information.  Once you’ve created your aggregate Good Life code, however, you can now use the ‘if inside’ Fx with respondent turn code to retrieve the text portions you want and once again use the autocode function in Text Retrieval to create a meta structural code ‘Good Life: All by respondent turns ONLY’ that can be used in a word frequency to produce a very targeted word-by-respondent matrix.

Executing the Word Frequency using a Meta Structural Code

Figure 2. Word Frequency of Meta Structural Code

The next step is simply to activate all of your texts along with your new meta code for ‘Good Life: All by respondent turns ONLY’.  You can now use the MAXDictio menu word frequency option to produce a matrix.  Important to this process is selecting the options specified in Figure 2. including only using retrieved segments and differentiating by documents (i.e. respondents).

If you have a good stop list in place, then your results  can be exported as a comma separated values file.

Figure 3. Word-by-Respondent List

Correspondence Analysis

Now we can use Excel to edit our matrix according to a selected cut-off which I’ll choose here by creating a scree-plot which can be done with a couple of clicks in Excel by selecting the entire frequency column and choosing the X Y Scatter Plot Chart option as shown in Figure 4.

Figure 4. Scree Plot in Excel

Finally, once you have your scree plot you can create a trendline by right-clicking on any of the points in the chart and selecting ‘create trendline’.  Now it’s up to you whether you want to choose the point that intersects with the trend line as your cut-off (more parsimonious but this usually means you’ll have more words to consider) or to eye-ball the elbow of the scree plot as your cut-off (more common) for your word list.  In my case, I eyeballed the elbow and chose 72 as my cut-off which means deleting rows 74 and below (don’t forget the row numbers include the header column!).  You’ll also want to delete the three columns listing cumulative statistics on the concordance and re-save your matrix.  From here on out it’s just a matter of copying and pasting the matrix into UCINET in order to save it as a UNINET file which can then be used to produce the Correspondence plot shown in Figure 6 by using the menus: Toos–>Scaling/Decomposition–>Correspondence….

Figure 5. Pasting your word-by-respondent matrix into UCINET.

Figure 6. Word-by-respondent Correspondence Analysis

Figure 7. Income Similarity non-metric MDS with 3 dimensions (showing 2 for X and 3 for Y)

Finally, as an alternative to the above method I can use my scree plot to choose a different cut off according to a break in the data, which I found around the 26th word.  When I limit things to just the top 26 words there is more clarity in the correspondence analysis. And for even more clarity I can transform my my data into binary variables and run a similarity matrix by rows or columns only to see just words or just subjects.

Figure 8. Words Similarity non-metric MDS with 3 dimensions (showing 2 for X and 3 for Y)

You’ll notice in figure 7 that I’ve replaced subject IDs with income range labels to allow me to explore whether those of various income groups seem to mention any words more than other groups.

Posted in Correspondence Analysis, Text Analysis | Tagged , , , , | Leave a comment

6005 Ways to Improve Your Word Frequencies

An Interview

Two-way Interviews Can Be Parsed Into Just Responses for Ease of Analysis

The title of the post stems from the number of turns interviewers took either asking questions, probing or otherwise following up with participants in the Tallahassee data set of semi-structured interviews I’m analyzing. And when you consider that this number comes from just 33 semi-structured interviews of ~1 hour each, the potential scale of a qualitative analysis can seem quite intimidating. Of course, from a glass half-full perspective based on the give-and-take of interviews there are about 6005 participant responses (5542 actually but I haven’t quite figured out that anomaly…) to those 6005 interviewer turns and by parsing the interviewer turns from the respondent turns you’re able to cut the body of data to be analyzed in a word frequency or other similar analysis dramatically.

Marking turns and parsing data helps eliminate a common error in quantitative analysis whereby you produce a word list (or concordance) based on entire transcripts which will then bias your results according to how often an interviewer used a repetition probe or otherwise used words that you’re interested in as part of your analysis.

Posted in Quantitative Analysis, Text Analysis | Tagged , , , | Leave a comment

Chunky Themes and Patterned Metaphors

Giant Panda (Ailuropoda melanoleuca) "Tian Tian"

Chunk, Indeed

This week I dove into line-by-line coding, chewed my way threw a bamboo forest of memo writing, and juggled metaphors in a Schema analysis.  The line-by-line coding and memo writing are what Bernard and Ryan (2010) describe as the first of three common steps in the process of grounded theory analysis.  The title of this week’s post is based on Bernard and Ryan’s (p.192) observation that grounded theory achieves its analytic ends by segmenting text “into chunks,” grouping chunks into themes, and analyzing those themes.  By comparison, schema analysis–they also point out–tries to find patterns in the entirety of a text.

My Experience So Far: GT Analysis

I decided to start with two separate interviews that I’ve only previously coded for structure.  With the first interview I went line-by-line (though sometimes two-lines at a time if I’m being honest) and identified the simplest and least meta-analytic codes I could manage.  I’m happy to report that the experience had its intended effect; I was forced to stop looking for what I’ve already seen in the data and instead focus on the seemingly mundane, but ultimately crucial, minutiae.  For those out there who struggle with perfectionism (it’s a mixed blessing at best as I understand it) this technique really forces you to keep moving and to stop trying to always connect things on the fly.  Instead, you can simply get through the text and then go back and read look for connections between your chunks only then starting to piece things together with memos.  Far from boring, I found the methodical march of line-by-line coding to free me from habits that often seem to slow me down.

My Experience So Far: Schema Analysis

With my second text I focused on getting a bigger picture of the interview first and then went about methodically looking for important metaphors.  Now, admittedly, the point of schema analysis is to compare metaphors across units of analysis (i.e. your interviewees) but for this round of analysis it was all I could do to finish up my line-by-line analysis, read through this second interview and then identify every metaphor I could. Upon reflexion, I’m not sure the initial read through was necessary but after reading Bernard and Ryan’s description of Schema Analysis in Chapter 9 it seemed like a potentially fruitful approach.  In my experience, it wasn’t very helpful in finding metaphors, but it was helpful in coming up with more cogent memos once I had identified metaphors.

Posted in Grounded Theory, Text Analysis | Tagged , , , , , | Leave a comment

Rich Attribute Data (Read: Drowning in Data)

I spent time this week de-identifying attribute data and formatting the data set in order for import into MAXQDA. The result is 109 columns of attributes for each of the 25 respondents that currently have both attribute and semi-structured data available.  Above you can see the result of my import from a UTF-Unicode 16 txt file I created using Excel.  Now begins the process of trying to limit the possible exploratory analyses that will best help me understand what the qualitative data can tell me about the social network survey data measures (e.g. tie strength or structural measures).

Posted in Text Analysis | Tagged , , , | Leave a comment