AB01 – Graham, S. S., Kim, S.-Y., Devasto, M. D., & Keith, W. (2015). Statistical genre analysis: Toward big data methodologies in technical communication.

A team of researchers determines to bring the power of “big data” into the toolkit of technical communication scholars by piloting a research method they “dub statistical genre analysis (SGA)” and describing and explaining the method in an article published in the journal Technical Communication Quarterly (Graham, Kim, Devasto, & Keith, 2015, pp. 70-71).

Acknowledging the value academic markets have begun assigning to findings, conclusions, and theories founded upon rigorous analysis of massive data sets, this team deconstructs the amorphous “big data” phenomenon and demonstrates how their SGA methodology can be used to quantitatively describe and visually represent the generic content (e.g. types of evidence and modes of reasoning) of rhetorical situations (e.g. committee meetings) and to discover input variables (e.g. conflicts of interest) that have statistically significant effects upon output variables (e.g. recommendations) of important policy-influencing entities such as the Food and Drug Administration’s (FDA) Oncologic Drugs Advisory Committee (ODAC) (Graham et al., 2015, pp. 86-89).

The authors believe there is much to gain by integrating the “humanistic and qualitative study of discourse with statistical methods” and although they respect the “craft character of rhetorical inquiry” (Graham et al., 2015, pp 71-72) and utilize “the inductive and qualitative nature of rhetorical analysis as a necessary” initial step in their hybrid method (Graham et al., 2015, p. 77), they conclude their mixed-method SGA approach can increase the “range and power” (Graham et al., 2015 p. 92) of “traditional, inductive approaches to genre analysis” (Graham et al., 2015, p. 86) by offering the advantages “of statistical insights” while avoiding the disadvantages of statistical sterility that can emerge when the qualitative humanist element is absent (Graham et al., 2015, p. 91).

In the conclusion of their article, the researchers identify two main benefits of their hybrid SGA method. The first benefit is communication genres “can be defined with more precision” since SGA documents the actual frequency of generic conventions as they exist within a large sample of the corpus, rather than being defined more generally since traditional rhetorical methods may document the opinions experts have of the “typical” frequency of generic conventions as they perceive them to exist within a limited sample of “exemplars” selected from a small sample of the corpus. In addition, the authors argue analysis of a massive number of texts may reveal generic conventions that do not appear in the limited sample of exemplars that may be studied by practitioners of the traditional rhetorical approach involving only “critical analysis and close reading.” The second benefit is communications scholars are enabled to move beyond critical opinion and to claim statistically significant correlations between “situational inputs and outputs” and “genre characteristics that have been empirically established” (Graham et al., 2015, p. 92).

Befitting the subject of their study, the authors devote a considerable portion of their article to describing their research methodology. In the third section titled “Statistical Genre Analysis,” they begin by noting they conducted the “current pilot study” on a “relatively small subset” of the available data in order to “demonstrate the potential of SGA.” Further, they outline their research questions, the answers to two of which indeed seem to attest to the strength SGA can contribute to both the evidence and the inferences used by communication scholars in their own arguments about the communications they study. As they do in the introduction, in this section also, the authors note the intellectual lineage of SGA in various disciplines, including “rhetorical studies, linguistics,” “health communication,” psychology, and “applied statistics” (Graham et al., 2015, pp. 71, 76).

As explained earlier, the communication artifacts studied by these researches are selected from among the various artifacts arising from the FDA’s ODAC meetings, specifically the textual transcriptions of presentations (essentially opening statements) given by the sponsors (pharmaceutical manufacturing companies) of the drugs under review during meetings which usually last one or two days (Graham et al., 2015, pp. 75-76). Not only in the arenas of technical communication and rhetoric, but also in the arenas of Science and Technology Studies (STS) and of Science, Technology, Engineering, and Math (STEM) public policy, managing conflicts of interests among ODAC participants and encouraging inclusion of all relevant stakeholders in ODAC meetings are prominent issues (Graham et al., 2015, p. 72). At the conclusion of ODAC meetings, voting participants vote either for or against the issue under consideration, generally “applications to market new drugs, new indications for already approved drugs, and appropriate research/study endpoints” (Graham et al., 2015, pp. 74-76).

It is within this context the authors attempted to answer the following two research questions, among others, regarding all ODAC meetings and sponsor presentations given at those meetings between 2009 and 2012: “1. How does the distribution of stakeholders affect the distribution of votes?” and “3. How does the distribution of evidence and forms of reasoning in sponsor presentations affect the distribution of votes?” (Graham et al., 2015, pp. 75-76). Notice both of these research questions ask whether certain input variables affect certain output variables. And in this case, the output variables are votes either for or against an action that will have serious consequences for people and organizations. Put another way, this is a political (or deliberative rhetoric) situation and the ability to predict with a high degree of certainty which inputs produce which outputs could be quite valuable, given those inputs and outputs could determine substantial budget allocations, consulting fees, and pharmaceutical sales – essentially, success or failure – among other things.

Toward the aim of asking and answering research questions with such potentially high stakes, the authors applied their SGA mixed-methods approach, which they explain included four phases of research conducted over approximately six months to one year and included at least four researchers. The authors explain SGA “requires first an extensive data preparation phase” after which the researchers “subjected” the data “to various statistical tests to directly address the research questions.” They describe the four phases of their SGA method as “(a) coding schema development, (b) directed content analysis, (c) meeting data and participant demographics extraction, and (d) statistical analyses.” Before moving into a deeper discussion of their own “coding schema” development, as well as the other phases of their SGA approach, the authors cite numerous influences from scholars in “behavioral research,” “multivariate statistics,” “corpus linguistics,” and “quantitative work in English for specific purposes,” while explaining the specific statistical “techniques” they apply “can be found in canonical works of multivariate statistics such as Keppel’s (1991) Design and Analysis and Johnson and Wichern’s (2007) Applied Multivariate Statistical Analysis” (Graham et al., 2015, pp. 75-77). One important distinction the authors make between their method and these other methods is while the other methods operate at the more granular “word and sentence level” that facilitates formulation of “coding schema amenable to automated content analysis,” the authors operate at the less granular paragraph level that requires human intervention in order to formulate coding schema reflecting nuances only discernable at higher cognitive levels, for example whether particular evidentiary artifacts (transcripts) are based on randomized controlled trials (RCTs) addressing issues of “efficacy” or RCTs addressing issues of “safety and treatment-related hazards” (Graham et al., 2015, pp. 77-78). Choosing the longer, more complex paragraph as their unit of analysis requires the research method to depend upon “the inductive and qualitative nature of rhetorical analysis as a necessary precursor to both qualitative coding and statistical testing” (Graham et al., 2015, p. 77).

In the final section of their explanation of SGA, their research methodology, the authors summarize their statistical methods including both “descriptive statistics” and “inferential statistics” and how they applied these two types of statistical methods, respectively, to “provide a quantitative representation of the data set” (e.g. “mean, median, and standard deviation”) and to “estimate the relationship between variables” (e.g. “statistically significant impacts”) (Graham et al., 2015, pp. 81-83).

Returning to the point of the authors’ research – namely demonstrating how SGA empowers scholars to provide confident answers to research questions and therefore to create and assert knowledge clearly valued by societal interests – their SGA enables them to state their “multiple regression analysis” found “RCT-efficacy data and conflict of interest remained as the only significant predictors of approval rates. Oddly, the use of efficacy data seems to lower the chance of approval, whereas a greater presence of conflict of interest increases the probability of approval” (Graham et al., 2015, p. 89). Obviously, this finding encourages entities aiming to increase the probability of approval to allocate resources toward increasing the presence of conflicts of interests since that is the only input variable demonstrated to contribute to achieving their aim. On the other hand, this finding provides evidence entities claiming conflicts of interests illegally (or at least undesirably) affect ODAC participants’ votes can use to bolster their arguments “stricter controls on conflicts of interests should be deployed (Graham et al., 2015, p. 92).

Leave a Reply Cancel reply