AB02 – Boyd, D., & Crawford, K. (2012). Critical questions for Big Data

As “social scientists and media studies scholars,” Boyd and Crawford (2012) consider it their responsibility to encourage and focus the public discussion regarding “Big Data” by asserting six claims they imply help define the many and important potential issues the “era of Big Data” has already presented to humanity and the diverse and competing interests that comprise it (Boyd & Crawford, 2012, pp. 662-663). Before asserting and explaining their claims, however, the authors define Big Data “as a cultural, technological, and scholarly phenomenon” that “is less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data sets,” a phenomenon that has three primary components (fields or forces) interacting within it: 1) technology, 2) analysis, and 3) mythology (Boyd & Crawford, 2012, p. 663). Precisely because Big Data, as well as some “other socio-technical phenomenon,” elicit both “utopian and dystopian rhetoric” and visions of the future of humanity, Boyd and Crawford think it is “necessary to ask critical questions” about “what all this data means, who gets access to what data, how data analysis is deployed, and to what ends” (Boyd & Crawford, 2012, p. 664).

The authors’ first two claims are concerned essentially with epistemological issues regarding the nature of knowledge and truth (Boyd & Crawford, 2012, pp. 665-667. In explaining their first claim, “1. Big Data changes the definition of knowledge,” the authors draw parallels between Big Data as a “system of knowledge” and “’Fordism’” as a “manufacturing system of mass production.” According to the authors, both of these systems influence peoples’ “understanding” in certain ways. Fordism “produced a new understanding of labor, the human relationship to work, and society at large.” And Big Data “is already changing the objects of knowledge” and suggesting new concepts that may “inform how we understand human networks and community” (Boyd & Crawford, 2012, p. 665). In addition, the authors cite Burkholder, Latour, and others in describing how Big Data refers not only to the quantity of data, but also to the “tools and procedures” that enable people to process and analyze “large data sets,” and to the general “computational turn in thought and research” that accompanies these new instruments and methods (Boyd & Crawford, 2012, p. 665). In addition, the authors state “Big Data reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and categorization of reality” (Boyd & Crawford, 2012, p. 665). Finally, as counterpoint to the many potential benefits and positive aspects of Big Data they have emphasized thus far, the authors cite Anderson as one who has revealed the at times prejudicial and arrogant beliefs and attitudes of some quantitative proponents who summarily dismiss all qualitative or humanistic approaches to gathering evidence and formulating theories (Boyd & Crawford, 2012, pp. 665-666) as inferior.

In explaining their second claim, “2. Claims to objectivity and accuracy are misleading,” the authors continue considering some of the biases and misconceptions inherent in epistemologies that privilege “quantitative science and objective method” as the paths to knowledge and absolute truth. According to the authors, Big Data “is still subjective” and even when research subjects or variables are quantified, those quantifications do “not necessarily have a closer claim on objective truth.” In the view of the authors, the obsession of social science and the “humanistic disciplines” with attaining “the status of quantitative science and objective method” is at least to some extent misdirected (Boyd & Crawford, 2012, pp. 666-667), even if understandable given the apparent value society assigns to quantitative evidence. Citing Gitelman and Bollier, among others, the authors believe “all researchers are interpreters of data” not only when they draw conclusions based on their research findings, but also when they design their research and decide what will – and what will not – be measured. Overall, the authors argue against too eagerly embracing the positivistic perspective on knowledge and truth and argue in favor of critically examining research philosophies and methods and considering the limitations inherent within them (Boyd & Crawford, 2012, pp. 667-668).

The third and fourth claims the authors make could be considered to address research quality. Their third claim, “3. Big data are not always better data,” emphasizes the importance of quality control in research and highlights how “understanding sample, for example, is more important than ever.” Since “the public discourse around” massive and easily collected data streams such as Twitter “tends to focus on the raw number of tweets available” and since “raw numbers” would not be a “representative sample” of most populations about which researchers seek to make claims, public perceptions and opinion could be skewed by either mainstream media’s misleading reporting about valid research or by unprofessional researchers’ erroneous claims based upon invalid research methods and evidence (Boyd & Crawford, 2012, pp. 668-669). In addition to these issues of research design, the authors highlight how additional “methodological challenges” can arise “when researchers combine multiple large data sets,” challenges involving “not only the limits of the data set, but also the limits of which questions they can ask of a data set and what interpretations are appropriate” (Boyd & Crawford, 2012, pp. 669-670).

The authors fourth claim continues addressing research quality, but at the broader level of context. Their fourth claim, “4. Taken out of context, Big Data loses its meaning,” emphasizes the importance of considering how the research context affects research methods and research findings and conclusions. The authors imply attitudes toward mathematical modeling and data collection methods may cause researchers to select data more for their suitability to large-scale, computational, automated, quantitative data collection and analysis than for their suitability to discovering patterns or to answering research questions. As an example, the authors consider the evolution of the concept of human networks in sociology and focus on different ways of measuring “‘tie strength,’” a concept understood by many sociologists to indicate “the importance of individual relationships” (Boyd & Crawford, 2013, p. 670). Although recently developed concepts such as “articulated networks” and “behavioral networks” may appear at times to indicate tie strength equivalent to more traditional concepts such as “kinship networks,” the authors explain how the tie strength of kinship networks is based on more in-depth, context-sensitive data collection such as “surveys, interviews” and even “observation,” while the tie strength of articulated networks or behavioral networks may rely on nothing more than interaction frequency analysis; and “measuring tie strength through frequency or public articulation is a common mistake” (Boyd & Crawford, 2013, p. 671). In general, the authors urge caution against considering Big Data the panacea that will objectively and definitively answer all research questions. In their view, “the size of the data should fit the research question being asked; in some cases, small is best” (Boyd & Crawford, 2012, p. 670).

The authors’ final two claims address ethical issues related to Big Data, some of which seem to have arisen in parallel with its ascent. In their fifth claim, “5. Just because it is accessible does not make it ethical,” the authors focus primarily on whether “social media users” implicitly give permission to anyone to use publicly available data related to the user in all contexts, even contexts the user may not have imagined, such as in research studies or in the collectors’ data or information products and services (Boyd & Crawford, 2012, pp. 672-673). Citing Ess and others, the authors emphasize researchers and scholars have “accountability” for their actions, including those actions related to “the serious issues involved in the ethics of online data collections and analysis.” The authors encourage researchers and scholars to consider privacy issues and to proactively assess whether they should assume users have provided “informed consent” for the researchers to collect and analyze users’ publicly available data just because the data is publicly available” (Boyd & Crawford, 2013, pp. 672-673). In their sixth claim, “6. Limited access to Big Data creates new digital divides,” the authors note that although there is a prevalent perception Big Data “offers easy access to massive amounts to data,” the reality is access to Big Data and the ability to manage and analyze Big Data require resources unavailable to much of the population – and this “creates a new kind of digital divide: the Big Data rich and the Big Data poor” (Boyd & Crawford, 2013, pp. 673-674). “Whenever inequalities are explicitly written into the system,” the authors assert further, “they produce class-based structures (Boyd & Crawford, 2012, p. 675)

In their article overall, Boyd & Crawford maintain an optimistic tone while enumerating the many and myriad issues emanating from the phenomenon Big Data. In concluding, the authors encourage scholars, researchers, and society to “start questioning the underlying assumptions, values, and biases of this new wave of research” (Boyd & Crawford, 2012, p. 675).

Leave a Reply Cancel reply