Part I: Questions
Research is hard to do. Much of it is left to the specialists who carry on in school 4-10 more years after completing a first degree to acquire the proper training. It’s not only hard to do, it’s also hard to read and understand and extrapolate from. Mass media coverage of science and social research is rife with misinterpretations – overgeneralizations, glossing over research limitations, failing to adequately consider the characteristics of subject populations. Does more data or “big data” in any way, shape, or form alter this state of affairs? Is it the case, as Wired magazine (provocatively…arrogantly…and ignorantly) suggests that “the data deluge makes the scientific method obsolete” and “with enough data, the numbers speak for themselves?”
Being an ethnographer makes me more of a “small data” person. It seems counter-intuitive at first, but I find there are good, sound reasons to sometimes forgo the opportunity to collect more data. This gets to ever present questions about how much is sufficient when doing qualitative or, more specifically, ethnographic research (i.e. how many people to interview? how many months to spend in the field? etc). I find memory limits are an important bounding factor. Can I remember key points from each interview, distinctive elements of that individual’s story? Can I recall the setting and some of the things I observed there? Reading a transcript or my field notes, can I put myself back in that time and place? To have good recall and mastery of your data helps you to move through it with agility and to draw the kinds of surprising thematic connections across data that make ethnographic work, at times, profound. As much as qualitative data analysis software (like NVivo or Atlas TI) aids in rediscovering whats in your data through coding and keyword search, I find the flexibility of my brain is indispensable in drawing connections that no search algorithm would make. If I have all the data for a project more or less sketchily outlined in my memory then if I don’t recall something exactly, at least I know where to look. Otherwise it is too easy to draw haphazardly and selectively from data. It is easy to overlook counter-examples, contradictions, and challenges to my emerging claims if the mountain of data becomes too tall.
Leading into a data science conference my department (DataEdge) is hosting this week, I want to list some questions that I (and maybe other ‘small data’ people) have about the big data / data analytics trend. These questions arise from my ethnographic orientation and an interest and history in applied research. For me, they are the following:
- What do researchers consider the most compelling examples, the ‘showcase’ applications of big data that involve study of the social world and social behavior?
- To what end is such a research approach being put? What actions are being taken on the basis of findings from ‘big data’ analysis?
- The data analytics discussion appears to be US-centric debate … how well are researchers grappling with the analysis of ‘big data’ when dealing with data collected from across heterogeneous, international populations?
- How do ‘big data’ analysts connect data on behavior to the meaning/intent underlying that behavior? How do they avoid (or how do they think they can avoid) getting this wrong?
- How might the analysis of ‘big data’ complement projects that are primarily ethnographic?
For good measure, a couple of interesting, probing takes on big data:
- Genevieve Bell on ‘big data as a person‘
- danah boyd and Kate Crawford – Six Provocations for Big Data
Following the DataEdge conference, I will try to address some of these questions and offer some answers through a conference recap.
___________________________________________________________________
Read the rest of the posts in the “The Ethnographer’s Complete Guide to Big Data” series:
The Ethnographer’s Complete Guide to Big Data: Answers (part 2 of 3)
The Ethnographer’s Complete Guide to Big Data: Conclusions (part 3 of 3)
Sometimes I wonder whether the material generated by some ethnographic research is even possible to categorize as small data or big data. It seems like trying to — pardon the cliche — fit a square peg in a round hole. It reminds me of literature on art about the difference between an analogue photograph and its digital counterpart.
Certainly, we can make data quantifiable. Then big data folks can argue about how much data is need to cross the threshold from a lot of data points to “big data” (based on number of terabytes, number of physical machines required to store data, process data, etc).
Perhaps the counterpart in ethnographic work is the question of how much time in the field is “enough” (a tiring argument!). How much data is represented by a collection of fieldnotes? Or are fieldnotes the data and then easily quantified again by the size of the file (“small data”). Doesn’t the material elicited from fieldnotes continually change in some ways? Is a year in the field really “small data”?
Anyhow, pardon the rambling. I am looking forward to your notes following the conference. I have many of the same questions you ask here and am curious to see what answers you find.