As promised here is the final installment of my short series about ‘big data.’ I started out by declaring myself a ‘small data’ person. My intention was to be a bit provocative by suggesting that forgoing or limiting data collection might sometimes be a legitimate or even laudable choice. That contrast was perhaps overdrawn. It seemed to suggest that ‘big data’ and ethnographic approaches were at the opposing ends of some continuum. ‘How much’ is not necessarily a very interesting or relevant question for an ethnographer, but who among us hasn’t done some counting and declared some quantity (1000s of pages of notes, hundreds of days in the field, hours of audio or video recordings) that is meant to impress, to indicate thoroughness, depth, effort, and seriousness?
So the game of numbers is one we all probably play from time to time.
Now to answer my few remaining questions:
1) How might big data be part of projects that are primarily ethnographic in approach?
My first exposure to ‘big data’ came from a student who managed to gain access to a truly massive collection of CDR (call detail record) data from a phone network in Rwanda. Josh Blumenstock was able to combine CDR data with results from a survey he designed and carried out with a research team in Rwanda to gain insights into the demographics of phone owners, within country migration patterns, and reciprocity and risk management. I was terribly excited by the possibilities of what could be found in that kind of data since I had been examining mobile phone ownership and gifting in nearby Uganda. I wondered how larger patterns in the data might reflect (or raise questions) about what I was coming to see at the micro-level about phone ownership and sharing, especially its gendered dimensions. Indeed Josh’s work showed a strong gender skew in ownership with far more men than women owning phones and women phone owners more affluent and well-educated. My work explained the marital and other family dynamics that put far fewer phones into the hands of women than men.
However, combining these two approaches is more a standard mixed methods approach than anything new. Is something more innovative than that possible?
One really interesting answer I found in this piece: “Numbers Have Qualities Too: Experiences with Ethno-Mining” by Ken Anderson, Dawn Nafus, and Tye Rattenbury (Intel) and Ryan Aipperspach (GoodGuide.com) from the Ethnographic Praxis in Industry Conference, 2009. In the project, the researchers created a data visualization of computing device usage that showed intensity of use and was color coded by time of day. The researchers presented research participants with this visualization of their own behavioral data. They were invited to interpret and discuss this compressed visual representation of their activities. This is a really intriguing and novel way to overcome the challenge of bringing together detailed and documented behavior and the meaning and intent underlying this behavior by using a kind of projective interviewing technique.
Another question I posed at the beginning of the series and now can attempt to answer.
2) What do people consider to be the compelling applications of big data?
Based on what I heard at the DataEdge conference, the ‘vision’ and sense of possibility often seemed to overwhelm the actual concrete applications in the big data/data analytics space. But here are just a few of the emerging tools and applications: The massive accumulation of Google search terms lends to some interesting ways to relate search patterns in time, to location, or even to events like flu epidemics. This can provide opportunities to improve the precision of economic models. You can play around with this at Google Correlate. There was much talk about Hadoop and other tools / services for processing super-large data sets (Hadoop, Hadapt, Splunk, Cloudera). Captricity offered a very interesting tool with a somewhat different story of origins. This start-up originated in research in Tanzania health clinics and offers a way to digitize large back logs of paper records transforming them into structured data.
On the whole there was a distinctive emphasis on commercial opportunities and especially the customization of marketing messages, making ads more effective. How do people move through the airport and what is our best bet for capturing their attention? A company called Path Intelligence that offers technology for pedestrian path measurement was mentioned. There was a suggestion about using big data for farming, to predict weather and yields. Another proposal to use data from criminal records to predict crime. Though on this last example especially, my thoughts turn to questions about the quality and completeness of records, something ethnographers and particularly ethnomethodologists have some insight into (see Garfinkel’s ‘good organizational reasons for bad clinical records’). Criminal records aren’t automatically collected, but are the product of a clerical process. In genearl ‘dirty data’ is an acknowledged threat in ‘big data’ circles and no apparent solution has yet been offered.
As is probably apparent, better predictive power and more effective marketing feature prominently in this list of applications. While some aspects of the ‘big data’ trend are new, the underlying dream seems to be an old one – to better anticipate what is coming in the future, to chase away uncertainty and ambiguity.
Read the rest of the posts in the “The Ethnographer’s Complete Guide to Big Data” series:
The Ethnographer’s Complete Guide to Big Data: Small Data People in a Big Data World (part 1 of 3)
The Ethnographer’s Complete Guide to Big Data: Answers (part 2 of 3)