I was grappling for a long time about how to approach numbers, statistics, patterns in my Wikipedia research. I’m an ethnographer, right, we’re supposed to be averse to using numbers. Right?
And then Rachelle sent me this really interesting piece by Ken Anderson and the crew from Intel’s People and Practices Research group called ‘Numbers Have Qualities Too: Experiences with Ethno-Mining’ (Ken Anderson et al, 2009) [PDF].
And I realised that there is no problem with numbers and statistics per se. The problem is when we use numbers divorced from the context in which they are extracted. The problem comes when we use numbers to speak for a community, rather than enabling the community to speak to the numbers.
In the paper, Ken Anderson, Dawn Nafus, Tye Rattenbury and Ryan Aipperspach introduce the concept of “ethno-mining” as a way of joining database mining and ethnography, and explain how they used sensing and behavioral tracking technologies to invite conversations with research participants and within the corporation as well. According to Anderson et al, ethno-mining combines the semi-automated collection and analysis of behavioral data with the collection and analysis of qualitative data in an open and iterative analytical framework, relying heavily on shared artifacts, in our case data visualizations, for the co-construction of meaning.
They write that ethno-mining is a hybrid rather than a “mixed method” – in the sense that the two elements (data mining and qualitative interviews) cannot be separated and that ‘Numbers are treated in ways that surface qualities in addition to quantities.’ They ensured this by creating visualizations that were evocative (so complex that they required explanation), leaving data in their rawest and most complete state – avoiding summaries, transformations and “cleaning up” data so that it would remain foreign to both participants and researchers.
Anderson et al write that their ethno-mining techniques grew from the very practical restrictions of doing long-term participant observation at Intel. Using tools to track how participants interacted with personal computing devices (laptops, smart phones etc) as well as other methods to collect data on behaviors that were not tracked and meanings associated with that data, the researchers believe that they were able to garner a set of clues about what transpired between their visits. They also noted how these tools enabled them to overcome some of the limits of participant observation – tracking the use of laptops in bed, not joining them while they used them in bed (ha!).
Looking for patterns in the traces we leave behind as we interact and experience online spaces (see Geiger and Ribes on ‘Trace Ethnography’) can be really helpful in place of the participant observation that we can do in physical spaces. And this is even more helpful when we can use the data to ask people to interpret it, asking questions like:
What were you thinking/experiencing here? How do you understand what is happening here? What do these patterns tell us about how you and your community do things? Do you think this means x. Or does it mean y? Or does it mean something totally different?
So I’ve been trying this method out in my Wikipedia research starting with a small experiment at the Wikimedia Kenya meetup last Saturday where I asked participants to comment on the patterns represented in the maps that Mark Graham has recently made to represent places on Swahili Wikipedia.
One of the three bureaucrats of Swahili Wikipedia was at the meetup and responded to the concentration of articles about places in Turkey:
Actually you will find that all these places in Turkey are on most language Wikipedias because there is a guy who specialises on Turkish geography stubs and he puts them into all the Wikipedias. So he entered them into the Swahili WP and that’s the result. I think that this is a very telling map. The moment you have someone who doesn’t care about which language he publishes in but he cares about a subject deeply – say, Turkish geographical locations…. Turkish places of settlement – he puts that into all the different languages he can find. I mean, you don’t need to understand a language very well to say ‘x is a village in such a such a district’… And that’s the result.
Babatabita’s comments parallel Graham’s explanation of ‘a few dedicated editors creating stub articles about relatively structured topics such as cities in Turkey (in the Swahili Wikipedia) or every county in the US state of Georgia (in the Arabic Wikipedia)’ but hearing Babatabita talk about this single person highlighted for me the impact that a single person – even someone who doesn’t speak the language – can have on a small encyclopaedia like Swahili Wikipedia.
The map is not ideal as a conversation point. I definitely felt that the conversation we had could have been richer if we had had more of the raw, incomplete data that Anderson et al say is required to elicit conversations about it, but I really enjoyed the fact that this map, in some small way, was being spoken to by (a small portion of) those whom it represented. It was great to be able to share how others see the artefact that these Wikipedians are helping to create, rather than it being shared predominantly outside of the community from which it was derived. It was also wonderful to be able to share Graham’s message that he had sent to me on Twitter after I had told him that I was going to share it at the meetup:
Thanks!! Please ask them if there is anything they’d like to see mapped that would help them
Now that is what I call the start of a data conversation!