Editor’s Note: While digital ethnography is an established field within ethnography, we don’t often hear of ethnographers building digital tools to conduct their fieldwork. Wendy Hsu wants to change that. In the first of her three-part guest post series, she shows how ethnographers can use software, and even build their own software, to explore online communities. By drawing on examples from her own research on independent rock musicians, she shares with us how she moved from being an ethnographer of purely physical domains to an ethnographer who built software programs to gather more relevant qualitative data.
Wendy is currently a Mellon Digital Scholarship Postdoctoral Fellow in the Center of Digital Learning + Research at Occidental College. She recently completed a Ph.D. in the Critical and Comparative Studies program in the McIntire Department of Music at the University of Virginia. Her dissertation, an ethnography of Asian American independent rock musicians, deploys the methods of ethnomusicology and digital humanities to explore the complex interrelationships between popular music and geography in transnational contexts. She implemented methods of digital ethnography to map musicians’ social networks. She tweets at @wendyfhsu and blogs at beingwendyhsu.info. She also plays with the vintage Asian garage pop revivalist band Dzian!.
When Tricia asked me to contribute a series on Ethnography Matters, I thought that I would take this opportunity to bring together the notes on digital ethnography that I have collected over the last couple of years. I would like to push the boundaries of computational usage in ethnographic processes a bit here. I really want to expand the definition of digital ethnography beyond the use of computers, tablets, and smart phones as devices to interact with online communities, or to capture, transfer, and store field media.
In this three-part series, I am going to discuss how working with computational tools could widen the scope of ethnographic work and deepen our practice. I will stay mostly within the domain of data gathering in this first post. In the second post, I will talk about the process of field data interpreting and visualizing; and the last post, I will focus on how the digital may transform ethnographic narrative and argumentation.
In this post, I’d like to foreground computational methodology in thinking about how we as ethnographers may deploy digital tools as we explore communities within and around digital infrastructures. I am particularly interested in how we use these tools to study communities that are digitally organized. How do we use and think about data ethnographically? How does one use computational tools to navigate in digital communities? What are the advantages of leveraging (small) data approaches in doing ethnographic work? While this post is focused on the study of digitally embedded communities, in my later posts, I will speak more broadly about how the digital may extend how we look at communities where face-to-face interactions are central.
From bars to Myspace
When I was writing my dissertation on the experiences of Asian American musicians playing independent rock music, I discovered that most of the musicians that I connected with spent more time online networking and promoting their music, than actually performing, rehearsing, and recording. This shifted the site of my investigation away from the strictly physical, i.e. in clubs, bars, basement parties, coffee shops where musicians hang out, to include sites of digital social media such as Myspace, Twitter, Facebook, G-chat, etc.
In particular, I noticed that Myspace (mid to late 2000s) was a hot spot for social interactions. The musicians in my study used Myspace to extend their peer and fan networks beyond the borders of the United States. Many of them have forged connections with bands who were geographically based in Asia. I began wondering, what do these online communities look like geographically? Where in the world are the Myspace friends of my musician-informants located?
Building a software tool to gather data
I set out to explore these bands’ digital social terrain beyond what Internet browsers display through leveraging software tools like web-scraping. Web-scraping refers a set of programmatic methods used to extract targeted information from web pages.[i] To extract location information displayed on Myspace profile pages, I created a web-scraper in the form of an Application Programming Interface (API). APIs are, by definition, a set of software components that act as an interface to communication across applications typically based in the web environment (a friendly explanation of API).
An example of an API is Twitter client (for example, Tweetbot, Tweetdeck). Instead of using Twitter via Twitter.com, developers have leveraged the robust Twitter API to make available apps for users to interact with Twitter via a computer, tablet, mobile or smart phone. In the case of my API, I had to build it from scratch through writing the commands in the Ruby scripting language. I used the Mechanize ruby gem to navigate the source code of a series of targeted Myspace pages.
I will take my work with the South Asian American punk band The Kominas as an example. During the period that I was web-scraping, The Kominas had close to 3,000 friends on Myspace. These were all Myspace users who had requested to become friends with The Kominas, or vice versa. My homebrewed API successfully crawled through the profile pages of 2,867 friends of The Kominas on Myspace and parsed the location-specific text in the source code of these pages. Because I planned on mapping these points, I scripted for the API to use the Geokit ruby gem to turn these friend locations into longitude and latitude coordinates.
This software tool allowed me to go beyond the textual and discursive dimensions of collecting field data, a path previously unexplored by academic online participant-observers. With the geographical information that I gathered from the API, I mapped out these friend locations. I will go deeper into this in my next post on data interpretation and visualization. But for now, I will say that having this set of data has allowed me to exact my empirical place-based findings in an ostensibly placeless digital environment. Not only that, it has enabled me to deepen my analysis. From these software findings, I generated further questions that are geographically focused and theoretically interesting around notions of space and space. Juxtaposing my findings from traditional (and physical) participant observation and software explorations, I discovered patterns of social behaviors and cultural meanings that I would not have had access to otherwise.
Discovering boundaries of software spaces
With this API, I was able to reach beyond the user- and consumer-end experiences of technology. Using a computational tool — a machine-based script that communicates with other machines — I was able to explore quite literally the software infrastructures in which my field interactions occur. This became apparent when my API broke while trying to scrape location information from the friends of The Hsu-nami, a New-Jersey-based progressive erhu-rock band that I followed, on Myspace China.
In troubleshooting, I found that Myspace is in fact not as global as it has promised to be. The Myspace user networks of all (of the available) countries in the world exist on a server located in U.S., with the exception of the users of Myspace China. Hosted on a server in China, Myspace China is positioned institutionally apart from the rest of the Myspace networks in “the world.” These institutional and social barriers are reinforced by the software barriers between Myspace China and Myspace (U.S.) where The Hsu-nami’s profile page is hosted.[ii] From this software “observation”, I have gathered enough evidence to argue that there is not one single cyber space, but rather multiple cyber spaces. The Internet is not one giant blob of space. There are borders and boundaries—software- and hardware-dependent—that bind and separate these cyber spaces. And in the case of the Hsu-nami, forging connections with friends in China potentially suggests that ethnic meanings from musical sound and perform may transcend software barriers.
Certain digital communities are more open to software approaches than others. The Myspace community, for instance, is much more closed than Twitter and Facebook. Last.FM, for instance, is built around an open technology that documents each user’s song selection and form a musical taste profile unique to the user. The records of users’ listening patterns are transferred or “scrobbled” to Last.fm’s database. Last.fm make these data available for builders to create APIs; for that reason, it has become a self-proclaimed “social music playground” on which curious programmers and designers to play with (mostly visual) patterns of music listening. One outstanding visualization project that is built upon the Last.fm API is the thesis project of Christopher Adjei and Nils Holland-Cunz. [Here’s a neat video documentation of the application they built]. Unfortunately, none of these studies were ethnographically informed. Their frameworks of analysis are restricted by the domain of software data, and are not integrated with interviews or interaction-based observations.
Ethnographers with programming skills – why not?
I should also mention that I got into thinking and doing things computationally through the back doors of digital humanities. In grad school, I worked with the awesome folks at the Scholars’ Lab, a hub that trains graduate students in to acquire software skills and digital humanities perspectives at the University of Virginia. At the Scholars’ Lab, under the guidance of humanities-friendly technologists (shoutout to Joe Gilbert!), I learned basic programming just enough to execute what I had envisioned.
Within the community of academic ethnographers, unfortunately, I have not encountered much of any discussion on computational tools like APIs or web extraction. I have seen scholars in computer science, communications, and social sciences apply similar computational methods such as web-crawling in their works.[iii] But I welcome the opportunity of meeting other software-oriented ethnographers or engage in the conversations with those with interests in computational methodology.
Everyone’s doing it, why aren’t we?
It’s worth mentioning that I do traditional field work, capturing performances on my digital audio recording, taking field notes on Twitter [and Storify], interviewing musicians in coffee shops, setting up shows for them, and sharing a stage with them. But with basic computational know-how, both applied and critical, I have had the opportunity to think wildly about what a mixed-method or “multimodal” ethnography means to me. The technology of web extraction, as I illustrated above, has enabled me to accomplish the following:
- effectively gather relevant data in digital communities
- reveal the space and boundaries created by software infrastructures
- recontextualize findings from traditional field methods – in my case, in geographic terms
- illuminate how physical/geographic intersects with digital
Integrating physical and software field practices has satisfied my thirst as someone who is curious about our contemporary society as it is organized by various digital infrastructures.
Zooming back out a bit, it is not hard to see the relevance of web scraping and other forms of web extraction in the media and tech industry. In fact, data-mining is a pervasive practice for acquiring marketing research data. There are bots everywhere. Everyone knows that Amazon stores our browsing patterns and that user information in social media regularly gets mined as marketing research analytics.
In light of these pervasive data practices, we as ethnographers should think how we too can leverage these technologies to better understand the infrastructural context, thus closing our knowledge gap between (the cultural and social) content and the (technical and institutional) context of our scrutiny.
In my next post, I will talk about how how digital tools could facilitate data interpretation and examination. I’ll focus on mapping as a method to discover and document patterns of place-based observations. Then I will discuss how we can take advantage of the digital capacity to zoom in and out on content so we could deepen our sensory engagement with physical ethnographic materials.
[i] The term ‘web-crawling’ is sometimes used synonymously with web-scraping. Typically crawling refers to the technology of extraction all information on the web, similar to the technology of Google search engines. And web-scraping refers to the extraction of specific online information.
[ii] The software disconnection between China and the United States (and the rest of the world) on Myspace is maybe a product of the financial and political relationship between the countries. In order to follow up this inquiry, one could search news stories about company structure and changes of Myspace. For more detail, read David Barboza’s article “Murdoch Is Taking MySpace to China”, April 27, 2007. http://www.nytimes.com/2007/04/27/business/worldbusiness/27myspace.html (accessed on January 13, 2011).
[iii] More on web-crawling as a social science method, read Halavais, A. (2000). National Borders on the World Wide Web. New Media & Society, 2(7), 7-28. doi: 10.1177; Lin, J., Halavais, A., & Zhang, B. Bin. (2007). The Blog Network in America: Blogs as Indicators of Relationships among US Cities. Connections, 27(2),15-23. Retrieved from http://www.insna.org/Connections-Web/Volume27-2/Lin.pdf