In the next post for ‘The Person in the (Big) Data‘ edition, Chris Birchall @birchallchris talks us through a variety of methods – big, small and mixed – that he used to study citizenship in the UK. Using some of the dominant tools for studying large data sources in one part of the study, Chris realised that the tools used had a significant impact on what can be (and is being) discovered and that this is quite different from the findings reached by deeper, mixed methods analysis. In this post, Chris asks important questions about whether big data research tools are creating some the conditions of citizenship today and what, exactly, deeper, more nuanced analysis can tell us.
People talk about politics online in many different ways and for many different purposes. The way that researchers analyse and understand such conversation can influence the way that we depict public political opinion and citizenship. In two recent projects I investigated the nature of this conversation and the forces that influence it, as well as the networks, spaces and resources that link that talk to political action. In doing so, I encountered a methodological rift in which careful, manual, time consuming approaches produce different types of conclusions from the big data driven approaches that are widespread in the commercial social media analytics industry. Both of these approaches could be framed as an illustration of human behaviour on the internet, but their differences show that the way that we embrace big data or digital methods influences the understanding of digital publics and citizenship that we gain from the translation of mass online data.
My recently submitted PhD study investigated online public political conversation in the UK. Drawing on the work of previous scholars who have focussed on the deliberative online public sphere (such as Coleman and Gotze, 2001; Coleman and Moss, 2012; Mutz, 2006; Wright and Street, 2007; Graham, 2012), the study acknowledged the importance of interpersonal exchange between participants and exposure to diverse and opposing viewpoints in the formation of preferences and informed opinion. My initial motivation was to ask how interface design might influence people as they talk about politics in online spaces, but this required an examination of the more human, less technologically determinate factors that are also, and often more significantly, involved in political expression.
Over the course of the study it became obvious that the methodology used to investigate these concepts influences the insight obtained; something that many researchers have discussed in the context of digital methods within social science (Baym, 2013; Boyd and Crawford, 2012; Clough et al., 2015; Gitelman and Jackson, 2013; Kitchin and Lauriault, 2014; Kitchin, 2014; Manovich, 2011; Van Dijck, 2014). Technologically mediated questions can be answered through technology-centric methods to give technologically focussed answers, while questions involving human nature, motivation and interaction can be answered by qualitative, human-centred methods in order to provide human-centred answers. These approaches represent the divide between the large scale, quantitative analysis of big data methods and small scale qualitative approaches. In order to address this issue, I employed a methodology which was designed to combine these approaches through directed iterations of analysis that was initially large scale and quantitative, but increasingly small scale and qualitative.
Understanding the origins, nature and development of emergent online conversation in the UK required an initially very large scale view, situating it within the big data paradigm. Therefore I created bespoke software for the collection and automated analysis of conversation data. Text was scraped from web pages but data was also harvested about the connections between contributions including the replies and quotations held in specific user interface structures, and the likes and recommends that accompanied them. These data were then translated into metrics for connectedness and quantitative dominance – measures that provided broad headline figures for different online spaces, which described in generalised terms the nature of the conversation occurring there.
However, manual evaluation of these metrics illustrated flaws within this method, as I found that details of conversations were often misrepresented by the interface structures that contained them. For example, replies were often present within the textual content of a message, while contributions marked up as replies were often just statements of opinion or preference. Nevertheless, while somewhat inaccurate, the generalisations achieved in this first step proved useful for the purpose of directing further analysis in the form of more qualitative investigation of the significant headline statistics, such as manual coding of samples, argument mapping and social network analysis. The resulting findings uncovered detailed patterns within conversations, including distinctive participatory models within particular spaces (two are illustrated below). Finally, surveys and interviews with participants provided the human-level understanding that was lacking from the previous two stages, allowing insight into why participation occurred in the way that it did.

Two conversation maps illustrating two very different modes of participation: on the left, a conversation in a forum is very connected with diverse opinion present – characterised by long chains of messages, agreements (green nodes) and disagreements (red nodes); on the right, a more disconnected conversation on Twitter, full of individual expressions with just a small proportion of interactions. [Image created by author, CC BY-NC-SA 3.0]
Used to stimulate further analysis in this way, big data methods can provide significant value for social research. However, this is far from the dominant method for understanding digital publics. Mainstream approaches are often dominated by commercial social media analytics tools providing data that is difficult, or impossible, to access as an independent researcher (such as Facebook Topic data, one of the hottest marketing developments of 2015 which provides anonymised, aggregated access to private personal contributions). These tools are used not only by companies to understand customers, but also increasingly by the media, NGOs and political institutions to understand contemporary forms of political expression, as described by Ceron et al. (2013) and Roginsky and Jeanne-Perrier (2014).
A subsequent study, investigating the interconnected nature of online participatory spaces and links between conversation and political action utilised these methods to access social media data relating to various recent political events. The vast majority of the resultant data consisted of social media posts in which links to spaces for political action – such as online petitions – were shared alongside links to related mainstream news media. These results could be interpreted to describe online citizens responding primarily to mainstream media, but this judgment would be based upon quantitative measurements that lack the nuance of the mixed methods approach described above, offering just a picture of what happened, without any real insight into why it happened. The study does describe a form of online citizenship, but also highlights some of the implications of reliance upon mainstream digital analytics tools.
Our understanding of digital citizenship, and interventions into this digital environment are increasingly dependent on such methods, partly due to the ‘black boxed’ nature of social media and our limited capacities to access certain aspects of it, but also due to the scale at which it exists which can render closer analysis impractical. Such methods are often exclusive (Baym, 2013; Boyd and Crawford, 2012; Kennedy et al., 2014), and generate crude quantitative knowledge, or what Gillespie called “calculated publics” (2014). Of course, all of these limitations can be accounted for during research, but big data and social media analytics are often touted as answering questions about the public, or citizens, on their own.
The citizenship demonstrated by such studies is a digitally mediated construction of mainstream culture, expressed through everyday digital media production and consumption, rather than an accurate picture of the meaning of citizenship as it is experienced by people. It is not the whole picture, but it is the picture readily painted through algorithmic and analytical means. This calculated citizenship feeds back into the political process by providing an accessible and credible illustration of public opinion, reaction and preference. Technology is not necessarily causing a change to citizenship, but has rather catalysed changes in our methods of understanding citizenship within the social and political ecology of the digital world. As Clough et al. discuss, big data methods are not simply a new way of generalising across populations on a larger scale, but are actually a methodological embodiment of an emergent conception of sociality (2015).
References:
Baym, N.K. 2013. Data Not Seen: the uses and shortcomings of social media metrics. First Monday. 18(10).
Benkler, Y., Roberts, H., Faris, R., Solow-Niederman, A. and Etling, B. 2015. Social Mobilization and the Networked Public Sphere: Mapping the SOPA-PIPA Debate. Political Communication. 32(4),pp.594–624.
boyd, danah m and Crawford, K. 2012. Critical questions for big data – Provocations for a cultural, technological, and scholarly phenomenon. Informacios Tarsadalom. 12(2).
Ceron, A., Curini, L., Iacus, S.M. and Porro, G. 2013. Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France. New Media & Society. 16(2),pp.340–358.
Clough, P., Gregory, K., Haber, B. and Scannel, R.J. 2015. The Datalogical Turn. Unpublished Article. Academia.edu.,pp.1–26.
Coleman, S. and Gotze, J. 2001. Bowling together : online public engagement in policy deliberation. London: Hansard Society.
Coleman, S. and Moss, G.S. 2012. Under Construction: the Field of Online Deliberation Research. Journal of Information Technology & Politics. 9(1),pp.1–15.
Van Dijck, J. 2014. Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society. 12(2),pp.197–208.
Gillespie, T. 2014. The Relevance of Algorithms In: T. Gillespie, P. J. Boczkowski and K. A. Foot, eds. Media Technologies: Essays on Communication, Materiality, and Society. MIT Press.
Gitelman, L. and Jackson, V. 2013. Introduction, Raw Data is an Oxymoron. (Gitelman, ed.). Cambridge, MA: MIT.
Graham, T. 2012. Beyond ‘Political’ Communicative Spaces: Talking Politics on the Wife Swap Discussion Forum. Journal of Information Technology & Politics.
Kennedy, H., Moss, G.S., Birchall, C. and Moshonas, S. 2014. Balancing the potential and problems of digital methods through action research: methodological reflections. Information, Communication & Society. 18(2),pp.172–186.
Kitchin, R. 2014. Big Data, new epistemologies and paradigm shifts. Big Data & Society. 1(1).
Kitchin, R. and Lauriault, T.P. 2014. Small data in the era of big data. GeoJournal. 80(4),pp.463–475.
Manovich, L. 2011. Trending: the promises and the challenges of big social data. http://manovich.net/index.php/projects/trending-the-promises-and-the-challenges-of-big-social-data.
Mutz, D.C. 2006. Hearing the Other Side: Deliberative versus Participatory Democracy. New York: Cambridge University Press.
Roginsky, S. and Jeanne-Perrier, V. 2014. La fabrique de la communication des parlementaires européens: ‘Tweet ton député’ et les ‘ateliers du député 2.0’”. Politiques de communication. 1,pp.85–123.
Wright, S. and Street, J. 2007. Democracy, Deliberation and Design: The Case of Online Discussion Forums. New Media and Society. 9(5),pp.849–869.