Rob Procter, Manchester e-Research Centre, University of Manchester
Recently, we have seen the emergence of collaborative online efforts by vast numbers of committed individuals to produce coverage competing with the mainstream media. The wide availability of web-based platforms and access devices – now including mobile phones and tablets – has given such initiatives unprecedented reach. What is more, the multiplying potential of social media can transform the impact that an individual has if their actions are responded to by many.
A platform that has become well known for its potential to amplify an individual message is Twitter, a service that allows people to post short (140 character) messages or ‘tweets’. A tweet is said to go ‘viral’ when it is taken up, ‘retweeted’ and referred to by many. In crisis situations, the rapid spread of information through social media like Twitter can outpace traditional media and coverage on social media platforms can challenge more mainstream accounts of events (for better or for worse) .
An example of this is the role that Twitter played during the August 2011 riots in English cities. We have had the opportunity to analyse about 2.5 million tweets collected during the period of these events. Our findings show that, contrary to claims made in some initial reactions, Twitter was much less a vehicle for inciting violence but much more a mechanism that allowed reactions to the events to be formed .
In particular, Twitter has been used extensively to create what became known as the ‘riots cleanup’ movement. Information about this growing movement spread through a combination of hashtags (#riotcleanup, initiated by @sophontrack on 8th August), the establishment of specific accounts on Twitter (@riotcleanup) and websites (riotcleanup.co.uk) as well as the retweeting of messages by Twitterati with high follower counts (such as @piersmorgan or @simonpegg). Looking at just the 10 top ’information flows’ , we can see that these tweets were retweeted more than 30000 times (see Figure 1). The most retweeted message was sent by Simon Pegg and contained a pointer to the riotcleanup.co.uk website. It is followed by Piers Morgan posting a link to a picture of people holding their brooms in the air, which was originally posted by @lawcol888. These messages are followed by calls from @riotcleanup to spread information about its existence as a source of information about cleanup events as well as about the #riotcleanup hashtag. Some other tweets carry humorous messages but most contain information about cleanup events or general information and endorsements.
Figure 1: Top 10 most retweeted #riotcleanup tweets
Data and Knowledge Claims
It is impossible in retrospect to estimate the true reach of these messages. On the one hand, the groups of followers of the main sources will overlap, so simply added up the numbers will not work and the resulting count of 7 million will be an over-estimate. Furthermore, we do not know whether the tweets will actually be read and noticed by all the followers. On the other hand, the use of hashtags and the public availability of tweets mean that people not following the sources can find the tweets and the messages contained in the tweets may well have been further disseminated through other means of communication and through mainstream media. The number of retweets does provide a strong indication that these tweets have managed to carry their message to tens of thousands of people who have taken the time to further spread them.
The data we have access to allows us to analyse the flow of information within Twitter about the riots and the subsequent cleanup movement. However, on the basis of the Twitter data alone it is impossible to say with confidence what the impact of these messages was on real-world events. This limitation on the knowledge claims that can be made lies in the nature of the source of the data. We faced other problems that had more to do with the fact that we were working with a fixed corpus of data that was compiled in retrospect and not immediately for research purposes. This includes the lack of profile information in the initial corpus, which was subsequently amended. Similarly, we had to calculate information about tweet/retweet relationships from distance measures applied to the tweet content. This limitation exists not only because of the limited data made available to us but because not all Twitter clients record the source tweet when a user retweets information. Some of these problems can be overcome if tweets were collected in real-time through the Twitter Streaming API or through service providers such as DataSift or Gnip. These sources provide access to a much richer set of contextual data but are not without their own problems, especially when it comes to the interests of researchers wishing to access larger corpora of tweets .
Tweets as Big Data
Even though each tweet carries relatively little information in its 140 characters, data streamed off the Twitter Streaming API contains a rich set of contextual information such as user profile information, a timestamp, possibly the location (if the user device supports this), the tweet id of the source a retweet is from (if the user client supports this) and other information. For example, a set of 540k tweets gathered on the topic of the launch of the new iPad averaged about 2400 bytes per tweet. Given that some events that attract worldwide attention can lead to Twitter rates in excess of 12k tweets per second, this equates to an incoming datastream of about 27MB/s. A normal recordable DVD would store a mere three minutes worth of such data.
Depending on their specific interests, researchers or other professionals may only be interested in a fraction of such data or may be able to work with a random sample. On the other hand, they may be interested in much longer periods in time, e.g., when tracking an election campaign. In any case, it is likely that they will produce corpora of data larger than what can be handled using normal desktop computers and tools.
Researchers will require tools to ingest data from the source, filter and store it and feed it into an analyse workflow making use of computational tools such as language and sentiment detection, clustering and classification as well as social network analysis, to name a few examples. We are currently working on a cloud- based Twitter Analysis Workbench that will make such functionality available as a service, making use of BigData technologies such as no-SQL databases, lightweight messaging middleware to produce real-time analysis workflows and batch processing through a MapReduce model.
1. Procter, R., Vis, F., and Voss, A., (2012, in preparation). Reading the Riots on Twitter: methodological innovation for the analysis of big data on Twitter. To be submitted to the International Journal of Social Research Methodology, special issue on Computational Social Science: Research Strategies, Design & Methods.
2. Lewis P, Newburn T, et al. (2011). Reading the Riots: Investigating England's summer of disorder. http://www.guardian.co.uk/uk/interactive/2011/dec/14/reading-the-riots-investigating-england-s-summer-of-disorder-full-report
3. Lotan, G., Graeff, E., Ananny, M., Gaffney, D., Pearce, I., Boyd, D. (2011). The Revolutions Were Tweeted: Information Flows During the 2011 Tunisian and Egyptian Revolutions, International Journal of Communication (5) Feature: 1375-1405.
4. Bruns, A. How Long is a Tweet? Mapping Dynamic Conversation Networks on Twitter Using Gawk and Gehpi. Information, Communication & Society forthcoming, available online at:http://www.tandfonline.com/doi/abs/10.1080/1369118X.2011.635214