The Internet, Policy & Politics Conferences

Oxford Internet Institute, University of Oxford

Margeret Hall, Simon Caton: A Crowdsourcing Approach to Identify Common Method Bias and Self-representation

Margeret Hall, Simon Caton; Karlsruhe Service Research Institute

Pertinent questions on the measurement of social indicators are: the verification of data gained online (e.g., controlling for self-representation on social networks), and appropriate uses in community management and policy-making. Across platforms like Facebook, LinkedIn, Twitter, and blogging services, users (sub)consciously represent themselves in a way which is appropriate for their intended audience (Qui et al., 2012; Zhao et al., 2008). However, scholars in the social sciences and computer science have not yet adequately addressed controlling for self-representation, or the propensity to display or censor oneself, in their analyses (Zhao et al., 2008; Das and Kramer, 2013). As such researchers on these platforms risk working with ‘gamified’, socially responding, or online disinhibitive (trolls) personas which goes above and beyond efforts to contain Common Method Biases (CMB) (Linville, 1985; Suler, 2004; Podsakoff et al., 2003). What has not been approached in a systematic way is the verification of such data on offline and actual personality. In this paper, we focus on the alignment of traditional survey methods with unobtrusive methods to gather profile data from online social media via crowdsourcing platforms.
Online gathered social data raises challenges for researchers aiming to unobtrusively apply publically accessible online data for generalizable social models. This research has two aims: 1. Establishing the relationship between offline and online personalities via survey responses and self-produced text, and 2. Mitigation of biases in both traditional survey methods and in publically sourced data. In response to these research aims, we hypothesize that self-representation can be identified, and thus eventually be controlled for in broad social models. Surveys are prone to rater and item effects (Podsakoff et al., 2003) and online data is susceptible to context effects. By first addressing and mitigating CMB, and then using this to isolate further personality enhancements and/or misrepresentations, we create a research method for the analysis and use of publically gathered data with minimal bias. The proposed alignment method attempts to address concerns of CMB in analysis of publically sourced data by investigating and addressing common rater effects, item characteristics effects, item context effects, and measurement context effects (Podsakoff et al., 2003) in a multi-method approach. Our research creates an estimation function to lessen the distortion created by CMB and self-representation. 
To facilitate our study, 509 Amazon Mechanical Turk workers completed psychometric surveys and questions on Facebook usage via a Facebook application, from which 469 completed the exercise. A PHP-based Facebook application simultaneously accessed the unique Facebook ID of participants and via Facebook’s open graph application programming interface (API) accessed participants’ Facebook timelines. Workers were given an option to opt out of the HIT at the stage where it linked to their Facebook profile. Privacy-aware users were able to hide their activities from the app. Participants’ IDs were one-way hashed, with profile, survey, and worker payment being tied to the hashed ID. The app extracted only posts, i.e., status updates, participants made to their timelines. Other post types such as shares, profile updates, etc. are excluded as they are not fully self-produced texts. This type of constraint can create first-order bias by potentially culling messages from the list of retrieved posts; as this study is not a network study, second order bias is not considered here (González-Bailón et al., 2014).
An initial screening question based on reading comprehension was employed in order to minimize ‘click-through’ behaviour (Berinsky et al., 2012). The crowdworkers’ results from these surveys indicate replication of reportings from (Huppert and So, 2011; John et al, 1991; Ewig, 2011). Workers self-reported current locations in six distinct geographic regions, with the bulk majority of workers reporting locations in North America and India. 73% of workers self-reported to be aged 35 or younger. Gender of the workers was evenly split between women and men, with one non-disclosure and one choice of ‘Other.’ 37% reported being currently unemployed and 57% completed at least a bachelor’s degree. Having collected this data, it is possible to escalate the unit of analysis when obvious incidents of CMB occur (Podsakoff and Organ, 1986), as separation of measurement is difficult to implement in crowdwork platforms.
Our survey analyses suggest construct reliability and convergence, with the Kaiser-Meyer-Olkin (KMO) measures for all constructs (personality, personal well-being, Facebook usage, demographics) ranging from 0.788 to 0.9. In the construct Facebook usage, a Principle Component Analysis indicated that two traits, “Do other people present themselves differently in online and offline settings?” (0.391) and “I can be more open online than in real life” (0.487) did not fulfil the KMO criterion of a 0.5 minimum value, and are therefore trimmed from the scale in accordance with (Podsakoff and Organ, 1986). In each construct analysis Bartlett's test of sphericity was statistically significant (p < .0005); Cronbach’s alpha tests of internal consistency showed values ranging from 0.668 - 0.841.
Whereas there is evidence of positive affectivity, as calculated by the trait Extraversion and its multiple highly significant relationships with personal well-being and Facebook usage, there is little supporting evidence of negative affectivity with the exception of the negatively correlated traits Neuroticism and Human Flourishing, a measurement of personal well-being. (r(467)= -0.250, p < .0005). A sentiment analysis conducted on the Facebook data indicates that the workers communicate their positive emotions more frequently via social posturing, where negative emotions in Facebook are hardly communicated, in line with the results of (Qui, et al., 2012). Sentiment analysis also shows that the relative frequency of positive and negative emotion shifts across the lifespan of a timeline, allowing for pattern establishment and estimation of transient mood states. By concentrating on a selection of sentiment categories known to correlate with deception, social processes, and mood, we construct and then cluster individual’s propensity to self-represent in their online social media persona (Buckels et al., 2014; Tausczik and Pennebaker, 2010; Newman, 2003; Yarkoni, 2010).
With personality and mood validated and a sentiment analysis performed on the lifespan of a user’s Facebook timeline, we can now measure the propensity of a user to portray themselves in opposition to their truthful, psychological baseline. The research contribution is accurate predictions of psychometric information from short informal text without the need to administer costly traditional personality surveys. We propose that policy-makers as well as community managers can apply this method to their analyses of publically sourced data in order to mitigate the effects of various phenomena, including trolling, social desirability, and acquiescent behaviours. Such an approach has diverse applications in policy and community management, in that it allows for a new, accurate measurement system from which to deduce from publically accessible text onto the general population.
1. Qiu, L., Lin, H., Leung, A.K., & Tov, W. (2012). Putting their best foot forward: emotional disclosure on Facebook. Cyberpsychology, Behavior and Social Networking, 15(10), 569–72. 
2. Zhao, S., Grasmuck, S., & Martin, J. (2008). Identity construction on Facebook: Digital empowerment in anchored relationships. Computers in Human Behavior, 24(5), 1816–1836. 
3. Das, S., & Kramer, A. (2013). Self-censorship on Facebook. Proc. of ICWSM 2013, 120-127.
4. Linville, P. W. (1985). Self-complexity and affective extremity: Don't put all of your eggs in one cognitive basket. Social Cognition, 3(1), 94-120.
5. Suler, J. (2004). The online disinhibition effect. Cyberpsychology & Behavior, 7(3), 321-326.
6. Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879.
7. González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. (2014). Assessing the bias in samples of large online networks. Social Networks, 38, 16-27.
8. Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon. com's Mechanical Turk. Political Analysis, 20(3), 351-368.
9. Huppert, F.A. & So, T.T.C. (2011). Flourishing Across Europe: Application of a New Conceptual Framework for Defining Well-Being. Social Indicators Research 110(3), 837–861. 
10. John, O.P., Donahue, E.M., & Kentle, R.L. (1991). The big five inventory—versions 4a and 54. Berkeley, CA.
11. Ewig, C., (2011) “Identität und Soziale Netzwerke – StudiVZ und Facebook” In Social Media: Theorie und Praxis digitaler Sozialität. Caja Thimm and Mario Anastasiadis (eds) Berlin: Peter Lang Internationaler Verlag der Wissenschaft.
12. Podsakoff, P. M., & Organ, D. W. (1986). Self-reports in organizational research: Problems and prospects. Journal of Management, 12(4), 531-544.
13. Buckels, E. E., Trapnell, P. D., & Paulhus, D. L. (2014). Trolls just want to have fun. Personality and Individual Differences.
14. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24-54.
15. Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, (29), 665-75.
16. Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality, 44(3), 363-373.


Margeret Hall, Simon Caton