An analysis of 1.2 million Reddit users found that around 16 per cent of people wrote toxic posts and 13 per cent wrote toxic comments, such as direct insults
Technology
18 August 2022
More than one in eight Reddit users publish toxic content, according to an analysis of more than 2 billion posts and comments on the social news aggregation platform.
Hind Almerekhi at Hamad Bin Khalifa University in Qatar and her colleagues gathered a data set of Reddit posts and their comments from between 2005 and 2020. They looked at any Reddit user who has posted on one of the 100 most popular subreddits – akin to forums on the site – as well as another subreddit. That filtering resulted in a total of 2.2 billion posts and comments from 1.2 million users across more than 100,000 subreddits.
To judge the toxicity of the comments, the researchers hired people through a crowdsourcing platform to manually label the toxicity level of a sample of 10,000 posts and comments. The team gave them very clear criteria on “what we consider highly toxic, slightly toxic and not toxic”, says Almerekhi. Each comment was assessed by at least three workers.
The resulting data set was used to train a neural network to categorise the toxicity of the remaining posts.
The algorithm found 2 per cent of posts and 6 per cent of comments were highly toxic. A further 7 per cent of posts and 11.5 per cent of comments were slightly toxic, with the remainder of posts and comments classed as not toxic. Highly toxic posts included direct insults and swear words, slightly toxic posts included milder insults (such as “hideous”), while not toxic posts contained neither.
Overall, around 16 per cent of people in the data set were responsible for toxic posts and 13 per cent for toxic comments. However, that behaviour could and did change depending on the community. Four in five people showed changes in the average amount of toxicity in their posts, depending on the subreddit they posted in.
Savvas Zannettou at the Delft University of Technology in the Netherlands says that the analysis only focuses on the mainstream side of Reddit. This means it’s likely to understate the impact that a user who visits a fringe web community – which could be more toxic – will have, he says.
A Reddit spokesperson told New Scientist: “The study in question confirms our own research and insights: that the vast majority of content on Reddit is healthy, and users tend to positively adjust their behaviour in accordance with community norms.” They added that some of the data was approaching 20 years old so does not encapsulate how Reddit’s policies on speech have changed.
Journal reference: PeerJ Computer Science, DOI: 10.7717/peerj-cs.1059
More on these topics: