Assessing the toxicity of Reddit feedback

Credit score: CC0 Public Area

New analysis, revealed in PeerJ Laptop Science, which analyzes over 87 million posts and a pair of.205 billion feedback on Reddit from greater than 1.2 million distinctive customers, examines modifications within the on-line habits of customers who publish in a number of communities on Reddit by measuring “toxicity.”

Person habits toxicity evaluation confirmed that 16.11% of customers publish poisonous posts, and 13.28% of customers publish poisonous feedback. 30.68% of customers publishing posts, and 81.67% of customers publishing feedback, exhibit modifications of their toxicity throughout totally different communities—or subreddits—indicating that customers adapt their habits to the communities’ norms.

The examine means that one method to restrict the unfold of toxicity is by limiting the communities by which customers can take part. The researchers discovered a optimistic correlation between the rise within the variety of communities and the rise in toxicity however can not assure that that is the one purpose behind the rise in poisonous content material.

Varied varieties of content material could be shared and revealed on social media platforms, enabling customers to speak with one another in numerous methods. The expansion of social media platforms has sadly led to an explosion of malicious content material akin to harassment, profanity, and cyberbullying. Varied causes could encourage customers of social media platforms to unfold dangerous content material. It has been proven that publishing poisonous content material (i.e., malicious habits) spreads—the malicious habits of non-malicious customers can affect non-malicious customers and make them misbehave, negatively impacting on-line communities.

“One problem with learning on-line toxicity is the multitude of types it takes, together with hate speech, harassment, and cyberbullying. Poisonous content material typically accommodates insults, threats, and offensive language, which, in flip, contaminate on-line platforms. A number of on-line platforms have carried out prevention mechanisms, however these efforts should not scalable sufficient to curtail the speedy development of poisonous content material on on-line platforms. These challenges name for creating efficient computerized or semiautomatic options to detect toxicity from a big stream of content material on on-line platforms,” say the authors, Ph.D. (ABD) Hind Almerekhi, Dr. Haewoon Kwak and Professor Bernard J. Jansen.

“Monitoring the change in customers’ toxicity could be an early detection technique for toxicity in on-line communities. The proposed methodology can determine when customers exhibit a change by calculating the toxicity proportion in posts and feedback. This alteration, mixed with the toxicity degree our system detects in customers’ posts, can be utilized effectively to cease toxicity dissemination.”

The analysis staff, with assistance from crowdsourcing, constructed a labeled dataset of 10,083 Reddit feedback, then used the dataset to coach and fine-tune a Bidirectional Encoder Representations from Transformers (BERT) neural community mannequin. The mannequin predicted the toxicity ranges of 87,376,912 posts from 577,835 customers and a pair of,205,581,786 feedback from 890,913 customers on Reddit over 16 years, from 2005 to 2020.

This examine utilized the toxicity ranges of consumer content material to determine toxicity modifications by the consumer inside the similar group, throughout a number of communities, and over time. For the toxicity detection efficiency, the fine-tuned BERT mannequin achieved a 91.27% classification accuracy and an Space Underneath the Receiver Working Attribute Curve (AUC) rating of 0.963 and outperformed a number of baseline machine studying and neural community fashions.

Extra data: Hind Almerekhi et al, Investigating toxicity modifications of cross-community redditors from 2 billion posts and feedback, PeerJ Laptop Science (2022). DOI: 10.7717/peerj-cs.1059

Quotation: Assessing the toxicity of Reddit feedback (2022, August 18) retrieved 20 August 2022 from

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal examine or analysis, no half could also be reproduced with out the written permission. The content material is offered for data functions solely.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

We use cookies to give you the best experience. Cookie Policy