A Telegram corpus for hate speech, offensive language, and online harm

Abstract

We provide a new text corpus from the social medium Telegram, which is rich in indirect forms of divisive speech. We scraped all messages from one channel of supporters of Donald Trump, covering a large part of his presidency from late 2016 until January 2021. The discussion among the group members over this long time period includes the spread of disinformation, disparaging of out-group members, and other forms of offensive speech. To encourage research into such practices of poisoning public political discourse, we added automatic annotations of offensive language to all messages. We further added manual annotations of harmful language to a portion of the posts in order to enable the analysis of more implicit forms of online harm.

Author's Profile

Mihaela Popa-Wyatt
University of Manchester

Analytics

Added to PP
2021-04-08

Downloads
416 (#54,302)

6 months
92 (#62,312)

Historical graph of downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.
How can I increase my downloads?