Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

Download Edit this record How to cite View on PhilPapers
In real-time, Twitter strongly imprints world events, popular culture, and the day-to-day; Twitter records an ever growing compendium of language use and change; and Twitter has been shown to enable certain kinds of prediction. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing, day-scale curation of over 100 billion tweets containing around 1 trillion 1-grams from 2008 to 2020. For each day, we break tweets into 1-, 2-, and 3-grams across 150+ languages, record usage frequencies, and generate Zipf distributions. We make the data set available through an interactive time series viewer, and as downloadable time series and daily distributions. We showcase a few examples of the many possible avenues of study we aim to enable including how social amplification can be visualized through ‘contagiograms’.
No keywords specified (fix it)
(categorize this paper)
PhilPapers/Archive ID
Upload history
Archival date: 2020-08-03
View other versions
Added to PP index

Total views
41 ( #58,407 of 64,209 )

Recent downloads (6 months)
11 ( #45,974 of 64,209 )

How can I increase my downloads?

Downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.