T4SA - sporedata/researchdesigneR GitHub Wiki
General description
Twitter for Sentiment Analysis (T4SA) corpus is a collection of tweets containing text and images collected from July to December 2016. During this time span, the researchers exploited Twitter’s Sample API to access a random 1% sample of the stream of all globally produced tweets, discarding:
- tweets not containing any static image or containing other media (i.e., They also discarded tweets containing only videos and/or animated GIFs)
- tweets not written in the English language
- tweets whose text was less than 5 words long
- retweets.
Related publications / Literature
- Sentiment analysis on Italian tweets - In this paper the authors collected one year worth of tweets, from February 2012 to February 2013.
Data access
You can download the T4SA dataset at Cross-Media Learning for Image Sentiment Analysis in the Wild .