T4SA - sporedata/researchdesigneR GitHub Wiki

General description

Twitter for Sentiment Analysis (T4SA) corpus is a collection of tweets containing text and images collected from July to December 2016. During this time span, the researchers exploited Twitter’s Sample API to access a random 1% sample of the stream of all globally produced tweets, discarding:

tweets not containing any static image or containing other media (i.e., They also discarded tweets containing only videos and/or animated GIFs)
tweets not written in the English language
tweets whose text was less than 5 words long
retweets.

Related publications / Literature

Sentiment analysis on Italian tweets - In this paper the authors collected one year worth of tweets, from February 2012 to February 2013.

Data access

You can download the T4SA dataset at Cross-Media Learning for Image Sentiment Analysis in the Wild .