Available data sources - digitalmethodsinitiative/4cat GitHub Wiki

On this page we list the scripts for data sources. Some of these are fully functional, others are deprecated. Let us know if you have a new data source to add.

For datasource-specific information, check the README files in the folder of the respective data source.

Name Source Active Objects Local (Continuous scraper) Notes
4chan 4chan API Yes Comments + OPs Yes We wrote several scripts to import data from 4chan archives in the helper-scripts folder, e.g. this script to import csv dumps from 4plebs.
8chan 8chan API No (Archives only) Comments + OPs Yes 8chan is now defunct. We scraped live data when it was still online. Let us know in case you are interested in a database copy.
8kun 8chan API Yes Comments + OPs Yes Similar to the 4chan data source.
9gag ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Bitchute Scraping No (issue) Videos + comments No Uses BitChute's web search endpoint, and scrapes data from the live website.
Bluesky Bluesky API Yes Posts No Uses the Bluesky API.
Douban Scraping Yes Comments + OPs No Small datasets can be collected; due to rate-limiting, large searches may not complete properly.
Douyin ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Gab Zeeschuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Imgur ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Import from tool Files from other tools Yes - No This to import files from tools like YouTube Data Tools.
Instagram ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
LinkedIn ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Media upload Upload media files Yes Images and videos No Import image and video files so they can be analyzed using 4CAT's processors.
Parler Parler API No Posts No Uses Parler's unofficial web API; requires a valid Parler login for usage.
Pinterest Zeeschuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Reddit Pushshift API No (Archive only) Comments + OPs No Data retrieved via Pushshift. Unavailable after Reddit's increase of API prices in July 2023.
Rednote / Xiaohongshu Zeeschuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Telegram Telegram API Yes Messages in open groups No Requires a personal API key, which can be obtained by anyone with a Telegram account here.
Threads Zeeschuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Truth Social Zeeschuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
TikTok ZeeSchuimer Yes Posts No Must be actively scraped via your browser and the Zeeschuimer plugin.
Tumblr Tumblr API Yes Posts + reblogs No Requires API keys which you can obtain here
X/Twitter Twitter API & ZeeSchuimer Yes Tweets No Must be actively scraped via your browser and the Zeeschuimer plugin.
Usenet - Comments + OPs Yes Requires a local, static Usenet database. VK