Available data sources - digitalmethodsinitiative/4cat GitHub Wiki
On this page we list the scripts for data sources. Some of these are fully functional, others are deprecated. Let us know if you have a new data source to add.
For datasource-specific information, check the README files in the folder of the respective data source.
| Name | Source | Active | Objects | Local (Continuous scraper) | Notes |
|---|---|---|---|---|---|
| 4chan | 4chan API | Yes | Comments + OPs | Yes | We wrote several scripts to import data from 4chan archives in the helper-scripts folder, e.g. this script to import csv dumps from 4plebs. |
| 8chan | 8chan API | No (Archives only) | Comments + OPs | Yes | 8chan is now defunct. We scraped live data when it was still online. Let us know in case you are interested in a database copy. |
| 8kun | 8chan API | Yes | Comments + OPs | Yes | Similar to the 4chan data source. |
| 9gag | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Bitchute | Scraping | No (issue) | Videos + comments | No | Uses BitChute's web search endpoint, and scrapes data from the live website. |
| Bluesky | Bluesky API | Yes | Posts | No | Uses the Bluesky API. |
| Douban | Scraping | Yes | Comments + OPs | No | Small datasets can be collected; due to rate-limiting, large searches may not complete properly. |
| Douyin | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Gab | Zeeschuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Imgur | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Import from tool | Files from other tools | Yes | - | No | This to import files from tools like YouTube Data Tools. |
| ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. | |
| ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. | |
| Media upload | Upload media files | Yes | Images and videos | No | Import image and video files so they can be analyzed using 4CAT's processors. |
| Zeeschuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. | |
| No (Archive only) | |||||
| Rednote / Xiaohongshu | Zeeschuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Telegram | Telegram API | Yes | Messages in open groups | No | Requires a personal API key, which can be obtained by anyone with a Telegram account here. |
| Threads | Zeeschuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Truth Social | Zeeschuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| TikTok | ZeeSchuimer | Yes | Posts | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Tumblr | Tumblr API | Yes | Posts + reblogs | No | Requires API keys which you can obtain here |
| X/Twitter | Twitter API & ZeeSchuimer | Yes | Tweets | No | Must be actively scraped via your browser and the Zeeschuimer plugin. |
| Usenet | - | Comments + OPs | Yes | Requires a local, static Usenet database. | VK |