Twitch Platform Exploration - weiyinc11/HateSpeechModerationTwitch GitHub Wiki

Method of Controlled Environment Creation: Private Twitch Stream with a game enlisted that is either not real or one that no one watches - link invite only


Suspicious User Controls(Ban User Detection) described below: "Over time, the tool will learn from the actions taken by the creators and mods and the accuracy of its predictions will continue to improve." AutoMod described: "When enabled, AutoMod pre-screens chat messages and holds messages that contain content detected as risky, preventing them from being visible on the channel unless they are approved by a mod. Blocked Terms allow creators to further tailor AutoMod by adding custom terms or phrases that will be always blocked in their channel. These features are best utilized through Mod View, a customizable channel interface that provides mods with a toolbelt of ‘widgets’ for moderation tasks like reviewing messages held by AutoMod, keeping tabs on actions taken by other mods on their team, changing moderation settings, and more."

More in-depth into the Machine Learning tools that are integrated in the platform:

  • Scans the platform's chats for offensive usernames, spam and scams, offensive emotes, and bot accounts. It also scans livestreams for "content that is particularly objectionable and potentially harmful to our communities, including extreme violence, gore, and pornography." - flagged then human review.
  • Models also include globally banned keywords to detect.
  • "We make sure to collect training data from a broad sampling of expert labeled data (either internal.. labeled by our policy and Trust & Safety teams or by external vendors using criteria provided by our policy team)".
  • Claims to work on specific bias monitoring and mitigation work for biased datasets
  • The models are also reviewed by internal team weekly and while detecting harmful content, also take action by: "For credible user reports of high-harm content, Twitch’s automated systems also temporarily downgrade the visibility of potentially problematic content until a full moderator review can be completed, including hiding the reported channel from discoverability, meaning the channel cannot be found through the home page, search bar, or other discovery tools, until the moderator review is completed."

Enforcement includes a combination of "machine detection and user reporting... sent to our Safety Operations team for human review. Acknowledgements towards the "ephemeral" nature of the content and claims that, "Nevertheless, we have found ways to make machine detection viable and useful on Twitch, and we will continue to invest in these technologies to improve them."

image Image taken from: https://safety.twitch.tv/s/article/Safety-at-Twitch?language=en_US#4Service-LevelSafety This image also seems to make a distinction between AutoMod and Machine Detection.. suggesting an internet machine learning or NLP tool that Twitch uses on an internal service level.

AutoMod is active under 19 languages and "holds potentially inappropriate messages, preventing them from being visible in chat unless approved by a mod".

Shield Mode is another functionality provided to Twitch streams which can be activated when streamer is receiving many harmful content messages in their stream's chat. Including functions described below such as keyword filters, chat modes - limiting access to chat based on users, chat verification options, and AutoMod itself. Seems to be an all inclusive tools that is seemingly customisable and automated to the extent of the AutoMod functions.

Viewer-level safety options

All of this happens after the twitch-wide and channel-wide moderation efforts.

  • Streamers add tags to their streams for specific themes (e.g. tobacco, gambling, adult content)
  • Viewers will get an interstitial when visiting the stream warning them of the content label/theme, they can then opt to continue watching
  • Viewers can also toggle to hide videos labeled with each theme entirely from their account settings, effectively removing them from homepage/recommendations. All are not hidden by default. Screenshot 2024-12-06 122557
  • Content Labels include:
    • Sexual Themes;
      • Sexual Themes has an additional setting in account settings where if you do not hide it, you can blur the previews for it (on by default)
    • Drugs, Intoxication, or Excessive Tobacco Use;
    • Gambling; Violent and Graphic Depictions;
    • Significant Profanity or Vulgarity;
    • Mature-Rated Games;
    • Politics and Sensitive Social Issues

Screenshot 2024-12-06 123143

  • If you are not signed in, All of these labeled content are hidden by default. And you can enable them all, but not each one granularly. You have to sign in to hide/unhide each theme individually.
  • "If you cannot change your Content Display Preferences settings, it might be because they’re being controlled by your account or device based on your location. For example, some regions prohibit certain content and in those cases Twitch must comply with any local requirements."

2. Chat filters

Screenshot 2024-12-06 123430

  • Follow each user from channel to channel
  • Set globally from any chat, persists
  • Settings for each of : Discrimination, Sexually Explicit Language, Hostility, and Profanity.
  • Can configure even if not signed in, but then it wont persist. (It does persist across a browser session though)

3. (Loosely related) Blocking

  • Blocking whispers (dms) from strangers is on by default. Stranger is defined by: Screenshot 2024-12-06 123935
  • Users can also specifically block (and unblock) other users: Screenshot 2024-12-06 125332
    • Only works with signed in users

4. (Loosely related; Not mentioned in the graphic) User interactions with Twitch recommendations

  • Users can Sort recommended channels by attributes
  • Users can flag channels they aren’t interested in which hides from future recommendations
  • Users can flag content categories " " (a specific game; or something like the "just chatting" category)
  • Users can supposedly flag specific videos “ ”
    • Didnt get this to work. Where I flag “not interested” for channels and categories, there is only an option to “report video”.
    • Even after I tried reporting a video and blocking the creator, it did not show up in my settings as previously given video feedback Screenshot 2024-12-06 122400
  • All channels, videos, and categories that users have previously flagged with “Not interested” appear in account settings to be revoked later.
  • Only works with signed in users

Enforcement Methods


Describes the role of a Twitch moderator as mainly taken by human streamers and a team of human reviewers. More carefully outlines and documents the expectations of a human moderator's role and the importance of communicating with the streamer, approaches to moderating different types of harmful content accordingly: "Dealing with Trolls: Trolls are viewers who intentionally try to disrupt the chat and provoke other viewers. Communicate with the streamer on how they’d like you to handle trolls in chat.

Handling Malicious Spam: Spam can be a major problem in the chat. Moderators should use chat filters and other tools to remove spam as quickly as possible.

Enforcing Rules: Enforcing the streamer’s rules and guidelines can sometimes be challenging. Moderators should be consistent in their enforcement and not show favoritism or bias towards any particular viewer.

Handling Difficult Viewers: Some viewers may be difficult to deal with, either because they are violating the rules or being disruptive in the chat. In these situations, moderators should remain calm and professional and diffuse the situation as quickly as possible to maintain the chat quality.

Dealing with Hate Raids: Unfortunately, some bad actors will use the Twitch Raid feature to send hate-filled messages via hordes of bot accounts maliciously. To immediately help combat these hate raids activate Shield Mode and work with the streamer on the next steps. The streamer can also adjust their raid settings in their channel’s settings."

The platform also includes chat modes which include:

  1. Follower Only Mode - enable chat only for followers who have been following for a certain time period such as 0 mins to 3 months
  2. Unique Chat Mode - prevent repeat messages
  3. Slow Mode - restrict from sending multiple in one row
  4. Sub-only mode - only enable chat for subscribers, VIPs, and mods
  5. Emote only mode

1. Default Settings on the Platform (Moderation Tab on Creator Dashboard/Settings/Moderation)


AutoMod Bot - built-in machine learning and nlp detection tool

Platform AutoMod Docs:

  • "AutoMod uses machine learning and nature language processing algorithms to hold risky messages from chat so they can be reviewed by a channel moderator before appearing to other viewers in the chat."
  • Detection tools built-in to stream chats that "blocks inappropriate or harassing chat... detects misspellings and evasive language automatically"
  • There are different levels of the AutoMod that a creator can activate for their stream which corresponds to the level of moderation across 4 main categories of "discrimination, sexual content, hostility, and profanity". !! The data used to train this LLM or inform this detection referred to as the "dictionary" that "each setting contains is managed by Twitch"

Screenshot 2024-06-27 at 1 23 11 AM
Depicting the settings of Twitch's AutoMod when turning it on. !! Interestingly, Twitch is developing a machine learning tool that perhaps is trained on moderator actions so that a model can carry out what is likely to be moderated... [This might be adjacent to our exploration but still interesting]

While Discord provides user reports as the detection scheme and then relays the action to other human moderators or includes automated blurring of explicit images, Twitch seems to utilise LLMs to detect these policy violations for the streamer's review.

Screenshot 2024-06-27 at 1 18 13 AM

Blocked Terms and Phrases for a keyword search tool that Discord also provides. Also provides a Block Hyperlinks functionality but only markets towards complete ban of any URLs in the chat except for moderators, editors and VIPs in a creator's channel.

  • "Uses machine learning to detect potential ban evaders using a signal level you select. When enabled, users will be flagged and surfaced to moderators for review"
  • Flagging the possible or likely channel-ban evaders who enter the streamer's chat of which these users will either be put into restricted mode (likely) or monitoring mode.
  • Automatically turned on for all streamers on the platform whereas the AutoMod tool is not.

Custom Chat Rules that can be implemented by Streamer on the same Creator Dashboard. However there are no docs detailing how these chat rules are defined and enforced.

Shared Moderator Comments: Twitch also seems to provide a moderator platform a creator can share ban info with. The mod view has a section for Mod actions and AutoMod queue of the flagged content from the stream's chat.


2. Extensions: Allows for external apps to be integrated to a creator's stream. Under twitch developers platform, twitch API seems to allow users to create their own extensions to add onto their own streams.

Documentation for Building a custom Twitch Extension: https://dev.twitch.tv/docs/extensions/ Twitch also provides documentation for building your own Twitch Chatbot:

  • Uses an Internal Relay Chat (IRC) interface that uses a WebSocket or TCP connection to interact with Twitch Chat platforms.

Important to note for line rate when inputting our dataset Hence, we may need to be our bot verified: https://dev.twitch.tv/limit-increase/ - complete the form and wait for Twitch approval. The form requires a developer name, email, bot name and description, reason for verification, bot commands, if there is a panel on the bot with either own contact info, bot description, instructions on how to add the bot, and how to remove, and if the bot has 2FA.


Screenshot 2024-06-28 at 11 21 04 AM

Under the Twitch API, the following functions for moderation may be of use: Screenshot 2024-06-28 at 11 26 00 AM

If we need, building an extension that includes a verified bot that moderates at potentially customisable levels and includes the use of API to add channel moderators (limit of 10 within 10 sec window). It could also handle data insertion and be able to change to a different kind of policy so that our experimental design can be compact.

⚠️ **GitHub.com Fallback** ⚠️