AI Moderation System - openguard-bot/openguard GitHub Wiki
AI Moderation System
The AI Moderation System is the core intelligence of AIMod, providing sophisticated content analysis and automated moderation decisions. This system leverages state-of-the-art language models through LiteLLM to analyze messages, images, and user behavior.
🧠 Core AI Engine
LiteLLM Integration
AIMod uses LiteLLM as its AI abstraction layer, supporting multiple providers:
Supported Providers:
- OpenRouter - Primary provider with access to multiple models
- GitHub Copilot - Enterprise-grade AI with code understanding
- OpenAI - Direct API integration
- Anthropic Claude - Advanced reasoning capabilities
- Google Gemini - Multimodal analysis
Configuration:
def get_litellm_client():
return LiteLLM(
api_base="https://openrouter.ai/api/v1",
api_key=os.getenv("OPENROUTER_API_KEY"),
model="github_copilot/gpt-4.1" # Default model
)
System Prompt Architecture
The AI system uses a sophisticated prompt template that includes:
Context Components:
- Server rules and guidelines
- Channel-specific rules
- User role and permissions
- Recent message history
- Channel category and settings
Prompt Template Structure:
You are an AI moderation assistant for a Discord server with a very edgy and dark sense of humor.
Your primary function is to analyze message content and attached media based STRICTLY on the
server rules provided below, using all available context. Your default stance should be to
IGNORE messages unless they are a CLEAR and SEVERE violation.
Server Rules:
---
{rules_text}
---
Context Information:
- User's Server Role: {user_role}
- Channel Category: {channel_category}
- Channel Age-Restricted/NSFW: {nsfw_status}
- Recent Channel History: {recent_messages}
Message to Analyze:
{message_content}
Respond with a JSON object containing your decision...
🔍 Message Processing Pipeline
1. Message Reception
@commands.Cog.listener(name="on_message")
async def message_listener(self, message: discord.Message):
# Initial filtering
if message.author.bot:
return
if not message.content and not message.attachments:
return
if not message.guild:
return
2. Configuration Checks
# Check if moderation is enabled
if not await get_guild_config_async(message.guild.id, "ENABLED", True):
return
# Check channel exclusions
if await is_channel_excluded(message.guild.id, message.channel.id):
return
3. Global Ban Enforcement
# Auto-ban globally banned users
if message.author.id in GLOBAL_BANS:
ban_reason = "Globally banned for severe universal violation."
await message.guild.ban(message.author, reason=ban_reason)
return
4. Content Preprocessing
# Truncate long messages
content = truncate_text(message.content, max_length=2000)
# Process attachments
attachment_descriptions = []
for attachment in message.attachments:
if attachment.content_type and attachment.content_type.startswith('image/'):
description = await self.media_processor.process_image(attachment.url)
attachment_descriptions.append(description)
5. Context Building
# Get channel-specific or server rules
rules_text = await get_channel_rules(message.guild.id, message.channel.id)
# Build user context
user_role = "Administrator" if message.author.guild_permissions.administrator else "Member"
channel_category = message.channel.category.name if message.channel.category else "Uncategorized"
nsfw_status = getattr(message.channel, 'nsfw', False)
# Get recent message history
recent_messages = await get_recent_channel_history(message.channel, limit=5)
6. AI Analysis
# Construct the full prompt
system_prompt = SYSTEM_PROMPT_TEMPLATE.format(
rules_text=rules_text,
user_role=user_role,
channel_category=channel_category,
nsfw_status=nsfw_status,
recent_messages=recent_messages,
message_content=content,
attachment_descriptions=attachment_descriptions
)
# Call AI service
response = await self.genai_client.acompletion(
model=ai_model,
messages=[{"role": "system", "content": system_prompt}],
temperature=0.1,
max_tokens=500
)
7. Decision Processing
# Parse AI response
ai_decision = json.loads(response.choices[0].message.content)
# Extract decision components
action = ai_decision.get("action", "IGNORE")
rule_violated = ai_decision.get("rule_violated", "")
reasoning = ai_decision.get("reasoning", "")
confidence = ai_decision.get("confidence", 0)
8. Action Execution
Based on the AI decision, the system executes appropriate moderation actions:
IGNORE: No action taken WARN: Delete message + send warning DM TIMEOUT: Timeout user + delete message + send notification BAN: Ban user + delete message + log action GLOBAL_BAN: Global ban + notify all servers
🎯 Decision Engine
Action Types
IGNORE
- Default action for acceptable content
- No moderation action taken
- Message remains visible
WARN
- For minor rule violations
- Message is deleted
- User receives warning DM
- Infraction logged in database
TIMEOUT
- For moderate violations
- User is timed out (muted)
- Duration based on violation severity
- Message deleted and user notified
BAN
- For severe violations
- User is banned from the server
- Recent messages deleted
- Permanent record created
GLOBAL_BAN
- For extreme violations
- User banned across all servers
- Added to global ban list
- Immediate enforcement
Confidence Scoring
The AI provides confidence scores (0-100) for its decisions:
- 90-100: High confidence, automatic execution
- 70-89: Medium confidence, execute with logging
- 50-69: Low confidence, execute but flag for review
- 0-49: Very low confidence, log but don't execute
Rule Violation Categories
Content Violations:
- Spam and excessive posting
- NSFW content in inappropriate channels
- Hate speech and discrimination
- Harassment and bullying
- Doxxing and privacy violations
Behavioral Violations:
- Raid participation
- Bot-like behavior
- Evading moderation
- Impersonation
- Malicious links
🖼️ Media Processing
Image Analysis
The system can analyze images using computer vision:
class MediaProcessor:
async def process_image(self, image_url: str) -> str:
# Download and process image
image_data = await self.download_image(image_url)
# OCR text extraction
extracted_text = self.extract_text(image_data)
# Content classification
content_type = self.classify_content(image_data)
# Generate description
description = f"Image contains: {content_type}"
if extracted_text:
description += f" Text: {extracted_text}"
return description
Capabilities:
- OCR Text Extraction: Extract text from images
- Content Classification: Identify NSFW or inappropriate content
- Meme Detection: Recognize common meme formats
- QR Code Scanning: Detect and analyze QR codes
Attachment Processing
Supported File Types:
- Images: PNG, JPG, GIF, WebP
- Documents: PDF, TXT (text extraction)
- Archives: ZIP, RAR (content listing)
- Audio/Video: Basic metadata extraction
⚙️ Configuration Options
Guild-Level Settings
# AI Moderation Settings
ENABLED: bool = True # Enable/disable AI moderation
AI_MODEL: str = "github_copilot/gpt-4.1" # AI model to use
CONFIDENCE_THRESHOLD: int = 70 # Minimum confidence for action
RULES_TEXT: str = "..." # Server rules for AI context
Channel-Specific Settings
# Channel Exclusions
AI_EXCLUDED_CHANNELS: List[int] = [] # Channels to skip moderation
# Channel-Specific Rules
AI_CHANNEL_RULES: Dict[int, str] = { # Custom rules per channel
123456789: "This is a meme channel, be more lenient",
987654321: "This is a serious discussion channel"
}
Advanced Configuration
# Timeout Durations (in seconds)
TIMEOUT_DURATIONS = {
"minor": 300, # 5 minutes
"moderate": 3600, # 1 hour
"severe": 86400 # 24 hours
}
# Auto-escalation settings
AUTO_ESCALATE_ENABLED: bool = True
ESCALATION_THRESHOLDS = {
"warnings": 3, # Ban after 3 warnings
"timeouts": 2 # Ban after 2 timeouts
}
📊 Performance Metrics
Response Times
- Average AI Response: 2-5 seconds
- Message Processing: <1 second (excluding AI)
- Database Operations: <100ms
- Cache Hits: 85-95% for configuration
Accuracy Metrics
- False Positive Rate: <5%
- False Negative Rate: <10%
- User Appeal Success: ~15%
- Moderator Override: <8%
Resource Usage
- Memory per Message: ~50KB
- CPU per Analysis: ~100ms
- Database Queries: 2-4 per message
- API Calls: 1 per analyzed message
🔧 Troubleshooting
Common Issues
AI Service Unavailable:
- Fallback to rule-based moderation
- Queue messages for later processing
- Notify administrators of service issues
High False Positive Rate:
- Adjust confidence thresholds
- Review and update server rules
- Fine-tune system prompts
Performance Issues:
- Enable caching for frequent queries
- Optimize database indexes
- Consider rate limiting
Monitoring
Key Metrics to Monitor:
- AI response times
- Error rates and types
- Decision confidence distribution
- User appeal rates
- Moderator override frequency
Next: Database System - Comprehensive database architecture and operations