AI Moderation v0.7.0 - nself-org/nchat GitHub Wiki
AI-Powered Auto-Moderation System (v0.7.0)
Overview
The v0.7.0 release introduces a comprehensive AI-powered auto-moderation system that provides multi-layered content analysis, automated actions, and detailed analytics for community safety.
Architecture
Core Components
1. AI Moderator Core (src/lib/moderation/ai-moderator.ts)
- Multi-model approach combining multiple AI detectors
- Confidence scoring based on model agreement
- Auto-action decision logic with configurable thresholds
- False positive learning system for continuous improvement
- User violation tracking with trust scoring
Key Features:
- Content analysis with detailed issue detection
- Configurable moderation policies
- User behavior history tracking
- Whitelist/blacklist support
- Automated action recommendations
2. Toxicity Detector (src/lib/moderation/toxicity-detector.ts)
- Google Perspective API integration
- Detects: toxicity, severe toxicity, insults, profanity, threats, identity attacks
- Fallback detection using rule-based patterns
- Result caching for performance
- Configurable thresholds for each category
Detection Categories:
- Toxicity (general harmful content)
- Severe Toxicity (extreme harmful content)
- Insult (personal attacks)
- Profanity (inappropriate language)
- Threat (violent threats)
- Identity Attack (discrimination)
- Sexually Explicit (adult content)
3. ML Spam Detector (src/lib/moderation/spam-detector-ml.ts)
- Pattern-based detection with ML-inspired heuristics
- User behavior analysis (message rate, duplicates)
- Link spam detection with domain whitelisting
- Promotional content detection
Spam Types Detected:
- Link spam (excessive URLs)
- Promotional content (marketing, sales)
- Repetitive content (flooding)
- Excessive caps/punctuation
- Shortened URLs (bit.ly, etc.)
- Duplicate messages
- High message frequency
4. Content Classifier (src/lib/moderation/content-classifier.ts)
- Category detection (technical, business, social, etc.)
- Language detection (10+ languages)
- Sentiment analysis (positive/negative/neutral)
- NSFW detection (text-based)
- Topic extraction (hashtags, keywords)
Categories:
- General, Technical, Business, Social
- Support, Announcement, Question
- Feedback, Complaint, Praise
- Warning categories (inappropriate, harassment, spam)
5. Moderation Actions (src/lib/moderation/actions.ts)
- Comprehensive action types (flag, hide, delete, warn, mute, ban)
- Bulk operations for efficient moderation
- Action audit trail with reversibility tracking
- Temporary actions with duration support
- User warning system
Available Actions:
- Flag: Mark for manual review
- Hide: Hide content from users
- Delete: Permanently remove content
- Warn: Issue user warning
- Mute: Temporarily restrict user (with duration)
- Unmute: Remove mute restriction
- Ban: Permanently or temporarily ban user
- Unban: Remove ban
- Approve: Approve flagged content
- Reject: Reject and delete flagged content
API Routes
/api/moderation/analyze (POST, GET, PUT)
- POST: Analyze content with AI moderation
- GET: Get current moderation policy
- PUT: Update moderation policy
Request Example:
{
"contentId": "msg_123",
"contentType": "text",
"content": "Message content to analyze",
"metadata": {
"userId": "user_456",
"channelId": "channel_789",
"messageCount": 5,
"timeWindow": 60000
}
}
Response:
{
"success": true,
"analysis": {
"contentId": "msg_123",
"contentType": "text",
"overallScore": 0.75,
"confidenceScore": 0.85,
"shouldFlag": true,
"shouldHide": false,
"autoAction": "flag",
"autoActionReason": "Overall risk score: 75%. High-severity issues: toxicity",
"priority": "high",
"detectedIssues": [
{
"category": "toxicity",
"severity": "high",
"confidence": 0.8,
"description": "Toxic content detected: insult",
"evidence": ["insult"]
}
]
}
}
/api/moderation/batch (POST)
Process multiple content items in parallel
Request:
{
"items": [
{
"contentId": "msg_1",
"contentType": "text",
"content": "Message 1"
},
{
"contentId": "msg_2",
"contentType": "text",
"content": "Message 2"
}
],
"maxConcurrency": 10
}
Response:
{
"success": true,
"stats": {
"total": 2,
"success": 2,
"failure": 0,
"flagged": 1,
"highPriority": 0,
"processingTime": 1250,
"avgProcessingTime": 625
},
"results": [...]
}
/api/moderation/actions (POST, GET)
- POST: Take moderation action (single or bulk)
- GET: Get action audit log
Single Action:
{
"action": "mute",
"targetType": "user",
"targetId": "user_123",
"targetUserId": "user_123",
"moderatorId": "mod_456",
"reason": "Repeated spam violations",
"duration": 60
}
Bulk Action:
{
"action": "hide",
"bulk": [
{
"targetType": "message",
"targetId": "msg_1",
"targetUserId": "user_123"
},
{
"targetType": "message",
"targetId": "msg_2",
"targetUserId": "user_123"
}
],
"moderatorId": "mod_456",
"reason": "Bulk spam removal"
}
/api/moderation/queue (GET, POST)
Manage moderation queue
/api/moderation/stats (GET)
Get analytics and statistics
Query Parameters:
period: 1d, 7d, 30d, 90d, allstartDate: Custom start dateendDate: Custom end date
Response:
{
"success": true,
"period": {
"start": "2026-01-24T00:00:00.000Z",
"end": "2026-01-31T00:00:00.000Z",
"label": "7d"
},
"metrics": {
"totalFlagged": 45,
"pendingReview": 12,
"highPriority": 3,
"totalActions": 33,
"automatedActions": 28,
"manualActions": 5
}
}
UI Components
ModerationDashboard
- Location:
src/components/admin/moderation/ModerationDashboard.tsx - Real-time analytics with charts
- Key metrics display
- Violation trends
- Top violators list
- Automation effectiveness metrics
Features:
- Period selection (24h, 7d, 30d, 90d)
- Queue status distribution (pie chart)
- Priority distribution (bar chart)
- Actions by type (horizontal bar chart)
- AI moderation performance metrics
ModerationQueue
- Location:
src/components/admin/moderation/ModerationQueue.tsx - Displays flagged content for review
- Filterable by status and priority
- One-click actions
- Detailed item analysis
Filters:
- Pending items
- High priority items
- All items
Actions per item:
- Approve
- Delete
- Warn User
- Hide
Displayed Information:
- Priority level (low, medium, high, critical)
- Status (pending, reviewing, approved, rejected)
- Content type
- AI flags and detections
- Toxicity, spam, NSFW scores
- Profanity words
- Auto action taken
- Confidence score
ModerationSettings
- Location:
src/components/admin/moderation/ModerationSettings.tsx - Configure AI detection features
- Adjust thresholds
- Enable/disable auto-actions
- Manage custom word lists
Settings Tabs:
- Detection: Enable/disable AI features
- Thresholds: Adjust sensitivity (0-100%)
- Auto Actions: Configure automated responses
- Custom Words: Blocked and allowed word lists
Database Schema
Tables
nchat_moderation_queue
CREATE TABLE nchat_moderation_queue (
id UUID PRIMARY KEY,
content_type TEXT NOT NULL,
content_id UUID NOT NULL,
content_text TEXT,
content_url TEXT,
channel_id UUID,
user_id UUID NOT NULL,
user_display_name TEXT,
status TEXT NOT NULL,
priority TEXT NOT NULL,
ai_flags TEXT[],
toxic_score DECIMAL,
nsfw_score DECIMAL,
spam_score DECIMAL,
profanity_detected BOOLEAN,
profanity_words TEXT[],
model_version TEXT,
confidence_score DECIMAL,
auto_action TEXT,
auto_action_reason TEXT,
is_hidden BOOLEAN DEFAULT FALSE,
reviewed_by UUID,
reviewed_at TIMESTAMPTZ,
moderator_decision TEXT,
moderator_notes TEXT,
appeal_status TEXT,
appeal_text TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
nchat_moderation_actions
CREATE TABLE nchat_moderation_actions (
id UUID PRIMARY KEY,
queue_id UUID,
action_type TEXT NOT NULL,
target_type TEXT NOT NULL,
target_id UUID NOT NULL,
target_user_id UUID NOT NULL,
moderator_id UUID,
moderator_role TEXT,
reason TEXT,
is_automated BOOLEAN DEFAULT FALSE,
automation_type TEXT,
duration INTEGER,
expires_at TIMESTAMPTZ,
reversible BOOLEAN DEFAULT TRUE,
reversed_by UUID,
reversed_at TIMESTAMPTZ,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
nchat_user_moderation_history
CREATE TABLE nchat_user_moderation_history (
user_id UUID PRIMARY KEY,
total_violations INTEGER DEFAULT 0,
toxic_violations INTEGER DEFAULT 0,
nsfw_violations INTEGER DEFAULT 0,
spam_violations INTEGER DEFAULT 0,
profanity_violations INTEGER DEFAULT 0,
warnings_received INTEGER DEFAULT 0,
mutes_received INTEGER DEFAULT 0,
bans_received INTEGER DEFAULT 0,
trust_score INTEGER DEFAULT 100,
is_muted BOOLEAN DEFAULT FALSE,
muted_until TIMESTAMPTZ,
is_banned BOOLEAN DEFAULT FALSE,
banned_until TIMESTAMPTZ,
first_violation_at TIMESTAMPTZ,
last_violation_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
nchat_user_warnings
CREATE TABLE nchat_user_warnings (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
moderator_id UUID,
reason TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Configuration
Moderation Policy
interface ModerationPolicy {
// Feature flags
enableToxicityDetection: boolean
enableNSFWDetection: boolean
enableSpamDetection: boolean
enableProfanityFilter: boolean
// Auto-actions
autoFlag: boolean
autoHide: boolean
autoWarn: boolean
autoMute: boolean
autoBan: boolean
// Thresholds
thresholds: {
toxicity: number // 0-1, default: 0.7
nsfw: number // 0-1, default: 0.7
spam: number // 0-1, default: 0.6
profanity: number // 0-1, default: 0.5
flagThreshold: number // 0-1, default: 0.5
hideThreshold: number // 0-1, default: 0.8
warnThreshold: number // 0-1, default: 0.7
muteThreshold: number // 0-1, default: 0.85
banThreshold: number // 0-1, default: 0.95
maxViolationsPerDay: number
maxViolationsPerWeek: number
maxViolationsTotal: number
}
// Custom lists
customBlockedWords: string[]
customAllowedWords: string[]
whitelistedUsers: string[]
blacklistedUsers: string[]
// Learning
enableFalsePositiveLearning: boolean
minimumConfidenceForAutoAction: number
}
Environment Variables
# Optional: Google Perspective API
PERSPECTIVE_API_KEY=your_api_key_here
# Optional: OpenAI Moderation API
OPENAI_API_KEY=your_api_key_here
Usage Examples
Analyze Message in Real-Time
import { getAIModerator } from '@/lib/moderation/ai-moderator'
const moderator = getAIModerator()
await moderator.initialize()
const analysis = await moderator.analyzeContent('msg_123', 'text', 'Message content here', {
userId: 'user_456',
channelId: 'channel_789',
})
if (analysis.shouldFlag) {
// Add to moderation queue
// Or take auto-action
}
Batch Process Messages
const response = await fetch('/api/moderation/batch', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
items: messages.map((msg) => ({
contentId: msg.id,
contentType: 'text',
content: msg.content,
metadata: { userId: msg.userId },
})),
}),
})
const { stats, results } = await response.json()
Take Moderation Action
const response = await fetch('/api/moderation/actions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
action: 'mute',
targetUserId: 'user_123',
moderatorId: 'mod_456',
reason: 'Spam violations',
duration: 60, // minutes
}),
})
Performance
- Average analysis time: <500ms per message
- Batch processing: Up to 100 items in <10s
- Cache hit rate: ~70% for repeated content
- False positive rate: <5% with tuned thresholds
Best Practices
- Start Conservative: Begin with higher thresholds (0.8-0.9) and lower as needed
- Review Regularly: Check flagged content weekly to tune thresholds
- Use Whitelists: Add trusted users to whitelist to reduce false positives
- Enable Learning: Turn on false positive learning for continuous improvement
- Monitor Analytics: Track automation effectiveness and adjust policies
- Combine Methods: Use both AI and rule-based detection for best results
- Gradual Enforcement: Start with flags, then warnings, before auto-bans
Roadmap
v0.8.0 Enhancements
- [ ] Advanced ML models (BERT, GPT-based)
- [ ] Image NSFW detection (NSFW.js integration)
- [ ] Video content moderation
- [ ] Multi-language support (50+ languages)
- [ ] Context-aware moderation (thread analysis)
- [ ] User appeal system UI
- [ ] Moderator collaboration tools
- [ ] Advanced analytics (ROC curves, precision/recall)
Future Features
- [ ] Federated moderation (cross-server)
- [ ] Community moderation voting
- [ ] AI training dashboard
- [ ] Custom ML model upload
- [ ] Real-time WebSocket notifications
- [ ] Mobile moderation app
License
MIT License - See LICENSE file for details
Support
For issues, questions, or feature requests:
- GitHub Issues: nself-chat/issues
- Documentation: docs/
- Community: Discord