Troubleshooting Guide - openguard-bot/openguard GitHub Wiki

Troubleshooting Guide

Comprehensive troubleshooting guide for common AIMod issues, error resolution, and performance optimization.

🚨 Common Issues

Bot Connection Issues

Bot Not Responding to Commands

Symptoms:

Commands not executing
No response from bot
Bot appears offline

Solutions:

# Check bot status
sudo systemctl status aimod-bot.service

# View recent logs
journalctl -u aimod-bot.service --since "1 hour ago"

# Restart bot service
sudo systemctl restart aimod-bot.service

# Check Discord token validity
python -c "
import discord
import os
from dotenv import load_dotenv
load_dotenv()

client = discord.Client(intents=discord.Intents.default())

@client.event
async def on_ready():
    print(f'Bot connected as {client.user}')
    await client.close()

client.run(os.getenv('DISCORD_TOKEN'))
"

Permission Errors

Symptoms:

"Missing Permissions" errors
Commands fail silently
Moderation actions not working

Solutions:

Check Bot Permissions:
- Ensure bot has Administrator permission
- Verify bot role is above target user roles
- Check channel-specific permissions

Required Permissions:

✅ Administrator (recommended)
OR specific permissions:
✅ Ban Members
✅ Kick Members
✅ Manage Messages
✅ Manage Channels
✅ Manage Roles
✅ View Audit Log
✅ Send Messages
✅ Use Slash Commands

Database Issues

Connection Failures

Symptoms:

"Database connection failed" errors
Slow response times
Configuration not saving

Diagnosis:

# Test PostgreSQL connection
psql -h localhost -U aimod_user -d aimod_bot -c "SELECT 1;"

# Check PostgreSQL status
sudo systemctl status postgresql

# View PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*-main.log

# Check connection pool status
python -c "
from database.connection import get_pool
import asyncio

async def test():
    pool = await get_pool()
    print(f'Pool size: {pool.get_size()}')
    print(f'Pool max size: {pool.get_max_size()}')

asyncio.run(test())
"

Solutions:

# Restart PostgreSQL
sudo systemctl restart postgresql

# Reset database permissions
sudo -u postgres psql << EOF
GRANT ALL PRIVILEGES ON DATABASE aimod_bot TO aimod_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO aimod_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO aimod_user;
EOF

# Increase connection limits
sudo nano /etc/postgresql/13/main/postgresql.conf
# max_connections = 200
# shared_buffers = 256MB

# Restart PostgreSQL after config changes
sudo systemctl restart postgresql

Migration Issues

Symptoms:

Data not appearing after migration
Inconsistent record counts
Migration script failures

Solutions:

# Re-run migration with verbose output
python migrate_json_to_postgresql.py --verbose

# Validate migration
python -c "
from migrate_json_to_postgresql import validate_migration
import asyncio
asyncio.run(validate_migration())
"

# Check for partial migration
psql -h localhost -U aimod_user -d aimod_bot -c "
SELECT 
    'guild_config' as table_name, 
    COUNT(*) as record_count 
FROM guild_config
UNION ALL
SELECT 
    'user_infractions' as table_name, 
    COUNT(*) as record_count 
FROM user_infractions;
"

# Rollback and retry if needed
# (See Database Migration guide for rollback procedures)

AI Provider Issues

API Key Errors

Symptoms:

"Invalid API key" errors
AI moderation not working
Authentication failures

Solutions:

# Test API key validity
curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
     https://openrouter.ai/api/v1/models

# Check environment variables
echo $OPENROUTER_API_KEY
echo $OPENAI_API_KEY
echo $GITHUB_TOKEN

# Verify .env file
grep -E "(OPENROUTER|OPENAI|GITHUB)" .env

# Test LiteLLM integration
python test_litellm_integration.py

Rate Limiting

Symptoms:

"Rate limit exceeded" errors
Slow AI responses
Intermittent failures

Solutions:

# Implement exponential backoff
import asyncio
import random

async def retry_with_backoff(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return await func()
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            await asyncio.sleep(wait_time)

# Monitor API usage
# Check provider dashboard for usage limits
# Consider upgrading API plan if needed

Redis Cache Issues

Connection Failures

Symptoms:

Cache misses
Slow configuration loading
Session issues

Solutions:

# Test Redis connection
redis-cli ping

# Check Redis status
sudo systemctl status redis-server

# View Redis logs
sudo journalctl -u redis-server --since "1 hour ago"

# Clear Redis cache
redis-cli FLUSHALL

# Restart Redis
sudo systemctl restart redis-server

Memory Issues

Symptoms:

Redis out of memory errors
Cache evictions
Performance degradation

Solutions:

# Check Redis memory usage
redis-cli INFO memory

# Configure memory limits
sudo nano /etc/redis/redis.conf
# maxmemory 512mb
# maxmemory-policy allkeys-lru

# Monitor memory usage
redis-cli --latency-history -i 1

🔧 Performance Issues

High CPU Usage

Diagnosis:

# Monitor CPU usage
htop
top -p $(pgrep -f "python.*bot.py")

# Check Python profiling
python -m cProfile -o profile.stats bot.py
python -c "
import pstats
p = pstats.Stats('profile.stats')
p.sort_stats('cumulative').print_stats(20)
"

Solutions:

# Optimize database queries
# Use connection pooling
# Implement caching for frequent operations
# Reduce AI API calls with smart caching

# Example optimization:
from cachetools import TTLCache

class OptimizedProcessor:
    def __init__(self):
        self.cache = TTLCache(maxsize=1000, ttl=300)
    
    async def process_message(self, message):
        cache_key = f"{message.guild.id}:{hash(message.content)}"
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        result = await self.expensive_operation(message)
        self.cache[cache_key] = result
        return result

High Memory Usage

Diagnosis:

# Monitor memory usage
free -h
ps aux | grep python

# Python memory profiling
pip install memory-profiler
python -m memory_profiler bot.py

Solutions:

# Implement proper cleanup
import gc
import weakref

class MemoryOptimizedCog:
    def __init__(self):
        self.cache = weakref.WeakValueDictionary()
        
    async def cog_unload(self):
        self.cache.clear()
        gc.collect()

# Use generators for large datasets
async def process_large_dataset():
    async for item in large_dataset_generator():
        yield process_item(item)

# Limit cache sizes
from cachetools import TTLCache
cache = TTLCache(maxsize=1000, ttl=300)  # Limit size

Slow Database Queries

Diagnosis:

-- Enable query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
SELECT pg_reload_conf();

-- Check slow queries
SELECT query, mean_exec_time, calls 
FROM pg_stat_statements 
ORDER BY mean_exec_time DESC 
LIMIT 10;

-- Check missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation 
FROM pg_stats 
WHERE schemaname = 'public' 
ORDER BY n_distinct DESC;

Solutions:

-- Add missing indexes
CREATE INDEX CONCURRENTLY idx_user_infractions_guild_user_time 
ON user_infractions(guild_id, user_id, timestamp);

CREATE INDEX CONCURRENTLY idx_event_logs_guild_type_time 
ON event_logs(guild_id, event_type, timestamp);

-- Optimize queries
-- Use EXPLAIN ANALYZE to understand query plans
EXPLAIN ANALYZE SELECT * FROM user_infractions 
WHERE guild_id = 123456789 AND user_id = 987654321;

-- Update table statistics
ANALYZE user_infractions;
ANALYZE guild_config;

🌐 Dashboard Issues

Frontend Build Failures

Symptoms:

Build process fails
Missing dependencies
TypeScript errors

Solutions:

# Clear node modules and reinstall
cd dashboard/frontend
rm -rf node_modules package-lock.json
npm install

# Fix TypeScript errors
npm run type-check

# Update dependencies
npm audit fix
npm update

# Build with verbose output
npm run build -- --verbose

Backend API Errors

Symptoms:

500 Internal Server Error
Authentication failures
CORS issues

Solutions:

# Check backend logs
journalctl -u aimod-backend.service --since "1 hour ago"

# Test API endpoints
curl -X GET http://localhost:8000/api/health

# Check CORS configuration
# Ensure frontend URL is in allowed origins

# Restart backend service
sudo systemctl restart aimod-backend.service

Authentication Issues

Symptoms:

Login redirects fail
JWT token errors
Session timeouts

Solutions:

# Verify Discord OAuth2 configuration
echo $DISCORD_CLIENT_ID
echo $DISCORD_CLIENT_SECRET
echo $DISCORD_REDIRECT_URI

# Check JWT secret
echo $JWT_SECRET

# Test Discord API connectivity
curl -H "Authorization: Bot $DISCORD_TOKEN" \
     https://discord.com/api/v10/users/@me

🔍 Debugging Tools

Log Analysis

# Centralized log viewing
tail -f /var/log/aimod/*.log

# Search for specific errors
grep -r "ERROR" /var/log/aimod/
grep -r "CRITICAL" /var/log/aimod/

# Filter by time range
journalctl -u aimod-bot.service --since "2025-01-13 10:00:00" --until "2025-01-13 11:00:00"

# Follow logs in real-time
journalctl -u aimod-bot.service -f

Database Debugging

-- Check active connections
SELECT pid, usename, application_name, client_addr, state, query_start, query 
FROM pg_stat_activity 
WHERE datname = 'aimod_bot';

-- Check table sizes
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE schemaname = 'public' 
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes 
ORDER BY idx_scan DESC;

Performance Monitoring

# System monitoring script
#!/bin/bash
echo "=== System Status $(date) ==="
echo "CPU: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')"
echo "Memory: $(free | grep Mem | awk '{printf "%.1f%%", $3/$2 * 100.0}')"
echo "Disk: $(df -h / | awk 'NR==2{print $5}')"

echo "=== Service Status ==="
systemctl is-active aimod-bot.service
systemctl is-active aimod-backend.service
systemctl is-active postgresql.service
systemctl is-active redis.service

echo "=== Database Connections ==="
sudo -u postgres psql -d aimod_bot -c "SELECT count(*) FROM pg_stat_activity;"

echo "=== Recent Errors ==="
journalctl -u aimod-bot.service --since "1 hour ago" | grep -i error | tail -5

📞 Getting Help

Community Support

Discord Server: Join our community for real-time help
GitHub Issues: Report bugs and request features
Documentation: Check existing guides first

Professional Support

Priority Support: Available for production deployments
Custom Development: Feature development and customization
Consulting: Architecture and optimization consulting

Reporting Issues

When reporting issues, include:

Environment Information:
- Operating system and version
- Python version
- PostgreSQL version
- Redis version
Error Details:
- Complete error messages
- Stack traces
- Relevant log entries
Reproduction Steps:
- Steps to reproduce the issue
- Expected vs actual behavior
- Configuration details

System Information:

# Generate system report
echo "=== System Information ===" > debug_report.txt
uname -a >> debug_report.txt
python --version >> debug_report.txt
psql --version >> debug_report.txt
redis-server --version >> debug_report.txt

echo "=== Service Status ===" >> debug_report.txt
systemctl status aimod-bot.service >> debug_report.txt

echo "=== Recent Logs ===" >> debug_report.txt
journalctl -u aimod-bot.service --since "1 hour ago" >> debug_report.txt

For additional support, visit our Discord Server or create an issue on GitHub.