Troubleshooting Guide - openguard-bot/openguard GitHub Wiki
Troubleshooting Guide
Comprehensive troubleshooting guide for common AIMod issues, error resolution, and performance optimization.
🚨 Common Issues
Bot Connection Issues
Bot Not Responding to Commands
Symptoms:
- Commands not executing
- No response from bot
- Bot appears offline
Solutions:
# Check bot status
sudo systemctl status aimod-bot.service
# View recent logs
journalctl -u aimod-bot.service --since "1 hour ago"
# Restart bot service
sudo systemctl restart aimod-bot.service
# Check Discord token validity
python -c "
import discord
import os
from dotenv import load_dotenv
load_dotenv()
client = discord.Client(intents=discord.Intents.default())
@client.event
async def on_ready():
print(f'Bot connected as {client.user}')
await client.close()
client.run(os.getenv('DISCORD_TOKEN'))
"
Permission Errors
Symptoms:
- "Missing Permissions" errors
- Commands fail silently
- Moderation actions not working
Solutions:
-
Check Bot Permissions:
- Ensure bot has Administrator permission
- Verify bot role is above target user roles
- Check channel-specific permissions
-
Required Permissions:
✅ Administrator (recommended) OR specific permissions: ✅ Ban Members ✅ Kick Members ✅ Manage Messages ✅ Manage Channels ✅ Manage Roles ✅ View Audit Log ✅ Send Messages ✅ Use Slash Commands
Database Issues
Connection Failures
Symptoms:
- "Database connection failed" errors
- Slow response times
- Configuration not saving
Diagnosis:
# Test PostgreSQL connection
psql -h localhost -U aimod_user -d aimod_bot -c "SELECT 1;"
# Check PostgreSQL status
sudo systemctl status postgresql
# View PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*-main.log
# Check connection pool status
python -c "
from database.connection import get_pool
import asyncio
async def test():
pool = await get_pool()
print(f'Pool size: {pool.get_size()}')
print(f'Pool max size: {pool.get_max_size()}')
asyncio.run(test())
"
Solutions:
# Restart PostgreSQL
sudo systemctl restart postgresql
# Reset database permissions
sudo -u postgres psql << EOF
GRANT ALL PRIVILEGES ON DATABASE aimod_bot TO aimod_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO aimod_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO aimod_user;
EOF
# Increase connection limits
sudo nano /etc/postgresql/13/main/postgresql.conf
# max_connections = 200
# shared_buffers = 256MB
# Restart PostgreSQL after config changes
sudo systemctl restart postgresql
Migration Issues
Symptoms:
- Data not appearing after migration
- Inconsistent record counts
- Migration script failures
Solutions:
# Re-run migration with verbose output
python migrate_json_to_postgresql.py --verbose
# Validate migration
python -c "
from migrate_json_to_postgresql import validate_migration
import asyncio
asyncio.run(validate_migration())
"
# Check for partial migration
psql -h localhost -U aimod_user -d aimod_bot -c "
SELECT
'guild_config' as table_name,
COUNT(*) as record_count
FROM guild_config
UNION ALL
SELECT
'user_infractions' as table_name,
COUNT(*) as record_count
FROM user_infractions;
"
# Rollback and retry if needed
# (See Database Migration guide for rollback procedures)
AI Provider Issues
API Key Errors
Symptoms:
- "Invalid API key" errors
- AI moderation not working
- Authentication failures
Solutions:
# Test API key validity
curl -H "Authorization: Bearer $OPENROUTER_API_KEY" \
https://openrouter.ai/api/v1/models
# Check environment variables
echo $OPENROUTER_API_KEY
echo $OPENAI_API_KEY
echo $GITHUB_TOKEN
# Verify .env file
grep -E "(OPENROUTER|OPENAI|GITHUB)" .env
# Test LiteLLM integration
python test_litellm_integration.py
Rate Limiting
Symptoms:
- "Rate limit exceeded" errors
- Slow AI responses
- Intermittent failures
Solutions:
# Implement exponential backoff
import asyncio
import random
async def retry_with_backoff(func, max_retries=3):
for attempt in range(max_retries):
try:
return await func()
except RateLimitError:
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(wait_time)
# Monitor API usage
# Check provider dashboard for usage limits
# Consider upgrading API plan if needed
Redis Cache Issues
Connection Failures
Symptoms:
- Cache misses
- Slow configuration loading
- Session issues
Solutions:
# Test Redis connection
redis-cli ping
# Check Redis status
sudo systemctl status redis-server
# View Redis logs
sudo journalctl -u redis-server --since "1 hour ago"
# Clear Redis cache
redis-cli FLUSHALL
# Restart Redis
sudo systemctl restart redis-server
Memory Issues
Symptoms:
- Redis out of memory errors
- Cache evictions
- Performance degradation
Solutions:
# Check Redis memory usage
redis-cli INFO memory
# Configure memory limits
sudo nano /etc/redis/redis.conf
# maxmemory 512mb
# maxmemory-policy allkeys-lru
# Monitor memory usage
redis-cli --latency-history -i 1
🔧 Performance Issues
High CPU Usage
Diagnosis:
# Monitor CPU usage
htop
top -p $(pgrep -f "python.*bot.py")
# Check Python profiling
python -m cProfile -o profile.stats bot.py
python -c "
import pstats
p = pstats.Stats('profile.stats')
p.sort_stats('cumulative').print_stats(20)
"
Solutions:
# Optimize database queries
# Use connection pooling
# Implement caching for frequent operations
# Reduce AI API calls with smart caching
# Example optimization:
from cachetools import TTLCache
class OptimizedProcessor:
def __init__(self):
self.cache = TTLCache(maxsize=1000, ttl=300)
async def process_message(self, message):
cache_key = f"{message.guild.id}:{hash(message.content)}"
if cache_key in self.cache:
return self.cache[cache_key]
result = await self.expensive_operation(message)
self.cache[cache_key] = result
return result
High Memory Usage
Diagnosis:
# Monitor memory usage
free -h
ps aux | grep python
# Python memory profiling
pip install memory-profiler
python -m memory_profiler bot.py
Solutions:
# Implement proper cleanup
import gc
import weakref
class MemoryOptimizedCog:
def __init__(self):
self.cache = weakref.WeakValueDictionary()
async def cog_unload(self):
self.cache.clear()
gc.collect()
# Use generators for large datasets
async def process_large_dataset():
async for item in large_dataset_generator():
yield process_item(item)
# Limit cache sizes
from cachetools import TTLCache
cache = TTLCache(maxsize=1000, ttl=300) # Limit size
Slow Database Queries
Diagnosis:
-- Enable query logging
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Log queries > 1s
SELECT pg_reload_conf();
-- Check slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Check missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'public'
ORDER BY n_distinct DESC;
Solutions:
-- Add missing indexes
CREATE INDEX CONCURRENTLY idx_user_infractions_guild_user_time
ON user_infractions(guild_id, user_id, timestamp);
CREATE INDEX CONCURRENTLY idx_event_logs_guild_type_time
ON event_logs(guild_id, event_type, timestamp);
-- Optimize queries
-- Use EXPLAIN ANALYZE to understand query plans
EXPLAIN ANALYZE SELECT * FROM user_infractions
WHERE guild_id = 123456789 AND user_id = 987654321;
-- Update table statistics
ANALYZE user_infractions;
ANALYZE guild_config;
🌐 Dashboard Issues
Frontend Build Failures
Symptoms:
- Build process fails
- Missing dependencies
- TypeScript errors
Solutions:
# Clear node modules and reinstall
cd dashboard/frontend
rm -rf node_modules package-lock.json
npm install
# Fix TypeScript errors
npm run type-check
# Update dependencies
npm audit fix
npm update
# Build with verbose output
npm run build -- --verbose
Backend API Errors
Symptoms:
- 500 Internal Server Error
- Authentication failures
- CORS issues
Solutions:
# Check backend logs
journalctl -u aimod-backend.service --since "1 hour ago"
# Test API endpoints
curl -X GET http://localhost:8000/api/health
# Check CORS configuration
# Ensure frontend URL is in allowed origins
# Restart backend service
sudo systemctl restart aimod-backend.service
Authentication Issues
Symptoms:
- Login redirects fail
- JWT token errors
- Session timeouts
Solutions:
# Verify Discord OAuth2 configuration
echo $DISCORD_CLIENT_ID
echo $DISCORD_CLIENT_SECRET
echo $DISCORD_REDIRECT_URI
# Check JWT secret
echo $JWT_SECRET
# Test Discord API connectivity
curl -H "Authorization: Bot $DISCORD_TOKEN" \
https://discord.com/api/v10/users/@me
🔍 Debugging Tools
Log Analysis
# Centralized log viewing
tail -f /var/log/aimod/*.log
# Search for specific errors
grep -r "ERROR" /var/log/aimod/
grep -r "CRITICAL" /var/log/aimod/
# Filter by time range
journalctl -u aimod-bot.service --since "2025-01-13 10:00:00" --until "2025-01-13 11:00:00"
# Follow logs in real-time
journalctl -u aimod-bot.service -f
Database Debugging
-- Check active connections
SELECT pid, usename, application_name, client_addr, state, query_start, query
FROM pg_stat_activity
WHERE datname = 'aimod_bot';
-- Check table sizes
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
Performance Monitoring
# System monitoring script
#!/bin/bash
echo "=== System Status $(date) ==="
echo "CPU: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')"
echo "Memory: $(free | grep Mem | awk '{printf "%.1f%%", $3/$2 * 100.0}')"
echo "Disk: $(df -h / | awk 'NR==2{print $5}')"
echo "=== Service Status ==="
systemctl is-active aimod-bot.service
systemctl is-active aimod-backend.service
systemctl is-active postgresql.service
systemctl is-active redis.service
echo "=== Database Connections ==="
sudo -u postgres psql -d aimod_bot -c "SELECT count(*) FROM pg_stat_activity;"
echo "=== Recent Errors ==="
journalctl -u aimod-bot.service --since "1 hour ago" | grep -i error | tail -5
📞 Getting Help
Community Support
- Discord Server: Join our community for real-time help
- GitHub Issues: Report bugs and request features
- Documentation: Check existing guides first
Professional Support
- Priority Support: Available for production deployments
- Custom Development: Feature development and customization
- Consulting: Architecture and optimization consulting
Reporting Issues
When reporting issues, include:
-
Environment Information:
- Operating system and version
- Python version
- PostgreSQL version
- Redis version
-
Error Details:
- Complete error messages
- Stack traces
- Relevant log entries
-
Reproduction Steps:
- Steps to reproduce the issue
- Expected vs actual behavior
- Configuration details
-
System Information:
# Generate system report echo "=== System Information ===" > debug_report.txt uname -a >> debug_report.txt python --version >> debug_report.txt psql --version >> debug_report.txt redis-server --version >> debug_report.txt echo "=== Service Status ===" >> debug_report.txt systemctl status aimod-bot.service >> debug_report.txt echo "=== Recent Logs ===" >> debug_report.txt journalctl -u aimod-bot.service --since "1 hour ago" >> debug_report.txt
For additional support, visit our Discord Server or create an issue on GitHub.