usage llm gpu - Security-Tools-Alliance/rengine-ng GitHub Wiki

🤖 LLM & GPU Support

Overview

reNgine-ng v2.2.0 reworks the feature introduced with 2.0.0: powerful AI-driven analysis capabilities through Large Language Models (LLM) and optional GPU acceleration for enhanced performance.

🧠 LLM Integration

What is LLM in reNgine-ng?

LLM (Large Language Model) integration allows reNgine-ng to automatically analyze vulnerabilities and generate comprehensive reports with:

Attack Surface Analysis: Get a complete attack surface analysis on a subdomain based on the technologies recognized
Vulnerability Analysis: Detailed impact assessment and technical explanations
Remediation Guidance: Step-by-step fixes and mitigation strategies
Risk Assessment: Business impact and priority recommendations
Report Generation: Human-readable vulnerability reports

Supported LLM Providers

🏠 Local Models (Ollama)

Pros: Privacy, no API costs, offline capability
Cons: Requires local resources, setup complexity
Requirements: Docker with optional GPU support

☁️ Remote Models (OpenAI)

Pros: No local resources needed, always available
Cons: API costs, requires internet, data privacy considerations
Requirements: OpenAI API key

🖥️ GPU Support

Why GPU Acceleration?

GPU acceleration significantly improves LLM performance:

Speed: 3-10x faster inference compared to CPU
Efficiency: Better resource utilization
Scalability: Handle multiple LLM requests simultaneously

Supported GPU Types

🟢 NVIDIA GPUs

Requirements: NVIDIA drivers, Docker with nvidia-runtime
Automatic Detection: reNgine-ng automatically detects NVIDIA GPUs
Models: All Ollama models support NVIDIA acceleration

🔴 AMD GPUs

Requirements: ROCm drivers, compatible AMD GPU
Automatic Detection: reNgine-ng automatically detects AMD ROCm support
Models: Limited model support depending on ROCm compatibility

🚀 Getting Started

1. Setup with CPU Only (Default)

# Start reNgine-ng normally
make up

2. Setup with GPU Support

# Enable GPU support during startup
make up GPU=1

# Or for development mode
make dev_up GPU=1

# Building with GPU support
make build GPU=1

[!NOTE]
GPU detection is automatic. reNgine-ng will detect your GPU type (NVIDIA/AMD) and configure accordingly.

3. Configure LLM Models

Navigate to Settings → LLM Toolkit
Choose your LLM strategy:
- Local Ollama: Download and use local models
- OpenAI: Configure API key in Settings → API Vault

🔧 LLM Model Management

Downloading Local Models

The LLM Toolkit interface provides:

Recommended Models: Curated list with performance requirements
Real-time Progress: WebSocket-based download progress
Resource Requirements: RAM and storage requirements for each model
Model Comparison: Feature comparison and best use cases

Model Recommendations

🎯 For Bug Bounty & Light Usage

llama3.2:3b (2GB RAM)
mistral:7b (4GB RAM)

⚡ For Professional Use

llama3.1:8b (5GB RAM)
codellama:13b (8GB RAM)

💪 For Enterprise/Heavy Usage

llama3.1:70b (35GB RAM)
codellama:34b (20GB RAM)

[!WARNING]
Hardware Requirements
Ensure your system has sufficient RAM before downloading large models. The interface shows requirements for each model.

⚙️ Configuration Guide

GPU Environment Variables

The following variables are automatically configured but can be overridden:

# GPU Support (0=disabled, 1=enabled)
GPU=1

# Automatically detected (nvidia/amd/none)
GPU_TYPE=nvidia
DOCKER_RUNTIME=nvidia

# Ollama instance (default points to local)
OLLAMA_INSTANCE=http://ollama:11434

Scan Engine Configuration

Enable LLM analysis in your scan engines:

vulnerability_scan: {
  'run_nuclei': true,
  'fetch_llm_report': true,  # Enable LLM analysis
  'nuclei': {
    'severities': ['medium', 'high', 'critical']
  }
}

API Configuration

For OpenAI integration:

Go to Settings → API Vault
Add your OpenAI API key
Select GPT model in LLM Toolkit

📊 Performance Optimization

GPU Memory Management

Monitor GPU usage during scans:

# NVIDIA GPU monitoring
nvidia-smi

# AMD GPU monitoring  
rocm-smi

Model Selection Strategy

Choose models based on your use case:

Use Case	Recommended Model	RAM Required	Speed	Quality
Quick Analysis	llama3.2:3b	2GB	⚡⚡⚡	🟡 Good
Balanced	mistral:7b	4GB	⚡⚡	🟢 Very Good
Professional	llama3.1:8b	5GB	⚡	🟢 Very Good
Enterprise	llama3.1:70b	35GB	🐌	🟢 Excellent

Threading Optimization

Adjust concurrent scans based on resources:

# Conservative (low resources)
vulnerability_scan: {
  'concurrency': 10,
  'rate_limit': 10
}

# Aggressive (high resources + GPU)
vulnerability_scan: {
  'concurrency': 50,
  'rate_limit': 100
}

🔍 Troubleshooting

Common GPU Issues

NVIDIA GPU Not Detected

# Check NVIDIA runtime
docker info | grep nvidia

# Verify GPU access
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

AMD GPU Not Detected

# Check ROCm installation
rocm-smi

# Verify Docker ROCm support
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi

LLM Model Issues

Model Download Fails

Check available disk space
Verify internet connection
Try smaller models first
Check Docker container logs

Model Performance Issues

Monitor system RAM usage
Consider GPU vs CPU performance
Adjust concurrency settings
Try different model variants

Memory Issues

Out of Memory Errors

# Check system resources
free -h
df -h

# Monitor Docker containers
docker stats

# Reduce model size or concurrency

🎯 Best Practices

🔒 Security Considerations

Local Models: Keep models updated for security patches
API Keys: Use environment variables, never commit to repositories
Data Privacy: Consider data sensitivity when choosing local vs remote
Network Security: Secure API communication channels

💰 Cost Optimization

OpenAI Usage: Monitor API usage and set billing alerts
Local Resources: Balance model size with available hardware
Scan Frequency: Use LLM analysis selectively on high-value findings
Caching: Leverage result caching to avoid redundant analysis

⚡ Performance Tips

GPU Utilization: Monitor GPU usage and adjust concurrency
Model Warming: Keep frequently used models loaded
Batch Processing: Group similar vulnerabilities for analysis
Resource Monitoring: Set up alerts for resource exhaustion

📈 Scaling Strategies

Horizontal Scaling: Run multiple reNgine-ng instances
Model Distribution: Use different models for different scan types
Load Balancing: Distribute LLM requests across instances
Resource Allocation: Dedicate resources per scan intensity

🔬 Advanced Usage

Custom Model Integration

Add custom Ollama models:

Pull model manually: docker exec -it rengine-ollama-1 ollama pull custom-model
The model will appear in LLM Toolkit interface
Select and configure for use

Multi-GPU Setup

For multiple GPUs:

# Use specific GPU
CUDA_VISIBLE_DEVICES=0 make up GPU=1

# Use multiple GPUs (advanced)
CUDA_VISIBLE_DEVICES=0,1 make up GPU=1

Integration with CI/CD

Example pipeline integration:

- name: Run reNgine with LLM
  run: |
    make up GPU=1
    # Wait for services
    sleep 60
    # Run scan with LLM analysis
    curl -X POST "http://localhost/api/scan/" \
      -H "Content-Type: application/json" \
      -d '{"target": "example.com", "engine": "Initial Scan - reNgine recommended"}'

Remember: LLM integration enhances your reconnaissance capabilities but should complement, not replace, manual security analysis and verification.