usage llm gpu - Security-Tools-Alliance/rengine-ng GitHub Wiki
๐ค LLM & GPU Support
Overview
reNgine-ng v2.2.0 reworks the feature introduced with 2.0.0: powerful AI-driven analysis capabilities through Large Language Models (LLM) and optional GPU acceleration for enhanced performance.
๐ง LLM Integration
What is LLM in reNgine-ng?
LLM (Large Language Model) integration allows reNgine-ng to automatically analyze vulnerabilities and generate comprehensive reports with:
- Attack Surface Analysis: Get a complete attack surface analysis on a subdomain based on the technologies recognized
- Vulnerability Analysis: Detailed impact assessment and technical explanations
- Remediation Guidance: Step-by-step fixes and mitigation strategies
- Risk Assessment: Business impact and priority recommendations
- Report Generation: Human-readable vulnerability reports
Supported LLM Providers
๐ Local Models (Ollama)
- Pros: Privacy, no API costs, offline capability
- Cons: Requires local resources, setup complexity
- Requirements: Docker with optional GPU support
โ๏ธ Remote Models (OpenAI)
- Pros: No local resources needed, always available
- Cons: API costs, requires internet, data privacy considerations
- Requirements: OpenAI API key
๐ฅ๏ธ GPU Support
Why GPU Acceleration?
GPU acceleration significantly improves LLM performance:
- Speed: 3-10x faster inference compared to CPU
- Efficiency: Better resource utilization
- Scalability: Handle multiple LLM requests simultaneously
Supported GPU Types
๐ข NVIDIA GPUs
- Requirements: NVIDIA drivers, Docker with nvidia-runtime
- Automatic Detection: reNgine-ng automatically detects NVIDIA GPUs
- Models: All Ollama models support NVIDIA acceleration
๐ด AMD GPUs
- Requirements: ROCm drivers, compatible AMD GPU
- Automatic Detection: reNgine-ng automatically detects AMD ROCm support
- Models: Limited model support depending on ROCm compatibility
๐ Getting Started
1. Setup with CPU Only (Default)
# Start reNgine-ng normally
make up
2. Setup with GPU Support
# Enable GPU support during startup
make up GPU=1
# Or for development mode
make dev_up GPU=1
# Building with GPU support
make build GPU=1
[!NOTE]
GPU detection is automatic. reNgine-ng will detect your GPU type (NVIDIA/AMD) and configure accordingly.
3. Configure LLM Models
- Navigate to Settings โ LLM Toolkit
- Choose your LLM strategy:
- Local Ollama: Download and use local models
- OpenAI: Configure API key in Settings โ API Vault
๐ง LLM Model Management
Downloading Local Models
The LLM Toolkit interface provides:
- Recommended Models: Curated list with performance requirements
- Real-time Progress: WebSocket-based download progress
- Resource Requirements: RAM and storage requirements for each model
- Model Comparison: Feature comparison and best use cases
Model Recommendations
๐ฏ For Bug Bounty & Light Usage
- llama3.2:3b (2GB RAM)
- mistral:7b (4GB RAM)
โก For Professional Use
- llama3.1:8b (5GB RAM)
- codellama:13b (8GB RAM)
๐ช For Enterprise/Heavy Usage
- llama3.1:70b (35GB RAM)
- codellama:34b (20GB RAM)
[!WARNING]
Hardware Requirements
Ensure your system has sufficient RAM before downloading large models. The interface shows requirements for each model.
โ๏ธ Configuration Guide
GPU Environment Variables
The following variables are automatically configured but can be overridden:
# GPU Support (0=disabled, 1=enabled)
GPU=1
# Automatically detected (nvidia/amd/none)
GPU_TYPE=nvidia
DOCKER_RUNTIME=nvidia
# Ollama instance (default points to local)
OLLAMA_INSTANCE=http://ollama:11434
Scan Engine Configuration
Enable LLM analysis in your scan engines:
vulnerability_scan: {
'run_nuclei': true,
'fetch_llm_report': true, # Enable LLM analysis
'nuclei': {
'severities': ['medium', 'high', 'critical']
}
}
API Configuration
For OpenAI integration:
- Go to Settings โ API Vault
- Add your OpenAI API key
- Select GPT model in LLM Toolkit
๐ Performance Optimization
GPU Memory Management
Monitor GPU usage during scans:
# NVIDIA GPU monitoring
nvidia-smi
# AMD GPU monitoring
rocm-smi
Model Selection Strategy
Choose models based on your use case:
Use Case | Recommended Model | RAM Required | Speed | Quality |
---|---|---|---|---|
Quick Analysis | llama3.2:3b | 2GB | โกโกโก | ๐ก Good |
Balanced | mistral:7b | 4GB | โกโก | ๐ข Very Good |
Professional | llama3.1:8b | 5GB | โก | ๐ข Very Good |
Enterprise | llama3.1:70b | 35GB | ๐ | ๐ข Excellent |
Threading Optimization
Adjust concurrent scans based on resources:
# Conservative (low resources)
vulnerability_scan: {
'concurrency': 10,
'rate_limit': 10
}
# Aggressive (high resources + GPU)
vulnerability_scan: {
'concurrency': 50,
'rate_limit': 100
}
๐ Troubleshooting
Common GPU Issues
NVIDIA GPU Not Detected
# Check NVIDIA runtime
docker info | grep nvidia
# Verify GPU access
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
AMD GPU Not Detected
# Check ROCm installation
rocm-smi
# Verify Docker ROCm support
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi
LLM Model Issues
Model Download Fails
- Check available disk space
- Verify internet connection
- Try smaller models first
- Check Docker container logs
Model Performance Issues
- Monitor system RAM usage
- Consider GPU vs CPU performance
- Adjust concurrency settings
- Try different model variants
Memory Issues
Out of Memory Errors
# Check system resources
free -h
df -h
# Monitor Docker containers
docker stats
# Reduce model size or concurrency
๐ฏ Best Practices
๐ Security Considerations
- Local Models: Keep models updated for security patches
- API Keys: Use environment variables, never commit to repositories
- Data Privacy: Consider data sensitivity when choosing local vs remote
- Network Security: Secure API communication channels
๐ฐ Cost Optimization
- OpenAI Usage: Monitor API usage and set billing alerts
- Local Resources: Balance model size with available hardware
- Scan Frequency: Use LLM analysis selectively on high-value findings
- Caching: Leverage result caching to avoid redundant analysis
โก Performance Tips
- GPU Utilization: Monitor GPU usage and adjust concurrency
- Model Warming: Keep frequently used models loaded
- Batch Processing: Group similar vulnerabilities for analysis
- Resource Monitoring: Set up alerts for resource exhaustion
๐ Scaling Strategies
- Horizontal Scaling: Run multiple reNgine-ng instances
- Model Distribution: Use different models for different scan types
- Load Balancing: Distribute LLM requests across instances
- Resource Allocation: Dedicate resources per scan intensity
๐ฌ Advanced Usage
Custom Model Integration
Add custom Ollama models:
- Pull model manually:
docker exec -it rengine-ollama-1 ollama pull custom-model
- The model will appear in LLM Toolkit interface
- Select and configure for use
Multi-GPU Setup
For multiple GPUs:
# Use specific GPU
CUDA_VISIBLE_DEVICES=0 make up GPU=1
# Use multiple GPUs (advanced)
CUDA_VISIBLE_DEVICES=0,1 make up GPU=1
Integration with CI/CD
Example pipeline integration:
- name: Run reNgine with LLM
run: |
make up GPU=1
# Wait for services
sleep 60
# Run scan with LLM analysis
curl -X POST "http://localhost/api/scan/" \
-H "Content-Type: application/json" \
-d '{"target": "example.com", "engine": "Initial Scan - reNgine recommended"}'
Remember: LLM integration enhances your reconnaissance capabilities but should complement, not replace, manual security analysis and verification.