usage llm gpu - Security-Tools-Alliance/rengine-ng GitHub Wiki

๐Ÿค– LLM & GPU Support

Overview

reNgine-ng v2.2.0 reworks the feature introduced with 2.0.0: powerful AI-driven analysis capabilities through Large Language Models (LLM) and optional GPU acceleration for enhanced performance.

-----------------------------------------------------

๐Ÿง  LLM Integration

What is LLM in reNgine-ng?

LLM (Large Language Model) integration allows reNgine-ng to automatically analyze vulnerabilities and generate comprehensive reports with:

  • Attack Surface Analysis: Get a complete attack surface analysis on a subdomain based on the technologies recognized
  • Vulnerability Analysis: Detailed impact assessment and technical explanations
  • Remediation Guidance: Step-by-step fixes and mitigation strategies
  • Risk Assessment: Business impact and priority recommendations
  • Report Generation: Human-readable vulnerability reports

Supported LLM Providers

๐Ÿ  Local Models (Ollama)

  • Pros: Privacy, no API costs, offline capability
  • Cons: Requires local resources, setup complexity
  • Requirements: Docker with optional GPU support

โ˜๏ธ Remote Models (OpenAI)

  • Pros: No local resources needed, always available
  • Cons: API costs, requires internet, data privacy considerations
  • Requirements: OpenAI API key

-----------------------------------------------------

๐Ÿ–ฅ๏ธ GPU Support

Why GPU Acceleration?

GPU acceleration significantly improves LLM performance:

  • Speed: 3-10x faster inference compared to CPU
  • Efficiency: Better resource utilization
  • Scalability: Handle multiple LLM requests simultaneously

Supported GPU Types

๐ŸŸข NVIDIA GPUs

  • Requirements: NVIDIA drivers, Docker with nvidia-runtime
  • Automatic Detection: reNgine-ng automatically detects NVIDIA GPUs
  • Models: All Ollama models support NVIDIA acceleration

๐Ÿ”ด AMD GPUs

  • Requirements: ROCm drivers, compatible AMD GPU
  • Automatic Detection: reNgine-ng automatically detects AMD ROCm support
  • Models: Limited model support depending on ROCm compatibility

-----------------------------------------------------

๐Ÿš€ Getting Started

1. Setup with CPU Only (Default)

# Start reNgine-ng normally
make up

2. Setup with GPU Support

# Enable GPU support during startup
make up GPU=1

# Or for development mode
make dev_up GPU=1

# Building with GPU support
make build GPU=1

[!NOTE]
GPU detection is automatic. reNgine-ng will detect your GPU type (NVIDIA/AMD) and configure accordingly.

3. Configure LLM Models

  1. Navigate to Settings โ†’ LLM Toolkit
  2. Choose your LLM strategy:
    • Local Ollama: Download and use local models
    • OpenAI: Configure API key in Settings โ†’ API Vault

-----------------------------------------------------

๐Ÿ”ง LLM Model Management

Downloading Local Models

The LLM Toolkit interface provides:

  • Recommended Models: Curated list with performance requirements
  • Real-time Progress: WebSocket-based download progress
  • Resource Requirements: RAM and storage requirements for each model
  • Model Comparison: Feature comparison and best use cases

Model Recommendations

๐ŸŽฏ For Bug Bounty & Light Usage

  • llama3.2:3b (2GB RAM)
  • mistral:7b (4GB RAM)

โšก For Professional Use

  • llama3.1:8b (5GB RAM)
  • codellama:13b (8GB RAM)

๐Ÿ’ช For Enterprise/Heavy Usage

  • llama3.1:70b (35GB RAM)
  • codellama:34b (20GB RAM)

[!WARNING]
Hardware Requirements
Ensure your system has sufficient RAM before downloading large models. The interface shows requirements for each model.

-----------------------------------------------------

โš™๏ธ Configuration Guide

GPU Environment Variables

The following variables are automatically configured but can be overridden:

# GPU Support (0=disabled, 1=enabled)
GPU=1

# Automatically detected (nvidia/amd/none)
GPU_TYPE=nvidia
DOCKER_RUNTIME=nvidia

# Ollama instance (default points to local)
OLLAMA_INSTANCE=http://ollama:11434

Scan Engine Configuration

Enable LLM analysis in your scan engines:

vulnerability_scan: {
  'run_nuclei': true,
  'fetch_llm_report': true,  # Enable LLM analysis
  'nuclei': {
    'severities': ['medium', 'high', 'critical']
  }
}

API Configuration

For OpenAI integration:

  1. Go to Settings โ†’ API Vault
  2. Add your OpenAI API key
  3. Select GPT model in LLM Toolkit

-----------------------------------------------------

๐Ÿ“Š Performance Optimization

GPU Memory Management

Monitor GPU usage during scans:

# NVIDIA GPU monitoring
nvidia-smi

# AMD GPU monitoring  
rocm-smi

Model Selection Strategy

Choose models based on your use case:

Use Case Recommended Model RAM Required Speed Quality
Quick Analysis llama3.2:3b 2GB โšกโšกโšก ๐ŸŸก Good
Balanced mistral:7b 4GB โšกโšก ๐ŸŸข Very Good
Professional llama3.1:8b 5GB โšก ๐ŸŸข Very Good
Enterprise llama3.1:70b 35GB ๐ŸŒ ๐ŸŸข Excellent

Threading Optimization

Adjust concurrent scans based on resources:

# Conservative (low resources)
vulnerability_scan: {
  'concurrency': 10,
  'rate_limit': 10
}

# Aggressive (high resources + GPU)
vulnerability_scan: {
  'concurrency': 50,
  'rate_limit': 100
}

-----------------------------------------------------

๐Ÿ” Troubleshooting

Common GPU Issues

NVIDIA GPU Not Detected

# Check NVIDIA runtime
docker info | grep nvidia

# Verify GPU access
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

AMD GPU Not Detected

# Check ROCm installation
rocm-smi

# Verify Docker ROCm support
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi

LLM Model Issues

Model Download Fails

  • Check available disk space
  • Verify internet connection
  • Try smaller models first
  • Check Docker container logs

Model Performance Issues

  • Monitor system RAM usage
  • Consider GPU vs CPU performance
  • Adjust concurrency settings
  • Try different model variants

Memory Issues

Out of Memory Errors

# Check system resources
free -h
df -h

# Monitor Docker containers
docker stats

# Reduce model size or concurrency

-----------------------------------------------------

๐ŸŽฏ Best Practices

๐Ÿ”’ Security Considerations

  • Local Models: Keep models updated for security patches
  • API Keys: Use environment variables, never commit to repositories
  • Data Privacy: Consider data sensitivity when choosing local vs remote
  • Network Security: Secure API communication channels

๐Ÿ’ฐ Cost Optimization

  • OpenAI Usage: Monitor API usage and set billing alerts
  • Local Resources: Balance model size with available hardware
  • Scan Frequency: Use LLM analysis selectively on high-value findings
  • Caching: Leverage result caching to avoid redundant analysis

โšก Performance Tips

  • GPU Utilization: Monitor GPU usage and adjust concurrency
  • Model Warming: Keep frequently used models loaded
  • Batch Processing: Group similar vulnerabilities for analysis
  • Resource Monitoring: Set up alerts for resource exhaustion

๐Ÿ“ˆ Scaling Strategies

  • Horizontal Scaling: Run multiple reNgine-ng instances
  • Model Distribution: Use different models for different scan types
  • Load Balancing: Distribute LLM requests across instances
  • Resource Allocation: Dedicate resources per scan intensity

-----------------------------------------------------

๐Ÿ”ฌ Advanced Usage

Custom Model Integration

Add custom Ollama models:

  1. Pull model manually: docker exec -it rengine-ollama-1 ollama pull custom-model
  2. The model will appear in LLM Toolkit interface
  3. Select and configure for use

Multi-GPU Setup

For multiple GPUs:

# Use specific GPU
CUDA_VISIBLE_DEVICES=0 make up GPU=1

# Use multiple GPUs (advanced)
CUDA_VISIBLE_DEVICES=0,1 make up GPU=1

Integration with CI/CD

Example pipeline integration:

- name: Run reNgine with LLM
  run: |
    make up GPU=1
    # Wait for services
    sleep 60
    # Run scan with LLM analysis
    curl -X POST "http://localhost/api/scan/" \
      -H "Content-Type: application/json" \
      -d '{"target": "example.com", "engine": "Initial Scan - reNgine recommended"}'

Remember: LLM integration enhances your reconnaissance capabilities but should complement, not replace, manual security analysis and verification.