User Guide - pascaldisse/open-sourcefy GitHub Wiki
User Guide
Complete guide for using Open-Sourcefy to decompile Windows PE executables into compilable C source code.
Overview
Open-Sourcefy provides multiple ways to analyze and decompile binary executables through its 17-agent Matrix pipeline. This guide covers all usage scenarios from basic analysis to advanced customization.
Basic Usage
First Analysis {#first-analysis}
To run your first analysis with Open-Sourcefy:
# Start with the default binary
python main.py
# This runs the complete 17-agent Matrix pipeline on input/launcher.exe
# Output will be generated in output/launcher/{timestamp}/
What happens during your first analysis:
- Agent 0: Coordinates the entire pipeline
- Agents 1-4: Foundation analysis (binary structure, imports, patterns)
- Agents 5-8: Advanced decompilation with Ghidra integration
- Agents 9-13: Source reconstruction and compilation
- Agents 14-16: Quality assurance and final validation
Expected output:
- Reconstructed C source code in
output/launcher/{timestamp}/compilation/src/
- Compiled executable in
output/launcher/{timestamp}/compilation/build/
- Comprehensive analysis reports in
output/launcher/{timestamp}/reports/
Analysis time: 15-30 minutes for typical 5MB binary
Quick Analysis
# Analyze default binary (launcher.exe in input/ directory)
python main.py
# Analyze specific binary
python main.py path/to/binary.exe
# Analyze with custom output directory
python main.py binary.exe --output-dir my_analysis
Pipeline Modes
Full Pipeline (Recommended)
# Run all 17 agents for complete analysis
python main.py --full-pipeline
# Full pipeline with debug logging
python main.py --full-pipeline --debug
Specialized Modes
# Decompilation only (agents 1,2,5,7,14)
python main.py --decompile-only
# Analysis without compilation (agents 1,2,3,4,5,6,7,8,9,14,15)
python main.py --analyze-only
# Compilation testing only (agents 1,2,4,5,6,7,8,9,10,11,12,13)
python main.py --compile-only
Agent Selection
Individual Agents
# Run single agent
python main.py --agents 1
# Run multiple specific agents
python main.py --agents 1,3,7,15
# Run agent ranges
python main.py --agents 1-5
python main.py --agents 10-16
Agent Information
# List all available agents
python main.py --list-agents
# Show agent dependencies
python main.py --show-dependencies
# Display agent descriptions
python main.py --describe-agents
Advanced Usage
Environment and Configuration
Environment Validation
# Verify all dependencies and paths
python main.py --verify-env
# Show current configuration summary
python main.py --config-summary
# Validate specific components
python main.py --verify-ghidra
python main.py --verify-build-system
Development Options
# Dry run (show execution plan without running)
python main.py --dry-run
# Debug mode with comprehensive logging
python main.py --debug --profile
# Benchmark mode with performance metrics
python main.py --benchmark --profile
Pipeline Validation
Quality Validation
# Basic pipeline validation
python main.py --validate-pipeline basic
# Comprehensive validation with all checks
python main.py --validate-pipeline comprehensive
# Agent-specific validation
python main.py --validate-agent 5
Update and Maintenance
# Update mode for incremental development
python main.py --update
# Force reprocessing (ignore cached results)
python main.py --force-reprocess
# Clean temporary files
python main.py --clean-temp
Understanding Output
Output Directory Structure
When you run Open-Sourcefy, it creates a structured output directory:
output/{binary_name}/{timestamp}/
├── agents/ # Individual agent outputs
│ ├── agent_01_sentinel/
│ ├── agent_02_architect/
│ └── ... (all agents)
├── ghidra/ # Ghidra decompilation results
│ ├── launcher.exe.gzf # Ghidra project file
│ ├── decompiled/ # Decompiled C code
│ └── analysis/ # Analysis results
├── compilation/ # Generated source and build files
│ ├── src/ # Reconstructed C source code
│ ├── include/ # Header files
│ ├── resources/ # Extracted resources
│ └── CMakeLists.txt # Build configuration
├── reports/ # Pipeline execution reports
│ ├── execution_report.json
│ ├── quality_report.json
│ └── agent_summary.json
└── logs/ # Detailed execution logs
├── pipeline.log
├── agents/ # Per-agent logs
└── debug/ # Debug information
Key Output Files
Comprehensive Reports
reports/execution_report.json
: Complete pipeline execution summaryreports/quality_report.json
: Quality metrics and validation resultsreports/agent_summary.json
: Individual agent performance and results
Generated Source Code
compilation/src/
: Reconstructed C source filescompilation/include/
: Generated header filescompilation/resources/
: Extracted and converted resources
Build Files
compilation/CMakeLists.txt
: CMake build configurationcompilation/Makefile
: Generated Makefile for compilationcompilation/project.vcxproj
: Visual Studio project file
Success Indicators
Pipeline Success
Look for these indicators of successful execution:
{
"pipeline_status": "SUCCESS",
"agents_completed": 16,
"agents_failed": 0,
"overall_quality": 0.85,
"compilation_status": "SUCCESS",
"binary_size_accuracy": 0.8336
}
Output Quality Metrics
- Compilation Success: Generated code compiles without errors
- Size Accuracy: ~83% of original binary size achieved
- Function Recovery: 538+ functions successfully identified
- Resource Extraction: Icons, dialogs, strings extracted
Working with Results
Compilation Testing
Using Generated Build Files
# Navigate to compilation directory
cd output/launcher/20250619-123456/compilation/
# Compile using CMake
cmake .
make
# Or using Visual Studio (Windows)
msbuild project.vcxproj /p:Configuration=Release
Verification
# Compare binary sizes
ls -la original_binary.exe
ls -la output/compilation/launcher.exe
# Test functionality (if applicable)
./output/compilation/launcher.exe
Code Review
Generated Source Analysis
- Main Function: Look in
src/main.c
for program entry point - Function Definitions: Individual functions in
src/functions/
- Data Structures: Type definitions in
include/types.h
- Resources: Converted resources in
resources/
Quality Assessment
- Code Comments: Auto-generated documentation
- Function Signatures: Type-inferred parameters and returns
- Control Flow: Reconstructed program logic
- Error Handling: Identified exception patterns
Customization
Configuration Files
config.yaml
)
Main Configuration (agents:
timeout: 300
retry_count: 2
parallel_execution: true
ghidra:
headless_timeout: 600
custom_scripts: true
decompilation_timeout: 60
output:
structured_dirs: true
compression: false
cleanup_temp: true
build_config.yaml
)
Build Configuration (build_system:
visual_studio:
version: "2022_preview"
installation_path: "C:/Program Files/Microsoft Visual Studio/2022/Preview"
build_tools:
rc_exe_path: "C:/Program Files (x86)/Windows Kits/10/bin/x64/rc.exe"
lib_exe_path: "C:/Program Files/Microsoft Visual Studio/2022/Preview/VC/Tools/MSVC/14.XX.XXXXX/bin/Hostx64/x64/lib.exe"
Environment Variables
Required Variables
# AI functionality (optional but recommended)
export ANTHROPIC_API_KEY=your_api_key_here
# Ghidra integration
export GHIDRA_HOME=/path/to/ghidra
export JAVA_HOME=/path/to/java
Debug Variables
# Enable debug logging
export MATRIX_DEBUG=true
# Enable AI-enhanced analysis
export MATRIX_AI_ENABLED=true
# Custom temporary directory
export MATRIX_TEMP_DIR=/custom/temp/path
Performance Optimization
System Requirements
- Memory: 16GB+ RAM for optimal performance
- CPU: Multi-core processor recommended for parallel execution
- Storage: SSD recommended for improved I/O performance
- Network: Stable connection for AI service integration
Performance Tuning
# Optimize for speed (fewer quality checks)
python main.py --fast
# Optimize for quality (more thorough analysis)
python main.py --thorough
# Parallel processing control
python main.py --max-parallel 4
# Memory usage control
python main.py --max-memory 8G
Troubleshooting
Common Issues
Pipeline Failures
# Check agent logs
cat output/binary/timestamp/logs/agents/agent_XX.log
# Run with debug logging
python main.py --debug
# Validate specific agent
python main.py --validate-agent X
Compilation Issues
# Verify build environment
python main.py --verify-build-system
# Check generated code
less output/binary/timestamp/compilation/src/main.c
# Test minimal compilation
cd output/binary/timestamp/compilation
gcc -c src/main.c
Resource Issues
# Monitor memory usage
python main.py --profile --debug
# Clean temporary files
python main.py --clean-temp
# Check disk space
df -h output/
Getting Help
Log Analysis
- Pipeline Logs:
output/{binary}/{timestamp}/logs/pipeline.log
- Agent Logs:
output/{binary}/{timestamp}/logs/agents/
- Debug Logs:
output/{binary}/{timestamp}/logs/debug/
Support Resources
- Issues: GitHub Issues
- Documentation: Troubleshooting for detailed problem resolution
- Architecture: Architecture Overview for system understanding
Next: Configuration Guide - Advanced configuration options
Related: Troubleshooting - Problem resolution guide