sfp_advanced_correlation - poppopjmp/spiderfoot GitHub Wiki
The Advanced Correlation Engine provides sophisticated data correlation and entity resolution capabilities. This module uses graph algorithms, temporal analysis, and machine learning techniques to identify relationships between entities across different data sources and platforms.
- Username Correlation: Match usernames across social platforms
- Email Address Linking: Connect email addresses to multiple accounts
- Name Matching: Fuzzy matching for real names and aliases
- Behavioral Patterns: Identify users by activity patterns
- Time-Series Correlation: Detect synchronized activities
- Event Clustering: Group related events by time windows
- Behavioral Rhythms: Identify user activity patterns
- Anomaly Detection: Spot unusual temporal patterns
- Location Clustering: Group events by geographic proximity
- Movement Patterns: Track entity movement over time
- Distance Analysis: Calculate Haversine distances
- Regional Correlations: Connect activities by location
- Graph Construction: Build entity relationship graphs
- Centrality Analysis: Identify key entities in networks
- Community Detection: Find entity clusters and groups
- Confidence Scoring: Quantify relationship strength
[advanced_correlation]
# Enable correlation engine
correlation_enabled = True
# Confidence threshold for matches (0.0-1.0)
confidence_threshold = 0.7
# Time window for temporal correlation (hours)
temporal_window_hours = 24
# Maximum entities to track
max_entities = 10000# Enable geospatial clustering
geospatial_enabled = True
# Clustering distance in kilometers
clustering_distance_km = 10.0
# Minimum cluster size
min_cluster_size = 3# Enable entity resolution
entity_resolution_enabled = True
# String similarity threshold
similarity_threshold = 0.8
# Enable cross-platform correlation
cross_platform_correlation = TrueUSERNAMEEMAILADDRSOCIAL_MEDIAGEOINFOHUMAN_NAMEPHONE_NUMBERIP_ADDRESSDOMAIN_NAME
CORRELATION_MATCHENTITY_RELATIONSHIPPATTERN_DETECTIONTEMPORAL_ANOMALYGEOSPATIAL_CLUSTERIDENTITY_RESOLUTION
- Levenshtein Distance: Character-level differences
- Jaro-Winkler: Optimized for names and short strings
- Soundex: Phonetic matching for name variations
- N-gram Analysis: Token-based similarity
- Centrality Measures: Betweenness, closeness, eigenvector
- Community Detection: Louvain algorithm implementation
- Path Analysis: Shortest paths between entities
- Clustering Coefficient: Network density analysis
- Time Windows: Configurable correlation periods
- Event Frequency: Activity pattern recognition
- Rhythm Analysis: Periodic behavior detection
- Anomaly Scoring: Statistical deviation detection
python sf.py -s [email protected] -t EMAILADDR -m sfp_social,sfp_twitter,sfp_tiktok_osint,sfp_advanced_correlationpython sf.py -s target_domain.com -t DOMAIN_NAME -m sfp_dnsresolve,sfp_ssl,sfp_advanced_correlationpython sf.py -s 192.168.1.0/24 -t NETBLOCK -m sfp_geoip,sfp_portscan_tcp,sfp_advanced_correlationThe module works with YAML-based correlation rules:
name: "Cross-Platform TikTok Identity Correlation"
description: "Correlates TikTok users with other social platforms"
triggers:
- event_type: "TIKTOK_PROFILE"
conditions:
- field: "username"
operator: "matches_pattern"
value: "social_username_pattern"name: "Suspicious Activity Pattern"
description: "Detects coordinated suspicious activities"
triggers:
- event_type: "CORRELATION_MATCH"
conditions:
- field: "confidence_score"
operator: "greater_than"
value: 0.8- Entity Cache: In-memory entity storage with TTL
- Relationship Cache: Pre-computed relationship graphs
- Pattern Cache: Cached pattern recognition results
- Query Optimization: Efficient database queries
- Batch Processing: Process multiple entities simultaneously
- Incremental Updates: Update graphs without full rebuilding
- Memory Management: Automatic cleanup of old entities
- Parallel Processing: Multi-threaded correlation analysis
# Social media investigation with correlation
-m sfp_twitter,sfp_tiktok_osint,sfp_linkedin,sfp_advanced_correlation
# Blockchain analysis with entity correlation
-m sfp_blockchain_analytics,sfp_advanced_correlation
# Email investigation with pattern detection
-m sfp_hunter,sfp_haveibeen,sfp_emailrep,sfp_advanced_correlation- Entity Graphs: Visual representation of relationships
- Cluster Maps: Geographic and logical clustering
- Timeline Views: Temporal correlation visualization
- Confidence Heatmaps: Relationship strength indicators
- Entity Profiles: Comprehensive entity summaries
- Relationship Analysis: Connection strength assessment
- Pattern Summaries: Detected behavioral patterns
- Anomaly Reports: Unusual activity identification
- Pattern Recognition: ML-based pattern detection
- Similarity Learning: Improved string matching
- Anomaly Detection: Statistical and ML-based detection
- Predictive Analysis: Relationship prediction
- Rule Engine: YAML-based rule definitions
- Custom Triggers: User-defined correlation conditions
- Weighted Scoring: Configurable confidence calculations
- Rule Validation: Syntax and logic verification
- Anonymization: Optional PII anonymization
- Retention Policies: Configurable data cleanup
- Access Controls: Restricted correlation data access
- Audit Logging: Comprehensive activity logging
- GDPR Compliance: Privacy-by-design implementation
- Data Minimization: Only necessary data correlation
- Consent Management: User consent tracking
- Right to Erasure: Data deletion capabilities
- High Memory Usage: Reduce max_entities or enable cleanup
- Slow Correlation: Adjust confidence thresholds
- False Positives: Fine-tune similarity thresholds
- Missing Correlations: Check input data quality
# Optimize for speed
confidence_threshold = 0.8
temporal_window_hours = 12
max_entities = 5000
# Optimize for accuracy
confidence_threshold = 0.6
temporal_window_hours = 48
max_entities = 20000For more information on correlation techniques, see the Advanced Analytics Guide.