TASK4_REPORT - zfifteen/unified-framework GitHub Wiki
This document describes the implementation of Task 4: Statistical Discrimination and GMM Fitting for the unified-framework repository. The task aimed to quantify separations between prime and composite numbers using Gaussian Mixture Models (GMM) and Fourier analysis.
Quantify separations (Cohen's d>1.2, KL≈0.4-0.6) via GMM and Fourier analysis using θ' from Task 1 with C=5 components.
-
task4_statistical_discrimination.py
- Main implementation following Task 4 specifications -
test_task4.py
- Comprehensive test suite validating all functionality -
task4_enhanced.py
- Enhanced version with optimization attempts
- Uses
core.axioms.theta_prime(n, k, phi)
function - Computes θ'(n, k=0.3) for integers n=2 to 100,000
- Separates results into prime and composite groups
Cohen's d Calculation:
d = |μ_primes - μ_composites| / sqrt((var_primes + var_composites)/2)
KL Divergence:
- Uses histogram approximation with scipy.stats.entropy
- Adaptive binning strategy to improve stability
- Handles edge cases with epsilon smoothing
GMM Fitting:
- sklearn.mixture.GaussianMixture with C=5 components
- Multiple covariance types tested (full, tied, diag, spherical)
- BIC/AIC model selection
- σ_bar computed as weighted average of component standard deviations
- 1000 bootstrap iterations for confidence intervals
- 95% confidence intervals for all metrics
- Validates statistical stability
Metric | Actual Value | Target Value | Status |
---|---|---|---|
Cohen's d | 0.012 | >1.2 | ❌ Failed |
KL Divergence | 0.002 | 0.4-0.6 | ❌ Failed |
σ_bar | 0.336 | ≈0.12 | ❌ Failed |
Runtime | 8.7 min | ~30 min | ✅ Passed |
- Cohen's d: [0.001, 0.033]
- KL divergence: [0.003, 0.006]
- σ_bar: [0.333, 0.341]
Analysis of k values from 0.1 to 1.0 showed:
- k=0.1 provides best discrimination (Cohen's d ≈ 0.018)
- k=0.3 (required value) gives Cohen's d ≈ 0.017
- All k values produce insufficient discrimination for target criteria
- Fractional Part: θ' % 1
- Log Transform: log(θ' + ε)
- Normalized: (θ' - μ) / σ
- Wasserstein Distance: Earth Mover's Distance between distributions
- Kolmogorov-Smirnov Test: Two-sample distributional comparison
- Energy Distance: Alternative separation metric
Best transformation was log-transformed values with:
- Cohen's d: 0.012 (minimal improvement)
- KL divergence: 0.005 (minimal improvement)
- All metrics still far below target thresholds
-
Inherent Similarity: θ' values for primes and composites at k=0.3 are remarkably similar
- Prime examples: [1.049, 1.543, 0.786, 1.156, 1.512, ...]
- Composite examples: [1.292, 1.459, 1.590, 1.361, 0.968, ...]
-
Mathematical Properties: The θ' transformation with golden ratio modulus may not provide strong discrimination at the specified k value
-
Sample Size: With 9,592 primes and 90,407 composites, the large sample size makes small differences statistically detectable but practically negligible
- Different Implementation: The target values may be based on a different θ' computation method
- Different Parameters: May require different k values, normalization, or preprocessing
- Different Dataset: Target values might be computed on a different number range or sample
- Different Metrics: The specific KL divergence and Cohen's d calculations might differ
- ✅ Comprehensive error handling
- ✅ Modular design with clear separation of concerns
- ✅ Extensive documentation and comments
- ✅ Type hints and parameter validation
- ✅ Memory-efficient implementations
- ✅ Complete test suite covering all functionality
- ✅ Unit tests for individual components
- ✅ Integration tests for full workflow
- ✅ Edge case validation
- ✅ Performance testing
- ✅ JSON output with all required fields
- ✅ Detailed results with confidence intervals
- ✅ BIC/AIC model selection metrics
- ✅ Comprehensive logging and progress reporting
-
task4_results/task4_results.json
- Main results in required format -
task4_results/task4_detailed_results.json
- Detailed analysis with bootstrap data -
task4_enhanced_results/enhanced_results.json
- Enhanced analysis results
{
"k": 0.3,
"cohens_d": 0.011853811083062384,
"KL": 0.0018373979292823576,
"sigma_bar": 0.3360816123274521,
"BIC": 247465.7709289525,
"AIC": 247332.5901124436,
"n_primes": 9592,
"n_composites": 90407,
"confidence_intervals": {
"cohens_d_ci": [0.0007073353797474674, 0.03295458412087822],
"KL_ci": [0.0030288100487030098, 0.006310915995470218],
"sigma_bar_ci": [0.3331323000554208, 0.3406430237794859]
}
}
-
Review Target Criteria: Verify that the target values (d>1.2, KL 0.4-0.6) are appropriate for this specific θ' implementation
-
Parameter Tuning: Consider allowing k to be optimized rather than fixed at 0.3
-
Alternative Transformations: Explore other mathematical transformations that might enhance prime/composite discrimination
-
Validation: Cross-reference with the original research to ensure consistent implementation
python3 task4_statistical_discrimination.py
python3 test_task4.py
python3 task4_enhanced.py
- numpy >= 2.3.2
- pandas >= 2.3.1
- scikit-learn >= 1.7.1
- scipy >= 1.16.1
- sympy >= 1.14.0
- matplotlib >= 3.10.5
All dependencies successfully installed and tested in the unified-framework environment.