External ML AI Projects - NenadBalaneskovic/NenadBalaneskovic GitHub Wiki
Table of contents
- Introduction
- Project 1: Bank Marketing Campaign - Predictive Modeling
- Project 2: Boarder Crossing Forecasting - SARIMAX Modeling
- Project 3: Stock Price Forecasting - SARIMAX Modeling with Akima Interpolation
- Project 4: Sentiment Analysis Project - News Sentiment Evaluation
- Project 5: GAN vs OpenCV Chessboard Reconstruction
- Project 6: Real Estate Data Set Analysis in Flower Hill
- Project 7: Advanced Signal Denoising Framework
- Project 8: Quantum Optimization with Qiskit
- Project 9: Balanced Gauge Study Analysis
Introduction
This section lists some of the external projects involving various algorithmic Machine Learning and AI driven methods within the realm of data analysis in technical, financial and physical domains which I had the opportunity to implement during my career. Most of them have been pursued with the intention of experimenting with numerous modern methods of mathematical physics, information and complexity theory in the context of their interaction with capabilities of advanced statistical inference used in the scope of contemporary data analysis.
Machine learning, while not exclusive to physics, has become an invaluable tool in the field. It involves the use of algorithms and statistical models that allow computers to learn patterns from data and make predictions or decisions without explicit programming. In physics, machine learning is applied to analyze complex systems, identify patterns in experimental data, and simulate phenomena that would be computationally expensive using traditional methods. It is accelerating advancements in areas such as quantum mechanics, cosmology, and material science, helping researchers uncover insights that were previously unattainable.
Machine learning is a transformative computational technique that enables systems to learn patterns and make decisions based on data, without being explicitly programmed for every task. It relies on algorithms and statistical models, such as neural networks, decision trees, and support vector machines, to analyze vast and complex datasets. In physics, where phenomena often involve intricate systems and massive amounts of data, machine learning has proven to be a powerful tool for accelerating research and discovery.
In experimental physics, machine learning is employed to process and analyze data from sophisticated instruments, such as particle detectors, telescopes, and quantum devices. It helps physicists detect patterns, anomalies, or rare events that might otherwise go unnoticed, such as identifying gravitational waves or isolating quantum states. In computational physics, machine learning is used to optimize simulations, solve partial differential equations, and reduce the computational cost of modeling complex systems, like fluid dynamics or astrophysical phenomena.
Furthermore, machine learning has paved the way for new approaches to understanding fundamental physical laws. By training models on data from experiments and simulations, physicists can uncover hidden relationships and generate predictions about systems that are difficult to study directly. For example, machine learning has been used to predict material properties, simulate high-energy physics events, and even design new experiments.
Another significant application of machine learning in physics is in quantum mechanics and quantum computing. Machine learning aids in analyzing and optimizing quantum algorithms, as well as interpreting the outcomes of quantum experiments. These methods are crucial for advancing quantum technologies and harnessing their potential.
In summary, machine learning is revolutionizing physics by enabling researchers to handle the complexity and scale of modern scientific challenges. Its applications span experimental, theoretical, and computational domains, making it a cornerstone in shaping the future of physics research and technology development.
The context of the project list displayed below will grow continuously as new accomplished ML-driven data analytical studies and their pdf reports become available.
Project list
Details
Project 1 -1. "Bank Marketing Campaign - Predictive Modeling"
Description
This project analyzes customer responses to a bank's term deposit marketing campaign, employing machine learning to optimize predictive accuracy and improve future campaign strategies.
Technologies and Python packages used in the project:
π Technologies Used
-
Machine Learning Algorithms
- Logistic Regression
- Decision Tree
- Random Forest
- XGBoost
- Ensemble Learning (Stacking, Bagging, Boosting)
-
Data Preprocessing Techniques
- Handling Missing Values (Imputation & Removal)
- One-Hot Encoding & Label Encoding
- Scaling & Normalization
- Class Balancing (SMOTE)
-
Evaluation & Metrics
- Accuracy, Precision, Recall, F1-Score
- Hyperparameter Tuning
-
Visualization & Interpretation
- Feature Importance
- Heatmaps & Correlation Analysis
- Confusion Matrix
-
Deployment Methods
- Pickle Storage for Trained Models
π Python Packages Used
-
Core Libraries
pandas
β Data manipulationnumpy
β Numerical computationsscikit-learn
β ML algorithms & preprocessingxgboost
β Gradient boosting
-
Data Preprocessing & Feature Engineering
sklearn.preprocessing
β Scaling & encodingimbalanced-learn
β SMOTE for class balancing
-
Machine Learning & Model Evaluation
sklearn.linear_model
β Logistic Regressionsklearn.tree
β Decision Treessklearn.ensemble
β Random Forest & StackingClassifiersklearn.metrics
β Evaluation metrics
-
Visualization
matplotlib
β Plotting graphsseaborn
β Statistical visualizations
-
Deployment & Model Persistence
pickle
β Saving models
This setup ensures efficient preprocessing, accurate ML modeling, proper evaluation, and reproducibility.
Details
Project 2 -2. "Boarder Crossing Forecasting - SARIMAX Modeling"
Description
This project attempts at forecasting the number of boarder crossings between USA and Canada based on the corresponding kaggle data set of The Bureau of Transportation Statistics (BTS) containing entries from 1996 to 2024, employing Python's SARIMAX forecasing scheme to optimize predictive accuracy and improve future security strategies.
Technologies and Python packages used in the project:
π Technologies Used
-
Time Series Forecasting
- SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous variables)
- Fast Fourier Transform (FFT) for periodicity estimation
- Custom & automated cross-validation
- Grid search for hyperparameter tuning
-
Data Engineering & Processing
- DuckDB for data extraction and querying
- Pandas for data manipulation
- Aggregation techniques for monthly entry volume analysis
- Train/Test split for model validation
-
Ensemble Learning & Optimization
- Automated SARIMAX grid search
- Model parameter storage for reproducibility
-
Evaluation Metrics
- RMSE (Root Mean Squared Error)
- MAPE (Mean Absolute Percentage Error)
- Execution duration tracking
-
Visualization & Interpretation
- Forecast plots for historical trends and predictions
- Performance comparison of different SARIMAX models
π Python Packages Used
-
Core Libraries for Data Handling
pandas
β DataFrame operations & time-series manipulationnumpy
β Numerical computations
-
Database & Querying
duckdb
β SQL-style queries on structured datasets
-
Time Series Forecasting & Statistical Modeling
statsmodels
β SARIMAX model implementationscipy.fftpack
β FFT-based periodicity detection
-
Machine Learning & Evaluation
sklearn.model_selection
β Train/Test split & cross-validationsklearn.metrics
β RMSE & MAPE calculation
-
Visualization Tools
matplotlib
β Standard plotting for trends & comparisonsseaborn
β Advanced statistical visualizations
This setup provides strong predictive modeling capabilities, efficient data handling, and optimized forecasting methods.
Details
Project 3 -3. "Stock Price Forecasting - SARIMAX Modeling with Akima Interpolation"
Description
This project attempts at forecasting the temporal stock prices evolution based on a ficticious csv file containing daily stock prices by means of Akima interpolated stock price data subject to SARIMAX and critical point modeling implemented via the physical theory of critical phenomena.
Technologies and Python packages used in the project:
π Technologies Used
β Time Series Forecasting
- SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous variables)
- Akima Interpolation for trend smoothing
- Critical Point Analysis for volatility estimation
- Adaptive Trend Detection for stock price evolution
- Grid Search Optimization for SARIMAX parameter tuning
β Data Processing & Engineering
- Data generation (synthetic stock prices)
- Feature extraction (first & second derivatives of price trends)
- Characterization of trend types (Bullish Surge, Sharp Decline)
- Structured CSV-based data storage for analysis
β Evaluation Metrics
- RMSE (Root Mean Squared Error)
- MAE (Mean Absolute Error)
- $$R^2$$ (R-squared performance metric)
- Execution time tracking
β Visualization & Interpretation
- Akima-interpolated stock prices
- Trend detection graphs
- SARIMAX-forecasted price plots
β Storage & Documentation
- Model-generated forecasts stored in
critical_trends.csv
- Markdown documentation with embedded references and research papers
π Python Packages Used
β Core Libraries for Data Handling
pandas
β Data manipulation, CSV reading/writingnumpy
β Numerical computations
β Time Series Forecasting & Modeling
statsmodels.tsa.statespace.sarimax
β SARIMAX forecasting modelscipy.interpolate.Akima1DInterpolator
β Akima interpolation for trend analysis
β Machine Learning & Optimization
sklearn.model_selection
β Grid search hyperparameter tuningsklearn.metrics
β RMSE, MAE, R-squared calculations
β Visualization Tools
matplotlib.pyplot
β Standard plotting for trends and comparisonsseaborn
β Advanced statistical visualizations
β Storage & Model Persistence
pickle
β Saving critical data trends for reproducibility
This setup provides robust time series forecasting, efficient stock price trend analysis, and reliable volatility estimation using SARIMAX and Akima interpolation techniques.
Details
Project 4 -4. "Sentiment Analysis Project - News Sentiment Evaluation"
Description
This project analyzes sentiments of news headlines contained within the Groundnews website by means of a customized Pythonic GUI.
Technologies and Python packages used in the project:
π Technologies Used
β Natural Language Processing (NLP)
- Sentiment analysis using NLTK VADER
- Named Entity Recognition (NER) via SpaCy
- Keyword extraction
- Word Cloud visualization
β Web Scraping & Data Fetching
- Fetching news headlines from Ground News
- BeautifulSoup for HTML parsing
β Graphical User Interface (GUI)
- PyQt5-based GUI for interactive analysis
- Real-time sentiment trend visualization using PyQtGraph
β Data Processing & Export
- Pandas for structured data handling
- CSV storage for analysis results
- PNG export for graphical outputs
β Visualization & Trend Analysis
- Word Cloud for keyword representation
- Sentiment trend plotting and tracking
- Interactive graphs to display sentiment evolution
β Performance Optimization
- Asynchronous threading for news fetching
- UI interaction improvements for seamless experience
π Python Packages Used
β Core Libraries for Data Handling
pandas
β Data manipulation, CSV exportingnumpy
β Numerical computations
β Natural Language Processing (NLP) & Sentiment Analysis
nltk.sentiment.vader
β Sentiment polarity scoringspacy
β Named Entity Recognition (NER)wordcloud
β Generating keyword cloud visualizations
β Web Scraping & Data Fetching
beautifulsoup4
β Parsing HTML content for news headlines
β Graphical User Interface (GUI) & Visualization
PyQt5
β Interactive GUI elementsPyQtGraph
β Real-time sentiment trend visualization
β Performance Optimization & Asynchronous Execution
threading
β Multi-threaded news fetchingrequests
β Asynchronous HTTP requests (suggested future enhancement)
This setup provides a powerful NLP-driven sentiment analysis application, combining real-time news data, text processing, interactive visualization, and efficient data storage for valuable insights.
Details
Project 5 -5. "GAN vs OpenCV Chessboard Reconstruction"
Description
This project aims to compare traditional OpenCV-based methods for chessboard image reconstruction with Generative Adversarial Network (GAN)-driven approaches. The goal is to evaluate the effectiveness of deep learning in reconstructing secluded or obscured chessboard sections more accurately than conventional techniques.
Technologies and Python packages used in the project:
π Technologies Used
β Computer Vision Techniques
- Edge Detection (
cv2.Canny
) - Contour Detection (
cv2.findContours
) - Perspective Transformation (
cv2.getPerspectiveTransform
)
β Deep Learning (GAN)
- Generative Adversarial Networks (GAN) for chessboard reconstruction
- Adversarial loss optimization
- TensorFlow/Keras-based model training
β Image Preprocessing & Augmentation
- Grayscale normalization
- Image resizing (
cv2.resize
) - Dataset creation for GAN training
β Performance Optimization & Deployment
- GPU acceleration using Google Colab (T4 GPU)
- Batch processing for faster training
- Model persistence (
gan_chessboard_model.h5
)
β Evaluation & Comparison
- Reconstruction accuracy in obstructed images
- Processing speed and computational efficiency
- Comparison between GAN-based and OpenCV-based methods
π Python Packages Used
β Core Libraries for Data Handling & Computation
numpy
β Numerical operationspandas
β Data handling
β Computer Vision & Image Processing
opencv-python
β Edge detection, contour detection, perspective transformationmatplotlib
β Visualization of reconstructed images
β Deep Learning & Neural Networks
tensorflow.keras
β GAN model implementation & trainingtensorflow.keras.models
β Model saving & persistence (save_model
)tensorflow.keras.optimizers
β Loss function optimization
β System & File Operations
os
β File managementshutil
β File copying/moving
β Google Colab Integration
Google Colab GPU Acceleration
β Faster training execution- Interactive runtime configuration
This setup provides a powerful AI-driven chessboard reconstruction system, leveraging traditional computer vision (OpenCV) and deep learning-based GAN techniques for superior image completion.
Details
Project 6 -6. "Real Estate Data Set Analysis in Flower Hill"
Description
This project involves a large data set related to real estate sales for a fictional town of Flower Hill. The aim is to combine the analysis of this data set with PyMongo, MLflow, Python (SARIMAX times series forecasting, classification, Neural Networks, Kohonen Maps) and DAG-like process organization of ML-tasks. Thus, we are blending data engineering, machine learning, forecasting, and process automation into a well-structured framework.
Technologies and Python packages used in the project:
π Technologies Used
β Database & Data Storage
- MongoDB (PyMongo) β NoSQL database for storing real estate transactions.
- Apache Airflow β DAG-based automation for machine learning workflows.
- MLflow β Model tracking, logging, and versioning for ML experiments.
β Machine Learning & Forecasting
- SARIMAX β Time series forecasting for property price trends.
- Classification Models β Predict buyer types, resale probability, and price ranges.
- Neural Networks β Deep learning models for pattern recognition.
- Kohonen Maps β Self-organizing neural networks for clustering districts.
- Random Forest β Classification & decision-making for district identification.
β Data Engineering & Processing
- Feature Engineering β Extracting key property attributes for ML models.
- Data Cleaning β Handling missing values, formatting timestamps, and standardizing currency.
- Automated Forecast Updates β Using DAG scheduling in Airflow.
β Visualization & Interpretation
- Real Estate Price Trends β Box plots, comparative district analyses.
- Urban Expansion & Buyer Segmentation β Cluster analysis with Kohonen maps.
- Market Control & Economic Cycles β Comparative analytics across districts.
- Sentiment Analysis & Economic Growth Forecasts β Exploring price evolution in different districts.
β Performance Optimization & Deployment
- Windows 10 (Anaconda Environment) β Isolated ML/AI dependencies.
- Cloud-Based Solution for Scaling β Suggested AWS/GCP/Azure deployment options.
- GPU Acceleration for Faster Computation β Optimization for ML workloads.
π Python Packages Used
β Core Libraries for Data Handling & Computation
pandas
β DataFrame operations for structured transactions.numpy
β Numerical computations for forecasting models.pymongo
β MongoDB integration for transaction storage & retrieval.
β Machine Learning & Forecasting Models
statsmodels.tsa.statespace.sarimax
β SARIMAX for property price forecasting.sklearn.ensemble.RandomForestClassifier
β Random Forest model for classification.kohonen
β Kohonen Maps for self-organizing district clustering.tensorflow.keras
β Neural Networks for property price prediction.
β Pipeline Automation & Model Tracking
apache-airflow
β DAG-based execution for ML tasks.mlflow
β Model logging, tracking, and visualization.
β Visualization & Interpretation
matplotlib
β Standard plotting for time series & district comparisons.seaborn
β Advanced statistical visualizations.
β System & Deployment Tools
pickle
β Model persistence & saving trained classifiers.shutil
β File management for structured dataset handling.
This setup provides a structured ML pipeline for analyzing real estate trends, leveraging MongoDB, Airflow, MLflow, and various machine learning models for forecasting, classification, clustering, and trend analysis.
Details
Project 7 -"Advanced Signal Denoising Framework"
π Description
This project focuses on adaptive noise mitigation techniques in signal processing, evaluating various approaches beyond deep learning models.
The goal is to establish an ensemble-based noise suppression framework, leveraging mathematical modeling, filtering strategies, and real-time adaptability.
Key methodologies include variance estimation, correlation-based denoising, hybrid statistical filtering, and multi-stage noise suppression techniques.
Technologies and Python packages used in the project:
π Technologies Used
β Signal Processing & Statistical Analysis
- Median Filter Variance Estimation β Adaptive smoothing and noise variance estimation.
- Autocorrelation-Based Noise Reduction β Detects and mitigates periodic noise disturbances.
- Beta-Sigma Adaptive Resampling β Enhances signal fidelity using dynamic resampling strategies.
- Hybrid Multi-Pass Filtering β Integrates multiple filtering steps for improved robustness.
- Flexible Dynamic Denoising β Automated selection of optimal denoising techniques based on real-time signal properties.
β Mathematical Modeling & Optimization
- Root Mean Square Error (RMSE) Analysis β Performance benchmarking of denoising methods.
- Variance Estimation Techniques β Dynamic signal complexity adjustments.
- Multi-Stage Fusion Frameworks β Real-time adaptive optimization.
π Python Packages Used
β Core Libraries for Signal Processing
numpy
β Efficient numerical computations.scipy.signal
β Advanced filtering methods.statsmodels.tsa.stattools
β Autocorrelation function (ACF) for noise estimation.
β Machine Learning & Statistical Modeling
sklearn.metrics
β RMSE computation for evaluating signal fidelity.kohonen
β Self-organizing clustering for noise classification.
β Visualization & Interpretation
matplotlib
β Signal waveform analysis.seaborn
β Statistical visualization of performance results.
π Performance Evaluation & Optimization
β Comparative RMSE Analysis β Tracks signal accuracy before and after noise reduction.
β Multi-Stage Adaptive Filtering β Combines variance estimation, autocorrelation-based denoising, and dynamic resampling techniques.
β Fusion-Based Optimization β Uses adaptive weighting for selecting the best-performing denoising method dynamically.
β Real-Time Signal Adaptation β Ensures flexibility across different noise environments without prior deep learning training.
π Deployment & System Considerations
β Windows 10 (Anaconda Environment) β Structured ML dependencies for implementation.
β Cloud-Based Computational Scaling β Recommended deployment via AWS, GCP, or Azure.
β Parallel Computation β Optimized multi-threaded processing for real-time noise suppression tasks.
This project provides a structured exploration of adaptive signal denoising, enhancing real-time processing with flexible ensemble-based strategies.
The final framework integrates variance estimation, multi-stage filtering, and fusion-based optimization, ensuring robust noise reduction while maintaining signal integrity.
Details
π Project 8 -"Quantum Optimization with Qiskit (Kaggle introductory course)"
π Description
This project explores quantum optimization techniques for solving complex combinatorial problems efficiently using Qiskit. The goal is to leverage quantum algorithms to enhance optimization processes beyond classical methods, enabling faster and more scalable solutions.
Key methodologies include variational quantum optimization, Groverβs search-based decision-making, and hybrid quantum-classical workflows for real-world applications.
π Technologies Used
β Quantum Computing & Optimization
- Variational Quantum Eigensolver (VQE) β Solves optimization problems by estimating the lowest energy state.
- Quantum Approximate Optimization Algorithm (QAOA) β Provides near-optimal solutions for combinatorial problems.
- Groverβs Search β Accelerates decision-making by reducing search complexity.
- Quantum Fourier Transform (QFT) β Extracts periodicity for structured problem-solving.
β Hybrid Classical-Quantum Integration
- Classical Preprocessing with NumPy β Data preparation and matrix operations.
- Quantum Execution with IBM Quantum β Real-time execution on quantum hardware.
- Optimization Refinement β Hybrid workflows combining classical solvers with quantum approaches.
π Python Packages Used
β Core Quantum Libraries
qiskit
β Quantum circuit design and execution.qiskit-aer
β High-performance quantum simulations.qiskit-optimization
β Specialized optimization functions.
β Mathematical Modeling & Analysis
numpy
β Efficient numerical computations for parameter tuning.matplotlib
β Visualization of quantum optimization results.
π Performance Evaluation & Optimization
β Benchmarking Quantum vs. Classical Optimization β Performance comparison for scalability and efficiency.
β Quantum Error Mitigation β Techniques for improving accuracy and reducing noise interference.
β Hybrid Workflow Enhancement β Integrating quantum algorithms with classical optimization methods.
β Scalability Testing β Evaluating the effectiveness of QAOA and VQE on real-world datasets.
π Deployment & System Considerations
β IBM Quantum Platform β Remote execution on quantum processors.
β Local Quantum Simulation β Running Qiskit circuits on qiskit-aer
.
β Cloud-Based Optimization Scaling β Leveraging AWS, GCP, or Azure for quantum computing experiments.
β Parallel Execution Strategies β Optimized batch processing for complex optimization tasks.
This project offers a structured approach to quantum-enhanced optimization, leveraging Qiskit for real-world problem-solving.
By integrating classical and quantum techniques, the framework boosts computational efficiency, demonstrating the potential of quantum algorithms in combinatorial optimization.
Details
π Project 9 -Balanced Gauge Study Analysis
π Description
This project focuses on developing a PyQt-GUI for conducting Balanced Gauge Studies, assessing measurement system capability using ANOVA (Analysis of Variance) techniques. The objective is to evaluate repeatability (same operator/device) and reproducibility (different operators/devices), ensuring measurement accuracy across trials. The project applies statistical methodologies to analyze variance components, optimize measurement consistency, and enhance quality control in experimental processes.
π Technologies Used
β Measurement System Analysis & Statistical Modeling
- One-Factor Gauge Study β Evaluates repeatability (operator/device consistency).
- Two-Factor Gauge Study β Assesses reproducibility (operator and part variability).
- ANOVA-Based Variance Decomposition β Identifies sources of measurement error.
β GUI-Based Data Processing & Visualization
- PyQt β Interactive GUI for CSV input, Gauge Study generation, and results visualization.
- pandas β Efficient handling of structured measurement datasets.
- Matplotlib β Graphical representation of statistical metrics.
π Python Packages Used
β Core Statistical & Data Science Libraries
- scipy.stats β ANOVA calculations and hypothesis testing.
- statsmodels β Generalized linear models for variance analysis.
- numpy β Efficient numerical computations for variance decomposition.
- matplotlib & seaborn β Data visualization of Gauge Study results.
π Performance Evaluation & Optimization
β Gauge Precision Metrics
- PTR (Precision-to-Tolerance Ratio) β Evaluates measurement precision reliability.
- SNR (Signal-to-Noise Ratio) β Determines stability of measurement accuracy.
- Cp (Process Capability Index) β Ensures measurement system meets industrial standards.
β Graphical & XAI-Based Classification
- Variance Contribution Plots β Breakdown of measurement variability sources.
- Box Plots for Repeatability β Identification of operator-specific inconsistencies.
- Histogram for Measurement Distribution β Evaluates bias and systematic errors.
π Deployment & System Considerations
β Interactive GUI for Data Handling β User-friendly interface for importing and analyzing CSV datasets.
β Automated Report Generation β PDF-based summaries with statistical conclusions.
β Real-Time Statistical Evaluations β Immediate Gauge R&R calculations based on user input.
β Scalability & Industrial Application β Optimized for measurement system validation across industries like manufacturing, engineering, and quality control.
This project provides a structured and automated approach to Balanced Gauge Studies, ensuring measurement system validation while leveraging statistical modeling for repeatability and reproducibility assessments.