Deep Learning Basics - telivaina/ai GitHub Wiki

🧠 Deep Learning Basics

Welcome to the world of Deep Learning — a powerful subfield of Machine Learning that mimics the human brain through artificial neural networks.

🤖 What is Deep Learning?

Deep Learning (DL) is a branch of Machine Learning (ML) that uses artificial neural networks with many layers—known as deep networks—to automatically learn from large amounts of data.

It is inspired by the structure and functioning of the human brain, and is designed to analyze patterns and solve complex problems by processing information through multiple interconnected layers of artificial neurons.

🚀 Why Deep Learning?

Deep Learning is behind some of the most advanced AI systems today, Applications of Deep Learning includes:

Computer Vision 📸
- Deep learning is used in medical image analysis, drug discovery, and predictive healthcare models. Image generation like DALL·E.
Speech and Language Processing 🗣️📝
- Natural Language Processing (NLP) : DL is the backbone of many NLP tasks such as text classification, machine translation, sentiment analysis, and chatbot development.
- Speech recognition : Voice assistants and transcription systems. DL enables speech recognition by using deep neural networks to convert spoken language (audio) into text.OpenAI's Whisper, Google's Speech-to-Text API, and Apple's Siri all use DL-based architectures for accurate, multilingual speech recognition.
Healthcare 🏥
- Deep learning is used in medical image analysis, drug discovery, and predictive healthcare models.
Autonomous Decision Making 🧠
- Autonomous Vehicles 🚗: Self-driving cars utilize deep learning for perception (recognizing objects and obstacles), decision-making, and control.

🔍 Difference Between ML and DL

Feature	Machine Learning (ML)	Deep Learning (DL)
📚 Learning Type	Needs feature engineering	Learns features automatically
🏗️ Structure	Shallow models (SVM, Decision Trees, etc.)	Deep Neural Networks (DNNs, CNNs, RNNs)
🔢 Data Requirement	Works well with smaller datasets	Requires large datasets
⚙️ Computation	Less resource intensive	Requires high computational power (GPU/TPU)
🧠 Mimics Human Brain?	Not directly	Yes, via neural networks

🧱 Core Concepts of Deep Learning:

Neural Networks (NNs) 🧠

Neural networks are computational models inspired by the brain. They consist of layers of interconnected nodes (or neurons) that work together to solve problems.
A general term for a network of interconnected nodes (neurons) inspired by the human brain. It is a foundational concept in Artificial Intelligence and Deep Learning.
Broad term that includes all types of neural networks. Can be basic or complex (shallow or deep).Includes ANN, CNN, RNN, GAN, etc.

Artificial Neural Network (ANN)

The most basic and standard form of a neural network. Usually refers to fully connected feedforward networks(FNN).
The foundational structure inspired by how biological neurons work.
An Artificial Neural Network (ANN) is a type of Neural Network that consists of layers of nodes (neurons), typically including an input layer, hidden layer(s), and an output layer. Sample Deep Learning Training Workflow:

Input data → Forward pass using activation functions → Compute loss → Apply backpropagation → Update weights using gradient descent → Repeat over epochs & batches → Use techniques like dropout to prevent overfitting/underfitting → Improve with transfer learning if needed.### Backpropagation 🔁

Backpropagation is the algorithm used for training neural networks.
The technique used to improve the model by adjusting weights of the network based on the error between predicted and actual outputs, optimizing the model through gradient descent..

Gradient Descent ⬇️

Gradient descent is an optimization technique used to minimize the loss function by adjusting the weights of the network in the direction of the steepest decrease in error.

Overfitting and Underfitting ⚖️

Overfitting: The model is too complex and learns the training data too well, including noise, leading to poor generalization on new data.
Underfitting: The model is too simple and fails to capture important patterns in the data.

Dropout ❌

Dropout is a regularization technique used to prevent overfitting by randomly turning off neurons during training, forcing the model to learn more robust features.

Transfer Learning 🔄

Transfer learning involves taking a pre-trained model on one task and fine-tuning it for a related task, saving time and computational resources.

Activation Functions ⚡

Help the network understand non-linear patterns.
Functions like ReLU, Sigmoid, and Tanh determine whether a neuron will be activated and passed on to the next layer.

Epochs & Batches

Terms used for training iterations and data chunks.

🏗️ Architecture of Neural Networks

A neural network is made up of several layers:

Input Layer: The first layer that receives the data.
Hidden Layers: Intermediate layers that perform computations to extract features.
Output Layer: The final layer that produces the result.

🔄 Training a Neural Network

Training involves adjusting the weights of the connections between neurons through backpropagation. The process works by minimizing the error between predicted and actual outputs.

⚙️ Activation Functions

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. Common activation functions include:

ReLU (Rectified Linear Unit)
Sigmoid
Tanh

📊 Loss Functions

A loss function measures the difference between the predicted and actual outputs, guiding the model to improve during training. Examples include:

Mean Squared Error (MSE) for regression tasks.
Cross-Entropy Loss for classification tasks.

⚡ Optimizers

Optimizers adjust the learning rate and other parameters to minimize the loss function during training. Popular optimizers include:

Stochastic Gradient Descent (SGD)
Adam

🏋️‍♀️ Overfitting & Underfitting

Overfitting: When a model learns the training data too well, including noise, and performs poorly on new data.
Underfitting: When a model is too simple to capture the underlying patterns of the data.

🧠 Deep Learning Workflow: End-to-End Hierarchy

1. 🔄 Data Preprocessing  
    └── Clean, normalize, and transform raw data  
    └── Split into training, validation, and test sets  

2. 🏗️ Model Architecture Design  
    └── Choose neural network type (ANN, CNN, RNN, etc.)  
    └── Define layers, activation functions, loss function  

3. 🚀 Forward Propagation  
    └── Input data passes through network layers  
    └── Outputs predictions (initially random)

4. 🎯 Loss Calculation  
    └── Measure error between predicted and actual output  
    └── Uses a loss function (e.g., Cross-Entropy, MSE)

5. 📉 Backpropagation  
    └── Compute gradients using the chain rule  
    └── Gradients flow backward to update weights  

6. ⛰️ Gradient Descent  
    └── Optimizer updates weights to minimize loss  
    └── Techniques: SGD, Adam, RMSprop, etc.

7. 📦 Epochs & Batches  
    └── Train using mini-batches for efficiency  
    └── Repeat over multiple epochs for convergence

8. 🧠 Regularization Techniques  
    └── Dropout: Randomly drop neurons to prevent overfitting  
    └── L1/L2 Regularization, Early Stopping, etc.

9. 🧪 Validation Loop  
    └── Evaluate model on unseen validation data during training  
    └── Helps monitor overfitting and adjust accordingly

10. 🔧 Hyperparameter Tuning  
    └── Adjust learning rate, batch size, network depth, etc.  
    └── Techniques: Grid Search, Random Search, Bayesian Optimization

11. 📊 Evaluation  
    └── Test model on holdout test set  
    └── Use metrics: Accuracy, Precision, Recall, F1, ROC-AUC, etc.

12. 🔁 Transfer Learning *(if applicable)*  
    └── Reuse pre-trained model weights on new but related tasks  
    └── Fine-tune only a few layers

13. 💾 Model Saving  
    └── Save final model for reuse (e.g., `.h5`, `.pt`, `.pkl` files)

14. 🌐 Deployment  
    └── Deploy model in production (cloud, edge, mobile, API)  
    └── Use frameworks: TensorFlow Serving, TorchServe, FastAPI, etc.

15. 🔍 Monitoring & Feedback Loop  
    └── Monitor real-world performance  
    └── Collect feedback and retrain if needed (MLOps)

🏗️ Popular Neural Networks Architectures in Deep Learning:

1. Convolutional Neural Networks (CNNs) 🖼️

CNNs are specifically designed for processing grid-like data, such as images. They automatically detect patterns and features (e.g., edges, textures) in images, making them highly effective for computer vision tasks.
Example Use Cases: Image classification, object detection, facial recognition.

2. Recurrent Neural Networks (RNNs) ⏳

RNNs are designed to handle sequential data by maintaining a memory of previous inputs in the form of hidden states.
Example Use Cases: Natural language processing, speech recognition, time-series prediction.

3. Long Short-Term Memory (LSTM) 💡

LSTMs are a type of RNN designed to avoid the vanishing gradient problem, making them more effective for capturing long-term dependencies in sequential data.
Example Use Cases: Sentiment analysis, machine translation, time-series forecasting.

4. Generative Adversarial Networks (GANs) 🤝

GANs consist of two networks — a generator and a discriminator — that are trained together. The generator creates data (e.g., images), and the discriminator tries to distinguish between real and fake data. This setup helps improve the generator's output over time.
Example Use Cases: Image generation, deepfake creation, data augmentation.

5. Transformers 🔄

Transformers are a type of model architecture that processes sequences of data by using mechanisms like self-attention to weigh the importance of different elements of the input.
Example Use Cases: Machine translation, text generation (like GPT models), speech recognition.

🔧 Tools and Libraries for Deep Learning:

TensorFlow 🤖
- An open-source framework developed by Google for building and training deep learning models.
PyTorch 🔥
- A popular deep learning framework known for its flexibility and ease of use, developed by Facebook's AI Research.
Keras 🧩
- A high-level neural networks API that runs on top of TensorFlow, designed to simplify building deep learning models.
Caffe ☕
- A deep learning framework developed by Berkeley AI Research (BAIR), known for its speed and efficiency in image processing.
MXNet 🧠
- A flexible deep learning framework used for both research and production, developed by Apache Software Foundation.

🌍 The Future of Deep Learning:

Deep Learning continues to evolve rapidly, with future trends focusing on:

Improved models and architectures: Newer architectures like Neural Architecture Search (NAS) are emerging to automate the process of discovering the best models.
Quantum Deep Learning: Exploring the potential of quantum computing for deep learning to solve problems that classical computers cannot.
Ethical AI: Ensuring fairness, accountability, and transparency in deep learning models to mitigate bias and ethical concerns.

🔗 References & Resources: