Self Supervised and Semi Supervised Learning - tech9tel/ai GitHub Wiki

🤹 Self-Supervised & Semi-Supervised Learning

Explore the middle-ground approaches between supervised and unsupervised learning. These methods use unlabeled data in smart ways to boost learning.

📘 Definition:

Self-Supervised Learning: Leverages unlabeled data by generating labels from the data itself.
Semi-Supervised Learning: Combines a small amount of labeled data with a large amount of unlabeled data.

🎯 Goal: Reduce the need for expensive manual labeling.

Learn from data without human-labeled examples.

🧩 Definition: Learns from unlabeled data by creating labels from the data itself.
🧠 Analogy: Like solving a jigsaw puzzle without the box cover — the learner figures it out through structure and clues.
📘 Technical Insight: A pretext task is designed (e.g., predicting missing words in a sentence) so the model learns representations that are useful for downstream tasks.
🔍 Example Tasks: Image colorization, sentence completion, contrastive learning.

Blend of a few labeled samples + many unlabeled ones.

🏷️ Definition: Uses a small set of labeled data and a large set of unlabeled data to improve learning accuracy.
🧠 Analogy: A student with a few solved examples (labeled) learns to solve the rest on their own (unlabeled).
📘 Technical Insight: The model first learns from the labeled data, then generalizes patterns using the structure of unlabeled data.
🔍 Example Use Cases: Text classification with limited annotations, medical imaging with a few labeled scans.

Learning Type	Input Data	Labels Used?	Key Idea	Common Use Cases	Examples
🧠 Supervised	Labeled data	✅ Yes	Learn mapping from input to known output	Classification, Regression	Spam detection, House price prediction
🔍 Unsupervised	Unlabeled data	❌ No	Find hidden patterns or structure	Clustering, Dimensionality Reduction	Customer segmentation, PCA
🎮 Reinforcement	Agent in environment	❌ No (rewards)	Learn actions via rewards & penalties	Game AI, Robotics, Self-driving cars	AlphaGo, Robot arm training
🧩 Self-Supervised	Raw data (generates own labels)	⚠️ Indirect	Predict parts of data from other parts	Pretraining LLMs, Contrastive Learning	BERT, SimCLR, GPT Pretraining
🌓 Semi-Supervised	Small labeled + large unlabeled dataset	⚠️ Partial	Use few labels + structure from data	NLP, Medical imaging, Fraud Detection	Pseudo-labeling, Mean Teacher