Simple Explanations for Beginners

Machine Learning

Key aspects

Data
Feature
Model

Types of Machine Learning

Supervised Learning

Training a model using labeled data where the outcome is already known

Unsupervised Learning

It involves training a model with data that does not have labelled outcome, model tries to find patterns, similarities or groups within the data on its own.

Reinforcement Learning

It is about training a model through trial and error where it receives rewards or penalities based on its actions.

RAG (Retrieval-Augmented Generation

RAG enhances the abilities of LLM by allowing them to access external data sources like databases or search engines, to improve accuracy of its responses.

For example: if you ask ChatGPT about the new tax regulations, it recognizes the need for recent information and because it is using RAG, it is able to retrieve relevant data from external sources like government websites to provide accurate response beyond its original training.

Terminologies

read

In the context of Large Language Models (LLMs) and AI/ML, a vector is a fundamental concept with many important roles. Let's break it down and then dive into other key terminologies.

Vector (Embedding Vector)

In LLMs, a vector usually refers to a numerical representation of text (like a word, sentence, paragraph, or document).

This is called an embedding, which is a high-dimensional array of numbers (e.g., a 768-dimensional vector).

These vectors capture the semantic meaning of text. Words with similar meanings have vectors that are close together in this space.

Example:

"dog" -> [0.12, 0.45, -0.33, ..., 0.56]
"puppy" -> [0.14, 0.47, -0.30, ..., 0.59] (close to "dog")

Used in:

Question answering

Similarity matching

Vector databases

Embedding

An embedding is the actual vector representation of data.

Generated by a model like OpenAI's text-embedding-ada-002, SentenceTransformers, or LLaMA variants with an embedding head.

Embeddings reduce high-dimensional discrete data (like text) to continuous numeric form.

Vector Space / Embedding Space

A multi-dimensional space where each vector (embedding) lives.

Semantic similarity is represented by distance or angle (e.g., cosine similarity).

Used in semantic search, retrieval-augmented generation (RAG), and clustering.

Vector Database

Specialized databases that store vectors and allow fast similarity search.

Popular tools: FAISS, Pinecone, Weaviate, Chroma, Qdrant.

Allows you to retrieve relevant documents or info based on embedding similarity.

Similarity Metrics

Metrics used to compare vectors:

Cosine similarity: Measures angle between vectors.

Euclidean distance: Measures straight-line distance.

Dot product: Raw similarity signal.

Used in nearest neighbor search.

Nearest Neighbor Search (ANN)

A way to find the most similar vectors in the database.

Approximate Nearest Neighbor (ANN) techniques like HNSW make this efficient at scale.

Retrieval-Augmented Generation (RAG)

Combines vector search with language models.

Steps:

User asks a question.
Query is embedded.
Similar documents are retrieved from a vector database.
LLM uses those docs to generate an accurate answer.

Chunking

Breaking text into manageable chunks before embedding.

Necessary to embed long documents (e.g., 500 words per chunk).

Affects performance of retrieval in RAG.

Tokenization

Breaking text into tokens (words, subwords, or characters).

Vectors are usually generated after tokenization.

Important for managing input length and model behavior.

Positional Encoding / Context Window

Vectors are also used inside models to represent where a word appears in the input.

LLMs have a context window (e.g., 4K or 32K tokens) within which they process inputs.

After this, vector info can be lost unless retrieved again.

Here's a curated list of the best AI + no-code/low-code workflow orchestration tools that allow you to build and automate AI-enhanced applications without heavy coding:

Zapier + OpenAI Plugin

Type: No-code workflow automation

Best for: Connecting over 5000 apps with AI logic

AI Support: Built-in OpenAI & ChatGPT integrations, can trigger workflows based on AI outputs

Strengths: Simple UI, scalable automations, chatbot integrations

Make.com (Integromat)

Type: Visual low-code automation platform

Best for: More complex, branching workflows with conditionals

AI Support: OpenAI, Hugging Face, Replicate integrations

Strengths: Visual editor, robust error handling, and webhook support

Type: Open-source workflow automation

Best for: Developers or advanced users who want self-hosted workflows

AI Support: Native OpenAI integration, works well with LangChain, Ollama, etc.

Strengths: Self-hosting, great for building AI agents, supports custom code blocks

Flowise AI

Type: Low-code visual builder for LLM apps (based on LangChain)

Best for: Building AI chatbots, agents, RAG pipelines

AI Support: LangChain, OpenAI, Ollama, HuggingFace, Pinecone, Chroma, etc.

Strengths: Drag-and-drop nodes, real-time testing, embeddable UIs

AirOps

Type: AI workflows + apps platform

Best for: AI-enhanced internal tools and workflows

AI Support: GPT, Claude, embeddings, retrieval

Strengths: Business-friendly AI workflows (extractions, summaries), spreadsheet-style UI

Pipedream

Type: Low-code integration platform for developers

Best for: Serverless functions + AI API orchestration

AI Support: Hugging Face, OpenAI, Cohere, custom HTTP

Strengths: Free to use, great for background jobs, supports JS/Python code

Langflow

Type: Visual builder for LangChain-based workflows

Best for: Local or cloud-based LLM apps with visual agent orchestration

AI Support: LangChain, Vector DBs, LLMs (OpenAI, Ollama, etc.)

Strengths: Free and open source, rapid prototyping for AI agents

Autocode

Type: Serverless JS workflows platform

Best for: Building AI-powered bots, Discord/Slack integrations

AI Support: OpenAI, API chaining

Strengths: Real-time JS editor with instant deploy, prebuilt templates

AI & ML - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Simple Explanations for Beginners

Machine Learning

Key aspects

Types of Machine Learning

RAG (Retrieval-Augmented Generation

Terminologies

Vector (Embedding Vector)

Embedding

Vector Space / Embedding Space

Vector Database

Similarity Metrics

Nearest Neighbor Search (ANN)

Retrieval-Augmented Generation (RAG)

Chunking

Tokenization

Positional Encoding / Context Window

⚠️ GitHub.com Fallback ⚠️

AI & ML - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Simple Explanations for Beginners

Machine Learning

Key aspects

Types of Machine Learning

RAG (Retrieval-Augmented Generation

Terminologies

Vector (Embedding Vector)

Embedding

Vector Space / Embedding Space

Vector Database

Similarity Metrics

Nearest Neighbor Search (ANN)

Retrieval-Augmented Generation (RAG)

Chunking

Tokenization

Positional Encoding / Context Window

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️