AI & ML - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki
- Data
- Feature
- Model
- Supervised Learning
Training a model using labeled data where the outcome is already known
- Unsupervised Learning
It involves training a model with data that does not have labelled outcome, model tries to find patterns, similarities or groups within the data on its own.
- Reinforcement Learning
It is about training a model through trial and error where it receives rewards or penalities based on its actions.
RAG enhances the abilities of LLM by allowing them to access external data sources like databases or search engines, to improve accuracy of its responses.
For example: if you ask ChatGPT about the new tax regulations, it recognizes the need for recent information and because it is using RAG, it is able to retrieve relevant data from external sources like government websites to provide accurate response beyond its original training.
read
In the context of Large Language Models (LLMs) and AI/ML, a vector is a fundamental concept with many important roles. Let's break it down and then dive into other key terminologies.
In LLMs, a vector usually refers to a numerical representation of text (like a word, sentence, paragraph, or document).
This is called an embedding, which is a high-dimensional array of numbers (e.g., a 768-dimensional vector).
These vectors capture the semantic meaning of text. Words with similar meanings have vectors that are close together in this space.
Example:
"dog" -> [0.12, 0.45, -0.33, ..., 0.56]
"puppy" -> [0.14, 0.47, -0.30, ..., 0.59] (close to "dog")
Used in:
Search
Question answering
Similarity matching
Vector databases
An embedding is the actual vector representation of data.
Generated by a model like OpenAI's text-embedding-ada-002, SentenceTransformers, or LLaMA variants with an embedding head.
Embeddings reduce high-dimensional discrete data (like text) to continuous numeric form.
A multi-dimensional space where each vector (embedding) lives.
Semantic similarity is represented by distance or angle (e.g., cosine similarity).
Used in semantic search, retrieval-augmented generation (RAG), and clustering.
Specialized databases that store vectors and allow fast similarity search.
Popular tools: FAISS, Pinecone, Weaviate, Chroma, Qdrant.
Allows you to retrieve relevant documents or info based on embedding similarity.
Metrics used to compare vectors:
Cosine similarity: Measures angle between vectors.
Euclidean distance: Measures straight-line distance.
Dot product: Raw similarity signal.
Used in nearest neighbor search.
A way to find the most similar vectors in the database.
Approximate Nearest Neighbor (ANN) techniques like HNSW make this efficient at scale.
Combines vector search with language models.
Steps:
-
User asks a question.
-
Query is embedded.
-
Similar documents are retrieved from a vector database.
-
LLM uses those docs to generate an accurate answer.
Breaking text into manageable chunks before embedding.
Necessary to embed long documents (e.g., 500 words per chunk).
Affects performance of retrieval in RAG.
Breaking text into tokens (words, subwords, or characters).
Vectors are usually generated after tokenization.
Important for managing input length and model behavior.
Vectors are also used inside models to represent where a word appears in the input.
LLMs have a context window (e.g., 4K or 32K tokens) within which they process inputs.
After this, vector info can be lost unless retrieved again.
Here's a curated list of the best AI + no-code/low-code workflow orchestration tools that allow you to build and automate AI-enhanced applications without heavy coding:
- Zapier + OpenAI Plugin
Type: No-code workflow automation
Best for: Connecting over 5000 apps with AI logic
AI Support: Built-in OpenAI & ChatGPT integrations, can trigger workflows based on AI outputs
Strengths: Simple UI, scalable automations, chatbot integrations
- Make.com (Integromat)
Type: Visual low-code automation platform
Best for: More complex, branching workflows with conditionals
AI Support: OpenAI, Hugging Face, Replicate integrations
Strengths: Visual editor, robust error handling, and webhook support
- N8N
Type: Open-source workflow automation
Best for: Developers or advanced users who want self-hosted workflows
AI Support: Native OpenAI integration, works well with LangChain, Ollama, etc.
Strengths: Self-hosting, great for building AI agents, supports custom code blocks
- Flowise AI
Type: Low-code visual builder for LLM apps (based on LangChain)
Best for: Building AI chatbots, agents, RAG pipelines
AI Support: LangChain, OpenAI, Ollama, HuggingFace, Pinecone, Chroma, etc.
Strengths: Drag-and-drop nodes, real-time testing, embeddable UIs
- AirOps
Type: AI workflows + apps platform
Best for: AI-enhanced internal tools and workflows
AI Support: GPT, Claude, embeddings, retrieval
Strengths: Business-friendly AI workflows (extractions, summaries), spreadsheet-style UI
- Pipedream
Type: Low-code integration platform for developers
Best for: Serverless functions + AI API orchestration
AI Support: Hugging Face, OpenAI, Cohere, custom HTTP
Strengths: Free to use, great for background jobs, supports JS/Python code
- Langflow
Type: Visual builder for LangChain-based workflows
Best for: Local or cloud-based LLM apps with visual agent orchestration
AI Support: LangChain, Vector DBs, LLMs (OpenAI, Ollama, etc.)
Strengths: Free and open source, rapid prototyping for AI agents
- Autocode
Type: Serverless JS workflows platform
Best for: Building AI-powered bots, Discord/Slack integrations
AI Support: OpenAI, API chaining
Strengths: Real-time JS editor with instant deploy, prebuilt templates