1.1. Terms We Need To Know - SamuraiBarbi/jttw-ai GitHub Wiki

Hardware Related Terms

Term	Definition
GPU (Graphics Processing Unit)	A GPU is a hardware component that accelerates the training and inference processes of AI models by parallelizing computations.
CPU (Central Processing Unit)	A CPU is a general-purpose processor that handles tasks related to overall system management and execution of instructions in AI applications.
RAM (Random Access Memory)	RAM is a type of computer memory that provides fast and temporary storage for data that the CPU is currently using or processing.
VRAM (Video Random Access Memory)	VRAM is a specific type of RAM used by GPUs to store graphical data, such as textures and frame buffers, for rendering images.
CUDA Cores	CUDA Cores are processing units within a GPU that handle parallel computations, commonly used in deep learning tasks.
Tensor Cores	Tensor Cores are specialized processing units in modern GPUs designed for accelerating matrix operations, particularly beneficial for deep learning workloads.

Model Related Terms

Term	Definition
Model	A model is a mathematical representation or framework that an AI system uses to learn patterns from data and make predictions or generate responses.
LLM (Language Model)	A Language Model (LLM) is a type of AI model specifically designed for understanding and generating human language.
NLP (Natural Language Processing)	A subset of artificial intelligence (AI) dedicated to the interaction between computers and humans using natural language. Its objective is to empower computers to comprehend, interpret, and produce human language with meaning and contextual relevance. In the realm of AI models, NLP is pivotal for creating systems that can analyze, understand, and generate text or speech. Advancements in deep learning and transformer architectures have notably enhanced the capabilities of these models, enabling them to excel in tasks such as content generation, language translation, and various other applications. NLP involves pre-training models on extensive datasets and fine-tuning them for specific purposes, making them versatile tools in the evolving landscape of artificial intelligence.
Fine-Tuning	A machine learning technique involving the adjustment of a pre-trained model on a specific task or dataset to enhance its performance. Particularly valuable when labeled data for the target task is scarce, fine-tuning leverages existing knowledge from a broader domain, streamlining adaptation to new tasks. Widely applied in domains like natural language processing, computer vision, and speech recognition, fine-tuning has demonstrated effectiveness in yielding superior results with fewer labeled data and computational resources compared to training models from the ground up.
Quantize	The process of reducing the precision of the numerical representation of the model's parameters and/or activations. In other words, it involves representing numerical values with fewer bits (e.g., from 32-bit floating-point to 8-bit integers) without significantly sacrificing the model's performance. It's a technique used to make AI models more efficient for deployment on resource-constrained devices by reducing the precision of numerical representations while attempting to minimize the impact on model performance, effectively compressing the models file size down.
Training	Involves teaching a model to perform a specific task or discern patterns from data. It utilizes a sizable dataset of input-output pairs, enabling the model to learn patterns and relationships within the data. Throughout training, the model adjusts its parameters to minimize the disparity between predicted and actual outputs. The primary objective is to develop a well-generalized model capable of accurate predictions or generating meaningful responses when presented with new, unseen data. The success of training relies on factors such as the quality of training data, model architecture, and optimization techniques employed, emphasizing its critical role in AI system development.
LoRA (Low-Rank Adaptation of Large Language Models)	A widely adopted and lightweight training technique that substantially decreases the number of trainable parameters in a model. It achieves this by introducing a smaller set of new weights into the model, and only these weights undergo training. This approach results in faster and more memory-efficient training, yielding smaller model weights (typically a few hundred MBs), making storage and sharing more convenient. LoRA can be synergistically employed with other training techniques, such as DreamBooth, to further accelerate the training process.
Weights	Learned parameters associated with input features. Adjusted iteratively during training, these coefficients minimize the disparity between the model's predictions and actual target values. They play a crucial role in enabling the model to make accurate predictions based on input data, making their iterative adjustment a fundamental aspect of the machine learning process.
Epochs	Represent complete passes through the training dataset, where the model adjusts its parameters to minimize the difference between predictions and actual target values. The number of epochs is a crucial factor in training AI models, requiring a balance between learning and the risk of overfitting. It reflects the iterative nature of the training process, allowing the model to learn and enhance its performance over time.
MoE (Mixture of Experts)	An AI architecture that breaks down complex tasks into simpler subtasks, assigning each to a specialized "expert" model. These experts work in parallel, and their outputs are combined by a gating network. This dynamic selection process allows the system to effectively handle diverse information by utilizing a combination of specialized experts, making MoE a sophisticated approach to AI model design.

Usage Related Terms

Term	Definition
Prompt	A prompt is a user-provided input or instruction given to an AI system to generate a specific response or perform a task.
Inference	Inference in AI refers to the process of using a trained model to make predictions or generate outputs based on new, unseen data.
Tokens	Tokens are units of input text that a language model processes. They can be as short as a single character or as long as an entire word.
Context	Context refers to the information or surroundings that influence the interpretation of a given input or task in AI systems.
Chat, Instruct, Completion	These terms represent different types of tasks that AI models can perform, such as engaging in conversation (Chat), following instructions (Instruct), or completing a given prompt (Completion).
OOM (Out of Memory)	OOM occurs when a program or process exhausts the available memory and is unable to allocate more, leading to system instability or termination.