Notes on AI LLM & Cognition - GRibbans/Gribbans GitHub Wiki
Table of Contents
- Table of Contents
- Applied & Practical
- Academic / Theoretical
- Human Cognition (Thinking/Thought)
- Todo
[!Note] Work in Progress This page is a first draft and potentially subject to extensive change.
Applied & Practical
Improving LLM Response Quality / Ability aka Prompt Engineering
How to optimally tap into the model capabilities during interaction via instructions (prompts). Essentially you need to provide enough context for it to fully understand what it is that is being asked of it. While also matching the prompt to the style (structure) of the data used to train the LLM.
First-call Prompt Engineering
There are some basic steps that can positively and negatively affect LLM responses.
- Word Ordering
- Spacing
- Capitalisation
- Self-reflection
- Ask to 'Ruminate' - Asking the LLM to ruminate on your prompt can improved output.
Further general advise to improve reasoning and output.
- Direct writing - Be direct and to the point
- Inclusive prompts - Include your audience in your prompt
- Structured format - E.g. Format prompt:
### Instruction ###, followed by ### Question ###
etc. - Task breakdown - Break down complicated tasks into multiple prompts:
- Positive language - Use directives like “do” instead of “don’t”
- Mandatory instructions - Use phrases like You MUST e.g. "You must provide a step-by-step guide to..."
- Penalty warning - Use phrases like "You’ll be penalized"
- Natural responses - Request a simple response e.g. "answering in a natural, human-like manner..."
- Unbiased descriptions - Request the LLm to make sure that your answer is unbiased and doesn’t rely on stereotypes
- Linguistic mimicry - Use the same language as the following paragraph to explain the importance of exercise:
- Precise instructions - Clearly state the requirements that the model must follow to produce content, in the form of keywords or instructions
Next-Stage Prompt Engineering
Example Driven Prompts - Use example-driven (few-shot) prompting: CoT Chain of Thought Prompts - Combine chain-of-thought (CoT) prompts with few-shot prompts: "Let's think through this step by step. Solve each step and explain how you arrived at your answer." Delimited Prompts - Use delimiters to structure text: ReAct Model - Reasoning and Acting paradigm that guides LLM to respond in a structured manager to complex queries. Results suggest that ReAct outperforms other leading methods in language and decision-making tasks, enhances human understanding and trust in large language models (LLMs). It and best when combined with Chain-of-Thought (CoT) steps as individual tasks with results being used for the next step, utilizing both internal knowledge and external information during reasoning. BoT Buffer of Thoughts - TBC CO-START Framework
- Context - provide the background
- Objective (or Task) - define the task to be performed
- Style - instruct a writing style. Kind of sentences; formal, informal, magazine style, colloquial, or allude to a know style.
- Audience - who's it for?
- Response - format, Text, Python, SQL, JSON, etc
Retrieval Augmentation (RAG)
TBA
Function Calling
Function calling enhances the capabilities of Large Language Models (LLMs) in two main ways:
- Structured Responses: LLMs can generate structured responses, like JSON objects, that can be used as arguments in subsequent functions within LLM applications. This is a more secured way to ensure that your application is in control of executing a set of functions for which the LLM has generated function arguments.
- Function Invocation: Optionally, LLMs can directly invoke functions using these structured JSON objects as arguments. This allows them to utilize external tools such as Python interpreters, search websites, or access external databases, offering vast potentite of possibilities.
Outline ideal use case - limitations as it it not training data but directly referencing information
Improving LLM Speed (Response Times)
Inference
TBA
Routing
TBA Add details on RouteLLM / vLLM / LiteLLM
Context caching
See Context Caching by Google AI
Using Context Caching feature, you can pass content to the model: cache the input tokens and then refer to the cached tokens for subsequent requests.
Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases such as:
- Chat-bots with extensive system instructions
- Repetitive analysis of lengthy video files
- Recurring queries against large document sets
- Frequent code repository analysis or bug fixing
Approaches & Techniques for LLM Systems
[!NOTE]Definition
Approach aka theoretical Method, a set of procedures based on the theories. Techniques aka strategy used for a specific goal, including overcoming expected obstacles.
Improving Risk Management and Security
NeMo Guardrails - toolkit to design and develop LLM app/system guardrails. Llama Guard(PDF available), an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. CoLang - modelling language for design of guardrails for conversational systems.
CoLang Terminology
LLM-based Application: a software application that uses an LLM to drive Bot: synonym for LLM-based application. Utterance: the raw text coming from the user or the bot. Intent: the canonical form (i.e. structured representation) of a user/bot utterance. Event: something that has happened and is relevant to the conversation e.g. user is silent, user clicked something, user made a gesture, etc. Action: a custom code that the bot can invoke; usually for connecting to third-party API. Context: any data relevant to the conversation (i.e. a key-value dictionary). Flow: a sequence of messages and events, potentially with additional branching logic. Rails: specific ways of controlling the behaviors of a conversational system (a.k.a. bot) e.g. not talk about politics, respond in a specific way to certain user requests, follow a predefined dialog path, use a specific language style, extract data etc.
Core Syntactic Elements within LLMs
User Messages - User message definition blocks define the canonical form message that should be associated with various user utterances. Bot Messages - Bot message definition blocks define the utterances that should be associated with various bot message canonical forms. Flows - Flows represent how you want the conversation to unfold. It includes sequences of user messages, bot messages and potentially other events. Subflows - Subflows are a particular type of flows. While flows are meant to be applied automatically to the current conversation (when there is a match), subflows are meant to be called explicitly by other flows/subflows. Actions - Actions are custom functions available to be invoked from flows but are not defined in Colang. They are made available to the guardrails configuration at runtime by the host application.
Academic / Theoretical
Research Papers Focused on AI & LLM
Collection of journal articles, mainly found across HuggingFace posts, and certain YouTube channels.
ARXIV - By Cornell University, it is one of the main sites publishing AI & LLM related papers.
[!NOTE]
Find the journal article on extreme extended training to form geometric learning and recall. Addendum - article was on 'groking'.
SYSTEM NAME | DESCRIPTION |
---|---|
Attention is all you need. The legendary paper that introduced Transformers, and kick-started the LLM explosion | 1706.03762 |
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction | 2408.04948v1 |
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State | 2311.04897 |
Language Models (Mostly) Know What They Know | 2207.05221 |
Plan-and-solve prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | 2305.04091 |
Think Before You Speak | 2310.02226 |
On the binding problem in Artificial Neural Networks | 2012.05208 |
AutoCoder Enhancing Code Large Language Model with AIEV-INSTRUCT | 2405.14906 |
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | arXiv.2406.04271 |
Chain-of-Note - Enhancing robustness in Retrieval Augmented Large Language Models | arXiv.2311.09210 |
Demystifying Prompts in Language Models via Perplexity Estimation | arXiv.2212.04037 |
Emergent Tool Use from Multi-Agent Autocurricula | arXiv.1909.07528 |
Faults in Deep Reinforcement Learning Programs: a Taxonomy and a Detection Approach | arXiv.2101.00135 |
Fine-Tuning and Prompt Optimization Two Great Steps that Work Better Together | arXiv-2407.10930 |
Grammatical Error Correction - A survey on the SOTA | arXiv.2211.05166 |
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery | arXiv.2302.03668 |
How to Prompt? Opportunities and Challenges of Zero and Few-shot Learning in Creative Applications of Generative Models | arXiv.2209.01390 |
Knowledge Technologies by Nick Milton | arXiv.0802.3789 |
LLM In-Context Recall is Prompt Dependent | arXiv.2404.08865 |
Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT | arXiv.2302.11382 |
QLoRA Efficient Fine-tuning of Quantized LLMs | arXiv.2305.14314 |
RAFT - A new way to teach LLMs to be better at RAG | arXiv.2403.10131 |
ReAct: Synergizing Reasoning and Acting in Language Models | arXiv.2210.03629 |
Scaling Instructable Agents Across Many Simulated Worlds | arXiv.2404.10179 |
Taxonomy of Prompt Modifiers for Text-to-Image Generation | arXiv.2204.13988 |
Research Papers on Related Topics
A few from subject areas surrounding AI / LLM development, such as: Argumentation, Reasoning, Learning, Logic, Thinking, Ontology, Taxonomies, among others.
SYSTEM NAME | DESCRIPTION |
---|---|
A critical review of argument visualization tools: do users become better reasoners? | ResearchGate |
Buffer of Thoughts - Thought-Augmented Reasoning with Large Language Models | arXiv.2406.04271 |
Continual Learning, Fast and Slow | arXiv.2209.02370 |
Defeasible Logic Programming: An Argumentative Approach | arXiv.cs/0302029 |
Events as Entities in Ontology-Driven Conceptual Modelling | Springer |
Language of Thought Hypothesis | Archive.org |
Role of the Systemic View in Foundational Ontologies' | Semantic Scholar |
Types and taxonomic structures in conceptual modelling - A novel ontological theory and engineering support | Elsevier |
Research Papers from the Association of Computational Linguistics (ACL Anthology)
ACL Anthology, looks into the study of computational linguistics and natural language processing.
- ACL Anthology - SemEval-2019 Task 6
-
- Identifying and Categorizing Offensive Language in Social Media (OffensEval) 10.18653/v1/S19-2010
Research Papers from Meta AI Research Lab
Human Cognition (Thinking/Thought)
Language of Thought Hypothesis Ghost in the Machine (210 minutes) Video by Machine Learning Street Talk on cognition and AI/LLM.
Todo
- Joint Embedding Predictive Architecture (JEPA)
- Find paper on 'groking' and benefits of extended learning, and how LLM are overcoming the mid-point diminishing return.
- Add research paper: Connectionism and Cognitive Architecture: A Critical Analysis by Jerry A. Foder and Zenon W. Pylsshyn
-
Add research paper: On the binding problem in Artificial Neural Networks 2012.05208 - Papers by Jurgen Schimdhuber
- RAG / RALM
- RAG + Fine-tuning
- Multi-modal models
- Agents / Agentic / Micro-Agents
- Parallel running of models