Notes on AI LLM & Cognition - GRibbans/Gribbans GitHub Wiki

Image at the top of each wiki page, all Header Images use abstract geometric motifs

Table of Contents


[!Note] Work in Progress This page is a first draft and potentially subject to extensive change.

Applied & Practical

Improving LLM Response Quality / Ability aka Prompt Engineering

How to optimally tap into the model capabilities during interaction via instructions (prompts). Essentially you need to provide enough context for it to fully understand what it is that is being asked of it. While also matching the prompt to the style (structure) of the data used to train the LLM.

First-call Prompt Engineering

There are some basic steps that can positively and negatively affect LLM responses.

  • Word Ordering
  • Spacing
  • Capitalisation
  • Self-reflection
  • Ask to 'Ruminate' - Asking the LLM to ruminate on your prompt can improved output.

Further general advise to improve reasoning and output.

  • Direct writing - Be direct and to the point
  • Inclusive prompts - Include your audience in your prompt
  • Structured format - E.g. Format prompt: ### Instruction ###, followed by ### Question ### etc.
  • Task breakdown - Break down complicated tasks into multiple prompts:
  • Positive language - Use directives like “do” instead of “don’t”
  • Mandatory instructions - Use phrases like You MUST e.g. "You must provide a step-by-step guide to..."
  • Penalty warning - Use phrases like "You’ll be penalized"
  • Natural responses - Request a simple response e.g. "answering in a natural, human-like manner..."
  • Unbiased descriptions - Request the LLm to make sure that your answer is unbiased and doesn’t rely on stereotypes
  • Linguistic mimicry - Use the same language as the following paragraph to explain the importance of exercise:
  • Precise instructions - Clearly state the requirements that the model must follow to produce content, in the form of keywords or instructions

Next-Stage Prompt Engineering

Example Driven Prompts - Use example-driven (few-shot) prompting: CoT Chain of Thought Prompts - Combine chain-of-thought (CoT) prompts with few-shot prompts: "Let's think through this step by step. Solve each step and explain how you arrived at your answer." Delimited Prompts - Use delimiters to structure text: ReAct Model - Reasoning and Acting paradigm that guides LLM to respond in a structured manager to complex queries. Results suggest that ReAct outperforms other leading methods in language and decision-making tasks, enhances human understanding and trust in large language models (LLMs). It and best when combined with Chain-of-Thought (CoT) steps as individual tasks with results being used for the next step, utilizing both internal knowledge and external information during reasoning. BoT Buffer of Thoughts - TBC CO-START Framework

  • Context - provide the background
  • Objective (or Task) - define the task to be performed
  • Style - instruct a writing style. Kind of sentences; formal, informal, magazine style, colloquial, or allude to a know style.
  • Audience - who's it for?
  • Response - format, Text, Python, SQL, JSON, etc

Retrieval Augmentation (RAG)

TBA

Function Calling

Function calling enhances the capabilities of Large Language Models (LLMs) in two main ways:

  • Structured Responses: LLMs can generate structured responses, like JSON objects, that can be used as arguments in subsequent functions within LLM applications. This is a more secured way to ensure that your application is in control of executing a set of functions for which the LLM has generated function arguments.
  • Function Invocation: Optionally, LLMs can directly invoke functions using these structured JSON objects as arguments. This allows them to utilize external tools such as Python interpreters, search websites, or access external databases, offering vast potentite of possibilities.

Outline ideal use case - limitations as it it not training data but directly referencing information

Improving LLM Speed (Response Times)

Inference

TBA

Routing

TBA Add details on RouteLLM / vLLM / LiteLLM

Context caching

See Context Caching by Google AI

Using Context Caching feature, you can pass content to the model: cache the input tokens and then refer to the cached tokens for subsequent requests.

Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases such as:

  • Chat-bots with extensive system instructions
  • Repetitive analysis of lengthy video files
  • Recurring queries against large document sets
  • Frequent code repository analysis or bug fixing

Approaches & Techniques for LLM Systems

[!NOTE]Definition
Approach aka theoretical Method, a set of procedures based on the theories. Techniques aka strategy used for a specific goal, including overcoming expected obstacles.

GenAI Cookbook

Improving Risk Management and Security

NeMo Guardrails - toolkit to design and develop LLM app/system guardrails. Llama Guard(PDF available), an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. CoLang - modelling language for design of guardrails for conversational systems.

CoLang Terminology

LLM-based Application: a software application that uses an LLM to drive Bot: synonym for LLM-based application. Utterance: the raw text coming from the user or the bot. Intent: the canonical form (i.e. structured representation) of a user/bot utterance. Event: something that has happened and is relevant to the conversation e.g. user is silent, user clicked something, user made a gesture, etc. Action: a custom code that the bot can invoke; usually for connecting to third-party API. Context: any data relevant to the conversation (i.e. a key-value dictionary). Flow: a sequence of messages and events, potentially with additional branching logic. Rails: specific ways of controlling the behaviors of a conversational system (a.k.a. bot) e.g. not talk about politics, respond in a specific way to certain user requests, follow a predefined dialog path, use a specific language style, extract data etc.

Core Syntactic Elements within LLMs

User Messages - User message definition blocks define the canonical form message that should be associated with various user utterances. Bot Messages - Bot message definition blocks define the utterances that should be associated with various bot message canonical forms. Flows - Flows represent how you want the conversation to unfold. It includes sequences of user messages, bot messages and potentially other events. Subflows - Subflows are a particular type of flows. While flows are meant to be applied automatically to the current conversation (when there is a match), subflows are meant to be called explicitly by other flows/subflows. Actions - Actions are custom functions available to be invoked from flows but are not defined in Colang. They are made available to the guardrails configuration at runtime by the host application.


Academic / Theoretical

Research Papers Focused on AI & LLM

Collection of journal articles, mainly found across HuggingFace posts, and certain YouTube channels.

ARXIV - By Cornell University, it is one of the main sites publishing AI & LLM related papers.

[!NOTE]
Find the journal article on extreme extended training to form geometric learning and recall. Addendum - article was on 'groking'.

SYSTEM NAME DESCRIPTION
Attention is all you need. The legendary paper that introduced Transformers, and kick-started the LLM explosion 1706.03762
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction 2408.04948v1
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State 2311.04897
Language Models (Mostly) Know What They Know 2207.05221
Plan-and-solve prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models 2305.04091
Think Before You Speak 2310.02226
On the binding problem in Artificial Neural Networks 2012.05208
AutoCoder Enhancing Code Large Language Model with AIEV-INSTRUCT 2405.14906
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models arXiv.2406.04271
Chain-of-Note - Enhancing robustness in Retrieval Augmented Large Language Models arXiv.2311.09210
Demystifying Prompts in Language Models via Perplexity Estimation arXiv.2212.04037
Emergent Tool Use from Multi-Agent Autocurricula arXiv.1909.07528
Faults in Deep Reinforcement Learning Programs: a Taxonomy and a Detection Approach arXiv.2101.00135
Fine-Tuning and Prompt Optimization Two Great Steps that Work Better Together arXiv-2407.10930
Grammatical Error Correction - A survey on the SOTA arXiv.2211.05166
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery arXiv.2302.03668
How to Prompt? Opportunities and Challenges of Zero and Few-shot Learning in Creative Applications of Generative Models arXiv.2209.01390
Knowledge Technologies by Nick Milton arXiv.0802.3789
LLM In-Context Recall is Prompt Dependent arXiv.2404.08865
Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT arXiv.2302.11382
QLoRA Efficient Fine-tuning of Quantized LLMs arXiv.2305.14314
RAFT - A new way to teach LLMs to be better at RAG arXiv.2403.10131
ReAct: Synergizing Reasoning and Acting in Language Models arXiv.2210.03629
Scaling Instructable Agents Across Many Simulated Worlds arXiv.2404.10179
Taxonomy of Prompt Modifiers for Text-to-Image Generation arXiv.2204.13988

Research Papers on Related Topics

A few from subject areas surrounding AI / LLM development, such as: Argumentation, Reasoning, Learning, Logic, Thinking, Ontology, Taxonomies, among others.

SYSTEM NAME DESCRIPTION
A critical review of argument visualization tools: do users become better reasoners? ResearchGate
Buffer of Thoughts - Thought-Augmented Reasoning with Large Language Models arXiv.2406.04271
Continual Learning, Fast and Slow arXiv.2209.02370
Defeasible Logic Programming: An Argumentative Approach arXiv.cs/0302029
Events as Entities in Ontology-Driven Conceptual Modelling Springer
Language of Thought Hypothesis Archive.org
Role of the Systemic View in Foundational Ontologies' Semantic Scholar
Types and taxonomic structures in conceptual modelling - A novel ontological theory and engineering support Elsevier

Research Papers from the Association of Computational Linguistics (ACL Anthology)

ACL Anthology, looks into the study of computational linguistics and natural language processing.

Research Papers from Meta AI Research Lab

Human Cognition (Thinking/Thought)

Language of Thought Hypothesis Ghost in the Machine (210 minutes) Video by Machine Learning Street Talk on cognition and AI/LLM.


Todo

  • Joint Embedding Predictive Architecture (JEPA)
  • Find paper on 'groking' and benefits of extended learning, and how LLM are overcoming the mid-point diminishing return.
  • Add research paper: Connectionism and Cognitive Architecture: A Critical Analysis by Jerry A. Foder and Zenon W. Pylsshyn
  • Add research paper: On the binding problem in Artificial Neural Networks 2012.05208
  • Papers by Jurgen Schimdhuber
  • RAG / RALM
  • RAG + Fine-tuning
  • Multi-modal models
  • Agents / Agentic / Micro-Agents
  • Parallel running of models