LLM Large Language Model - yszheda/wiki GitHub Wiki

Models

https://kipp.ly/transformer-taxonomy/ https://kipp.ly/transformer-inference-arithmetic/

Transformer

Position Encoding

multi-head

swin-transformer

transformer performance optimization

KV cache

Flash Decoding

llama

Frameworks

llama.cpp

llama2.c

Optimization