AI Machine Learning Engineering - BKJackson/BKJackson_Wiki GitHub Wiki

Related Pages

TensorRT-LLM

Accelerating Large Language Model Inference with TensorRT-LLM: A Comprehensive Guide
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM - NVIDIA blog
TensorRT-LLM Github - TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

vLLM

vLLM Github - Easy, fast, and cheap LLM serving for everyone
vLLM - Sky Lab UC Berkeley - A High-Throughput and Memory-Efficient Inference and Serving Engine for LLMs
vLLM Blog - Latest news about vLLM & applications
NYC vLLM Meetup Slides - May 7, 2025
Efficient Memory Management for Large Language Model Serving with PagedAttention - Kwon et al., 2023
RedHat Optimized Model Hub
RedHat LLM Compressor Tools Github - Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

GuideLLM

GuideLLM - Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

HuggingFace Transformer (HFT) Format

HuggingFace Transformers - Github

MLC LLM

MCL LLM - Tailored for client-side use, it brings LLM capabilities directly to end-users

gRPC

gRPC Home - A high performance, open source universal RPC framework
gRPC and AI: A Powerful Partnership - May 20, 2025

Machine Learning Security

Architectural Neural Backdoors from First Principles - Feb 10, 2024

Books

Machine Learning Engineering Open Book - This is an open collection of methodologies, tools and step by step instructions to help with successful training of large language models and multi-modal models. This is a technical material suitable for LLM/VLM training engineers and operators. That is the content here contains lots of scripts and copy-n-paste commands to enable you to quickly address your needs.
The Art of Debugging

My Training

AWS Technical Essentials - 4 hour course, free