AI Machine Learning Engineering - BKJackson/BKJackson_Wiki GitHub Wiki
Related Pages
TensorRT-LLM
Accelerating Large Language Model Inference with TensorRT-LLM: A Comprehensive Guide
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM - NVIDIA blog
TensorRT-LLM Github - TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
vLLM
vLLM Github - Easy, fast, and cheap LLM serving for everyone
vLLM - Sky Lab UC Berkeley - A High-Throughput and Memory-Efficient Inference and Serving Engine for LLMs
vLLM Blog - Latest news about vLLM & applications
NYC vLLM Meetup Slides - May 7, 2025
Efficient Memory Management for Large Language Model Serving with PagedAttention - Kwon et al., 2023
RedHat Optimized Model Hub
RedHat LLM Compressor Tools Github - Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
GuideLLM
GuideLLM - Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
HuggingFace Transformer (HFT) Format
HuggingFace Transformers - Github
MLC LLM
MCL LLM - Tailored for client-side use, it brings LLM capabilities directly to end-users
gRPC
gRPC Home - A high performance, open source universal RPC framework
gRPC and AI: A Powerful Partnership - May 20, 2025
Machine Learning Security
Architectural Neural Backdoors from First Principles - Feb 10, 2024
Books
Machine Learning Engineering Open Book - This is an open collection of methodologies, tools and step by step instructions to help with successful training of large language models and multi-modal models. This is a technical material suitable for LLM/VLM training engineers and operators. That is the content here contains lots of scripts and copy-n-paste commands to enable you to quickly address your needs.
The Art of Debugging
My Training
AWS Technical Essentials - 4 hour course, free