Build Enterprise Worthy LLM Inference with Open Source and Kubernetes
Walk through orchestrating multi-node LLM inference on Kubernetes with real GPU scheduling tactics, built around an e-commerce use case from Microsoft's team.
Microsoft's team walks through orchestrating multi-node LLM inference on Kubernetes using real e-commerce examples, showing you exactly how to configure GPU scheduling and optimize data transfers between nodes. You'll learn how NVIDIA Dynamo and Azure Kubernetes Service work together to solve the practical bottlenecks that appear when you move from a notebook to production scale. The workshop covers concrete tactics around advanced hardware like the GB200 NVL72 and distributed workload optimization on AKS, giving you patterns you can apply to your own infrastructure. This is hands-on enough that you'll understand the tradeoffs between latency, cost, and throughput rather than just hearing theory.