Mixture of Experts (MoE) is a cutting-edge neural network architecture that enables efficient model scaling by routing inputs through a small subset of expert subnetworks. In this course, instructor Vaibhava Lakshmi Ravideshik explores the inner workings of MoE, from its core components to advanced routing strategies like top-k gating. The course balances theoretical understanding with hands-on coding using PyTorch to implement a simplified MoE layer. Along the way, you’ll also get a chance to review real-world applications of MoE in state-of-the-art models like GPT-4 and Mixtral.
Learn More