STEM

Reinforcement Learning for LLM Alignment and Reasoning by Pearson

Pretraining gives LLMs capability, not judgment. In this course, learn how reinforcement learning techniques like direct preference optimization (DPO) and group relative policy optimization (GRPO) shape model behavior, safety, and reasoning, and how to build the evaluation and governance systems that keep alignment on track. This course is an ideal fit for developers, data scientists, and ML engineers who are fine-tuning or deploying LLMs and want to improve their safety, effectiveness, and reasoning capabilities.

Note: This course was created by Pearson. We are pleased to host this training in our library.

Learn More