Michael Pocress

An 8-week structured curriculum introducing fellows and students to AI Safety and Alignment. Covers machine learning foundations, alignment challenges, interpretability, control methods, red-teaming, and governance.

Overview

This curriculum provides a comprehensive introduction to AI Safety and Alignment, structured as an 8-week guided reading program. It begins with foundational machine learning concepts (Week 0, optional prerequisite) and progresses through increasingly advanced topics including mechanistic interpretability, scalable oversight, deceptive alignment, and governance frameworks. Designed for individuals preparing to work on AI safety research, particularly those joining fellowships like MATS, SERI, or similar alignment programs.

Who This Is For

Fellows, students, and researchers preparing to work in AI Safety and Alignment

What's Included

8-week structured syllabus with progressive difficulty
Week-by-week topic themes and conceptual focus
Curated core readings and optional further readings
Time estimates for each reading/video
Covers ML foundations, alignment challenges, interpretability, control methods, and governance
Designed for fellowship preparation and self-study

Curriculum Structure

The curriculum is organized as a progressive 8-week program, with each week focusing on a specific theme or set of concepts:

Week 0 (Optional Prerequisite): Foundational machine learning concepts for those new to the field
Weeks 1-8: Progressive exploration of AI Safety and Alignment topics, from basic concepts to advanced research areas

Each week includes curated core readings, optional further readings, and time estimates to help you plan your study schedule. Topics covered include mechanistic interpretability, scalable oversight, deceptive alignment, red-teaming methodologies, and governance frameworks.

AI Safety and Alignment Curriculum

Overview

Who This Is For

What's Included

Curriculum Structure