An 8-week structured curriculum introducing fellows and students to AI Safety and Alignment. Covers machine learning foundations, alignment challenges, interpretability, control methods, red-teaming, and governance.
Overview
This curriculum provides a comprehensive introduction to AI Safety and Alignment, structured as an 8-week guided reading program. It begins with foundational machine learning concepts (Week 0, optional prerequisite) and progresses through increasingly advanced topics including mechanistic interpretability, scalable oversight, deceptive alignment, and governance frameworks. Designed for individuals preparing to work on AI safety research, particularly those joining fellowships like MATS, SERI, or similar alignment programs.
Who This Is For
Fellows, students, and researchers preparing to work in AI Safety and Alignment
What's Included
- 8-week structured syllabus with progressive difficulty
- Week-by-week topic themes and conceptual focus
- Curated core readings and optional further readings
- Time estimates for each reading/video
- Covers ML foundations, alignment challenges, interpretability, control methods, and governance
- Designed for fellowship preparation and self-study
Curriculum Structure
The curriculum is organized as a progressive 8-week program, with each week focusing on a specific theme or set of concepts:
- Week 0 (Optional Prerequisite): Foundational machine learning concepts for those new to the field
- Weeks 1-8: Progressive exploration of AI Safety and Alignment topics, from basic concepts to advanced research areas
Each week includes curated core readings, optional further readings, and time estimates to help you plan your study schedule. Topics covered include mechanistic interpretability, scalable oversight, deceptive alignment, red-teaming methodologies, and governance frameworks.