Course Description
AI holds great promise and,
many believe, great peril. What can mathematicians contribute to ensuring that promise is fulfilled, and peril avoided?
Topics may include: predictive coding, good regulator theorems, Markov decision processes, power-seeking theorems, signaling games, evolution of cooperation, open-source game theory, multi-agent learning, opponent shaping, logical uncertainty, usable information under computational constraints, proper scoring rules, forecast aggregation, Bayesian truth serum, coherence theorems, multi-objective optimization.
Related courses
This course is loosely modeled on the
AI Alignment course taught by Roger Grosse at the University of Toronto.
Useful background
Machine learning, game theory, and stochastic processes (at the level of MATH 4740).
Books
Papers
My plan is to cover bits and pieces of some of the
Causal Incentives Working Group papers, starting with
Agent Incentives and
Discovering agents.
Seminar
Starting in November, we'll devote each class to one of the following papers: 45 minute student presentation followed by 30 minute class discussion of the paper. The presentation can be slides (encouraged!) or blackboard. You should aim to state precisely the paper's main result and put it in context: what hole in human knowledge does this paper fill? The 30-minute follow-up discussion will poke at the paper to examine its strengths and weaknesses, and identify open questions and research directions that build on the paper.
Anthropic's interpretability papers (2021-2024)
Probing and steering language models
Hidden incentives
Coooperation and bounded agents
Class notes
I'll ask for a volunteer to take notes each class!
Instructions for notetakers: Use this
LaTeX template, update the date and topic in the filename and in the header, and
email me the .tex and .pdf of your notes so I can post them here. If anything in the lecture was confusing, you're encouraged to send me a draft of the notes and ask me questions! Notes are due 1 week after the lecture.
2024 Aug 26 & 28: Math for AI Safety
slides
2024 Sep 4:
Conditional Independence
2024 Sep 9:
d-separation theorem
2024 Sep 11:
G-Markov distributions
2024 Sep 16: Causal models: P(A|B) versus P(A|do(B))
2024 Sep 18: Causal models: counterfactuals
2024 Sep 23: Experiments with OpenAI's reasoning model
o1. Speculations on long chain of thought leading to a trapped prior.
2024 Sep 25: Influence diagrams, value of information, response incentive, and value of control as defined in the
Agent Incentives paper by Everitt et al.
2024 Sep 30 & Oct 2: Using causal models to reason about hidden incentives. Examples:
2024 Oct 7: Causal games and mechanised causal diagrams, as defined in the papers
Discovering agents (Kenton et al, 2022) and
Causality in games (Hammond et al, 2023).
2024 Oct 16: Overview of the research papers we'll cover in November, so you can make an informed choice of which paper to present!
2024 Oct 21: von Neumann-Morgenstern coherence theorem: Non-EU-maximizing agents are exploitable
2024 Oct 23: Dutch book theorems: Agents with incoherent beliefs are exploitable. How to quantify the incoherence of a set of beliefs. How to aggregate multiple weak predictions into one strong prediction
2024 Oct 28 & 30: Prepare for your seminar presentation (Lionel in Cambridge, UK this week)
2024 Nov 4: Common knowledge, Aumann agreement theorem, bounded rationality
2024 Nov 6: Overview of optional student research projects!
Seminar (student presentations of research papers)
2024 Nov 11:
Guiding formal theorem provers with informal proofs (Presenter: Baran Zadeoğlu)
2024 Nov 13:
Toy models of superposition (Presenter: Jacob Ornelas)
2024 Nov 18:
Learning time-scales in two-layer neural networks (Presenter: Haoxuan Fu)
2024 Nov 20:
Undetectable watermarks for language models (Presenter: Elijah Blum)
2024 Nov 25:
Open problems in causal machine learning (Presenter: Suvadip Sana)
2024 Dec 2:
Hidden incentives for auto-induced distributional shift (Presenter: Arkar Oak Soe)
2024 Dec 4:
Open-source game theory (Presenter: Matthew Haulmark or Lionel)
Presenter: Set the stage for your talk by crafting 1-3 warmup questions for us to think about beforehand.
Email me the questions at least 72 hours before your presentation and I'll pass them on for everyone to think about. The warmup questions should be about background knowledge or context that's useful for understanding the paper you're presenting.
Audience: You'll get the most out of the seminar if you look at the relevant paper beforehand and come with questions about it!
2024 Dec 9 (last class): Student research projects, or debug Lionel's research program
Questions
Email
me with questions about the course, or to request a particular topic!