Causal Incentives Working Group


We are a collection of researchers interested in using causal models to understand agent incentives, in order to design safe and fair AI algorithms.

If you are interested in collaborating on any related problems, feel free to reach out to us.

View My GitHub Profile


Agent Incentives: A Causal Perspective (AI:ACP): presents sound and complete graphical criteria for four incentive concepts: value of information, value of control, response incentives, and control incentives.
T. Everitt*, R. Carey*, E. Langlois*, PA. Ortega, S. Legg.

How RL Agents Behave When Their Actions Are Modified: studies how user interventions affect the learning of different RL algorithms.
E. Langlois, T. Everitt.

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice (video): introduces a notion of subgames in multi-agent (causal) influence diagrams, alongside classic equilibrium refinements. The paper also reports on pycid.
L. Hammond, J. Fox, T. Everitt, A. Abate, M. Wooldridge.

Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective (summary): analyzes various reward tampering (aka “wireheading”) problems with causal influence diagrams.
T. Everitt, M. Hutter, R. Kumar, V. Krakovna
Synthese, 2021

Modeling AGI safety frameworks with causal influence diagrams
T. Everitt, R. Kumar, V. Krakovna, S. Legg
IJCAI AI Safety Workshop, 2019

The Incentives that Shape Behavior (summary): superseded by AI:ACP.
R Carey*, E Langlois*, T Everitt, S Legg

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings (summary): superseded by AI:ACP.
T. Everitt, P.A. Ortega, E. Barnes, S. Legg

(* denotes equal contribution)


pycid: A Python implementation of causal influence diagrams, built on pgmpy.

CID Latex Package: A package for drawing professional looking influence diagrams, see tutorial.

Working group members