Causal Incentives Working Group

Logo

We are a collection of researchers interested in using causal models to understand agent incentives, in order to design safe and fair AI algorithms.

If you are interested in collaborating on any related problems, feel free to reach out to us.

View My GitHub Profile

Papers

Agent Incentives: A Causal Perspective (AI:ACP) (summary): presents sound and complete graphical criteria for four incentive concepts: value of information, value of control, response incentives, and control incentives.
T. Everitt*, R. Carey*, E. Langlois*, PA. Ortega, S. Legg.
AAAI-21

How RL Agents Behave When Their Actions Are Modified (summary): studies how user interventions affect the learning of different RL algorithms.
E. Langlois, T. Everitt.
AAAI-21

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice (video, summary): introduces a notion of subgames in multi-agent (causal) influence diagrams, alongside classic equilibrium refinements.
L. Hammond, J. Fox, T. Everitt, A. Abate, M. Wooldridge.
AAMAS-21

Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective (summary, summary 2): analyzes various reward tampering (aka “wireheading”) problems with causal influence diagrams.
T. Everitt, M. Hutter, R. Kumar, V. Krakovna
Synthese, 2021

PyCID: A Python Library for Causal Influence Diagrams (github): describes our Python package for analyzing (multi-agent) causal influence diagrams.
J. Fox, T. Everitt, R. Carey, E. Langlois, A. Abate, M. Wooldridge
SciPy, 2021

Modeling AGI safety frameworks with causal influence diagrams
T. Everitt, R. Kumar, V. Krakovna, S. Legg
IJCAI AI Safety Workshop, 2019

The Incentives that Shape Behavior (summary): superseded by AI:ACP.
R Carey*, E Langlois*, T Everitt, S Legg

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings (summary): superseded by AI:ACP.
T. Everitt, P.A. Ortega, E. Barnes, S. Legg

(* denotes equal contribution)

Software

pycid: A Python implementation of causal influence diagrams, built on pgmpy.

CID Latex Package: A package for drawing professional looking influence diagrams, see tutorial.

Working group members