Agent Incentives: A Causal Perspective (AI:ACP) (summary): presents sound and complete graphical criteria for four incentive concepts: value of information, value of control, response incentives, and control incentives.
T. Everitt*, R. Carey*, E. Langlois*, PA. Ortega, S. Legg.
How RL Agents Behave When Their Actions Are Modified (summary): studies how user interventions affect the learning of different RL algorithms.
E. Langlois, T. Everitt.
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice (video, summary): introduces a notion of subgames in multi-agent (causal) influence diagrams, alongside classic equilibrium refinements.
L. Hammond, J. Fox, T. Everitt, A. Abate, M. Wooldridge.
Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective (summary, summary 2): analyzes various reward tampering (aka “wireheading”) problems with causal influence diagrams.
T. Everitt, M. Hutter, R. Kumar, V. Krakovna
PyCID: A Python Library for Causal Influence Diagrams (github): describes our Python package for analyzing (multi-agent) causal influence diagrams.
J. Fox, T. Everitt, R. Carey, E. Langlois, A. Abate, M. Wooldridge
Modeling AGI safety frameworks with causal influence diagrams
T. Everitt, R. Kumar, V. Krakovna, S. Legg
IJCAI AI Safety Workshop, 2019
Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings (summary): superseded by AI:ACP.
T. Everitt, P.A. Ortega, E. Barnes, S. Legg
(* denotes equal contribution)