Agent Incentives: A Causal Perspective (AI:ACP) (summary): presents sound and complete graphical criteria for four incentive concepts: value of information, value of control, response incentives, and control incentives.
T. Everitt*, R. Carey*, E. Langlois*, PA. Ortega, S. Legg.
How RL Agents Behave When Their Actions Are Modified (summary): studies how user interventions affect the learning of different RL algorithms.
E. Langlois, T. Everitt.
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice (video, summary): introduces a notion of subgames in multi-agent (causal) influence diagrams, alongside classic equilibrium refinements. The paper also reports on pycid.
L. Hammond, J. Fox, T. Everitt, A. Abate, M. Wooldridge.
Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective (summary, summary 2): analyzes various reward tampering (aka “wireheading”) problems with causal influence diagrams.
T. Everitt, M. Hutter, R. Kumar, V. Krakovna
Modeling AGI safety frameworks with causal influence diagrams
T. Everitt, R. Kumar, V. Krakovna, S. Legg
IJCAI AI Safety Workshop, 2019
Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings (summary): superseded by AI:ACP.
T. Everitt, P.A. Ortega, E. Barnes, S. Legg
(* denotes equal contribution)