The Shutdown Problem: Incomplete Preferences as a Solution

Abstract

I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I then propose a solution: train agents to have incomplete preferences. Specifically, I propose that we train agents to lack a preference between every pair of different-length trajectories. I suggest a way to train such agents using reinforcement learning: we give the agent lower reward for repeatedly choosing same-length trajectories.

Author's Profile

Elliott Thornley
University of Oxford

Analytics

Added to PP
2024-03-05

Downloads
479 (#38,947)

6 months
479 (#3,278)

Historical graph of downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.
How can I increase my downloads?