The Shutdown Problem: Incomplete Preferences as a Solution

Elliott Thornley

The Shutdown Problem: Incomplete Preferences as a Solution

Abstract

I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I then propose a solution: train agents to have incomplete preferences. Specifically, I propose that we train agents to lack a preference between every pair of different-length trajectories. I suggest a way to train such agents using reinforcement learning: we give the agent lower reward for repeatedly choosing same-length trajectories.

View on PhilPapers

Author's Profile

Elliott Thornley

University of Oxford

Archival history

First archival date: 2024-03-05
Latest version: 3 (2025-05-18)
View all versions

Keywords

the shutdown problem shutdownability corrigibility AI safety constructive decision theory incomplete preferences

Reprint years

Analytics

Added to PP
2024-03-05

Downloads
945 (#26,810)

6 months
266 (#11,277)

Historical graph of downloads since first upload

This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.

How can I increase my downloads?

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

The Shutdown Problem: Incomplete Preferences as a Solution

Abstract

Author's Profile

Archival history

Categories

Keywords

Reprint years

Analytics