Switch to: References

Citations of:

Shutdown-seeking AI

Simon Goldstein & Pamela Robinson

Philosophical Studies:1-13 (forthcoming)

Add citations

You must login to add citations.

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies:1-28.details I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And (...) Download Export citation Bookmark 2 citations
Towards Shutdownable Agents via Stochastic Choice.Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho & Louis Thomson - 2024 - Global Priorities Institute Working Paper.details Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn't happen. A key part of the IPP is using a novel 'Discounted REward for Same-Length Trajectories (DREST)' reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'), and (2) choose stochastically between different trajectory-lengths (be 'NEUTRAL' about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a (...) Download Export citation Bookmark

1