Switch to: References

Add citations

You must login to add citations.
  1. Towards Shutdownable Agents via Stochastic Choice.Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho & Louis Thomson - 2024 - Global Priorities Institute Working Paper.
    Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn't happen. A key part of the IPP is using a novel 'Discounted REward for Same-Length Trajectories (DREST)' reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'), and (2) choose stochastically between different trajectory-lengths (be 'NEUTRAL' about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a (...)
    Download  
     
    Export citation  
     
    Bookmark  
  • The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies:1-28.
    I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation