Results for 'shutdownability'

Order:
  1. The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies.
    I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And (...)
    Download  
     
    Export citation  
     
    Bookmark  
  2. The Shutdown Problem: Incomplete Preferences as a Solution.Elliott Thornley - manuscript
    I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I then propose a solution: train agents to have incomplete preferences. Specifically, I propose that we train agents to lack a preference between every pair of different-length trajectories. I suggest a way to train such agents using reinforcement learning: (...)
    Download  
     
    Export citation  
     
    Bookmark