Off-Switching Not Guaranteed

Philosophical Studies:1-13 (forthcoming)
  Copy   BIBTEX

Abstract

Hadfield-Menell et al. (2017) propose the Off-Switch Game, a model of Human-AI cooperation in which AI agents always defer to humans because they are uncertain about our preferences. I explain two reasons why AI agents might not defer. First, AI agents might not value learning. Second, even if AI agents value learning, they might not be certain to learn our actual preferences.

Author's Profile

Sven Neth
University of Pittsburgh

Analytics

Added to PP
2025-02-12

Downloads
244 (#87,805)

6 months
244 (#10,798)

Historical graph of downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.
How can I increase my downloads?