Switch to: References

Citations of:

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Elliott Thornley

Philosophical Studies:1-28 (forthcoming)

Add citations

You must login to add citations.

How Much Should Governments Pay to Prevent Catastrophes? Longtermism's Limited Role.Carl Shulman & Elliott Thornley - 2025 - In Jacob Barrett, Hilary Greaves & David Thorstad, Essays on Longtermism: Present Action for the Distant Future. Oxford University Press.details Longtermists have argued that humanity should significantly increase its efforts to prevent catastrophes like nuclear wars, pandemics, and AI disasters. But one prominent longtermist argument overshoots this conclusion: the argument also implies that humanity should reduce the risk of existential catastrophe even at extreme cost to the present generation. This overshoot means that democratic governments cannot use the longtermist argument to guide their catastrophe policy. In this paper, we show that the case for preventing catastrophe does not depend on longtermism. (...) Download Export citation Bookmark 6 citations
What is AI safety? What do we want it to be?Jacqueline Harding & Cameron Domenico Kirk-Giannini - manuscriptdetails The field of AI safety seeks to prevent or reduce the harms caused by AI systems. A simple and appealing account of what is distinctive of AI safety as a field holds that this feature is constitutive: a research project falls within the purview of AI safety just in case it aims to prevent or reduce the harms caused by AI systems. Call this appealingly simple account The Safety Conception of AI safety. Despite its simplicity and appeal, we argue that (...) Download Export citation Bookmark
LLMs Can Never Be Ideally Rational.Simon Goldstein - manuscriptdetails LLMs have dramatically improved in capabilities in recent years. This raises the question of whether LLMs could become genuine agents with beliefs and desires. This paper demonstrates an in principle limit to LLM agency, based on their architecture. LLMs are next word predictors: given a string of text, they calculate the probability that various words can come next. LLMs produce outputs that reflect these probabilities. I show that next word predictors are exploitable. If LLMs are prompted to make probabilistic predictions (...) Download Export citation Bookmark
Towards Shutdownable Agents via Stochastic Choice.Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho & Louis Thomson - 2024 - Global Priorities Institute Working Paper.details Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn't happen. A key part of the IPP is using a novel 'Discounted REward for Same-Length Trajectories (DREST)' reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'), and (2) choose stochastically between different trajectory-lengths (be 'NEUTRAL' about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a (...) Download Export citation Bookmark
Beyond Preferences in AI Alignment.Tan Zhi-Xuan, Micah Carroll, Matija Franklin & Hal Ashton - forthcoming - Philosophical Studies:1-51.details The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term apreferentistapproach to AI alignment. In this paper, we characterize (...) Download Export citation Bookmark

1