Switch to: References

Citations of:

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective

Tom Everitt, Marcus Hutter, Ramana Kumar & Victoria Krakovna

Synthese 198 (Suppl 27):6435-6467 (2021)

Add citations

You must login to add citations.

Predicting and Preferring.Nathaniel Sharadin - forthcoming - Inquiry: An Interdisciplinary Journal of Philosophy.details The use of machine learning, or “artificial intelligence” (AI) in medicine is widespread and growing. In this paper, I focus on a specific proposed clinical application of AI: using models to predict incapacitated patients’ treatment preferences. Drawing on results from machine learning, I argue this proposal faces a special moral problem. Machine learning researchers owe us assurance on this front before experimental research can proceed. In my conclusion I connect this concern to broader issues in AI safety. Download Export citation Bookmark 1 citation
Discovering agents.Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott & Tom Everitt - 2023 - Artificial Intelligence 322 (C):103963.details Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we (...) Download Export citation Bookmark 2 citations
Automation, Alignment, and the Cooperative Interface.Julian David Jonker - 2024 - The Journal of Ethics 28 (3):483-504.details The paper demonstrates that social alignment is distinct from value alignment as it is currently understood in the AI safety literature, and argues that social alignment is an important research agenda. Work provides an important example for the argument, since work is a cooperative endeavor, and it is part of the larger manifold of social cooperation. These cooperative aspects of work are individually and socially valuable, and so they must be given a central place when evaluating the impact of AI (...) Download Export citation Bookmark
Value Alignment for Advanced Artificial Judicial Intelligence.Christoph Winter, Nicholas Hollman & David Manheim - 2023 - American Philosophical Quarterly 60 (2):187-203.details This paper considers challenges resulting from the use of advanced artificial judicial intelligence (AAJI). We argue that these challenges should be considered through the lens of value alignment. Instead of discussing why specific goals and values, such as fairness and nondiscrimination, ought to be implemented, we consider the question of how AAJI can be aligned with goals and values more generally, in order to be reliably integrated into legal and judicial systems. This value alignment framing draws on AI safety and (...) Download Export citation Bookmark 2 citations

1