Citations of:

Artificial Intelligence, Values, and Alignment

Iason Gabriel

Minds and Machines 30 (3):411-437 (2020)

Add citations

You must login to add citations.

The linguistic dead zone of value-aligned agency, natural and artificial.Travis LaCroix - 2024 - Philosophical Studies:1-23.details The value alignment problem for artificial intelligence (AI) asks how we can ensure that the “values”—i.e., objective functions—of artificial systems are aligned with the values of humanity. In this paper, I argue that linguistic communication is a necessary condition for robust value alignment. I discuss the consequences that the truth of this claim would have for research programmes that attempt to ensure value alignment for AI systems—or, more loftily, those programmes that seek to design robustly beneficial or ethical artificial agents. Download Export citation Bookmark
Disagreement, AI alignment, and bargaining.Harry R. Lloyd - forthcoming - Philosophical Studies:1-31.details New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, biomedicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that – dependent on the alignment target chosen – our AIs might optimise for objectives that reflect the values only of a certain subset of society, and (...) Download Export citation Bookmark
The “Trolley Problem” in Fully Automated AI-Driven Media: A Challenge Beyond Autonomous Driving.Juan Wang & Bin Ye - 2024 - Journal of Media Ethics 39 (4):244-262.details The rapid progress of artificial intelligence (AI) has resulted in its integration into various stages of the media process, including information gathering, processing, and distribution. This integration has raised the possibility of AI dominating the media industry, leading to an era of “autonomous driving” within AI-driven media systems. Similar to the ethical dilemma known as the “trolley problem” (TP) in autonomous driving, a comparable problem arises in AI automated media. This study examines the emergence of the new TP in fully (...) Download Export citation Bookmark
Language Agents and Malevolent Design.Inchul Yum - 2024 - Philosophy and Technology 37 (104):1-19.details Language agents are AI systems capable of understanding and responding to natural language, potentially facilitating the process of encoding human goals into AI systems. However, this paper argues that if language agents can achieve easy alignment, they also increase the risk of malevolent agents building harmful AI systems aligned with destructive intentions. The paper contends that if training AI becomes sufficiently easy or is perceived as such, it enables malicious actors, including rogue states, terrorists, and criminal organizations, to create powerful (...) Download Export citation Bookmark 1 citation
A sociotechnical system perspective on AI.Olya Kudina & Ibo van de Poel - 2024 - Minds and Machines 34 (3):1-9.details Download Export citation Bookmark
Existentialist risk and value misalignment.Ariela Tubert & Justin Tiehen - forthcoming - Philosophical Studies.details We argue that two long-term goals of AI research stand in tension with one another. The first involves creating AI that is safe, where this is understood as solving the problem of value alignment. The second involves creating artificial general intelligence, meaning AI that operates at or beyond human capacity across all or many intellectual domains. Our argument focuses on the human capacity to make what we call “existential choices”, choices that transform who we are as persons, including transforming what (...) Download Export citation Bookmark 1 citation
Adaptable robots, ethics, and trust: a qualitative and philosophical exploration of the individual experience of trustworthy AI.Stephanie Sheir, Arianna Manzini, Helen Smith & Jonathan Ives - forthcoming - AI and Society:1-14.details Much has been written about the need for trustworthy artificial intelligence (AI), but the underlying meaning of trust and trustworthiness can vary or be used in confusing ways. It is not always clear whether individuals are speaking of a technology’s trustworthiness, a developer’s trustworthiness, or simply of gaining the trust of users by any means. In sociotechnical circles, trustworthiness is often used as a proxy for ‘the good’, illustrating the moral heights to which technologies and developers ought to aspire, at (...) Download Export citation Bookmark 1 citation
On the computational complexity of ethics: moral tractability for minds and machines.Jakob Stenseke - 2024 - Artificial Intelligence Review 57 (105):90.details Why should moral philosophers, moral psychologists, and machine ethicists care about computational complexity? Debates on whether artificial intelligence (AI) can or should be used to solve problems in ethical domains have mainly been driven by what AI can or cannot do in terms of human capacities. In this paper, we tackle the problem from the other end by exploring what kind of moral machines are possible based on what computational systems can or cannot do. To do so, we analyze normative (...) Download Export citation Bookmark
Challenges of responsible AI in practice: scoping review and recommended actions.Malak Sadek, Emma Kallina, Thomas Bohné, Céline Mougenot, Rafael A. Calvo & Stephen Cave - forthcoming - AI and Society:1-17.details Responsible AI (RAI) guidelines aim to ensure that AI systems respect democratic values. While a step in the right direction, they currently fail to impact practice. Our work discusses reasons for this lack of impact and clusters them into five areas: (1) the abstract nature of RAI guidelines, (2) the problem of selecting and reconciling values, (3) the difficulty of operationalising RAI success metrics, (4) the fragmentation of the AI pipeline, and (5) the lack of internal advocacy and accountability. Afterwards, (...) Download Export citation Bookmark 1 citation
Ethics of generative AI and manipulation: a design-oriented research agenda.Michael Klenk - 2024 - Ethics and Information Technology 26 (1):1-15.details Generative AI enables automated, effective manipulation at scale. Despite the growing general ethical discussion around generative AI, the specific manipulation risks remain inadequately investigated. This article outlines essential inquiries encompassing conceptual, empirical, and design dimensions of manipulation, pivotal for comprehending and curbing manipulation risks. By highlighting these questions, the article underscores the necessity of an appropriate conceptualisation of manipulation to ensure the responsible development of Generative AI technologies. Download Export citation Bookmark 1 citation
Artificial Intelligence: Arguments for Catastrophic Risk.Adam Bales, William D'Alessandro & Cameron Domenico Kirk-Giannini - 2024 - Philosophy Compass 19 (2):e12964.details Recent progress in artificial intelligence (AI) has drawn attention to the technology’s transformative potential, including what some see as its prospects for causing large-scale harm. We review two influential arguments purporting to show how AI could pose catastrophic risks. The first argument — the Problem of Power-Seeking — claims that, under certain assumptions, advanced AI systems are likely to engage in dangerous power-seeking behavior in pursuit of their goals. We review reasons for thinking that AI systems might seek power, that (...) Download Export citation Bookmark 5 citations
A Personalized Patient Preference Predictor for Substituted Judgments in Healthcare: Technically Feasible and Ethically Desirable.Brian D. Earp, Sebastian Porsdam Mann, Jemima Allen, Sabine Salloch, Vynn Suren, Karin Jongsma, Matthias Braun, Dominic Wilkinson, Walter Sinnott-Armstrong, Annette Rid, David Wendler & Julian Savulescu - 2024 - American Journal of Bioethics 24 (7):13-26.details When making substituted judgments for incapacitated patients, surrogates often struggle to guess what the patient would want if they had capacity. Surrogates may also agonize over having the (sole) responsibility of making such a determination. To address such concerns, a Patient Preference Predictor (PPP) has been proposed that would use an algorithm to infer the treatment preferences of individual patients from population-level data about the known preferences of people with similar demographic characteristics. However, critics have suggested that even if such (...) Download Export citation Bookmark 26 citations
Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition.Adrià Moret - 2023 - Journal of Artificial Intelligence and Consciousness 10 (02):309-334.details Ambitious value learning proposals to solve the AI alignment problem and avoid catastrophic outcomes from a possible future misaligned artificial superintelligence (such as Coherent Extrapolated Volition [CEV]) have focused on ensuring that an artificial superintelligence (ASI) would try to do what humans would want it to do. However, present and future sentient non-humans, such as non-human animals and possible future digital minds could also be affected by the ASI’s behaviour in morally relevant ways. This paper puts forward Sentientist Coherent Extrapolated (...) Download Export citation Bookmark
Ethics of Artificial Intelligence.Stefan Buijsman, Michael Klenk & Jeroen van den Hoven - forthcoming - In Nathalie Smuha (ed.), Cambridge Handbook on the Law, Ethics and Policy of AI. Cambridge University Press.details Artificial Intelligence (AI) is increasingly adopted in society, creating numerous opportunities but at the same time posing ethical challenges. Many of these are familiar, such as issues of fairness, responsibility and privacy, but are presented in a new and challenging guise due to our limited ability to steer and predict the outputs of AI systems. This chapter first introduces these ethical challenges, stressing that overviews of values are a good starting point but frequently fail to suffice due to the context (...) Download Export citation Bookmark
Value alignment, human enhancement, and moral revolutions.Ariela Tubert & Justin Tiehen - forthcoming - Inquiry: An Interdisciplinary Journal of Philosophy.details Human beings are internally inconsistent in various ways. One way to develop this thought involves using the language of value alignment: the values we hold are not always aligned with our behavior, and are not always aligned with each other. Because of this self-misalignment, there is room for potential projects of human enhancement that involve achieving a greater degree of value alignment than we presently have. Relatedly, discussions of AI ethics sometimes focus on what is known as the value alignment (...) Download Export citation Bookmark
Calibrating machine behavior: a challenge for AI alignment.Erez Firt - 2023 - Ethics and Information Technology 25 (3):1-8.details When discussing AI alignment, we usually refer to the problem of teaching or training advanced autonomous AI systems to make decisions that are aligned with human values or preferences. Proponents of this approach believe it can be employed as means to stay in control over sophisticated intelligent systems, thus avoiding certain existential risks. We identify three general obstacles on the path to implementation of value alignment: a technological/technical obstacle, a normative obstacle, and a calibration problem. Presupposing, for the purposes of (...) Download Export citation Bookmark 1 citation
Human achievement and artificial intelligence.Brett Karlan - 2023 - Ethics and Information Technology 25 (3):1-12.details In domains as disparate as playing Go and predicting the structure of proteins, artificial intelligence (AI) technologies have begun to perform at levels beyond which any humans can achieve. Does this fact represent something lamentable? Does superhuman AI performance somehow undermine the value of human achievements in these areas? Go grandmaster Lee Sedol suggested as much when he announced his retirement from professional Go, blaming the advances of Go-playing programs like AlphaGo for sapping his will to play the game at (...) Download Export citation Bookmark
Friendly AI will still be our master. Or, why we should not want to be the pets of super-intelligent computers.Robert Sparrow - 2024 - AI and Society 39 (5):2439-2444.details When asked about humanity’s future relationship with computers, Marvin Minsky famously replied “If we’re lucky, they might decide to keep us as pets”. A number of eminent authorities continue to argue that there is a real danger that “super-intelligent” machines will enslave—perhaps even destroy—humanity. One might think that it would swiftly follow that we should abandon the pursuit of AI. Instead, most of those who purport to be concerned about the existential threat posed by AI default to worrying about what (...) Download Export citation Bookmark 5 citations
Moral disagreement and artificial intelligence.Pamela Robinson - 2024 - AI and Society 39 (5):2425-2438.details Artificially intelligent systems will be used to make increasingly important decisions about us. Many of these decisions will have to be made without universal agreement about the relevant moral facts. For other kinds of disagreement, it is at least usually obvious what kind of solution is called for. What makes moral disagreement especially challenging is that there are three different ways of handling it. _Moral solutions_ apply a moral theory or related principles and largely ignore the details of the disagreement. (...) Download Export citation Bookmark
On the Philosophy of Unsupervised Learning.David S. Watson - 2023 - Philosophy and Technology 36 (2):1-26.details Unsupervised learning algorithms are widely used for many important statistical tasks with numerous applications in science and industry. Yet despite their prevalence, they have attracted remarkably little philosophical scrutiny to date. This stands in stark contrast to supervised and reinforcement learning algorithms, which have been widely studied and critically evaluated, often with an emphasis on ethical concerns. In this article, I analyze three canonical unsupervised learning problems: clustering, abstraction, and generative modeling. I argue that these methods raise unique epistemological and (...) Download Export citation Bookmark 4 citations
Risk and Responsibility in Context.Adriana Placani & Stearns Broadhead (eds.) - 2023 - New York: Routledge.details This volume bridges contemporary philosophical conceptions of risk and responsibility and offers an extensive examination of the topic. It shows that risk and responsibility combine in ways that give rise to new philosophical questions and problems. Philosophical interest in the relationship between risk and responsibility continues to rise, due in no small part due to environmental crises, emerging technologies, legal developments, and new medical advances. Despite such interest, scholars are just now working out how to conceive of the links between (...) Download Export citation Bookmark
Est-ce que Vous Compute?Arianna Falbo & Travis LaCroix - 2022 - Feminist Philosophy Quarterly 8 (3).details Cultural code-switching concerns how we adjust our overall behaviours, manners of speaking, and appearance in response to a perceived change in our social environment. We defend the need to investigate cultural code-switching capacities in artificial intelligence systems. We explore a series of ethical and epistemic issues that arise when bringing cultural code-switching to bear on artificial intelligence. Building upon Dotson’s (2014) analysis of testimonial smothering, we discuss how emerging technologies in AI can give rise to epistemic oppression, and specifically, a (...) Download Export citation Bookmark 2 citations
In Conversation with Artificial Intelligence: Aligning language Models with Human Values.Atoosa Kasirzadeh - 2023 - Philosophy and Technology 36 (2):1-24.details Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this (...) Download Export citation Bookmark 11 citations
An Institutionalist Approach to AI Ethics: Justifying the Priority of Government Regulation over Self-Regulation.Thomas Ferretti - 2022 - Moral Philosophy and Politics 9 (2):239-265.details This article explores the cooperation of government and the private sector to tackle the ethical dimension of artificial intelligence. The argument draws on the institutionalist approach in philosophy and business ethics defending a ‘division of moral labor’ between governments and the private sector. The goal and main contribution of this article is to explain how this approach can provide ethical guidelines to the AI industry and to highlight the limits of self-regulation. In what follows, I discuss three institutionalist claims. First, (...) Download Export citation Bookmark 2 citations
Artificial virtuous agents in a multi-agent tragedy of the commons.Jakob Stenseke - 2022 - AI and Society:1-18.details Although virtue ethics has repeatedly been proposed as a suitable framework for the development of artificial moral agents, it has been proven difficult to approach from a computational perspective. In this work, we present the first technical implementation of artificial virtuous agents in moral simulations. First, we review previous conceptual and technical work in artificial virtue ethics and describe a functionalistic path to AVAs based on dispositional virtues, bottom-up learning, and top-down eudaimonic reward. We then provide the details of a (...) Download Export citation Bookmark 1 citation
AI and society: a virtue ethics approach.Mirko Farina, Petr Zhdanov, Artur Karimov & Andrea Lavazza - 2024 - AI and Society 39 (3):1127-1140.details Advances in artificial intelligence and robotics stand to change many aspects of our lives, including our values. If trends continue as expected, many industries will undergo automation in the near future, calling into question whether we can still value the sense of identity and security our occupations once provided us with. Likewise, the advent of social robots driven by AI, appears to be shifting the meaning of numerous, long-standing values associated with interpersonal relationships, like friendship. Furthermore, powerful actors’ and institutions’ (...) Download Export citation Bookmark 8 citations
Why Machines Will Never Rule the World: Artificial Intelligence without Fear.Jobst Landgrebe & Barry Smith - 2022 - Abingdon, England: Routledge.details The book’s core argument is that an artificial intelligence that could equal or exceed human intelligence—sometimes called artificial general intelligence (AGI)—is for mathematical reasons impossible. It offers two specific reasons for this claim: Human intelligence is a capability of a complex dynamic system—the human brain and central nervous system. Systems of this sort cannot be modelled mathematically in a way that allows them to operate inside a computer. In supporting their claim, the authors, Jobst Landgrebe and Barry Smith, marshal evidence (...) Download Export citation Bookmark 4 citations
Twisted thinking: Technology, values and critical thinking.Lavinia Marin & Steinert Steffen - 2022 - Prometheus. Critical Studies in Innovation 38 (1):124-140.details Technology should be aligned with our values. We make the case that attempts to align emerging technologies with our values should reflect critically on these values. Critical thinking seems like a natural starting point for the critical assessment of our values. However, extant conceptualizations of critical thinking carve out no space for the critical scrutiny of values. We will argue that we need critical thinking that focuses on values instead of taking them as unexamined starting points. In order to play (...) Download Export citation Bookmark
Interdisciplinary Confusion and Resolution in the Context of Moral Machines.Jakob Stenseke - 2022 - Science and Engineering Ethics 28 (3):1-17.details Recent advancements in artificial intelligence have fueled widespread academic discourse on the ethics of AI within and across a diverse set of disciplines. One notable subfield of AI ethics is machine ethics, which seeks to implement ethical considerations into AI systems. However, since different research efforts within machine ethics have discipline-specific concepts, practices, and goals, the resulting body of work is pestered with conflict and confusion as opposed to fruitful synergies. The aim of this paper is to explore ways to (...) Download Export citation Bookmark 3 citations
The Ghost in the Machine has an American accent: value conflict in GPT-3.Rebecca Johnson, Giada Pistilli, Natalia Menedez-Gonzalez, Leslye Denisse Dias Duran, Enrico Panai, Julija Kalpokiene & Donald Jay Bertulfo - manuscriptdetails The alignment problem in the context of large language models must consider the plurality of human values in our world. Whilst there are many resonant and overlapping values amongst the world’s cultures, there are also many conflicting, yet equally valid, values. It is important to observe which cultural values a model exhibits, particularly when there is a value conflict between input prompts and generated outputs. We discuss how the co- creation of language and cultural value impacts large language models (LLMs). (...) Download Export citation Bookmark
The Global Governance of Artificial Intelligence: Some Normative Concerns.Eva Erman & Markus Furendal - 2022 - Moral Philosophy and Politics 9 (2):267-291.details The creation of increasingly complex artificial intelligence (AI) systems raises urgent questions about their ethical and social impact on society. Since this impact ultimately depends on political decisions about normative issues, political philosophers can make valuable contributions by addressing such questions. Currently, AI development and application are to a large extent regulated through non-binding ethics guidelines penned by transnational entities. Assuming that the global governance of AI should be at least minimally democratic and fair, this paper sets out three desiderata (...) Download Export citation Bookmark 4 citations
Human Goals Are Constitutive of Agency in Artificial Intelligence.Elena Popa - 2021 - Philosophy and Technology 34 (4):1731-1750.details The question whether AI systems have agency is gaining increasing importance in discussions of responsibility for AI behavior. This paper argues that an approach to artificial agency needs to be teleological, and consider the role of human goals in particular if it is to adequately address the issue of responsibility. I will defend the view that while AI systems can be viewed as autonomous in the sense of identifying or pursuing goals, they rely on human goals and other values incorporated (...) Download Export citation Bookmark 10 citations
Aligning artificial intelligence with human values: reflections from a phenomenological perspective.Shengnan Han, Eugene Kelly, Shahrokh Nikou & Eric-Oluf Svee - 2022 - AI and Society 37 (4):1383-1395.details Artificial Intelligence (AI) must be directed at humane ends. The development of AI has produced great uncertainties of ensuring AI alignment with human values (AI value alignment) through AI operations from design to use. For the purposes of addressing this problem, we adopt the phenomenological theories of material values and technological mediation to be that beginning step. In this paper, we first discuss the AI value alignment from the relevant AI studies. Second, we briefly present what are material values and (...) Download Export citation Bookmark 3 citations
Ethics-based auditing of automated decision-making systems: nature, scope, and limitations.Jakob Mökander, Jessica Morley, Mariarosaria Taddeo & Luciano Floridi - 2021 - Science and Engineering Ethics 27 (4):1–30.details Important decisions that impact humans lives, livelihoods, and the natural environment are increasingly being automated. Delegating tasks to so-called automated decision-making systems can improve efficiency and enable new solutions. However, these benefits are coupled with ethical challenges. For example, ADMS may produce discriminatory outcomes, violate individual privacy, and undermine human self-determination. New governance mechanisms are thus needed that help organisations design and deploy ADMS in ways that are ethical, while enabling society to reap the full economic and social benefits of (...) Download Export citation Bookmark 13 citations
Challenges of Aligning Artificial Intelligence with Human Values.Margit Sutrop - 2020 - Acta Baltica Historiae Et Philosophiae Scientiarum 8 (2):54-72.details As artificial intelligence systems are becoming increasingly autonomous and will soon be able to make decisions on their own about what to do, AI researchers have started to talk about the need to align AI with human values. The AI ‘value alignment problem’ faces two kinds of challenges—a technical and a normative one—which are interrelated. The technical challenge deals with the question of how to encode human values in artificial intelligence. The normative challenge is associated with two questions: “Which values (...) Download Export citation Bookmark 5 citations
Where Bioethics Meets Machine Ethics.Anna C. F. Lewis - 2020 - American Journal of Bioethics 20 (11):22-24.details Char et al. question the extent and degree to which machine learning applications should be treated as exceptional by ethicists. It is clear that of the suite of ethical issues raised by mac... Download Export citation Bookmark 2 citations
Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence.Shakir Mohamed, Marie-Therese Png & William Isaac - 2020 - Philosophy and Technology 33 (4):659-684.details This paper explores the important role of critical science, and in particular of post-colonial and decolonial theories, in understanding and shaping the ongoing advances in artificial intelligence. Artificial intelligence is viewed as amongst the technological advances that will reshape modern societies and their relations. While the design and deployment of systems that continually adapt holds the promise of far-reaching positive change, they simultaneously pose significant risks, especially to already vulnerable peoples. Values and power are central to this discussion. Decolonial theories (...) Download Export citation Bookmark 40 citations
The argument for near-term human disempowerment through AI.Leonard Dung - 2024 - AI and Society:1-14.details Many researchers and intellectuals warn about extreme risks from artificial intelligence. However, these warnings typically came without systematic arguments in support. This paper provides an argument that AI will lead to the permanent disempowerment of humanity, e.g. human extinction, by 2100. It rests on four substantive premises which it motivates and defends: first, the speed of advances in AI capability, as well as the capability level current systems have already reached, suggest that it is practically possible to build AI systems (...) Download Export citation Bookmark 4 citations
Assessment of Cognitive Behavioral Characteristics in Intelligent Systems with Predictive Ability and Computing Power.Oleg V. Kubryak, Sergey V. Kovalchuk & Nadezhda G. Bagdasaryan - 2023 - Philosophies 8 (5):75.details The article proposes a universal dual-axis intelligent systems assessment scale. The scale considers the properties of intelligent systems within the environmental context, which develops over time. In contrast to the frequent consideration of the “mind” of artificial intelligent systems on a scale from “weak” to “strong”, we highlight the modulating influences of anticipatory ability on their “brute force”. In addition, the complexity, the ”weight“ of the cognitive task and the ability to critically assess it beforehand determine the actual set of (...) Download Export citation Bookmark
What ethics can say on artificial intelligence: Insights from a systematic literature review.Francesco Vincenzo Giarmoleo, Ignacio Ferrero, Marta Rocchi & Massimiliano Matteo Pellegrini - 2024 - Business and Society Review 129 (2):258-292.details The abundance of literature on ethical concerns regarding artificial intelligence (AI) highlights the need to systematize, integrate, and categorize existing efforts through a systematic literature review. The article aims to investigate prevalent concerns, proposed solutions, and prominent ethical approaches within the field. Considering 309 articles from the beginning of the publications in this field up until December 2021, this systematic literature review clarifies what the ethical concerns regarding AI are, and it charts them into two groups: (i) ethical concerns that (...) Download Export citation Bookmark
Know Thyself, Improve Thyself: Personalized LLMs for Self-Knowledge and Moral Enhancement.Alberto Giubilini, Sebastian Porsdam Mann, Cristina Voinea, Brian Earp & Julian Savulescu - 2024 - Science and Engineering Ethics 30 (6):1-15.details In this paper, we suggest that personalized LLMs trained on information written by or otherwise pertaining to an individual could serve as artificial moral advisors (AMAs) that account for the dynamic nature of personal morality. These LLM-based AMAs would harness users’ past and present data to infer and make explicit their sometimes-shifting values and preferences, thereby fostering self-knowledge. Further, these systems may also assist in processes of self-creation, by helping users reflect on the kind of person they want to be (...) Download Export citation Bookmark 1 citation
Beyond Preferences in AI Alignment.Tan Zhi-Xuan, Micah Carroll, Matija Franklin & Hal Ashton - forthcoming - Philosophical Studies:1-51.details The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term apreferentistapproach to AI alignment. In this paper, we characterize (...) Download Export citation Bookmark
Exploring the psychology of LLMs’ Moral and Legal Reasoning.Guilherme F. C. F. Almeida, José Luiz Nunes, Neele Engelmann, Alex Wiegmann & Marcelo de Araújo - forthcoming - Artificial Intelligence.details Download Export citation Bookmark
No wheel but a dial: why and how passengers in self-driving cars should decide how their car drives.Johannes Himmelreich - 2022 - Ethics and Information Technology 24 (4):1-12.details Much of the debate on the ethics of self-driving cars has revolved around trolley scenarios. This paper instead takes up the political or institutional question of who should decide how a self-driving car drives. Specifically, this paper is on the question of whether and why passengers should be able to control how their car drives. The paper reviews existing arguments—those for passenger ethics settings and for mandatory ethics settings respectively—and argues that they fail. Although the arguments are not successful, they (...) Download Export citation Bookmark 2 citations
Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective.Tom Everitt, Marcus Hutter, Ramana Kumar & Victoria Krakovna - 2021 - Synthese 198 (Suppl 27):6435-6467.details Can humans get arbitrarily capable reinforcement learning agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we study when an RL agent has an instrumental goal to tamper with its reward process, and describe design principles that prevent instrumental (...) Download Export citation Bookmark 4 citations
Reconstructing AI Ethics Principles: Rawlsian Ethics of Artificial Intelligence.Salla Westerstrand - 2024 - Science and Engineering Ethics 30 (5):1-21.details The popularisation of Artificial Intelligence (AI) technologies has sparked discussion about their ethical implications. This development has forced governmental organisations, NGOs, and private companies to react and draft ethics guidelines for future development of ethical AI systems. Whereas many ethics guidelines address values familiar to ethicists, they seem to lack in ethical justifications. Furthermore, most tend to neglect the impact of AI on democracy, governance, and public deliberation. Existing research suggest, however, that AI can threaten key elements of western democracies (...) Download Export citation Bookmark
Value Alignment for Advanced Artificial Judicial Intelligence.Christoph Winter, Nicholas Hollman & David Manheim - 2023 - American Philosophical Quarterly 60 (2):187-203.details This paper considers challenges resulting from the use of advanced artificial judicial intelligence (AAJI). We argue that these challenges should be considered through the lens of value alignment. Instead of discussing why specific goals and values, such as fairness and nondiscrimination, ought to be implemented, we consider the question of how AAJI can be aligned with goals and values more generally, in order to be reliably integrated into legal and judicial systems. This value alignment framing draws on AI safety and (...) Download Export citation Bookmark 2 citations
Domesticating Artificial Intelligence.Luise Müller - 2022 - Moral Philosophy and Politics 9 (2):219-237.details For their deployment in human societies to be safe, AI agents need to be aligned with value-laden cooperative human life. One way of solving this “problem of value alignment” is to build moral machines. I argue that the goal of building moral machines aims at the wrong kind of ideal, and that instead, we need an approach to value alignment that takes seriously the categorically different cognitive and moral capabilities between human and AI agents, a condition I call deep agential (...) Download Export citation Bookmark
A qualified defense of top-down approaches in machine ethics.Tyler Cook - forthcoming - AI and Society:1-15.details This paper concerns top-down approaches in machine ethics. It is divided into three main parts. First, I briefly describe top-down design approaches, and in doing so I make clear what those approaches are committed to and what they involve when it comes to training an AI to behave ethically. In the second part, I formulate two underappreciated motivations for endorsing them, one relating to predictability of machine behavior and the other relating to scrutability of machine decision-making. Finally, I present three (...) Download Export citation Bookmark
Ethics-based auditing of automated decision-making systems: intervention points and policy implications.Jakob Mökander & Maria Axente - 2023 - AI and Society 38 (1):153-171.details Organisations increasingly use automated decision-making systems (ADMS) to inform decisions that affect humans and their environment. While the use of ADMS can improve the accuracy and efficiency of decision-making processes, it is also coupled with ethical challenges. Unfortunately, the governance mechanisms currently used to oversee human decision-making often fail when applied to ADMS. In previous work, we proposed that ethics-based auditing (EBA)—that is, a structured process by which ADMS are assessed for consistency with relevant principles or norms—can (a) help organisations (...) Download Export citation Bookmark 4 citations

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...