Switch to: References

Add citations

You must login to add citations.
  1. Two Types of AI Existential Risk: Decisive and Accumulative.Atoosa Kasirzadeh - manuscript
    The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  • Existentialist risk and value misalignment.Ariela Tubert & Justin Tiehen - forthcoming - Philosophical Studies.
    We argue that two long-term goals of AI research stand in tension with one another. The first involves creating AI that is safe, where this is understood as solving the problem of value alignment. The second involves creating artificial general intelligence, meaning AI that operates at or beyond human capacity across all or many intellectual domains. Our argument focuses on the human capacity to make what we call “existential choices”, choices that transform who we are as persons, including transforming what (...)
    Download  
     
    Export citation  
     
    Bookmark   2 citations  
  • Disagreement, AI alignment, and bargaining.Harry R. Lloyd - forthcoming - Philosophical Studies:1-31.
    New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, biomedicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that – dependent on the alignment target chosen – our AIs might optimise for objectives that reflect the values only of a certain subset of society, and (...)
    Download  
     
    Export citation  
     
    Bookmark  
  • Language Agents and Malevolent Design.Inchul Yum - 2024 - Philosophy and Technology 37 (104):1-19.
    Language agents are AI systems capable of understanding and responding to natural language, potentially facilitating the process of encoding human goals into AI systems. However, this paper argues that if language agents can achieve easy alignment, they also increase the risk of malevolent agents building harmful AI systems aligned with destructive intentions. The paper contends that if training AI becomes sufficiently easy or is perceived as such, it enables malicious actors, including rogue states, terrorists, and criminal organizations, to create powerful (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  • Promotionalism, Orthogonality, and Instrumental Convergence.Nathaniel Sharadin - forthcoming - Philosophical Studies:1-31.
    Suppose there are no in-principle restrictions on the contents of arbitrarily intelligent agents’ goals. According to “instrumental convergence” arguments, potentially scary things follow. I do two things in this paper. First, focusing on the influential version of the instrumental convergence argument due to Nick Bostrom, I explain why such arguments require an account of “promotion,” i.e., an account of what it is to “promote” a goal. Then, I consider whether extant accounts of promotion in the literature -- in particular, probabilistic (...)
    Download  
     
    Export citation  
     
    Bookmark  
  • Deception and manipulation in generative AI.Christian Tarsney - forthcoming - Philosophical Studies.
    Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own purposes. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of (...)
    Download  
     
    Export citation  
     
    Bookmark  
  • Is Alignment Unsafe?Cameron Domenico Kirk-Giannini - 2024 - Philosophy and Technology 37 (110):1–4.
    Inchul Yum (2024) argues that the widespread adoption of language agent architectures would likely increase the risk posed by AI by simplifying the process of aligning artificial systems with human values and thereby making it easier for malicious actors to use them to cause a variety of harms. Yum takes this to be an example of a broader phenomenon: progress on the alignment problem is likely to be net safety-negative because it makes artificial systems easier for malicious actors to control. (...)
    Download  
     
    Export citation  
     
    Bookmark  
  • (1 other version)AI safety: a climb to Armageddon?Herman Cappelen, Josh Dever & John Hawthorne - forthcoming - Philosophical Studies.
    This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: Optimism, Mitigation, and (...)
    Download  
     
    Export citation  
     
    Bookmark  
  • A Roadmap for Governing AI: Technology Governance and Power-Sharing Liberalism.Danielle Allen, Woojin Lim, Sarah Hubbard, Allison Stanger, Shlomit Wagman, Kinney Zalesne & Omoaholo Omoakhalen - 2025 - AI and Ethics 4 (4).
    This paper aims to provide a roadmap for governing AI. In contrast to the reigning paradigms, we argue that AI governance should be not merely a reactive, punitive, status-quo-defending enterprise, but rather the expression of an expansive, proactive vision for technology—to advance human flourishing. Advancing human flourishing in turn requires democratic/political stability and economic empowerment. To accomplish this, we build on a new normative framework that will give humanity its best chance to reap the full benefits, while avoiding the dangers, (...)
    Download  
     
    Export citation  
     
    Bookmark