Switch to: References

Citations of:

Understanding and Avoiding AI Failures: A Practical Guide

Robert Williams & Roman Yampolskiy

Philosophies 6 (3):53 (2019)

Add citations

You must login to add citations.

Language Agents and Malevolent Design.Inchul Yum - 2024 - Philosophy and Technology 37 (104):1-19.details Language agents are AI systems capable of understanding and responding to natural language, potentially facilitating the process of encoding human goals into AI systems. However, this paper argues that if language agents can achieve easy alignment, they also increase the risk of malevolent agents building harmful AI systems aligned with destructive intentions. The paper contends that if training AI becomes sufficiently easy or is perceived as such, it enables malicious actors, including rogue states, terrorists, and criminal organizations, to create powerful (...) Download Export citation Bookmark 1 citation
Provably Safe Artificial General Intelligence via Interactive Proofs.Kristen Carlson - 2021 - Philosophies 6 (4):83.details Methods are currently lacking to _prove_ artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation _AGI 1 _ rapidly triggers a succession of more powerful _AGI n _ that differ dramatically in their computational capabilities (_AGI n _ _n_+1 ). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for (...) Download Export citation Bookmark

1