Switch to: References

Add citations

You must login to add citations.
  1. Language Agents and Malevolent Design.Inchul Yum - 2024 - Philosophy and Technology 37 (104):1-19.
    Language agents are AI systems capable of understanding and responding to natural language, potentially facilitating the process of encoding human goals into AI systems. However, this paper argues that if language agents can achieve easy alignment, they also increase the risk of malevolent agents building harmful AI systems aligned with destructive intentions. The paper contends that if training AI becomes sufficiently easy or is perceived as such, it enables malicious actors, including rogue states, terrorists, and criminal organizations, to create powerful (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  • Provably Safe Artificial General Intelligence via Interactive Proofs.Kristen Carlson - 2021 - Philosophies 6 (4):83.
    Methods are currently lacking to _prove_ artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation _AGI 1 _ rapidly triggers a succession of more powerful _AGI n _ that differ dramatically in their computational capabilities (_AGI n _ _n_+1 ). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for (...)
    Download  
     
    Export citation  
     
    Bookmark