Is Alignment Unsafe?

Philosophy and Technology 37 (110):1–4 (2024)
  Copy   BIBTEX

Abstract

Inchul Yum (2024) argues that the widespread adoption of language agent architectures would likely increase the risk posed by AI by simplifying the process of aligning artificial systems with human values and thereby making it easier for malicious actors to use them to cause a variety of harms. Yum takes this to be an example of a broader phenomenon: progress on the alignment problem is likely to be net safety-negative because it makes artificial systems easier for malicious actors to control. I offer some reasons for skepticism about this surprising and pessimistic conclusion.

Author's Profile

Cameron Domenico Kirk-Giannini
Rutgers University - Newark

Analytics

Added to PP
2024-08-20

Downloads
90 (#93,961)

6 months
90 (#71,684)

Historical graph of downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.
How can I increase my downloads?