Results for 'AI safety'

946 found
Order:
  1. AI Safety: A Climb To Armageddon?Herman Cappelen, Dever Josh & Hawthorne John - manuscript
    This paper presents an argument that certain AI safety measures, rather than mitigating existential risk, may instead exacerbate it. Under certain key assumptions - the inevitability of AI failure, the expected correlation between an AI system's power at the point of failure and the severity of the resulting harm, and the tendency of safety measures to enable AI systems to become more powerful before failing - safety efforts have negative expected utility. The paper examines three response strategies: (...)
    Download  
     
    Export citation  
     
    Bookmark  
  2. Global Solutions vs. Local Solutions for the AI Safety Problem.Alexey Turchin - 2019 - Big Data Cogn. Comput 3 (1).
    There are two types of artificial general intelligence (AGI) safety solutions: global and local. Most previously suggested solutions are local: they explain how to align or “box” a specific AI (Artificial Intelligence), but do not explain how to prevent the creation of dangerous AI in other places. Global solutions are those that ensure any AI on Earth is not dangerous. The number of suggested global solutions is much smaller than the number of proposed local solutions. Global solutions can be (...)
    Download  
     
    Export citation  
     
    Bookmark   2 citations  
  3. AI Rights for Human Safety.Peter Salib & Simon Goldstein - manuscript
    AI companies are racing to create artificial general intelligence, or “AGI.” If they succeed, the result will be human-level AI systems that can independently pursue high-level goals by formulating and executing long-term plans in the real world. Leading AI researchers agree that some of these systems will likely be “misaligned”–pursuing goals that humans do not desire. This goal mismatch will put misaligned AIs and humans into strategic competition with one another. As with present-day strategic competition between nations with incompatible goals, (...)
    Download  
     
    Export citation  
     
    Bookmark  
  4. Levels of Self-Improvement in AI and their Implications for AI Safety.Alexey Turchin - manuscript
    Abstract: This article presents a model of self-improving AI in which improvement could happen on several levels: hardware, learning, code and goals system, each of which has several sublevels. We demonstrate that despite diminishing returns at each level and some intrinsic difficulties of recursive self-improvement—like the intelligence-measuring problem, testing problem, parent-child problem and halting risks—even non-recursive self-improvement could produce a mild form of superintelligence by combining small optimizations on different levels and the power of learning. Based on this, we analyze (...)
    Download  
     
    Export citation  
     
    Bookmark  
  5. Acceleration AI Ethics, the Debate between Innovation and Safety, and Stability AI’s Diffusion versus OpenAI’s Dall-E.James Brusseau - manuscript
    One objection to conventional AI ethics is that it slows innovation. This presentation responds by reconfiguring ethics as an innovation accelerator. The critical elements develop from a contrast between Stability AI’s Diffusion and OpenAI’s Dall-E. By analyzing the divergent values underlying their opposed strategies for development and deployment, five conceptions are identified as common to acceleration ethics. Uncertainty is understood as positive and encouraging, rather than discouraging. Innovation is conceived as intrinsically valuable, instead of worthwhile only as mediated by social (...)
    Download  
     
    Export citation  
     
    Bookmark  
  6. Unpredictability of AI.Roman Yampolskiy - manuscript
    The young field of AI Safety is still in the process of identifying its challenges and limitations. In this paper, we formally describe one such impossibility result, namely Unpredictability of AI. We prove that it is impossible to precisely and consistently predict what specific actions a smarter-than-human intelligent system will take to achieve its objectives, even if we know terminal goals of the system. In conclusion, impact of Unpredictability on AI Safety is discussed.
    Download  
     
    Export citation  
     
    Bookmark   3 citations  
  7. Will AI and Humanity Go to War?Simon Goldstein - manuscript
    This paper offers the first careful analysis of the possibility that AI and humanity will go to war. The paper focuses on the case of artificial general intelligence, AI with broadly human capabilities. The paper uses a bargaining model of war to apply standard causes of war to the special case of AI/human conflict. The paper argues that information failures and commitment problems are especially likely in AI/human conflict. Information failures would be driven by the difficulty of measuring AI capabilities, (...)
    Download  
     
    Export citation  
     
    Bookmark  
  8. Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”.Alexey Turchin - manuscript
    In this article we explore a promising way to AI safety: to send a message now (by openly publishing it on the Internet) that may be read by any future AI, no matter who builds it and what goal system it has. Such a message is designed to affect the AI’s behavior in a positive way, that is, to increase the chances that the AI will be benevolent. In other words, we try to persuade “paperclip maximizer” that it is (...)
    Download  
     
    Export citation  
     
    Bookmark  
  9. Values in science and AI alignment research.Leonard Dung - manuscript
    Roughly, empirical AI alignment research (AIA) is an area of AI research which investigates empirically how to design AI systems in line with human goals. This paper examines the role of non-epistemic values in AIA. It argues that: (1) Sciences differ in the degree to which values influence them. (2) AIA is strongly value-laden. (3) This influence of values is managed inappropriately and thus threatens AIA’s epistemic integrity and ethical beneficence. (4) AIA should strive to achieve value transparency, critical scrutiny (...)
    Download  
     
    Export citation  
     
    Bookmark  
  10. Will AI take away your job? [REVIEW]Marie Oldfield - 2020 - Tech Magazine.
    Will AI take away your job? The answer is probably not. AI systems can be good predictive systems and be very good at pattern recognition. AI systems have a very repetitive approach to sets of data, which can be useful in certain circumstances. However, AI does make obvious mistakes. This is because AI does not have a sense of context. As Humans we have years of experience in the real world. We have vast amounts of contextual data stored in our (...)
    Download  
     
    Export citation  
     
    Bookmark  
  11. Group Prioritarianism: Why AI should not replace humanity.Frank Hong - 2024 - Philosophical Studies:1-19.
    If a future AI system can enjoy far more well-being than a human per resource, what would be the best way to allocate resources between these future AI and our future descendants? It is obvious that on total utilitarianism, one should give everything to the AI. However, it turns out that every Welfarist axiology on the market also gives this same recommendation, at least if we assume consequentialism. Without resorting to non-consequentialist normative theories that suggest that we ought not always (...)
    Download  
     
    Export citation  
     
    Bookmark  
  12. AI Alignment Problem: “Human Values” don’t Actually Exist.Alexey Turchin - manuscript
    Abstract. The main current approach to the AI safety is AI alignment, that is, the creation of AI whose preferences are aligned with “human values.” Many AI safety researchers agree that the idea of “human values” as a constant, ordered sets of preferences is at least incomplete. However, the idea that “humans have values” underlies a lot of thinking in the field; it appears again and again, sometimes popping up as an uncritically accepted truth. Thus, it deserves a (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  13. AI Alignment vs. AI Ethical Treatment: Ten Challenges.Adam Bradley & Bradford Saad - manuscript
    A morally acceptable course of AI development should avoid two dangers: creating unaligned AI systems that pose a threat to humanity and mistreating AI systems that merit moral consideration in their own right. This paper argues these two dangers interact and that if we create AI systems that merit moral consideration, simultaneously avoiding both of these dangers would be extremely challenging. While our argument is straightforward and supported by a wide range of pretheoretical moral judgments, it has far-reaching moral implications (...)
    Download  
     
    Export citation  
     
    Bookmark  
  14. Unjustified untrue "beliefs": AI hallucinations and justification logics.Kristina Šekrst - forthcoming - In Kordula Świętorzecka, Filip Grgić & Anna Brozek (eds.), Logic, Knowledge, and Tradition. Essays in Honor of Srecko Kovac.
    In artificial intelligence (AI), responses generated by machine-learning models (most often large language models) may be unfactual information presented as a fact. For example, a chatbot might state that the Mona Lisa was painted in 1815. Such phenomenon is called AI hallucinations, seeking inspiration from human psychology, with a great difference of AI ones being connected to unjustified beliefs (that is, AI “beliefs”) rather than perceptual failures). -/- AI hallucinations may have their source in the data itself, that is, the (...)
    Download  
     
    Export citation  
     
    Bookmark  
  15. AI-Related Misdirection Awareness in AIVR.Nadisha-Marie Aliman & Leon Kester - manuscript
    Recent AI progress led to a boost in beneficial applications from multiple research areas including VR. Simultaneously, in this newly unfolding deepfake era, ethically and security-relevant disagreements arose in the scientific community regarding the epistemic capabilities of present-day AI. However, given what is at stake, one can postulate that for a responsible approach, prior to engaging in a rigorous epistemic assessment of AI, humans may profit from a self-questioning strategy, an examination and calibration of the experience of their own epistemic (...)
    Download  
     
    Export citation  
     
    Bookmark  
  16. Catastrophically Dangerous AI is Possible Before 2030.Alexey Turchin - manuscript
    In AI safety research, the median timing of AGI arrival is often taken as a reference point, which various polls predict to happen in the middle of 21 century, but for maximum safety, we should determine the earliest possible time of Dangerous AI arrival. Such Dangerous AI could be either AGI, capable of acting completely independently in the real world and of winning in most real-world conflicts with humans, or an AI helping humans to build weapons of mass (...)
    Download  
     
    Export citation  
     
    Bookmark  
  17. Assessing the future plausibility of catastrophically dangerous AI.Alexey Turchin - 2018 - Futures.
    In AI safety research, the median timing of AGI creation is often taken as a reference point, which various polls predict will happen in second half of the 21 century, but for maximum safety, we should determine the earliest possible time of dangerous AI arrival and define a minimum acceptable level of AI risk. Such dangerous AI could be either narrow AI facilitating research into potentially dangerous technology like biotech, or AGI, capable of acting completely independently in the (...)
    Download  
     
    Export citation  
     
    Bookmark  
  18. The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists.Elliott Thornley - forthcoming - Philosophical Studies:1-28.
    I explain the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  19. AI as IA: The use and abuse of artificial intelligence (AI) for human enhancement through intellectual augmentation (IA).Alexandre Erler & Vincent C. Müller - 2023 - In Fabrice Jotterand & Marcello Ienca (eds.), The Routledge Handbook of the Ethics of Human Enhancement. Routledge. pp. 187-199.
    This paper offers an overview of the prospects and ethics of using AI to achieve human enhancement, and more broadly what we call intellectual augmentation (IA). After explaining the central notions of human enhancement, IA, and AI, we discuss the state of the art in terms of the main technologies for IA, with or without brain-computer interfaces. Given this picture, we discuss potential ethical problems, namely inadequate performance, safety, coercion and manipulation, privacy, cognitive liberty, authenticity, and fairness in more (...)
    Download  
     
    Export citation  
     
    Bookmark  
  20. Military AI as a Convergent Goal of Self-Improving AI.Alexey Turchin & Denkenberger David - 2018 - In Turchin Alexey & David Denkenberger (eds.), Artificial Intelligence Safety and Security. CRC Press.
    Better instruments to predict the future evolution of artificial intelligence (AI) are needed, as the destiny of our civilization depends on it. One of the ways to such prediction is the analysis of the convergent drives of any future AI, started by Omohundro. We show that one of the convergent drives of AI is a militarization drive, arising from AI’s need to wage a war against its potential rivals by either physical or software means, or to increase its bargaining power. (...)
    Download  
     
    Export citation  
     
    Bookmark   3 citations  
  21. Cybercrime and Online Safety: Addressing the Challenges and Solutions Related to Cybercrime, Online Fraud, and Ensuring a Safe Digital Environment for All Users— A Case of African States (10th edition).Emmanuel N. Vitus - 2023 - Tijer- International Research Journal 10 (9):975-989.
    The internet has made the world more linked than ever before. While taking advantage of this online transition, cybercriminals target flaws in online systems, networks, and infrastructure. Businesses, government organizations, people, and communities all across the world, particularly in African countries, are all severely impacted on an economic and social level. Many African countries focused more on developing secure electricity and internet networks; yet, cybersecurity usually receives less attention than it should. One of Africa's major issues is the lack of (...)
    Download  
     
    Export citation  
     
    Bookmark  
  22. How to design AI for social good: seven essential factors.Luciano Floridi, Josh Cowls, Thomas C. King & Mariarosaria Taddeo - 2020 - Science and Engineering Ethics 26 (3):1771–1796.
    The idea of artificial intelligence for social good is gaining traction within information societies in general and the AI community in particular. It has the potential to tackle social problems through the development of AI-based solutions. Yet, to date, there is only limited understanding of what makes AI socially good in theory, what counts as AI4SG in practice, and how to reproduce its initial successes in terms of policies. This article addresses this gap by identifying seven ethical factors that are (...)
    Download  
     
    Export citation  
     
    Bookmark   38 citations  
  23. Deontology and Safe Artificial Intelligence.William D’Alessandro - forthcoming - Philosophical Studies:1-24.
    The field of AI safety aims to prevent increasingly capable artificially intelligent systems from causing humans harm. Research on moral alignment is widely thought to offer a promising safety strategy: if we can equip AI systems with appropriate ethical rules, according to this line of thought, they'll be unlikely to disempower, destroy or otherwise seriously harm us. Deontological morality looks like a particularly attractive candidate for an alignment target, given its popularity, relative technical tractability and commitment to harm-avoidance (...)
    Download  
     
    Export citation  
     
    Bookmark  
  24. Literature Review: What Artificial General Intelligence Safety Researchers Have Written About the Nature of Human Values.Alexey Turchin & David Denkenberger - manuscript
    Abstract: The field of artificial general intelligence (AGI) safety is quickly growing. However, the nature of human values, with which future AGI should be aligned, is underdefined. Different AGI safety researchers have suggested different theories about the nature of human values, but there are contradictions. This article presents an overview of what AGI safety researchers have written about the nature of human values, up to the beginning of 2019. 21 authors were overviewed, and some of them have (...)
    Download  
     
    Export citation  
     
    Bookmark  
  25. Artificial thinking and doomsday projections: a discourse on trust, ethics and safety.Jeffrey White, Dietrich Brandt, Jan Söffner & Larry Stapleton - 2023 - AI and Society 38 (6):2119-2124.
    The article reflects on where AI is headed and the world along with it, considering trust, ethics and safety. Implicit in artificial thinking and doomsday appraisals is the engineered divorce from reality of sublime human embodiment. Jeffrey White, Dietrich Brandt, Jan Soeffner, and Larry Stapleton, four scholars associated with AI & Society, address these issues, and more, in the following exchange.
    Download  
     
    Export citation  
     
    Bookmark  
  26. On Controllability of Artificial Intelligence.Roman Yampolskiy - 2016
    Invention of artificial general intelligence is predicted to cause a shift in the trajectory of human civilization. In order to reap the benefits and avoid pitfalls of such powerful technology it is important to be able to control it. However, possibility of controlling artificial general intelligence and its more advanced version, superintelligence, has not been formally established. In this paper, we present arguments as well as supporting evidence from multiple domains indicating that advanced AI can’t be fully controlled. Consequences of (...)
    Download  
     
    Export citation  
     
    Bookmark   5 citations  
  27. Artificial Intelligence Ethics and Safety: practical tools for creating "good" models.Nicholas Kluge Corrêa -
    The AI Robotics Ethics Society (AIRES) is a non-profit organization founded in 2018 by Aaron Hui to promote awareness and the importance of ethical implementation and regulation of AI. AIRES is now an organization with chapters at universities such as UCLA (Los Angeles), USC (University of Southern California), Caltech (California Institute of Technology), Stanford University, Cornell University, Brown University, and the Pontifical Catholic University of Rio Grande do Sul (Brazil). AIRES at PUCRS is the first international chapter of AIRES, and (...)
    Download  
     
    Export citation  
     
    Bookmark  
  28. Designometry – Formalization of Artifacts and Methods.Soenke Ziesche & Roman Yampolskiy - manuscript
    Two interconnected surveys are presented, one of artifacts and one of designometry. Artifacts are objects, which have an originator and do not exist in nature. Designometry is a new field of study, which aims to identify the originators of artifacts. The space of artifacts is described and also domains, which pursue designometry, yet currently doing so without collaboration or common methodologies. On this basis, synergies as well as a generic axiom and heuristics for the quest of the creators of artifacts (...)
    Download  
     
    Export citation  
     
    Bookmark  
  29. From Confucius to Coding and Avicenna to Algorithms: Cultivating Ethical AI Development through Cross-Cultural Ancient Wisdom.Ammar Younas & Yi Zeng - manuscript
    This paper explores the potential of integrating ancient educational principles from diverse eastern cultures into modern AI ethics curricula. It draws on the rich educational traditions of ancient China, India, Arabia, Persia, Japan, Tibet, Mongolia, and Korea, highlighting their emphasis on philosophy, ethics, holistic development, and critical thinking. By examining these historical educational systems, the paper establishes a correlation with modern AI ethics principles, advocating for the inclusion of these ancient teachings in current AI development and education. The proposed integration (...)
    Download  
     
    Export citation  
     
    Bookmark  
  30. Catching Treacherous Turn: A Model of the Multilevel AI Boxing.Alexey Turchin - manuscript
    With the fast pace of AI development, the problem of preventing its global catastrophic risks arises. However, no satisfactory solution has been found. From several possibilities, the confinement of AI in a box is considered as a low-quality possible solution for AI safety. However, some treacherous AIs can be stopped by effective confinement if it is used as an additional measure. Here, we proposed an idealized model of the best possible confinement by aggregating all known ideas in the field (...)
    Download  
     
    Export citation  
     
    Bookmark  
  31.  90
    Is Alignment Unsafe?Cameron Domenico Kirk-Giannini - 2024 - Philosophy and Technology 37 (110):1–4.
    Inchul Yum (2024) argues that the widespread adoption of language agent architectures would likely increase the risk posed by AI by simplifying the process of aligning artificial systems with human values and thereby making it easier for malicious actors to use them to cause a variety of harms. Yum takes this to be an example of a broader phenomenon: progress on the alignment problem is likely to be net safety-negative because it makes artificial systems easier for malicious actors to (...)
    Download  
     
    Export citation  
     
    Bookmark  
  32.  53
    A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences.Lode Lauwaert - 2023 - Artificial Intelligence Review 56:3473–3504.
    Since its emergence in the 1960s, Artifcial Intelligence (AI) has grown to conquer many technology products and their felds of application. Machine learning, as a major part of the current AI solutions, can learn from the data and through experience to reach high performance on various tasks. This growing success of AI algorithms has led to a need for interpretability to understand opaque models such as deep neural networks. Various requirements have been raised from diferent domains, together with numerous tools (...)
    Download  
     
    Export citation  
     
    Bookmark  
  33. Machines learning values.Steve Petersen - 2020 - In S. Matthew Liao (ed.), Ethics of Artificial Intelligence. Oxford University Press.
    Whether it would take one decade or several centuries, many agree that it is possible to create a *superintelligence*---an artificial intelligence with a godlike ability to achieve its goals. And many who have reflected carefully on this fact agree that our best hope for a "friendly" superintelligence is to design it to *learn* values like ours, since our values are too complex to program or hardwire explicitly. But the value learning approach to AI safety faces three particularly philosophical puzzles: (...)
    Download  
     
    Export citation  
     
    Bookmark   2 citations  
  34. The Unlikeliest of Duos; Why Super Intelligent AI Will Cooperate with Humans.Griffin Pithie - manuscript
    The focus of this article is the "good-will theory", which explains the effect humans can have on the safety of AI, along with how it is in the best interest of a superintelligent AI to work alongside humans and not overpower them. Future papers dealing with the good-will theory will be published, but discuss different talking points in regards to possible or real objections to the theory.
    Download  
     
    Export citation  
     
    Bookmark  
  35. Unmonitorability of Artificial Intelligence.Roman Yampolskiy - manuscript
    Artificially Intelligent (AI) systems have ushered in a transformative era across various domains, yet their inherent traits of unpredictability, unexplainability, and uncontrollability have given rise to concerns surrounding AI safety. This paper aims to demonstrate the infeasibility of accurately monitoring advanced AI systems to predict the emergence of certain capabilities prior to their manifestation. Through an analysis of the intricacies of AI systems, the boundaries of human comprehension, and the elusive nature of emergent behaviors, we argue for the impossibility (...)
    Download  
     
    Export citation  
     
    Bookmark  
  36. Artificial Intelligence: Arguments for Catastrophic Risk.Adam Bales, William D'Alessandro & Cameron Domenico Kirk-Giannini - 2024 - Philosophy Compass 19 (2):e12964.
    Recent progress in artificial intelligence (AI) has drawn attention to the technology’s transformative potential, including what some see as its prospects for causing large-scale harm. We review two influential arguments purporting to show how AI could pose catastrophic risks. The first argument — the Problem of Power-Seeking — claims that, under certain assumptions, advanced AI systems are likely to engage in dangerous power-seeking behavior in pursuit of their goals. We review reasons for thinking that AI systems might seek power, that (...)
    Download  
     
    Export citation  
     
    Bookmark   3 citations  
  37. Robustness to fundamental uncertainty in AGI alignment.I. I. I. G. Gordon Worley - manuscript
    The AGI alignment problem has a bimodal distribution of outcomes with most outcomes clustering around the poles of total success and existential, catastrophic failure. Consequently, attempts to solve AGI alignment should, all else equal, prefer false negatives (ignoring research programs that would have been successful) to false positives (pursuing research programs that will unexpectedly fail). Thus, we propose adopting a policy of responding to points of metaphysical and practical uncertainty associated with the alignment problem by limiting and choosing necessary assumptions (...)
    Download  
     
    Export citation  
     
    Bookmark  
  38. Classification of Global Catastrophic Risks Connected with Artificial Intelligence.Alexey Turchin & David Denkenberger - 2020 - AI and Society 35 (1):147-163.
    A classification of the global catastrophic risks of AI is presented, along with a comprehensive list of previously identified risks. This classification allows the identification of several new risks. We show that at each level of AI’s intelligence power, separate types of possible catastrophes dominate. Our classification demonstrates that the field of AI risks is diverse, and includes many scenarios beyond the commonly discussed cases of a paperclip maximizer or robot-caused unemployment. Global catastrophic failure could happen at various levels of (...)
    Download  
     
    Export citation  
     
    Bookmark   11 citations  
  39. (1 other version)An Enactive Approach to Value Alignment in Artificial Intelligence: A Matter of Relevance.Michael Cannon - 2021 - In Vincent C. Müller (ed.), Philosophy and Theory of AI. Springer Cham. pp. 119-135.
    The “Value Alignment Problem” is the challenge of how to align the values of artificial intelligence with human values, whatever they may be, such that AI does not pose a risk to the existence of humans. Existing approaches appear to conceive of the problem as "how do we ensure that AI solves the problem in the right way", in order to avoid the possibility of AI turning humans into paperclips in order to “make more paperclips” or eradicating the human race (...)
    Download  
     
    Export citation  
     
    Bookmark  
  40. Predicting and Preferring.Nathaniel Sharadin - forthcoming - Inquiry: An Interdisciplinary Journal of Philosophy.
    The use of machine learning, or “artificial intelligence” (AI) in medicine is widespread and growing. In this paper, I focus on a specific proposed clinical application of AI: using models to predict incapacitated patients’ treatment preferences. Drawing on results from machine learning, I argue this proposal faces a special moral problem. Machine learning researchers owe us assurance on this front before experimental research can proceed. In my conclusion I connect this concern to broader issues in AI safety.
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  41. Large Language Models and Biorisk.William D’Alessandro, Harry R. Lloyd & Nathaniel Sharadin - 2023 - American Journal of Bioethics 23 (10):115-118.
    We discuss potential biorisks from large language models (LLMs). AI assistants based on LLMs such as ChatGPT have been shown to significantly reduce barriers to entry for actors wishing to synthesize dangerous, potentially novel pathogens and chemical weapons. The harms from deploying such bioagents could be further magnified by AI-assisted misinformation. We endorse several policy responses to these dangers, including prerelease evaluations of biomedical AIs by subject-matter experts, enhanced surveillance and lab screening procedures, restrictions on AI training data, and access (...)
    Download  
     
    Export citation  
     
    Bookmark   3 citations  
  42. Hacking the Simulation: From the Red Pill to the Red Team.Roman V. Yampolskiy - manuscript
    Many researchers have conjectured that the humankind is simulated along with the rest of the physical universe – a Simulation Hypothesis. In this paper, we do not evaluate evidence for or against such claim, but instead ask a computer science question, namely: Can we hack the simulation? More formally the question could be phrased as: Could generally intelligent agents placed in virtual environments find a way to jailbreak out of them. Given that the state-of-the-art literature on AI containment answers in (...)
    Download  
     
    Export citation  
     
    Bookmark  
  43. Ética e Segurança da Inteligência Artificial: ferramentas práticas para se criar "bons" modelos.Nicholas Kluge Corrêa - manuscript
    A AI Robotics Ethics Society (AIRES) é uma organização sem fins lucrativos fundada em 2018 por Aaron Hui, com o objetivo de se promover a conscientização e a importância da implementação e regulamentação ética da AI. A AIRES é hoje uma organização com capítulos em universidade como UCLA (Los Angeles), USC (University of Southern California), Caltech (California Institute of Technology), Stanford University, Cornell University, Brown University e a Pontifícia Universidade Católica do Rio Grande do Sul (Brasil). AIRES na PUCRS é (...)
    Download  
     
    Export citation  
     
    Bookmark  
  44. LLMs Can Never Be Ideally Rational.Simon Goldstein - manuscript
    LLMs have dramatically improved in capabilities in recent years. This raises the question of whether LLMs could become genuine agents with beliefs and desires. This paper demonstrates an in principle limit to LLM agency, based on their architecture. LLMs are next word predictors: given a string of text, they calculate the probability that various words can come next. LLMs produce outputs that reflect these probabilities. I show that next word predictors are exploitable. If LLMs are prompted to make probabilistic predictions (...)
    Download  
     
    Export citation  
     
    Bookmark  
  45. Towards Shutdownable Agents via Stochastic Choice.Elliott Thornley, Alexander Roman, Christos Ziakas, Leyton Ho & Louis Thomson - 2024 - Global Priorities Institute Working Paper.
    Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn't happen. A key part of the IPP is using a novel 'Discounted REward for Same-Length Trajectories (DREST)' reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'), and (2) choose stochastically between different trajectory-lengths (be 'NEUTRAL' about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a (...)
    Download  
     
    Export citation  
     
    Bookmark  
  46.  99
    A Tri-Opti Compatibility Problem for Godlike Superintelligence.Walter Barta - manuscript
    Various thinkers have been attempting to align artificial intelligence (AI) with ethics (Christian, 2020; Russell, 2021), the so-called problem of alignment, but some suspect that the problem may be intractable (Yampolskiy, 2023). In the following, we make an argument by analogy to analyze the possibility that the problem of alignment could be intractable. We show how the Tri-Omni properties in theology can direct us towards analogous properties for artificial superintelligence, Tri-Opti properties. However, just as the Tri-Omni properties are vulnerable to (...)
    Download  
     
    Export citation  
     
    Bookmark  
  47. The Shutdown Problem: Incomplete Preferences as a Solution.Elliott Thornley - manuscript
    I explain and motivate the shutdown problem: the problem of creating artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I then propose a solution: train agents to have incomplete preferences. Specifically, I propose that we train agents to lack a preference between every pair of different-length trajectories. I suggest a way to train such agents using reinforcement learning: (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
  48. Digital suffering: why it's a problem and how to prevent it.Bradford Saad & Adam Bradley - 2022 - Inquiry: An Interdisciplinary Journal of Philosophy.
    As ever more advanced digital systems are created, it becomes increasingly likely that some of these systems will be digital minds, i.e. digital subjects of experience. With digital minds comes the risk of digital suffering. The problem of digital suffering is that of mitigating this risk. We argue that the problem of digital suffering is a high stakes moral problem and that formidable epistemic obstacles stand in the way of solving it. We then propose a strategy for solving it: Access (...)
    Download  
     
    Export citation  
     
    Bookmark   2 citations  
  49. Robustness to Fundamental Uncertainty in AGI Alignment.G. G. Worley Iii - 2020 - Journal of Consciousness Studies 27 (1-2):225-241.
    The AGI alignment problem has a bimodal distribution of outcomes with most outcomes clustering around the poles of total success and existential, catastrophic failure. Consequently, attempts to solve AGI alignment should, all else equal, prefer false negatives (ignoring research programs that would have been successful) to false positives (pursuing research programs that will unexpectedly fail). Thus, we propose adopting a policy of responding to points of philosophical and practical uncertainty associated with the alignment problem by limiting and choosing necessary assumptions (...)
    Download  
     
    Export citation  
     
    Bookmark  
  50. Artificial intelligence in medicine: Overcoming or recapitulating structural challenges to improving patient care?Alex John London - 2022 - Cell Reports Medicine 100622 (3):1-8.
    There is considerable enthusiasm about the prospect that artificial intelligence (AI) will help to improve the safety and efficacy of health services and the efficiency of health systems. To realize this potential, however, AI systems will have to overcome structural problems in the culture and practice of medicine and the organization of health systems that impact the data from which AI models are built, the environments into which they will be deployed, and the practices and incentives that structure their (...)
    Download  
     
    Export citation  
     
    Bookmark   1 citation  
1 — 50 / 946