Abstract
While philosophers hold that it is patently absurd to blame robots or hold them morally responsible [1], a series of recent empirical studies suggest that people do ascribe blame to AI systems and robots in certain contexts [2]. This is disconcerting: Blame might be shifted from the owners, users or designers of AI systems to the systems themselves, leading to the diminished accountability of the responsible human agents [3]. In this paper, we explore one of the potential underlying reasons for robot blame, namely the folk's willingness to ascribe inculpating mental states or "mens rea" to robots. In a vignette-based experiment (N=513), we presented participants with a situation in which an agent knowingly runs the risk of bringing about substantial harm. We manipulated agent type (human v. group agent v. AI-driven robot) and outcome (neutral v. bad), and measured both moral judgment (wrongness of the action and blameworthiness of the agent) and mental states attributed to the agent (recklessness and the desire to inflict harm). We found that (i) judgments of wrongness and blame were relatively similar across agent types, possibly because (ii) attributions of mental states were, as suspected, similar across agent types. This raised the question - also explored in the experiment - whether people attribute knowledge and desire to robots in a merely metaphorical way (e.g., the robot "knew" rather than really knew). However, (iii), according to our data people were unwilling to downgrade to mens rea in a merely metaphorical sense when given the chance. Finally, (iv), we report a surprising and novel finding, which we call the inverse outcome effect on robot blame: People were less willing to blame artificial agents for bad outcomes than for neutral outcomes. This suggests that they are implicitly aware of the dangers of overattributing blame to robots when harm comes to pass, such as inappropriately letting the responsible human agent off the moral hook.