Abstract
There is a non-trivial chance that sometime in the (perhaps somewhat distant) future, someone will build an artificial general intelligence that will surpass human-level cognitive proficiency and go on to become "superintelligent", vastly outperforming humans. The advent of superintelligent AI has great potential, for good or ill. It is therefore imperative that we find a way to ensure-long before one arrives-that any superintelligence we build will consistently act in ways congenial to our interests. This is a very difficult challenge in part because most of the final goals we could give an AI admit of so-called "perverse instantiations". I propose a novel solution to this puzzle: instruct the AI to love humanity. The proposal is compared with Yudkowsky's Coherent Extrapolated Volition, and Bostrom's Moral Modeling proposals.