Abstract
The re-use of publicly available (personal) data for originally unanticipated purposes has become common practice. Without such secondary uses, the development of many AI systems like large language models (LLMs) and ChatGPT
would not even have been possible. This chapter addresses the ethical implications of such secondary processing, with a particular focus on data protection and privacy issues. Legal and ethical evaluations of secondary processing of
publicly available personal data diverge considerably both among scholars and the general public. While some of these uses are met with opposition and criticism, others are quite unanimously viewed as unproblematic. Often, proponents and opponents of such practices invoke the same ethical and legal standards for their opposite conclusions. This state of affairs shows that other considerations besides the public availability of data must play a role. It calls for a theoretical clarification of the additional criteria that should guide decisions about the (legally informed) ethical acceptability of re-processing practices. In order to make a contribution towards this goal, the present chapter maps the ongoing debate and systematises the existing contributions around three lines of argument: a consent-centred position, an approach that focuses on the distinction between data and information, and finally a line of argument that focuses on the contextual norms that govern the flows of information. The chapter further relates these arguments to three underlying conceptions of privacy and data protection: rights-based, structural and contextual, and discusses the advantages and disadvantages of each position in the light of concrete examples. It concludes by arguing for a mixed approach that combines core elements of the structural and contextual approaches. The chapter aims to contribute to existing research in the fields of data, AI and research ethics, and to reconnect the debate with ethical and legal scholarship on privacy and data protection. In doing so, it aims to make a theoretical contribution towards refining existing conceptions of privacy and data protection in order to make them more fit to ‘drive our digital world’ as far as the use of publicly available data is concerned.