Abstract
Model-data symbiosis is the view that there is an interdependent and mutually beneficial
relationship between data and models, whereby models are not only data-laden, but data are also
model-laden or model filtered. In this paper I elaborate and defend the second, more
controversial, component of the symbiosis view. In particular, I construct a preliminary
taxonomy of the different ways in which theoretical and simulation models are used in the
production of data sets. These include data conversion, data correction, data interpolation, data
scaling, data fusion, data assimilation, and synthetic data. Each is defined and briefly illustrated
with an example from the geosciences. I argue that model-filtered data are typically more
accurate and reliable than the so-called raw data, and hence beneficially serve the epistemic aims
of science. By illuminating the methods by which raw data are turned into scientifically useful
data sets, this taxonomy provides a foundation for developing a more adequate philosophy of
data.