Each of our sensory modalities—vision, touch, taste, etc.—works on a slightly different timescale, with differing temporal resolutions and processing lag. This raises the question of how, or indeed whether, these sensory streams are co-ordinated or ‘bound’ into a coherent multisensory experience of the perceptual ‘now’. In this paper I evaluate one account of how temporal binding is achieved: the temporal windows hypothesis, concluding that, in its simplest form, this hypothesis is inadequate to capture a variety of multisensory phenomena. Rather, the evidence suggests the existence of a more complex temporal structure in which multiple overlapping windows support distinct functional mechanisms. To aid in the precise formulation of such views, I propose a taxonomy of temporal window types and their characteristics that in turn suggests promising avenues for future empirical and philosophical research. I conclude by examining some philosophical implications of multi-window models for the metaphysics of perception and perceptual experience more generally.