Philosophers often state that the persistence of objects in vision is experienced differently than the persistence of sounds in audition. This difference is expressed by using metaphors from the metaphysical endurantism/perdurantism debate. For instance, it is claimed that only sounds are perceived as “temporally extended”. The paper investigates whether it is justified to characterize visually experienced objects and auditorily experienced sounds as different types of entities: endurants and perdurants respectively. This issue is analyzed from the perspective of major specifications of the endurance/perdurance distinction connected, inter alia, with the notions of temporal parts and temporal localization. It is argued that it is unjustified to characterize visually experienced objects and auditorily experienced sounds as different types of entities in respect of how they persist. On the other hand, the apparent difference in the way of persisting can be explained by the presence of contingent differences between typical visual and auditory experiences.