Abstract
The framework leverages a hybrid cloud architecture, combining the scalability of public clouds with the security of private clouds. By employing a combination of client-side hashing, metadata indexing, and machine learning-based duplicate detection, the framework achieves significant storage savings without compromising data integrity. Real-time testing on a hybrid cloud setup demonstrated a 65% reduction in storage needs and a 40% improvement in data retrieval times. Additionally, the system employs blockchain for immutable logging of deduplication activities, enhancing transparency and traceability. This study concludes with an evaluation of the deduplication framework's impact on cost efficiency, system performance, and potential scalability. Future enhancements aim to integrate multi-cloud interoperability and advanced compression algorithms to further refine storage management.