Enhanced Image Captioning Using CNN and Transformers with Attention Mechanism

Ch Vasavi

Enhanced Image Captioning Using CNN and Transformers with Attention Mechanism

International Journal of Engineering Innovations and Management Strategies 1 (1):1-12 (2024) Copy BIBT_EX

Abstract

Image captioning has seen remarkable advancements with the integration of deep learning techniques, notably Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, for generating descriptive captions for images. Despite these improvements, capturing intricate details and context remains a challenge. This project introduces an enhanced image captioning model that integrates transformers with an attention mechanism to address these limitations. By leveraging CNNs for feature extraction and LSTMs for sequence generation, while utilizing transformers to apply sophisticated attention to significant image regions, the proposed model aims to generate more contextually rich and coherent captions. Experimental results indicate that incorporating transformers with attention mechanisms leads to a significant enhancement in caption accuracy and descriptiveness, surpassing traditional CNN-LSTM models. This advancement is particularly beneficial in various applications, including assistive technologies for the visually impaired, content-based image retrieval systems, automatic image annotation for digital asset management, and improved human-computer interaction. This approach represents a substantial step forward in achieving more precise and detailed image captioning, with potential impacts across numerous fields.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

View on PhilPapers

Archival history

Archival date: 2025-01-09
View all versions

Keywords

Transformers Attention Mechanism

Reprint years

Analytics

Added to PP
today

Downloads
0

6 months
0

Historical graph of downloads since first upload

Sorry, there are not enough data points to plot this chart.

How can I increase my downloads?

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Enhanced Image Captioning Using CNN and Transformers with Attention Mechanism

Abstract

Archival history

Categories

Keywords

Reprint years

Analytics