Enhancing Efficiency in Content-based Image Retrieval System Using Pre-trained Convolutional Neural Network Models

Pp: 241-262 (22)

* (Excluding Mailing and Handling)

Abstract

Traditionally, image retrieval is done using a text-based approach. In the text-based approach, the user must query metadata or textual information, such as keywords, tags, or descriptions. The effectiveness and utility of this approach in the digital realm for solving image retrieval problems are limited. We introduce an innovative method that relies on visual content for image retrieval. Various visual aspects of the image, including color, texture, shape, and more, are employed to identify relevant images. The choice of the most suitable feature significantly influences the system's performance. Convolutional Neural Network (CNN) is an important machine learning model. Creating an efficient new CNN model requires considerable time and computational resources. There are many pre-trained CNN models that are already trained on large image datasets, such as ImageNet containing millions of images. We can use these pre-train CNN models by transferring the learned knowledge to solve our specific content-based image retrieval talk.

In this chapter, we propose an efficient pre-trained CNN model for content-based image retrieval (CBIR) named as ResNet model. The experiment was conducted by applying a pre-trained ResNet model on the Paris 6K and Oxford 5K datasets. The performance of similar image retrieval has been measured and compared with the stateof-the-art AlexNet model. It is found that the AlexNet architecture takes a longer time to get more accurate results. The ResNet architecture does not need to fire all neurons at every epoch. This significantly reduces training time and improves accuracy. In the ResNet architecture, once the feature is extracted, it will not extract the feature again. It will try to learn a new feature. To measure its performance, we used the average mean precision. We obtained the result for Paris6K 92.12% and Oxford5K 84.81%. The Mean Precision at different ranks, for example, at the first rank in Paris6k, we get 100% result, and for Oxford5k, we get 97.06%.

Keywords: Content-based image retrieval, Convolution neural network architectures, Transfer learning.

Cite as