A HYBRID DEEP LEARNING AND DCT-BASED FEATURE FUSION FRAMEWORK FOR CONTENT-BASED IMAGE RETRIEVAL
Keywords:
Image Retrieval, Feature Fusion, Discrete Cosine Transform, Vision Transformer, Convolutional Neural NetworkAbstract
The CBIR systems tend to experience poor retrieval accuracy because of a lack of semantic information about the high level of representation and poor discrimination of texture. In order to overcome such shortcomings, the given paper presents a hybrid CBIR framework which combines deep learning-based features with handcrafted texture descriptors in Discrete Cosine Transform (DCT) domain. The model is proposed and represents a combination of Convolutional Neural Network (CNN) and Vision Transformer (ViT) to provide strong deep features and supplementary handcrafted features such as color histograms, Hu moments, and DCT-based texture features to improve spatial and frequency-domain representation. The feature fusion strategy is used to make a combination of deep and handcrafted features into one unified feature to achieve good image retrieval. The implementation of the proposed method is tested on several benchmark datasets, WANG, CIFAR-10, Oxford Flowers, and GPR1200, by applying common evaluation metrics, such as accuracy, precision, recall, F1-score, and ROC analysis. The results of the experiment show that the offered hybrid framework can be much more effective than typical CBIR methods and single deep models and reach a top retrieval accuracy of 94. Results indicate that the overall performance and strength of retrieval with a combination of deep semantic attributes and DCT-domain texture data is enhanced to a variety of image datasets, which is why the given approach can be used in high-level CBIR applications.













