Content-based image retrieval with compact deep convolutional features

Research output: Contribution to journalArticle

124 Downloads (Pure)

Abstract

Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN architecture with orderless quantization approaches, which limits the utilization of intermediate convolutional layers for identifying image local patterns. As one of the first works in the context of content-based image retrieval (CBIR), this paper proposes a new bilinear CNN-based architecture using two parallel CNNs as feature extractors. The activations of convolutional layers are directly used to extract the image features at various image locations and scales. The network architecture is initialized by deep CNNs sufficiently pre-trained on large generic image dataset then fine-tuned for the CBIR task. Additionally, an efficient bilinear root pooling is proposed and applied to the low-dimensional pooling layer to reduce the dimension of image features to compact but high discriminative image descriptors. Finally, an end-to-end training with backpropagation is performed to fine-tune the final architecture and to learn its parameters for the image retrieval task. The experimental results achieved on three standard benchmarking image datasets demonstrate the outstanding performance of the proposed architecture at extracting and learning complex features for the CBIR task without prior knowledge about the semantic meta-data of images. For instance, using a very compact image vector of 16-length, we achieve retrieval accuracy 95.7% (mAP) on Oxford5K and 88.6% on Oxford105K; which outperforms the best results reported by state-of-the-art approaches. Additionally, a noticeable reduction is attained in the required extraction time for image features and the memory size required for storage.
Original languageEnglish
Pages (from-to)95-105
Number of pages11
JournalNeurocomputing
Volume249
Early online date5 Apr 2017
DOIs
Publication statusPublished - 2 Aug 2017

Fingerprint

Image retrieval
Learning
Neural networks
Benchmarking
Semantics
Network architecture
Metadata
Backpropagation
Computer vision
Datasets
Chemical activation
Data storage equipment

Keywords

  • CBIR
  • Deep learning
  • Convolutional neural networks
  • Bilinear compact pooling
  • Similarity matching

Cite this

@article{cc18c77cdde4406dba500dcd74907ca4,
title = "Content-based image retrieval with compact deep convolutional features",
abstract = "Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN architecture with orderless quantization approaches, which limits the utilization of intermediate convolutional layers for identifying image local patterns. As one of the first works in the context of content-based image retrieval (CBIR), this paper proposes a new bilinear CNN-based architecture using two parallel CNNs as feature extractors. The activations of convolutional layers are directly used to extract the image features at various image locations and scales. The network architecture is initialized by deep CNNs sufficiently pre-trained on large generic image dataset then fine-tuned for the CBIR task. Additionally, an efficient bilinear root pooling is proposed and applied to the low-dimensional pooling layer to reduce the dimension of image features to compact but high discriminative image descriptors. Finally, an end-to-end training with backpropagation is performed to fine-tune the final architecture and to learn its parameters for the image retrieval task. The experimental results achieved on three standard benchmarking image datasets demonstrate the outstanding performance of the proposed architecture at extracting and learning complex features for the CBIR task without prior knowledge about the semantic meta-data of images. For instance, using a very compact image vector of 16-length, we achieve retrieval accuracy 95.7{\%} (mAP) on Oxford5K and 88.6{\%} on Oxford105K; which outperforms the best results reported by state-of-the-art approaches. Additionally, a noticeable reduction is attained in the required extraction time for image features and the memory size required for storage.",
keywords = "CBIR, Deep learning, Convolutional neural networks, Bilinear compact pooling, Similarity matching",
author = "Ahmad Alzu'bi and Abbes Amira and Naeem Ramzan",
note = "12 months embargo",
year = "2017",
month = "8",
day = "2",
doi = "10.1016/j.neucom.2017.03.072",
language = "English",
volume = "249",
pages = "95--105",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier B.V.",

}

Content-based image retrieval with compact deep convolutional features. / Alzu'bi, Ahmad; Amira, Abbes; Ramzan, Naeem.

In: Neurocomputing, Vol. 249, 02.08.2017, p. 95-105.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Content-based image retrieval with compact deep convolutional features

AU - Alzu'bi, Ahmad

AU - Amira, Abbes

AU - Ramzan, Naeem

N1 - 12 months embargo

PY - 2017/8/2

Y1 - 2017/8/2

N2 - Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN architecture with orderless quantization approaches, which limits the utilization of intermediate convolutional layers for identifying image local patterns. As one of the first works in the context of content-based image retrieval (CBIR), this paper proposes a new bilinear CNN-based architecture using two parallel CNNs as feature extractors. The activations of convolutional layers are directly used to extract the image features at various image locations and scales. The network architecture is initialized by deep CNNs sufficiently pre-trained on large generic image dataset then fine-tuned for the CBIR task. Additionally, an efficient bilinear root pooling is proposed and applied to the low-dimensional pooling layer to reduce the dimension of image features to compact but high discriminative image descriptors. Finally, an end-to-end training with backpropagation is performed to fine-tune the final architecture and to learn its parameters for the image retrieval task. The experimental results achieved on three standard benchmarking image datasets demonstrate the outstanding performance of the proposed architecture at extracting and learning complex features for the CBIR task without prior knowledge about the semantic meta-data of images. For instance, using a very compact image vector of 16-length, we achieve retrieval accuracy 95.7% (mAP) on Oxford5K and 88.6% on Oxford105K; which outperforms the best results reported by state-of-the-art approaches. Additionally, a noticeable reduction is attained in the required extraction time for image features and the memory size required for storage.

AB - Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN architecture with orderless quantization approaches, which limits the utilization of intermediate convolutional layers for identifying image local patterns. As one of the first works in the context of content-based image retrieval (CBIR), this paper proposes a new bilinear CNN-based architecture using two parallel CNNs as feature extractors. The activations of convolutional layers are directly used to extract the image features at various image locations and scales. The network architecture is initialized by deep CNNs sufficiently pre-trained on large generic image dataset then fine-tuned for the CBIR task. Additionally, an efficient bilinear root pooling is proposed and applied to the low-dimensional pooling layer to reduce the dimension of image features to compact but high discriminative image descriptors. Finally, an end-to-end training with backpropagation is performed to fine-tune the final architecture and to learn its parameters for the image retrieval task. The experimental results achieved on three standard benchmarking image datasets demonstrate the outstanding performance of the proposed architecture at extracting and learning complex features for the CBIR task without prior knowledge about the semantic meta-data of images. For instance, using a very compact image vector of 16-length, we achieve retrieval accuracy 95.7% (mAP) on Oxford5K and 88.6% on Oxford105K; which outperforms the best results reported by state-of-the-art approaches. Additionally, a noticeable reduction is attained in the required extraction time for image features and the memory size required for storage.

KW - CBIR

KW - Deep learning

KW - Convolutional neural networks

KW - Bilinear compact pooling

KW - Similarity matching

U2 - 10.1016/j.neucom.2017.03.072

DO - 10.1016/j.neucom.2017.03.072

M3 - Article

VL - 249

SP - 95

EP - 105

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -