Bridging CNNs and Unsupervised Image Clustering: A Hybrid Data-Driven and Model-Based Approach
Presentation Menu
Given a large unlabeled set of images, how to efficiently and effectively group images into clusters based on extracted visual representations remains a challenging problem. Although convolutional neural networks (CNNs) have proven to be effective in visual representation learning, it is difficult to learn a good CNN model from unlabeled images in a data-driven manner, even with an initial model pre-trained from a large-scale image dataset such as ImageNet. To address this problem, we propose a hybrid data-driven and modelbased approach to solve joint clustering and representation learning with a CNN in an iterative manner. In our method, given an input image set, we first find initial cluster centroids using a randomly initialized (or pretrained) CNN model. To reduce complexity, mini-batch clustering is performed to assign cluster labels to individual input samples for a mini-batch of images randomly sampled from the input image set until all images have been processed. Subsequently, image samples with reliable labels are extracted from noisy cluster labels based on Laplacian graph smoothness priors to update the CNN model, and the updated model is then used to re-cluster images. The representation learning and clustering are iterated until the clustering accuracy reaches to a convergence point. Experimental results demonstrate the proposed method outperforms start-of-the-art clustering schemes in terms of accuracy and storage complexity on large-scale image sets containing millions of images.