Computer Vision Datasets

Some Computer Vision Dataset Sources

ImageNet: The de-facto image dataset to test algorithms or models. It is organized according to the WordNet hierarchy.

Google’s Open Images:  9 million URLs to images “that have been annotated with labels spanning over 6,000 categories” under Creative Commons license.

LSUN: Scene understanding with many ancillary tasks (room layout estimation, saliency prediction, etc.)

MS COCO: COCO is a large-scale object detection, segmentation, and captioning dataset containing over 200,000 labeled images. It is used for object segmentation, recognition in context, and many other use cases.

Labelme: A large dataset created by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) containing 187,240 images, 62,197 annotated images, and 658,992 labeled objects.

Lego Bricks: 12,700+ images of 16 different Lego bricks classified by folders and computer rendered using Blender.

Columbia University Image Library: COIL100 is a dataset featuring over 100 objects imaged at every angle in a 360 rotation. It is very useful for object detection.

Visual Genome: Visual Genome is a dataset and knowledge base created in an effort to connect structured image concepts to language. The database features detailed visual knowledge base with captioning of over 108,077 images.

Youtube-8M: a large-scale labeled dataset that consists of millions of YouTube video IDs, with annotations of over 3,000+ visual entities.

Labelled Faces in the Wild: 13K labeled images of human faces, for use in developing applications that involve facial recognition.

Stanford Dogs Dataset: Contains 20,580 images and 120 different dog breed categories, with about 150 images per class.

Places: Scene-centric database with 205 scene categories and 2.5 million images with a category label.

CelebFaces: Face dataset with more than 200,000 celebrity images, each with 40 attribute annotations.

Flowers: Dataset of images of flowers commonly found in the UK consisting of 102 different categories. Each flower class consists of between 40 and 258 images with different pose and light variations.

Plant Image Analysis: A collection of datasets spanning over 1 million images of plants. Can choose from 11 species of plants.

Home Objects: A dataset that contains random objects from home, mostly from kitchen, bathroom and living room split into training and test datasets.

CIFAR-10: A large image dataset of 60,000 32×32 colour images split into 10 classes. The dataset is divided into five training batches and one test batch, each containing 10,000 images.

CompCars:  Contains 163 car makes with 1,716 car models, with each car model labeled with five attributes, including maximum speed, displacement, number of doors, number of seats, and type of car.

Indoor Scene Recognition: A very specific dataset, useful as most scene recognition models are better ‘outside’. It has Indoor categories, and a total of 15620 images.

VisualQA: VQA is a dataset containing open-ended questions about 265,016 images.  These questions require an understanding of vision and language. For each image, there are at least 3 questions and 10 answers per question.

Check more at

Leave a Reply

Your email address will not be published. Required fields are marked *