Skip to article frontmatterSkip to article content

Introduction

Authors
Affiliations
Université de Montréąl
Université de Montréąl

Visual object recognition entails the interpretation of complex, multidimensional semantic information from natural scenes (e.g., an object’s typical appearance, function, and context) and is a central area of investigation in both cognitive neuroscience and artificial intelligence. To enable direct comparison of human and artificial visual processing, and to support the development of computational models of human visual understanding, a critical challenge is the creation of image benchmarks with sufficient scale, ecological validity, and behavioral grounding. Here we present Images10k, a dataset designed to probe visual semantic knowledge through 8,382 naturalistic images spanning over 15 distinct semantic categories. These images feature objects within their typical environmental contexts, and their descriptive labels and category assignments have been rigorously validated through online behavioral experiments. We anticipate that this openly available, large-scale dataset of ecologically valid images will serve as a valuable benchmark for both computational and neuroscientific investigations of visual semantic knowledge and object recognition.

A common practice in psychology and cognitive science has been the use of highly controlled, often artificial stimuli, typically encountered exclusively within laboratory settings. For instance, numerous vision studies have employed images depicting objects isolated from their natural backgrounds, creating a significant discrepancy between experimental stimuli and the rich, contextualized visual environments humans experience daily. This methodological choice not only limits the ecological validity of findings but also presents challenges for computational modeling. The relatively small scale of stimulus sets typically used in human cognitive neuroscience, compared to those employed in machine learning, can impair the development and training of artificial neural networks aiming to model perceptual mechanisms from human brain activity. Furthermore, copyright restrictions often limit the accessibility of image sets used in published research, hindering reproducibility and collaborative efforts.

In recent years, there has been a growing trend towards utilizing larger and more naturalistic object image datasets in cognitive neuroscience. However, existing resources present their own limitations. Classical databases of object concepts often rely on manually curated, and thus potentially limited or biased, sets of concepts. Similarly, databases of naturalistic object images may consist of objects cropped from their context, or conversely, comprise vast collections of images with variable quality, necessitating extensive manual curation.

The investigation of semantic knowledge—how humans and artificial systems represent and process meaning (e.g., a lemon’s color, flavor, and typical use)—is a central theme in both cognitive neuroscience and artificial intelligence. To facilitate direct comparisons between human and artificial semantic representations, and to support the advancement of natural language processing (NLP) in modeling human understanding, the development of large-scale, robust benchmarks is critical. Such benchmarks, particularly those incorporating rich visual information, are essential for bridging these disciplines.

Here, we introduce Images10k, a novel dataset comprising 8,382 high-quality, naturalistic images spanning over 15 distinct semantic categories. A key feature of this dataset is its rigorous behavioral validation: using crowdsourcing methodologies, we have confirmed the accuracy of object labels and their corresponding category assignments for each image. The images depict objects within their natural contexts, enhancing ecological validity.

Images10k is designed to address several of the aforementioned limitations. It offers a large-scale, openly available resource of ecologically valid visual stimuli. We anticipate that this dataset will serve as a valuable tool for researchers in psychology, neuroscience, and computer science. Specifically, it can support investigations into semantic knowledge representation, facilitate the training and benchmarking of computational models of vision and cognition, and promote reproducible research through its open accessibility. This resource aims to bridge the gap between systematic, controlled experimentation and the complexities of real-world perception and understanding.