This section covers the various experiments conducted for the NSD dataset. This includes details on stimuli and experimental design (e.g. the order in which stimuli were presented).
This is a PDF report of the acquisition protocol for data collected at 3T. (Note: The diffusion scans are named dir98 and dir99, whereas the actually acquired data contain 99 and 100 volumes, respectively. This is because there is an additional b=0 volume at the beginning. Also, note that the actual b-values recorded in the .bval files deviate slightly from the "dialed-in" values of 0, 1500, and 3000.)
This movie is a screen capture of an example segment of the prf experiment.
screenshot from prf experiment
nsddata/stimuli/prf/RETBAR*
These are sequences of "aperture masks" that correspond to the multibar runs in the prf experiment. The files with "small" in the filename are resized versions of the masks. These resized versions have the aperture masks averaged across consecutive 1-s chunks of the spatiotemporal stimulus, with the exception of the file with "4div3" in the filename, which has been averaged across successive 4/3-s chunks. These aperture masks were used in analyzing the fMRI data from the prf experiment (1-s for the high-resolution preparation; 4/3-s for the standard-resolution preparation). The files without "small" in the filename are the original (unresized and unaveraged) versions of the masks — these masks update at a rate of 15 frames per second. Note that we provide .mp4 versions for convenience; however, the .mp4 files have some (very slight) compression artifacts in them, so be wary when using these files for actual analysis.
nsddata/stimuli/prf/RETWEDGERINGMASH*
Same information as RETBAR* except corresponding to the wedgering runs in the prf experiment.
Names of the 5 domains used in the floc experiment. The domains are in order and have a 1-to-2 relationship to the categories. For example, the first domain consists of the first two categories, the second domain consists of the third and fourth categories, and so on.
Information regarding the resting-state experiment
This movie is a screen capture of the beginning of the resting-state experiment (type 2, instructed-breath). Notice that after 12 seconds, the cross turns red, which instructs the subject to take a deep breath.
screenshot from resting-state experiment
Information regarding the NSD experiment
The 73,000 images used in the NSD experiment are a subset of the COCO images, specifically the 2017 train/val split (see http://cocodataset.org for details). NSD images were selected from the COCO database such that all of the NSD images have “stuff”, “panoptic”, and “coco” annotations. In addition, since the NSD experiment involved square stimulus presentation, we cropped COCO images using a specific method that attempted to minimize loss of semantic information in the images (details provided here: Experiments).
COCO annotations can be accessed on the COCO web site. The following Python notebook is helpful for getting started:
This is a comma-separated text file that contains information related to the selection and preparation of the NSD images. After a header row, what follows is one row for each of the 73,000 images used in the NSD experiment.
Column 1 is the 0-based image number (0-72999).
Column 2 (cocoId) is the ID number assigned to this image in the COCO database.
Column 3 (cocoSplit) is either “train2017” or “val2017”. The COCO web site designates different splits of images into training and validation sets. The NSD experiment does not involve any use of this designation (such as in the experimental design), but we provide this information just in case it is useful.
Column 4 (cropBox) is a tuple of four numbers indicating how the original COCO image was cropped. The format is (top, bottom, left, right) in fractions of image size. Notice that cropping was always performed along only the largest dimension. Thus, there are always two 0’s in the cropBox.
Column 5 (loss) is the object-loss score after cropping. See manuscript for more details, as well as the "Details on crop selection for COCO images" section below.
Column 6 (nsdId) is the 0-based index of the image into the full set of 73k images used in the NSD experiment. Values are the same as column 1. (Note that in some other cases, 73k IDs are specified as 1-based. Here the IDs are specified as 0-based.)
Column 7 (flagged) is True if the image has questionable content (e.g. violent or salacious content).
Column 8 (BOLD5000) is True if the image is included in the BOLD5000 dataset (http://bold5000.github.io). Note that NSD images are square-cropped, so the images are not quite identical across the two datasets.
Column 9 (shared1000) is True if the image is one of the special 1,000 images that are shown to all 8 subjects in the NSD experiment.
Columns 10-17 (subjectX) is 0 or 1 indicating whether that image was shown to subjectX (X ranges from 1-8).
Columns 18-41 (subjectX_repN) is 0 indicating that the image was not shown to subjectX, or a positive integer T indicating that the image was shown to subjectX on repetitionN (X ranges from 1-8; N ranges from 0-2 for a total of 3 trials). T provides the trialID associated with the image showing. The trialID is a 1-based index from 1 to 30000 corresponding to the chronological order of all 30,000 stimulus trials that a subject encounters over the course of the NSD experiment. Each of the 73k NSD images either has 3 trialIDs (if it was shown to only one subject) or 24 trialIDs (if it was shown to all 8 subjects).
nsddata/experiments/nsd/nsd_stim_info_merged.pkl
This contains the same information as the nsd_stim_info_merged.csv file, but is in Python-readable pickle file format (use pandas to read).
This movie is a screen capture of one entire run of the nsd experiment.
screenshot from nsd experiment
nsddata/experiments/nsd/nsd_expdesign.mat
Contents:
<masterordering> is 1 x 30000 with the sequence of trials (indices relative to 10k)
<basiccnt> is 3 x 40 where we calculate, for each scan session separately, the number of distinct images in that session that have a number of presentations equal to the row index.
<sharedix> is 1 x 1000 with sorted indices of the shared images (relative to 73k)
<subjectim> is 8 x 10000 with indices of images (relative to 73k). the first 1000 are the common shared 1000 images. it turns out that the indices for these 1000 are in sorted order. this is for simplicity, and there is no significance to the order (since the order in which the 1000 images are shown is randomly determined). the remaining 9000 for each subject are in a randomized non-sorted order.
<stimpattern> is 40 sessions x 12 runs x 75 trials. elements are 0/1 indicating when stimulus trials actually occur. note that the same <stimpattern> is used for all subjects.
Note: subjectim(:,masterordering) is 8 x 30000 indicating the temporal sequence of 73k-ids shown to each subject. This sequence refers only to the stimulus trials (ignoring the blank trials and the rest periods at the beginning and end of each run).
Note: All of these indices (in the nsd_expdesign.mat file) are 1-based indices.
nsddata_stimuli/stimuli/nsd/nsd_stimuli.hdf5
This is a single .hdf5 file that contains all images used in the nsd experiment across all subjects. <imgBrick> is 3 channels x 425 pixels x 425 pixels x 73,000 images and is in uint8 format. These images are shown on a gray background with RGB value (127,127,127).
The images in the .hdf5 file constitute the official list of the 73k images. When we use the term ‘73k-ID’, this refers to an index into this list of 73k images (1-indexed).
There is a special common set of 1,000 images, which are a subset of the 73k. Each of the eight subjects sees the shared 1,000 images, as well as 9,000 unique images (with the caveat that some subjects did not complete all 40 NSD scan sessions).
Here is an example of how to use MATLAB to quickly load in the 10239th image.
In this folder, there are 1,000 standard RGB .png files (uint8, 425 pixels x 425 pixels x 3 channels). Each file is named "sharedAAAA_nsdBBBBB.png" where AAAA ranges from 1 through 1000 and BBBBB indicates the 73k-ID (1-indexed). These are the 1,000 shared images common to all subjects. Note that the 73k-IDs are in sorted order.
nsddata/stimuli/nsd/special100/
This folder contains a subset of the files in the “shared1000” folder. Of the shared 1,000 images, there is a subset of 515 images that all 8 subjects saw for all 3 trials. From these 515 images, we chose a subset of size 100 in order to maximally span semantic space. These specially chosen 100 images are contained in this folder. These 100 images were used in the nsdmeadows experiment and in the nsdmemory experiment.
nsddata/stimuli/nsd/special3/
This folder contains a subset of the files in the “shared1000” folder. The valence/arousal component of the nsdmeadows experiment involved the special100 images as well as 3 additional images pulled from the subset of 515 images (as described above). These 3 additional images were selected on the criterion of having strong negative valence.
Simple text files that contain the 73k IDs (1-indexed) that comprise the various sets of images. The "notshown" file indicates 73k IDs of images that were not shown to any NSD subject (due to the fact that not all 8 subjects completed all prescribed sessions).
Details on performance bonuses provided during NSD data acquisition
In each scan session from nsd11–20, the subject earned up to $15 extra bonus. The bonus consisted of $3 for achieving better than the mean performance attained by that subject in sessions nsd01–10 with respect to four metrics. These metrics included the general BOLD quality metric, the intentionally vague “performance metric” (which was actually the performance on easy trials), raw motion, and detrended motion (as described in the NSD data paper). The subject also received $3 for achieving a response rate higher than 99%.
In each scan session from nsd21–30, the subject earned up to $25 extra bonus. The bonus consisted of $5 for agreeing to participate in the resting-state runs conducted in those sessions, $5 if the physiological recordings maintained stability throughout the session, $5 for staying awake and fixated during each resting-state run (thus, $10 in total was possible), and $5 for achieving the “performance metric” above the mean observed for that subject in sessions nsd01–20.
In each scan session from nsd31–40, the subject earned up to $35 extra bonus. The bonus consisted of $20 for participating in that scan session, $5 for achieving response rate higher than 99%, and $10 for agreeing to participate in 1–2 additional miscellaneous scanning runs unrelated to NSD.
Information regarding the nsdpostbehavior experiments
nsddata/experiments/csf/csf_screencapture.png
This screenshot shows how contrast sensitivity functions were quickly measured.
This movie shows an example of what subjects experienced during the nsdmeadows experiment which was conducted using the web-based Meadows platform.
screenshot from meadows experiment
Presentation files for experiments
nsddata_other/experimentcode/
This directory is an archive of materials used to conduct the various experiments in the NSD dataset.
Details on crop selection for COCO images
To select the optimal cropping box for each image, we computed an “object loss” score for each crop. Object loss was defined as the fraction of objects that are cropped by more than 50% of their total pixel count. We used only “thing” annotations to compute object loss. We did not use “stuff” annotations because these are often large and redundant, so that severely cropping them can often result in large object-loss scores but very little change to the semantic content of the image. When calculating object loss we did not include “things” that occupied less than 0.5% of the total pixels in the image. Finally, we imposed a bias toward center crops, selecting left, right, top, or bottom crops if object loss of the center crop exceeded the object loss of the left/right or top/bottom crops by more than 25%. For portrait-oriented images containing people, we always used the top crop, as these images almost always depicted human faces in the upper third of the image.
We examined all cropped images in the “val” portion of the train/val split and rejected any image, regardless of object loss score, for which cropping caused obvious “semantic loss”.
When examining the “val” images we observed the relationship between object loss and semantic loss, and noted several trends that guided our selection/rejection of “train” images.
First, we found that for landscape-oriented images an object-loss score of 0.0 was a reliable indication of negligible “semantic loss”. Thus, we automatically accepted all landscape-oriented images in the “train” set with an object-loss score of 0.0.
Second, we found that for landscape-oriented images crops resulting in 0.0 < object loss < 0.2 occasionally, but not often, induced appreciable semantic loss. Semantic loss occurred when small but key peripheral objects (i.e., a soccer ball) were cropped. We also noted that when images depicted a small number of salient objects, such as people, captions often indicated number of the objects (e.g., “four people sitting around a table”). In these cases crops sometimes made the picture inconsistent with the quantities stated in the captions. Thus we screened all landscape-oriented images in the training set with 0.0 < object-loss < 0.2 for special cases such as these, comparing images to their written captions where necessary.
Third, we found that for portrait-oriented images crops resulting in object loss = 0.0 occasionally, but not often, induced appreciable semantic loss. These images tended to contain a small number of objects with two distinct kinds of terrain in the bottom (e.g., sand, floor) and top (e.g., sky, ceiling) of the image. Cropping the bottom terrain often decontextualized images, for example by reducing “person running on a beach” to “person running”. Many portrait-oriented images depicted tall buildings towering over a semantically meaningful scene such as a flea-market or a street parade. Thus, we screened all portrait-oriented images in the “train” set with object-loss = 0.0 for special cases such as these, comparing images to their written captions where necessary.
After screening the “train” and “val” images, the 73,000 images selected for NSD had a maximum object loss of 0.167 and a median of 0.08.