Technical notes

This section contains a number of technical details that help document the NSD dataset.

Final numbers


Some of the NSD subjects did not complete all 40 planned NSD core scan sessions. Here we provide some useful summary statistics on what is present in the NSD dataset. Note that the numbers are calculated with respect to the full dataset).
How many core NSD scan sessions did each of the 8 NSD subjects complete?
[40 40 32 30 40 32 40 30]
How many distinct images were shown at least once to each subject?
[10,000 10,000 9,411 9,209 10,000 9,411 10,000 9,209]
How many distinct images were shown at least twice to each subject?
[10,000 10,000 8,355 7,846 10,000 8,355 10,000 7,846]
How many distinct images were shown all three times to each subject?
[10,000 10,000 6,234 5,445 10,000 6,234 10,000 5,445]
How many trials did each subject perform?
[30,000 30,000 24,000 22,500 30,000 24,000 30,000 22,500]
How many of the shared 1,000 images were shown at least once to each subject?
[1,000 1,000 930 907 1,000 930 1,000 907]
How many of the shared 1,000 images were shown all 3 times to every subject?
515
How many of the shared 1,000 images were shown at least 2 times to every subject?
766
How many of the shared 1,000 images were shown at least once to every subject?
907
What is the total number of distinct images, aggregated across all subjects?
70,566
What is the total number of trials, aggregated across all subjects?
213,000

Data sizes


The following are the matrix dimensions for the high-res (1.0-mm) functional data preparation, matrix dimensions for the standard-res (1.8-mm) functional data preparation, the vertex number in the left-hemisphere cortical surfaces, and the vertex number in the right-hemisphere cortical surfaces.
 
 Subject 1      [145 186 148]  [81 104 83]    227021 226601 
 Subject 2      [146 190 150]  [82 106 84]    239633 239309 
 Subject 3      [145 190 146]  [81 106 82]    240830 243023 
 Subject 4      [152 177 143]  [85 99 80]     228495 227262 
 Subject 5      [141 173 139]  [79 97 78]     197594 198908 
 Subject 6      [152 202 148]  [85 113 83]    253634 259406 
 Subject 7      [139 170 145]  [78 95 81]     198770 200392 
 Subject 8      [143 184 139]  [80 103 78]    224364 224398 

On the issue of valid voxels


Due to spatial distortion and/or head displacement over the course of a scan session, voxels on the edges of the imaged volume may not obtain a full set of data for that session. In pre-processing, such voxels are detected, deemed “invalid”, and are essentially set to 0 for the whole scan session. For the most part, brain voxels of interest are almost always valid.
 
The files named valid*.nii.gz provide information regarding which voxels contain valid data. Invalid voxels exhibit the following behavior:
timeseries*.nii.gz – Invalid voxels have pre-processed time-series data values that are all zeroes over the course of the entire scan session.
mean*.nii.gz – Invalid voxels have a mean intensity of 0.
R2*.nii.gz – Invalid voxels have a GLM variance explained value of NaN.
betas*.[nii.gz,hdf5] – Invalid voxels have betas that are all zeroes. (This is the result of the data being saved in int16 format, which converts NaNs to 0.)
meanbeta*.nii.gz – Invalid voxels have mean betas equal to 0.
onoffbeta*.nii.gz – Invalid voxels have onoffbeta weights equal to NaN.

Note that voxels outside of the brain mask are also set to 0 in the time-series data and in the beta weights; thus, they appear similar to invalid voxels.

Computational tips


The massive scale of the NSD dataset poses some computational challenges. Here we comment on some issues related to computational efficiency. 
File format choices are important. HDF5 provides fast access because it is uncompressed.
Pre-allocation of variables when loading data into memory is important (otherwise, unnecessary time costs are incurred).
Consider using 'single' or 'float' format to save memory usage.
For huge data, breaking up the analysis into chunks may be necessary in order to reduce memory usage (e.g., analyze one subject at a time).
In general, when loading in chunks from an HDF5 file, it is fastest to load chunks from the last dimension. However, the HDF5 files used for the NSD betas were saved with ChunkSize [1 1 1 750], which means that the trials were deliberately chunked together when saved. This was done because in theory, one will probably want to always get all of the trials (from a given set of voxels). Speed benefits for the NSD betas would be obtained when loading chunks from the third dimension (as opposed to the first or second dimensions).
Vectorization of code is important (avoid for-loops if possible).
If averaging across trials for the same image, one can do this efficiently through a single indexing operation (e.g. an indexing matrix that is 3 trials x N images), as opposed to using a for-loop.

Timing issues


Here is how timing issues are dealt with in the NSD dataset:
An empirical audio check of a typical fMRI scanning run (i.e. an NSD run involving 188 volumes at a TR of 1.6 s) indicates the following breakdown: There is 31.8 s from the start of scanner calibration noises to the start of the EPI noises; then, there is 8 s from the start of EPI noises until the start of the first actual recorded fMRI volume (the 8 s is due to dummy fMRI volumes); and, finally, there is 300.8 s (i.e. 188*1.6) from the start of the first recorded fMRI volume until the end of the EPI noises (indicating that data collection is complete). Thus, the dummy fMRI volumes are already dropped and do not show up in the NSD dataset. We consider the start of the first recorded fMRI volume to be time = 0.
The fMRI volumes are acquired at 1600 ms TR, and this is assumed to be exactly accurate. Empirical measurements of scanner triggers, as detected by the stimulus computer, indicate that the difference between successive triggers is consistently between 1599.95 and 1600.12 ms. Some of this variability is due to polling uncertainty. We believe this is good validation that the 1600 ms number can be trusted.
The stimulus computer controls the experiment presentation. The presentation code locks to the display rate of the BOLDscreen monitor, and empirical measurements of the duration of each 5-min (300 s) run come out to consistently between 299.955 s and 299.97 s. Thus, we are confident that the timing of the experimental presentation is highly reliable. Because these values are not exactly 300.000 s, in the pre-processing of the fMRI data, we resample the fMRI data to a sampling rate of 0.999878 s. (Note that 0.999878*300 = 299.9634 s.) Specifically, the high-resolution (func1mm) preparation of the data uses a new sampling rate of 0.999878 s, while the low-resolution (func1pt8mm) preparation of the data uses a new sampling rate of (0.999878)*(4/3) = 1.3331707 s. These numbers are quite close to 1 s and 4/3 s, respectively, and we often abbreviate using those numbers for simplicity.
Note that the fMRI acquisition extends slightly longer than the experiment duration. For example, for a typical NSD run, the experiment lasts 299.9634 s, while the fMRI acquisition lasts 188 * 1.6 = 300.8 s. This is intentional and no cause for concern.
With respect to the pre-processing of the fMRI data, the total duration of the func1mm preparation of each fMRI run is 0.999878 * 301 volumes = 300.96 s. The total duration of the func1pt8mm preparation of each fMRI run is (0.999878)*(4/3) * 226 volumes = 301.29 s. Notice that the two numbers are slightly different, and extend slightly beyond the original extent of the acquisition (1600 ms * 188 volumes = 300.8 s). This is all expected behavior, and is due to how the pre-processing code decides to place the final time points.
After the pre-processing of the fMRI data, it is convenient to simply interpret the fMRI data as being sampled at a rate of 1 s (or 4/3 s), even though that is not exactly accurate.
Slice acquisition order was determined from the DICOM header of the fMRI volumes. In the temporal pre-processing of the fMRI data, all slices were sampled to be coincident with the first (temporally) acquired slices. (Note that multiple slices were “first” because of the multiband acquisition.)
The experimental design comes in 4-s trials; thus, fMRI volumes after pre-processing land exactly on the onset of each trial (4 s is divisible by 1 s and by 4/3 s).
At the beginning of each run, the stimulus computer waits for a trigger to be sent by the MRI scanner, and once the trigger is detected, the computer starts the experiment. Note that there is a brief and somewhat variable (about 5-20 ms) delay that persists between the detection of the trigger and the first stimulus frame shown (e.g. due to the fixed refresh rate of the monitor). Thus, there may be a small (and more or less fixed) delay between the fMRI data and the stimulus frames. This seems like a relatively minor issue: the readout of the first slice in the EPI sequence itself takes some time, so there is already a delay (e.g. half of the readout window) that is essentially being ignored here.
The internal MR scanner clock shows some odd behavior. According to the stored AcquisitionTime header of the EPI DICOMs, we extracted the average duration of each TR volume and that number comes out to 1606.425 ms. This is surprising since the empirical measurements from the stimulus computer indicate that the TR (as reflected in the triggers that are sent by the scanner) is essentially exactly 1600 ms. Checks that we performed strongly suggest that, for the purposes of internal times recorded by the scanner in the DICOMs and in the physiological data, it does seem that the MR scanner believes the DICOMs come at a rate of 1606.425 ms. We found that under the assumptions we make when extracting the physiological data, the physiological data and the DICOM times are very nicely consistent with one another. Moreover, the number of samples that we extract corresponding to the actual fMRI acquisition does empirically turn out to be around 15040-15041, which is essentially exactly 50 Hz for a run duration of 188*1.6=300.8 s. Thus, our working interpretation is that (i) the correct time is being recorded by the stimulus computer; (ii) the MR scanner in fact achieves exactly the time requested (1600 ms TR); (iii) the MR scanner has some strange internal timing system that is internally consistent but which does not match the stimulus computer’s timing, and (iv) the user need not worry about the strange MR scanner timing.

FreeSurfer notes

FreeSurfer includes an internal T1 volume (e.g. mri/T1.mgz). Beware that although this volume contains basically the same image data as the original 0.8-mm anatomical volume that we provided to FreeSurfer, it has some header differences. Thus, if you were to load in the raw image data from the two volumes, in order to get them to match up, you may have to apply a specific set of flips, rotations, and shifts. This is because the orientation and exact positioning of the two volumes are different. A NIFTI-header-aware application that knows how to properly interpret the orientation and origin information will reveal that the two volumes are identical, in the sense that both volumes, when properly interpreted, are in the same position (e.g. (0,0,0) in millimeters corresponds to the same location in the two volumes). The following shows how the image data (ignoring headers) can be matched between the two volumes. 
% load aseg
sourcedata = '~/nsd/nsddata/freesurfer/subj01/mri/aseg.mgz';
vol = cvnloadmgz(sourcedata);

% bring it to our anat0pt8 space
vol = flipdim(flipdim(permute(vol,[1 3 2]),3),1);
volB = zeros(size(vol));
volB(2:end,:,2:end) = vol(1:end-1,:,1:end-1);
Note that we have converted some of the standard FreeSurfer output volumes to conform to the formats used for the NSD data. For example: nsddata/ppdata/subj01/func1pt8mm/aseg.nii.gz
The FreeSurfer surfaces (e.g. lh.white) have coordinates that must be interpreted with respect to the FreeSurfer headers. This is quite tricky, and requires using the FreeSurfer vox2ras and vox2ras-tkr information. Here is the basic idea (see preprocess_nsd_calculatetransformations.m) for how we map FreeSurfer’s surface coordinates to a 1-based coordinate system that corresponds to the official T1 0.8-mm anatomical volume:
newcoord = inv(M)*Norig*inv(Torig)*[tkrR tkrA tkrS 1]’ + 1
where [tkrR tkrA tkrS] are coordinates stored in the surface file, Torig is the output from vox2ras-tkr, Norig is the output from vox2ras, and M is the voxel-to-world transformation from the official T1 0.8-mm anatomical volume. The idea is that we first map from surface coordinates to 0-based pixel (CRS) space (i.e. inv(Torig)), then we map from FreeSurfer’s 0-based pixel space to physical RAS space (i.e. Norig), and then we map from physical RAS space to 0-based pixel space associated with the official T1 0.8-mm anatomical volume. Finally, we add 1 to the coordinates in order to convert to 1-based pixel space (i.e. 1 means the center of the first voxel).
In the diffusion files (nsddata_diffusion), various cortical surfaces are provided in GIFTI format. The coordinates contained in these GIFTI files are "world coordinates" and they are identical to the surface coordinates contained in the usual FreeSurfer surface files after making sure to convert the surface coordinates to physical RAS space.

MNI notes

All NIFTI files that we write are in LPI ordering (the first voxel is Left, Posterior, and Inferior). This applies even to files written by nsd_mapdata in the MNI space. Note that this is the same as what FreeSurfer calls “RAS” ordering, since that nomenclature refers to which directions have increasing voxel indices.
The MNI template (1mm) (borrowed from FSL) has matrix dimensions [182 218 182] and is in RPI ordering (first voxel is right, posterior, inferior). The origin lies at 1-based image coordinates (91,127,73).
NSD files provided in MNI (1mm) space have the same matrix dimensions [182 218 182] and are in LPI ordering. The origin lies at 1-based image coordinates (92,127,73). Note that while the MNI template is in RPI ordering, NSD files that are provided in MNI space are in LPI ordering. When comparing these two types of files in an application that understands and respects the NIFTI header information, everything should be correct and in correspondence.
When using nsd_mapdata to map from MNI to some other space, note that the source data is expected to be in RPI ordering (since that is what the MNI template uses). This means that if one performs analyses of, for example, the NSD beta weights prepared in MNI space (which have LPI ordering), the results need to be flipped along the first dimension before being passed to nsd_mapdata.m.
Furthermore, when trying to map MNI source data, the data should be EXACTLY in the same resolution, matrix size, etc. as the MNI 1mm template. (For example, if your MNI source data is 2-mm, you need to bring it to 1-mm resolution.) There are many ways to do this; one option is resliceniftitomatch.m as provided in  https://github.com/cvnlab/knkutils/ 
When using nsd_mapdata to map to MNI space, note that the output variable is returned to the workspace in RPI ordering. But notice that if you ask nsd_mapdata to write out a NIFTI file, that file has data stored in LPI ordering.
All NIFTI files that we write have the origin set to the exact center of the image slab. The only exception to this is when nsd_mapdata writes out MNI space files: in this case, we set the origin to match that used in the MNI template files.

Other notes

Recorded reaction times in the behavioral data have some rounding error due to the presentation of images at a 10 Hz rate. That is, the stimulus computer both controls image presentation and tries to record button presses. Approximately every 100 ms, the stimulus computer has to do work to present the image, and at these points in time, if there is a button that is pressed, it will be logged a few milliseconds late. (You will see this weird effect if you plot a histogram of a large number of RTs in bin widths of 1 ms.)
Note that the func1pt8mm and func1pt0mm have origins that are in slightly different places. This is because the field of view of the two preparations are different and because we set the origin to be the center of the image slab in both cases.

Transform files


Various coregistration procedures were performed in the pre-processing of NSD data, and the results of these procedures have been written out to a collection of files. Essentially, we have pre-computed a large number of possible mappings that the user might want to perform. These pre-computed transform files are used by the nsd_mapdata utility in order to map data from one space to another, and ordinary users should not need to worry about the contents of these files.

nsddata/ppdata/subjAA/transforms/

This directory contains the set of pre-computed transform files for subject AA.

Note that file format conventions vary across different software packages. Thus, these files are not necessarily "standard" and not necessarily compatible "off the shelf" with a given software package!

The basic form of a filename is "X-to-Y", indicating that this file contains information on how to access data from X for each location in Y. For example, "func1pt0-to-MNI.nii.gz" is a NIFTI file with the dimensionality of the 1-mm MNI space; there are three volumes in this file, corresponding to three spatial dimensions; and each value indicates how to pull from the 1.0-mm functional space. Intuitively, this file provides func1pt0 coordinates in an MNI-like volume.

Our convention is to use image coordinates for volume data. For example, 1 is the center of the first voxel; 2 is the center of the second voxel; and 1.5 is exactly in between the centers of the first and second voxels. Furthermore, our convention is to use 1-based indexing for surface data. For example, 1 indicates the first surface vertex.

To conform to FreeSurfer conventions, files named like "lh.X-to-Y.mgz" indicate how to access data from X for each location in the left hemisphere Y surface. For example, "lh.func1pt8-to-layerB2.mgz" indicates, for each surface vertex in the mid-gray left hemisphere cortical surface, how to pull data from the 1.8-mm functional volume.

For transform files involving fsaverage, all values are indices and not spatial locations (since our convention is to use nearest-neighbor interpolation for fsaverage-related transformations).

Additional documentation can be found in preprocess_nsd_calculatetransformations.m.


Inaccuracy in anatomical to functional space transformation

We recently discovered there is a slight inaccuracy in the transformation from anatomical space to functional space for each given NSD subject. Specifically, in the spatial transformations that are pre-calculated (and used in nsd_mapdata), the transformation from the anatomical to functional space for a given subject was calculated with a minor error (the order of inverse transformations given to ANTS was flipped (see line 202 in  this function )).

Fortunately, the size of the discrepancy is quite small. See the following visualization:



For sake of consistency with existing analyses, we leave the original transformations and data files as-is.

Note that updating the transformation would affect some NSD data files, including: (1) the conversion of surface-based ROIs (e.g. Kastner2015, HCP_MMP1, nsdgeneral, corticalsulc, streams, floc, and prf ROIs) to EPI functional space, and (2) the warping of anatomical volumes to functional space (e.g. T1, T2). We suspect that the differences would be negligible.