Diffusion data

This section covers the measurements and pre-processing of diffusion-weighted magnetic resonance imaging data (dMRI) prepared for the NSD dataset.

Data were preprocessed using publicly available processing pipelines available on  brainlife.io . Preprocessing pipelines were used to remove artifacts as well as possible; see note at the end. After artifact removal/minimization, a series of additional  brainlife.io  pipelines were used to generate and share data derivatives, including minimally preprocessed dMRI data, tractography, and network outputs.

Diffusion (dMRI) data collection

The four diffusion-weighted acquisitions were combined into two runs of diffusion data (referred to as ‘run_1’, ‘run_2’). The two diffusion runs were combined (stacked in the 4th dimension) before being processed. Data preprocessing included susceptibility-weighted, motion, and eddy correction.

Cloud processing via  brainlife.io 

All processing was performed on the reproducible, open cloud-based service known as  brainlife.io .  Brainlife.io  orchestrates large-data storage, processing via open-service code applications (apps), and high-speed large computing resources to quickly and reproducibly process neuroimaging data.

All of the code and pipelines used for processing the data described below can be found on  brainlife.io  and from there on  GitHub.com . A table at the end of this document provides all references to the pipeline used for data processing and generation.

The output files generated are further described below.

Diffusion-weighted imaging (dMRI).

The preprocessed dMRI data were used as the basis for all further modeling and analyses. This includes NIFTI images and the corrected b-values (bvals) and b-vectors (bvecs) in FSL format. These NIFTIs are in alignment with and have the same slice dimensions and voxel size as the official 0.8-mm T1w images provided with NSD (see  Structural data ). All NIFTI-based volume derivatives from the dMRI data maintain the same properties in regards to slice and voxel sizes. (Note that in our preprocessing, we drop the very last acquired volume; hence there is a one-volume mismatch between the number of volumes in the raw data (99, 99, 100, 100 for the four raw diffusion acquisitions) and the number of volumes in the preprocessed data (98 for 'Run 1' (which combines the first two acquisitions) and 99 for 'Run 2' (which combines the second two acquisitions).)

 nsddata_diffusion/ppdata/subjAA/run_*/dwi.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dwi.bvecs 
 nsddata_diffusion/ppdata/subjAA/run_*/dwi.bvals 

Signal-to-noise ratio (SNR) quantification.

Following preprocessing and separation of the dMRI data into its component runs, the signal-to-noise ratio (SNR) was computed using  a brainlife.io App  implementing methods available on the scientific library  DIPy.org . The output of this process is a .csv file describing the SNR found across the x-, y-, or z-directions in diffusion-weighted volumes and the SNR across the non-diffusion weighted volumes:
 nsddata_diffusion/ppdata/subjAA/run_*/snr/snr.csv 

dMRI brain mask.

A brain mask was generated with an  App implementing FSL BET  and used for all dMRI signal modeling and analyses purposes. The brain mask was generated using the preprocessed and combined dMRI data following preprocessing. The same mask was used for all subsequent processing steps:
 nsddata_diffusion/ppdata/subjAA/brainmask/mask.nii.gz 

Visual area parcellation.

A parcellation of the visual areas was implemented using the 180 multi-modal cortical Atlas (Glasser et al, 2016). The Atlas and areas were imported into dMRI volume space. The areas were used to segment the optic radiation and to generate area-to-area connectivity matrices. A  key.txt  file is provided also. The file includes the assignment of the voxels into the NIFTI files to the indices of the areas in the parcellation. A  label.json  file is also provided to includes important information for the parcellation nifti.

 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-parcellation/parcellation.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-parcellation/key.txt 
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-parcellation/label.json 

Diffusion signal modeling and data derivatives

The Diffusion-Tensor Model (DTI; Le Bihan et al., Journal of Magnetic Resonance Imaging, 2001), Diffusion Kurtosis Imaging (DKI; Rosenkrantz et al. Journal of Magnetic Resonance Imaging, 2015), and Neurite Orientation Dispersion Diffusion Imaging (NODDI; Zhang et al. Neuroimaging 2012) models were fit to the dMRI data.

Diffusion Tensor Imaging (DTI).

The fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity maps from the DTI model were generated using methods implementing in MRTrix3 (JD Tournier et al. Neuroimage 2019) as implemented in a  brainlife.io App .
 nsddata_diffusion/ppdata/subjAA/run_*/dti/ad.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dti/fa.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dti/md.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dti/rd.nii.gz 

Additional parameters were also returned byMRTrix3 given the multi-shell nature of the data.
 nsddata_diffusion/ppdata/subjAA/run_*/dti/cs.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dti/cl.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dti/cp.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dti/kurtosis.nii.gz 


Diffusion Kurtosis Imaging (DKI).

The implementation of DKI provided by the library  DIPy.org  was used via a  brainlife.io App  to generate DKI model parameter estimates. Both DTI measures (fractional anisotropy, mean diffusivity, axial diffusivity, radial diffusivity), as well as proper DKI measures (axial kurtosis, geodesic anisotropy, mean kurtosis, radial kurtosis), maps were generated. 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/ad.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/fa.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/md.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/rd.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/ak.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/ga.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/mk.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/dki/rk.nii.gz 

Neurite Orientation Dispersion Density Imaging (NODDI).

The NODDI implementation available in the library AMICO was used via  a brainlife.io App  to generate all parameter estimates. The neurite density, orientation dispersion, and isotropic volume fraction maps were generated. Two fits of the NODDI model were applied per dMRI run. The parallel diffusivity parameter (d//) was changed by run/fit.

The first model fitting was performed with d// = 1.7 x 10-3mm2/s, which is designed for fitting in deep white matter. In the data, this is marked as noddi-wm directory.

The second model fitting was performed with d// = 1.7 x 10-3mm2/s which was found to be the optimal value for gray matter mapping as identified in Fukutomi et al, 2018. This is designated with a noddi-cortex directory. The files within each directory have the same name, and thus we describe one set of directories below.

 nsddata_diffusion/ppdata/subjAA/run_*/noddi-{}/ndi.nii.gz 
# neurite density index map for either the white matter (wm) or cortex fits
 nsddata_diffusion/ppdata/subjAA/run_*/noddi-{}/odi.nii.gz 
# orientation dispersion index map for either the white matter (wm) or cortex fits
 nsddata_diffusion/ppdata/subjAA/run_*/noddi-{}/isovf.nii.gz 
# isotropic volume fraction map for either the white matter (wm) or cortex fits

Constrained Spherical Deconvolution (CSD).

CSD model fits for diffusion tractography across multiple spherical harmonic orders (Lmax=2, 4, 6, and 8) using MRTrix3.

 nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax2.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax4.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax6.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax8.nii.gz 
 nsddata_diffusion/ppdata/subjAA/run_*/csd/response.txt 

Tractography.

Whole-brain diffusion tractography was performed using a  brainlife.io App  implementing an advanced version of MRTrix3’s anatomically-constrained tractography (ACT) methodology (McPherson and Pestilli, Communications Biology, 2021). The multi-shell constrained spherical deconvolution (CSD) model was used to identify fiber orientation distributions. Multiple CSD model orders (Lmax) were run, namely 6 and 8, and used to separately generate tractograms. Each tractogram was generated with 1.5 million streamlines. The two  tractograms were merged  into a single tractogram containing 3 million streamlines implementing a simplified version of Ensemble Tractography (Takemura et al., PloS Computational Biology, 2018).

The optic radiations were identified using a novel  brainlife.io App  (Lmax 8) using parallel transport tractography implemented in the software library Trekker (Aydogan et al., IEE TMI, 2021). To identify the termination of the Optic Radiation, the LGN as identified with Freesurfer and V1 as identified by the multimodal parcellation were used. 5,000 streamlines were generated for each hemispheric and optic radiation. Left and right Optic Radiations were then merged to generate a single tractogram containing 10,000 streamlines.

 nsddata_diffusion/ppdata/subjAA/run_*/track/track-lmax6.tck 
 nsddata_diffusion/ppdata/subjAA/run_*/track/track-lmax8.tck 
 nsddata_diffusion/ppdata/subjAA/run_*/track/track-merged.tck 
 nsddata_diffusion/ppdata/subjAA/run_*/track/track-optic-radiation.tck 

Major white matter tracts segmentation.

The 61 major white matter tracts were segmented using the 3,000,000 whole-brain tractograms. The segmentation was performed using a  brainlife.io App  implementing an improved version of rules provided by the White Matter Query Language (WMQL; Wassermann et al., Brain Structure and Function, 2016). The segmentation outputs are organized into MatLab files (.mat) containing two cell structures:
White Matter Tract Name: the name of each white matter tract (1 x 61 tracts),
White matter Tract-streamline Index: the integer index of each tract for every streamline in the whole-brain, merged, tractogram (1 x 3,000,000 streamlines).
Following the tracts segmentation, a  brainlife.io App  was used to remove outlier streamlines from each tract. Outliers streamlines were defined as those with at least one node x,y,z coordinates more than 3 standard deviations away from the median white matter tract trajectory (i.e., median x,y,z tract coordinates). The resulting outliers' removed white matter tracts classification structure was returned ( classification-cleaned.mat ). Finally, a classification structure was generated for the optic radiation tractogram ( classification-optic-radiation.mat ), along with a version with outliers removed ( classification-optic-radiation-cleaned.mat ).

Note that poor segmentations of the cinguli were returned in both the classification-wholebrain and  classification-wholebrain-cleaned.mat  files for subj02, subj03, subj07, and subj08.

 nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-wholebrain.mat 
 nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-wholebrain-cleaned.mat 
 nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-optic-radiation.mat 
 nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-optic-radiation-cleaned.mat 

Tract Profiles and macrostructural statistics.

Mapping of DTI, DKI, and NODDI metrics along the core of the segmented whole-brain white matter tracts and the optic radiation using Tract Profiles (Yeatman et al, 2012), and quantitative statistics of macrostructure including tract volume, length, and streamline count provided in a single .csv file following format of AFQ-Browser (Yeatman/Rokem). As  brainlife.io  treats DTI and DKI as the same datatypes (with differentiating datatype tags), profilometry was performed separately on DTI and DKI measures, but NODDI values were computed in both. These two are designated with a specific directory, specifically tract-statistics/dti and tract-statistics/dki. Within each directory includes the profiles for the whole-brain segmentation following streamline outlier removal and the optic radiation segmentation following streamline outlier removal.

 nsddata_diffusion/ppdata/subjAA/run_*/tract-statistics/*/tractmeasures-wholebrain.csv  # whole-brain segmentation statistics derived from either DTI or DKI models and NODDI
 nsddata_diffusion/ppdata/subjAA/run_*/tract-statistics/*/tractmeasures-optic-radation.csv  # optic radiation segmentation statistics derived from either DTI or DKI models and NODDI

Visual area networks.

The merged 3,000,000 whole-brain tractogram was used in combination with the visual areas defined by the multi-modal cortical atlas to build a connectivity matrix of the visual system using a  brainlife.io App  implementing MRTrix3's method to build networks.

Multiple network measures were generated. Both standard network measures such as fiber count, density, and length as well as more advanced measures derived from the DTI, DKI, and NODDI model were generated.
Note that the DTI and DKI matrices have been seperated into distinct directories (i.e. visual-area-networks/dti and visual-area-networks/dki). Both directories contain the NODDI matrices generated during the generation of the DTI and DKI matrices. The same networks were then normalized by density. A final network of density normalized by length was also computed. The streamline weights defined by SIFT2 and node assignments are also provided.

 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/density.csv 
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/length.csv 
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/count.csv 
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/{}_mean.csv  # DTI, DKI, NODDI measures
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/{}_mean_density.csv  # DTI, DKI, NODDI measures normalized by density
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/weights.csv 
 nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/assignments.csv 

Measures of cortical white matter properties.

Diffusion measures derived from DTI, DKI, and NODDI models were mapped to the ‘midthickness’ surface derived from FreeSurfer following procedures outlined in Fukutomi et al, 2018. Each diffusion model mapping is designated by a  cortexmap-{}  directory. Within each model directory contains a main directory titled cortexmap. Within this directory are three sub-directories containing various surface gifti (gii) files:  func, label, surf .
Func contains the diffusion measures for each model mapped to the cortical midthickness surface, including temporal signal-to-noise ratio (tSNR).
Label contains the Desikan-Killiany (aparc.a2009s) atlas converted to GIFTI.
Surf contains all of the surfaces generated during the procedures, including (but not limited to) the midthickness surface and inflated versions of the midthickness surface. The remaining surfaces are surfaces derived from Freesurfer converted to gifti that were necessary for generating the midthickness surface and for mapping the diffusion model data to the midthickness surface.

Note, the  func.gii  metric surface files, and the GIFTI derivatives, may not load well into FreeSurfer but will load into Connectome Workbench. To ease the burden on users who are more accustomed to FreeSurfer's outputs, .mgh versions of the metric files are also provided. The GIFTI versions of the pial, white, and .label files are simple conversions of the FreeSurfer outputs using mris_convert. The midthickness surface GIFTI surface, to which the dMRI measures of microstructure were mapped, is nearly identical, although derived slightly differently, to the LayerB2 files described in  Structural data . However, this only matters if a user wants to replicate the cortex mapping analysis, as the number of vertices between the *func.gii files and the Freesurfer surfaces are the same.

 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/func/*/*h.{}.func.gii  or  *.mgh  # hemispheric diffusion measure mapped to midthickness surface in gifti and Freesurfer datatypes
 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/label/*h.aparc.a2009s.native.label.gii  # hemispheric Desikan-Killiany (aparc.a2009s) atlas in gifti
 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/surf/*h.midthickness.native.surf.gii  # hemispheric midthickness surface in gifti
 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/surf/*h.midthickness.inflated.surf.gii 
# hemispheric inflated midthickness surface in gifti
 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/surf/*h.midthickness.very_inflated.surf.gii  # hemispheric inflated midthickness surface in gifti

Statistics of cortical midthickness mapped diffusion measures.

Mapping of DTI, DKI, and NODDI metrics to the cortical mid thickness surface within both the Desikan-Killiany (aparc.a2009s) and 180 multi-modal cortical node atlases outputted to .csv files is compatible with the format proposed by AFQ-Browser (Yeatman et al., Nature Communications 2017). As  brainlife.io  treats DTI and DKI as the same datatypes (with differentiating datatype tags), profilometry was performed separately on DTI and DKI measures, but NODDI values were computed in both. These two are designated with a specific directory, specifically  cortexmap-statistics/func/dti  and  cortexmap-statistics/func/dki.  Within each directory includes the number of non-zero vertices (COUNT_NONZERO), minimum (MIN), maximum (MAX), average (MEAN), median (MEDIAN), mode (MODE), and standard deviation (STDEV) of each diffusion-based measure within each parcel found in the Desikan-Killiany (aparc.a2009s; aparc) and 180 multi-modal cortical node (hcp-mmp; parc) atlases.

 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap-statistics/*/aparc_{}.csv  # summary statistic for each DTI or DKI, and every NODDI, measure in every parcel in the aparc.a2009s atlas
 nsddata_diffusion/ppdata/subjAA/run_*/cortexmap-statistics/*/parc_{}.csv  # summary statistic for each each DTI or DKI, and every NODDI, measure in every parcel in the aparc.a2009s atlas

Colormap for visual-area parcellation

Below is a table of the ROI parcellations and colormap used to generate the visual area networks and images found in the NSD data paper. Note these are not the exact colors as the colors from the HCP_MMP parcellation.
Visual white matter parcel-color correspondence for visual white matter network analyses. HCP-MMP parcel ID and Color (hex) correspondence for scatterplots in Results Figure 5b,c. This is also the order of the nodes found in the network matrices in Results Figure 5b.

Preprocessing applications implemented via  brainlife.io 

Description and web-links to the open-source code and open cloud services used in the processing of this dataset.

Additional dMRI data preprocessing and data limitations.

The version of the diffusion derivatives that we provide online have some changes with respect to pre-processing compared to what is demonstrated in the NSD data paper. This was done to improve the quality of the diffusion derivatives with respect to strong slice-motion-eddy interactions in the raw dMRI data.

The preprocessing changes involved using only FSL's Topup and Eddy for preprocessing. It is important to note that although this change in the preprocessing corrected a significant amount of the artifact, it may have completely rid the data of the artifact. See screenshots for examples. Following preprocessing, the preprocessed combined dMRI data were aligned to the anatomical (T1w) image and split into the subsequent runs, and all further processing was performed individually on each run separately.

Example of regions where updated preprocessing improved artifact correction.
Example of regions where updated preprocessing did not completely correct artifact.