Diffusion data

This section covers the measurements and pre-processing of diffusion-weighted magnetic resonance imaging data (dMRI) prepared for the NSD dataset.

Data were preprocessed using publicly available processing pipelines available on  brainlife.io . Preprocessing pipelines were used to remove artifacts as well as possible; see note at the end. After artifact removal/minimization, a series of additional  brainlife.io  pipelines were used to generate and share data derivatives, including minimally preprocessed dMRI data, tractography, and network outputs.

Diffusion (dMRI) data collection

The four diffusion-weighted acquisitions were combined into two runs of diffusion data (referred to as ‘run_1’, ‘run_2’). The two diffusion runs were combined (stacked in the 4th dimension) before being processed. Data preprocessing included susceptibility-weighted, motion, and eddy correction.

Cloud processing via  brainlife.io 

All processing was performed on the reproducible, open cloud-based service known as  brainlife.io .  Brainlife.io  orchestrates large-data storage, processing via open-service code applications (apps), and high-speed large computing resources to quickly and reproducibly process neuroimaging data.

All of the code and pipelines used for processing the data described below can be found on  brainlife.io  and from there on  GitHub.com . A table at the end of this document provides all references to the pipeline used for data processing and generation.

The output files generated are further described below.

Diffusion-weighted imaging (dMRI).

The preprocessed dMRI data were used as the basis for all further modeling and analyses. This includes NIFTI images and the corrected b-values (bvals) and b-vectors (bvecs) in FSL format. These NIFTIs are in alignment with and have the same slice dimensions and voxel size as the official 0.8-mm T1w images provided with NSD (see  Structural data ). All NIFTI-based volume derivatives from the dMRI data maintain the same properties in regards to slice and voxel sizes. (Note that in our preprocessing, we drop the very last acquired volume; hence there is a one-volume mismatch between the number of volumes in the raw data (99, 99, 100, 100 for the four raw diffusion acquisitions) and the number of volumes in the preprocessed data (98 for 'Run 1' (which combines the first two acquisitions) and 99 for 'Run 2' (which combines the second two acquisitions).)

nsddata_diffusion/ppdata/subjAA/run_*/dwi.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dwi.bvecs
nsddata_diffusion/ppdata/subjAA/run_*/dwi.bvals

Signal-to-noise ratio (SNR) quantification.

Following preprocessing and separation of the dMRI data into its component runs, the signal-to-noise ratio (SNR) was computed using  a brainlife.io App  implementing methods available on the scientific library  DIPy.org . The output of this process is a .csv file describing the SNR found across the x-, y-, or z-directions in diffusion-weighted volumes and the SNR across the non-diffusion weighted volumes:
nsddata_diffusion/ppdata/subjAA/run_*/snr/snr.csv

dMRI brain mask.

A brain mask was generated with an  App implementing FSL BET  and used for all dMRI signal modeling and analyses purposes. The brain mask was generated using the preprocessed and combined dMRI data following preprocessing. The same mask was used for all subsequent processing steps:
nsddata_diffusion/ppdata/subjAA/brainmask/mask.nii.gz

Visual area parcellation.

A parcellation of the visual areas was implemented using the 180 multi-modal cortical Atlas (Glasser et al, 2016). The Atlas and areas were imported into dMRI volume space. The areas were used to segment the optic radiation and to generate area-to-area connectivity matrices. A key.txt file is provided also. The file includes the assignment of the voxels into the NIFTI files to the indices of the areas in the parcellation. A label.json file is also provided to includes important information for the parcellation nifti.

nsddata_diffusion/ppdata/subjAA/run_*/visual-area-parcellation/parcellation.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-parcellation/key.txt
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-parcellation/label.json

Diffusion signal modeling and data derivatives

The Diffusion-Tensor Model (DTI; Le Bihan et al., Journal of Magnetic Resonance Imaging, 2001), Diffusion Kurtosis Imaging (DKI; Rosenkrantz et al. Journal of Magnetic Resonance Imaging, 2015), and Neurite Orientation Dispersion Diffusion Imaging (NODDI; Zhang et al. Neuroimaging 2012) models were fit to the dMRI data.

Diffusion Tensor Imaging (DTI).

The fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity maps from the DTI model were generated using methods implementing in MRTrix3 (JD Tournier et al. Neuroimage 2019) as implemented in a  brainlife.io App .
nsddata_diffusion/ppdata/subjAA/run_*/dti/ad.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dti/fa.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dti/md.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dti/rd.nii.gz

Additional parameters were also returned byMRTrix3 given the multi-shell nature of the data.
nsddata_diffusion/ppdata/subjAA/run_*/dti/cs.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dti/cl.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dti/cp.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dti/kurtosis.nii.gz


Diffusion Kurtosis Imaging (DKI).

The implementation of DKI provided by the library  DIPy.org  was used via a  brainlife.io App  to generate DKI model parameter estimates. Both DTI measures (fractional anisotropy, mean diffusivity, axial diffusivity, radial diffusivity), as well as proper DKI measures (axial kurtosis, geodesic anisotropy, mean kurtosis, radial kurtosis), maps were generated. 
nsddata_diffusion/ppdata/subjAA/run_*/dki/ad.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/fa.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/md.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/rd.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/ak.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/ga.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/mk.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/dki/rk.nii.gz

Neurite Orientation Dispersion Density Imaging (NODDI).

The NODDI implementation available in the library AMICO was used via  a brainlife.io App  to generate all parameter estimates. The neurite density, orientation dispersion, and isotropic volume fraction maps were generated. Two fits of the NODDI model were applied per dMRI run. The parallel diffusivity parameter (d//) was changed by run/fit.

The first model fitting was performed with d// = 1.7 x 10-3mm2/s, which is designed for fitting in deep white matter. In the data, this is marked as noddi-wm directory.

The second model fitting was performed with d// = 1.7 x 10-3mm2/s which was found to be the optimal value for gray matter mapping as identified in Fukutomi et al, 2018. This is designated with a noddi-cortex directory. The files within each directory have the same name, and thus we describe one set of directories below.

nsddata_diffusion/ppdata/subjAA/run_*/noddi-{}/ndi.nii.gz
# neurite density index map for either the white matter (wm) or cortex fits
nsddata_diffusion/ppdata/subjAA/run_*/noddi-{}/odi.nii.gz
# orientation dispersion index map for either the white matter (wm) or cortex fits
nsddata_diffusion/ppdata/subjAA/run_*/noddi-{}/isovf.nii.gz
# isotropic volume fraction map for either the white matter (wm) or cortex fits

Constrained Spherical Deconvolution (CSD).

CSD model fits for diffusion tractography across multiple spherical harmonic orders (Lmax=2, 4, 6, and 8) using MRTrix3.

nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax2.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax4.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax6.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/csd/lmax8.nii.gz
nsddata_diffusion/ppdata/subjAA/run_*/csd/response.txt

Tractography.

Whole-brain diffusion tractography was performed using a  brainlife.io App  implementing an advanced version of MRTrix3’s anatomically-constrained tractography (ACT) methodology (McPherson and Pestilli, Communications Biology, 2021). The multi-shell constrained spherical deconvolution (CSD) model was used to identify fiber orientation distributions. Multiple CSD model orders (Lmax) were run, namely 6 and 8, and used to separately generate tractograms. Each tractogram was generated with 1.5 million streamlines. The two  tractograms were merged  into a single tractogram containing 3 million streamlines implementing a simplified version of Ensemble Tractography (Takemura et al., PloS Computational Biology, 2018).

The optic radiations were identified using a novel  brainlife.io App  (Lmax 8) using parallel transport tractography implemented in the software library Trekker (Aydogan et al., IEE TMI, 2021). To identify the termination of the Optic Radiation, the LGN as identified with Freesurfer and V1 as identified by the multimodal parcellation were used. 5,000 streamlines were generated for each hemispheric and optic radiation. Left and right Optic Radiations were then merged to generate a single tractogram containing 10,000 streamlines.

nsddata_diffusion/ppdata/subjAA/run_*/track/track-lmax6.tck
nsddata_diffusion/ppdata/subjAA/run_*/track/track-lmax8.tck
nsddata_diffusion/ppdata/subjAA/run_*/track/track-merged.tck
nsddata_diffusion/ppdata/subjAA/run_*/track/track-optic-radiation.tck

Major white matter tracts segmentation.

The 61 major white matter tracts were segmented using the 3,000,000 whole-brain tractograms. The segmentation was performed using a  brainlife.io App  implementing an improved version of rules provided by the White Matter Query Language (WMQL; Wassermann et al., Brain Structure and Function, 2016). The segmentation outputs are organized into MatLab files (.mat) containing two cell structures:
    .1White Matter Tract Name: the name of each white matter tract (1 x 61 tracts),
    .2White matter Tract-streamline Index: the integer index of each tract for every streamline in the whole-brain, merged, tractogram (1 x 3,000,000 streamlines).
Following the tracts segmentation, a  brainlife.io App  was used to remove outlier streamlines from each tract. Outliers streamlines were defined as those with at least one node x,y,z coordinates more than 3 standard deviations away from the median white matter tract trajectory (i.e., median x,y,z tract coordinates). The resulting outliers' removed white matter tracts classification structure was returned (classification-cleaned.mat). Finally, a classification structure was generated for the optic radiation tractogram (classification-optic-radiation.mat), along with a version with outliers removed (classification-optic-radiation-cleaned.mat).

Note that poor segmentations of the cinguli were returned in both the classification-wholebrain and classification-wholebrain-cleaned.mat files for subj02, subj03, subj07, and subj08.

nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-wholebrain.mat
nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-wholebrain-cleaned.mat
nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-optic-radiation.mat
nsddata_diffusion/ppdata/subjAA/run_*/tract-segmentation/classification-optic-radiation-cleaned.mat

Tract Profiles and macrostructural statistics.

Mapping of DTI, DKI, and NODDI metrics along the core of the segmented whole-brain white matter tracts and the optic radiation using Tract Profiles (Yeatman et al, 2012), and quantitative statistics of macrostructure including tract volume, length, and streamline count provided in a single .csv file following format of AFQ-Browser (Yeatman/Rokem). As  brainlife.io  treats DTI and DKI as the same datatypes (with differentiating datatype tags), profilometry was performed separately on DTI and DKI measures, but NODDI values were computed in both. These two are designated with a specific directory, specifically tract-statistics/dti and tract-statistics/dki. Within each directory includes the profiles for the whole-brain segmentation following streamline outlier removal and the optic radiation segmentation following streamline outlier removal.

nsddata_diffusion/ppdata/subjAA/run_*/tract-statistics/*/tractmeasures-wholebrain.csv # whole-brain segmentation statistics derived from either DTI or DKI models and NODDI
nsddata_diffusion/ppdata/subjAA/run_*/tract-statistics/*/tractmeasures-optic-radation.csv # optic radiation segmentation statistics derived from either DTI or DKI models and NODDI

Visual area networks.

The merged 3,000,000 whole-brain tractogram was used in combination with the visual areas defined by the multi-modal cortical atlas to build a connectivity matrix of the visual system using a  brainlife.io App  implementing MRTrix3's method to build networks.

Multiple network measures were generated. Both standard network measures such as fiber count, density, and length as well as more advanced measures derived from the DTI, DKI, and NODDI model were generated.
Note that the DTI and DKI matrices have been seperated into distinct directories (i.e. visual-area-networks/dti and visual-area-networks/dki). Both directories contain the NODDI matrices generated during the generation of the DTI and DKI matrices. The same networks were then normalized by density. A final network of density normalized by length was also computed. The streamline weights defined by SIFT2 and node assignments are also provided.

nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/density.csv
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/length.csv
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/count.csv
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/{}_mean.csv # DTI, DKI, NODDI measures
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/{}_mean_density.csv # DTI, DKI, NODDI measures normalized by density
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/weights.csv
nsddata_diffusion/ppdata/subjAA/run_*/visual-area-networks/*/assignments.csv

Measures of cortical white matter properties.

Diffusion measures derived from DTI, DKI, and NODDI models were mapped to the ‘midthickness’ surface derived from FreeSurfer following procedures outlined in Fukutomi et al, 2018. Each diffusion model mapping is designated by a cortexmap-{} directory. Within each model directory contains a main directory titled cortexmap. Within this directory are three sub-directories containing various surface gifti (gii) files: func, label, surf.
  • Func contains the diffusion measures for each model mapped to the cortical midthickness surface, including temporal signal-to-noise ratio (tSNR).
  • Label contains the Desikan-Killiany (aparc.a2009s) atlas converted to GIFTI.
  • Surf contains all of the surfaces generated during the procedures, including (but not limited to) the midthickness surface and inflated versions of the midthickness surface. The remaining surfaces are surfaces derived from Freesurfer converted to gifti that were necessary for generating the midthickness surface and for mapping the diffusion model data to the midthickness surface.

Note, the func.gii metric surface files, and the GIFTI derivatives, may not load well into FreeSurfer but will load into Connectome Workbench. To ease the burden on users who are more accustomed to FreeSurfer's outputs, .mgh versions of the metric files are also provided. The GIFTI versions of the pial, white, and .label files are simple conversions of the FreeSurfer outputs using mris_convert. The midthickness surface GIFTI surface, to which the dMRI measures of microstructure were mapped, is nearly identical, although derived slightly differently, to the LayerB2 files described in  Structural data . However, this only matters if a user wants to replicate the cortex mapping analysis, as the number of vertices between the *func.gii files and the Freesurfer surfaces are the same.

nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/func/*/*h.{}.func.gii or *.mgh # hemispheric diffusion measure mapped to midthickness surface in gifti and Freesurfer datatypes
nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/label/*h.aparc.a2009s.native.label.gii # hemispheric Desikan-Killiany (aparc.a2009s) atlas in gifti
nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/surf/*h.midthickness.native.surf.gii # hemispheric midthickness surface in gifti
nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/surf/*h.midthickness.inflated.surf.gii
# hemispheric inflated midthickness surface in gifti
nsddata_diffusion/ppdata/subjAA/run_*/cortexmap/surf/*h.midthickness.very_inflated.surf.gii # hemispheric inflated midthickness surface in gifti

Statistics of cortical midthickness mapped diffusion measures.

Mapping of DTI, DKI, and NODDI metrics to the cortical mid thickness surface within both the Desikan-Killiany (aparc.a2009s) and 180 multi-modal cortical node atlases outputted to .csv files is compatible with the format proposed by AFQ-Browser (Yeatman et al., Nature Communications 2017). As  brainlife.io  treats DTI and DKI as the same datatypes (with differentiating datatype tags), profilometry was performed separately on DTI and DKI measures, but NODDI values were computed in both. These two are designated with a specific directory, specifically cortexmap-statistics/func/dti and cortexmap-statistics/func/dki. Within each directory includes the number of non-zero vertices (COUNT_NONZERO), minimum (MIN), maximum (MAX), average (MEAN), median (MEDIAN), mode (MODE), and standard deviation (STDEV) of each diffusion-based measure within each parcel found in the Desikan-Killiany (aparc.a2009s; aparc) and 180 multi-modal cortical node (hcp-mmp; parc) atlases.

nsddata_diffusion/ppdata/subjAA/run_*/cortexmap-statistics/*/aparc_{}.csv # summary statistic for each DTI or DKI, and every NODDI, measure in every parcel in the aparc.a2009s atlas
nsddata_diffusion/ppdata/subjAA/run_*/cortexmap-statistics/*/parc_{}.csv # summary statistic for each each DTI or DKI, and every NODDI, measure in every parcel in the aparc.a2009s atlas

Colormap for visual-area parcellation

Below is a table of the ROI parcellations and colormap used to generate the visual area networks and images found in the NSD data paper. Note these are not the exact colors as the colors from the HCP_MMP parcellation.
Title
Title
Title
Title
HCP-MMP Parcel
Color (HEX)
HCP-MMP Parcel
Color (HEX)
lh.v1
#000000
rh.v1
#1CE6FF
lh.vmv1
#FFFF00
rh.vmv1
#FF34FF
lh.mst
#FF4A46
rh.mst
#008941
lh.v6
#006FA6
rh.v6
#A30059
lh.v2
#FFDBE5
rh.v2
#0000A6
lh.vmv2
#7A4900
rh.vmv2
#63FFAC
lh.v3
#B79762
rh.v3
#8FB0FF
lh.vmv3
#004D43
rh.vmv3
#997D87
lh.v4
#5A0007
rh.v4
#809693
lh.v8
#FEFFE6
rh.v8
#1B4400
lh.fef
#4FC601
rh.fef
#3B5DFF
lh.pef
#4A3B53
rh.pef
#FF2F80
lh.v3a
#61615A
rh.v3a
#BA0900
lh.v7
#6B7900
rh.v7
#00C2A0
lh.ips1
#FFAA92
rh.ips1
#FF90C9
lh.ffc
#B903AA
rh.ffc
#D16100
lh.v3b
#DDEEFFFF
rh.v3b
#000035
lh.lo1
#7B4F4B
rh.lo1
#A1C299
lh.lo2
#3000018
rh.lo2
#0AA6D8
lh.pit
#013349
rh.pit
#00846F
#372101
#FFB500
lh.mip
#C2FFED
rh.mip
#A079BF
lh.pres
#CC0744
rh.pres
#C0B9B2
lh.pros
#C2FF99
rh.pros
#001E09
lh.pha1
#00489C
rh.pha1
#6F0062
lh.pha3
#0CBD66
rh.pha3
#EEC3FF
lh.te1p
#456D75
rh.te1p
#B77B68
#7A87A1
#788D66
lh.te2p
#885578
rh.te2p
#FAD09F
lh.pht
#FF8A9A
rh.pht
#D157A0
#BEC459
#456648
lh.tpoj2
#0086ED
rh.tpoj2
#886F4C
lh.tpoj3
#34362D
rh.tpoj3
#B4A8BD
lh.dvt
#00A6AA
rh.dvt
#452C2C
lh.pgp
#636375
rh.pgp
#A3C8C9
lh.ip0
#FF913F
rh.ip0
#938A81
lh.v6a
#575329
rh.v6a
#00FECF
lh.pha2
#B05B6F
rh.pha2
#8CD0FF
lh.v4t
#3B9700
rh.v4t
#04F757
lh.fst
#C8A1A1
rh.fst
#1E6E00
lh.v3cd
#7900D7
rh.v3cd
#A77500
lh.lo3
#6367A9
rh.lo3
#A05837
lh.vvc
#6B002C
rh.vvc
#772600
Visual white matter parcel-color correspondence for visual white matter network analyses. HCP-MMP parcel ID and Color (hex) correspondence for scatterplots in Results Figure 5b,c. This is also the order of the nodes found in the network matrices in Results Figure 5b.

Preprocessing applications implemented via  brainlife.io 

Title
Title
Title
Title
Application
Github repository
Open Service DOI
Git branch
Tissue type segmentation
binarize-v1.0
Visual area parcellation
visual-white-matter-glasser-dwi-v1.0
dMRI preprocessing
cuda-v1.0
dMRI-T1 Registration
v1.0
SNR Calculation
plot
Brain mask Generation
dwi
NODDI model fit
1.3
Diffusion Kurtosis Fit
1.1.1
Constrained Spherical Deconvolution Fit
csd_generation-v1.0
Whole-brain Tractography
1.3
Merging Tractography Files
two-tck
Optic radiation Tractography
optic-radiation-v1.2
Structural Connectome
sift2_v1.2_centers_netneuro
White Matter Anatomy Segmentation
3.9
Remove Tract Outliers
1.3
Tract Profiles
1.13
Cortex Tissue Mapping
v1.2-snr-input
Cortical Summary Statistics
v1.1
Description and web-links to the open-source code and open cloud services used in the processing of this dataset.

Additional dMRI data preprocessing and data limitations.

The version of the diffusion derivatives that we provide online have some changes with respect to pre-processing compared to what is demonstrated in the NSD data paper. This was done to improve the quality of the diffusion derivatives with respect to strong slice-motion-eddy interactions in the raw dMRI data.

The preprocessing changes involved using only FSL's Topup and Eddy for preprocessing. It is important to note that although this change in the preprocessing corrected a significant amount of the artifact, it may have completely rid the data of the artifact. See screenshots for examples. Following preprocessing, the preprocessed combined dMRI data were aligned to the anatomical (T1w) image and split into the subsequent runs, and all further processing was performed individually on each run separately.

Example of regions where updated preprocessing improved artifact correction.
Example of regions where updated preprocessing did not completely correct artifact.