Functional data (NSD)

This covers GLM results for the NSD experiment. The goal of the main GLM analysis of the NSD data was to estimate single-trial betas (BOLD response amplitudes) for each voxel.
File format issues for betas﻿
The files that contain NSD betas are very large. The units of the prepared betas are percent signal change. However, for some of ther NSD data files that we have prepared, the betas have been multiplied by 300 and converted to int16 format to reduce space usage. Upon loading the beta files, the values should be immediately converted back to percent signal change by casting to decimal format (e.g. single or double) and dividing by 300.
 
For volume-based format of the betas, two versions have been prepared:
NIFTI (.nii.gz). These data are in int16 format. A liberal brain mask has been applied such that non-brain voxels have been zeroed-out in order to save disk space. The .gz indicates that the files are compressed (to save disk space). The advantage of .nii.gz format is that it is standard and easy-to-use, but the disadvantage is that the files must be uncompressed when loading and must be completely loaded into memory.
HDF5 (.hdf5). These data are in int16 format. '/betas' is X voxels x Y voxels x Z voxels x 750 trials. A liberal brain mask has been applied such that non-brain voxels have been zeroed-out. This file is in HDF5 format (with a specific chunk size of [1 1 1 750]) in order to enable very fast random access to small parts of the data file. A disadvantage of this format is that the file is uncompressed and therefore large in size.
 
Here is an example of how to use MATLAB to quickly load all 750 single-trial betas associated with 5 voxels from a single scan session, using h5read.m.
data = h5read('betas_session01.hdf5','/betas',[10 10 10 1],[1 1 5 750]);
Note that these are 1-indexed (due to MATLAB’s convention), and hence we are loading the 10th, 11th, 12th, 13th, and 14th voxels along the third dimension.
Results of a simple ON-OFF GLM﻿
Besides the single-trial GLM, the NSD were also analyzed with a simple ON-OFF GLM in order to derive some useful quantities.
﻿
nsddata/ppdata/subjAA/func*/onoffbeta_sessionBB.nii.gz
﻿
This is the beta (in percent signal change units) obtained, for session BB, for a simple GLM model that describes experiment-related variance with a simple ON-OFF predictor (one condition, canonical HRF).
﻿
subj01/func1mm/onoffbeta_session10.nii.gzsubj01/func1mm/onoffbeta_session10.nii.gz
﻿
nsddata/ppdata/subjAA/func*/onoffbeta.nii.gz
﻿
This is the average (using nanmean.m) of the onoffbeta across all sessions.
 
nsddata/ppdata/subjAA/func*/R2_sessionBB.nii.gz
﻿
This is the voxel-wise variance explained (0-100) for the simple ON-OFF GLM model for session BB.
﻿
subj01/func1mm/R2_session10.nii.gzsubj01/func1mm/R2_session10.nii.gz
﻿
nsddata/ppdata/subjAA/func*/R2.nii.gz
nsddata/freesurfer/subjAA/label/[lh,rh].R2.mgz
﻿
This is the voxel-wise variance explained, averaged across all sessions (using nanmean.m).
Results of single-trial GLM﻿
For single-trial GLM, we analyzed the time-series data from the NSD experiment using 3 different GLM models. The identifiers for these models are:
betas_assumehrf (beta version 1; b1) - GLM in which a canonical HRF is used
betas_fithrf (beta version 2; b2) - GLM in which the HRF is estimated for each voxel
betas_fithrf_GLMdenoise_RR (beta version 3; b3) – GLM in which the HRF is estimated for each voxel, the GLMdenoise technique is used for denoising, and ridge regression is used to better estimate the single-trial betas.
﻿
The interpretation of the betas obtained from these GLMs is that they are the BOLD response amplitudes evoked by each stimulus trial relative to the baseline signal level present during the absence of a stimulus (“gray screen”). Note that betas are expressed in percent signal change by dividing the full set of amplitudes obtained for a voxel by the grand mean intensity observed for that voxel in a given scan session and then multiplying by 100. 
 
Betas are provided both in the subject-native volume spaces (func1mm and func1pt8mm), a subject-native surface space (nativesurface) as well as in group spaces (fsaverage and MNI). Details on the nativesurface and group spaces are provided later.
 
Note that to save disk space, the ‘betas_assumehrf’ version is provided for the func1pt8mm space but not for the func1mm space.
 
nsddata_betas/ppdata/subjAA/func*/betas_*/betas_sessionBB.[nii.gz,hdf5]
 
These are single-trial betas (that have been multiplied by 300 and converted to integer format). The betas are in chronological order. There are 750 betas since there are 750 stimulus trials in each scan session (after concatenating all 12 runs). The betas correspond to the data acquired in session BB for subject AA.
 
nsddata_betas/ppdata/subjAA/func*/betas_*/meanbeta.nii.gz
nsddata_betas/ppdata/subjAA/func*/betas_*/meanbeta_sessionBB.nii.gz
 
For each session, the mean of all single-trial betas is calculated (meanbeta_sessionBB); then, this mean is averaged across all scan sessions (meanbeta). The result is a volume that indicates the voxel-wise average single-trial beta obtained for subject AA. (Please note that although the file format is single, the values must still be divided by 300 in order to return to percent signal change units.)
﻿
subj05/func1mm/betas_fithrf/meanbeta.nii.gzsubj05/func1mm/betas_fithrf/meanbeta.nii.gz
﻿
nsddata_betas/ppdata/subjAA/func*/betas_*/R2.nii.gz
nsddata_betas/ppdata/subjAA/func*/betas_*/R2_sessionBB.nii.gz
 
This contains the variance explained by the GLM model in each session (R2_sessionBB), and the average of this quantity across all sessions (R2). Please note that the R2 values for the ‘betas_assumehrf’ and ‘betas_fithrf’ models are probably not very useful given that these models are very flexible and can essentially fit nearly all of the variance in a given time-series (even if the time-series has no reliable stimulus-evoked responses). In contrast, the R2 for the ‘betas_fithrf_GLMdenoise_RR’ may be useful given that the ridge-regression regularization does shrink the model according to the response reliability that appears to be in the data for each given voxel. NaNs are possible in R2_sessionBB.nii.gz for invalid voxels. For R2.nii.gz, we compute the mean using nanmean.
﻿
subj05/func1mm/betas_fithrf_GLMdenoise_RR/R2.nii.gzsubj05/func1mm/betas_fithrf_GLMdenoise_RR/R2.nii.gz
﻿
nsddata_betas/ppdata/subjAA/func*/betas_*/R2run_sessionBB.nii.gz
 
This contains the variance explained by the GLM model calculated separately for each run in a given session.
 
nsddata_betas/ppdata/subjAA/func*/betas_*/HRFindex_sessionBB.nii.gz
nsddata_betas/ppdata/subjAA/func*/betas_*/HRFindexrun_sessionBB.nii.gz
 
Index of the chosen HRF for each voxel (integer between 1 and 20). This is estimated for each run in a session (HRFindexrun_sessionBB). The final HRF used to analyze the entire session of data is determined by combining results across runs (HRFindex_sessionBB).
﻿
subj05/func1mm/betas_fithrf_GLMdenoise_RR/HRFindex_session15.nii.gzsubj05/func1mm/betas_fithrf_GLMdenoise_RR/HRFindex_session15.nii.gz
﻿
nsddata_betas/ppdata/subjAA/func*/betas_*/FRACvalue_sessionBB.nii.gz
 
The fractional regularization level chosen for each voxel. Note that invalid voxels (e.g. outside of brain) are given a fraction of 1.
﻿
subj05/func1mm/betas_fithrf_GLMdenoise_RR/FRACvalue_session15.nii.gzsubj05/func1mm/betas_fithrf_GLMdenoise_RR/FRACvalue_session15.nii.gz
Single-trial GLM results in nativesurface format 
nsddata_betas/ppdata/subjAA/nativesurface/betas_*/[lh,rh].betas_sessionBB.hdf5
 
These files contain betas in the native FreeSurfer surface space for a given subject. They are saved in .hdf5 format to allow for very rapid access to subsets of the available vertices.
 
To generate these betas, we take the 1-mm subject-native volume betas, resample via cubic interpolation onto the subject-native cortical surfaces (which exist at 3 different depths), and average the resulting betas across depths. The resulting matrices have dimensions vertices x trials (and are separated by hemisphere).
 
Note that the betas are saved in int16 format and are multiplied by 300. In the case of missing data in a given scan session (i.e., due to head motion, a spatial location is moved out of the imaging field-of-view), it is possible that vertices have their betas set to all zeros. (There are very few instances where data are missing for cortical surface vertices; see  nsddata/information/knowndataproblems.txt  for more information. To detect such cases, one can simply check in each scan session whether all betas for a given vertex are equal to 0.) The ‘ChunkSize’ for the .hdf5 files is [1 T] where T is the total number of trials; this makes loading of all of the trials for single vertex (or small group of vertices) very fast.
 
Here is an example of how to use MATLAB to quickly load all 750 single-trial betas associated with the first 100 vertices from a single scan session.
 
data = h5read(‘lh.betas_session01.hdf5','/betas',[1 1],[100 750]);
 
Note that the indices in MATLAB are 1-based.
Single-trial GLM results in group spaces (fsaverage, MNI) 
The primary advantage of the subject-native spaces is that they provide the highest-resolution version of the NSD data. However, group analyses may be of interest, and one may want to transform the NSD data to group spaces prior to analysis. (Note that in theory, one can perform analyses of subject-native data and then transform to group spaces at the very end of the analysis process; this will likely give similar but not identical results.)
 
The group space versions of the betas are obtained by taking the betas in the subject-native 1-mm volume space and then resampling the betas to the group spaces (more details on the resampling procedures for fsaverage and MNI is provided below). Thus, there is some additional interpolation (and loss of resolution) inherent in the group-space betas.
 
nsddata_betas/ppdata/subjAA/fsaverage/betas_*/[lh,rh].betas_sessionBB.mgh
 
These files contain betas in the FreeSurfer fsaverage space. To generate these betas, we start with the subject-native surface format (i.e. take the 1-mm subject-native volume betas, resample via cubic interpolation onto the subject-native cortical surfaces (which exist at 3 different depths), average the resulting betas across depths), but then we additionally map via nearest-neighbor interpolation to the fsaverage surface. Note that the betas are saved in decimal format and are in percent signal change units (i.e. they are not multiplied by 300). In the case of missing data, it is possible that betas will have NaNs. Here is a simple example: 
a1 = load_mgh('lh.betas_session04.mgh');
>> size(a1)
ans =
      163842           1           1         750
 
nsddata_betas/ppdata/subjAA/MNI/betas_fithrf/betas_sessionBB.nii.gz
 
These files contain betas in MNI space. To generate these betas, we take the 1-mm subject-native volume betas and resample them via cubic interpolation into MNI space. Note that values are in int16 format and are multiplied by 300. Note that voxels with invalid data for a given scan session (either because data was missing from the subject-native volume or because the location is outside of the subject-native brain mask) will have their betas set to all zeros. Finally, note that to save disk space, we provide only the ‘betas_fithrf’ version of the betas in MNI space (we do not include ‘betas_assumehrf’ nor ‘betas_fithrf_GLMdenoise_RR’).
 
nsddata_betas/ppdata/subjAA/MNI/betas_fithrf/valid_sessionBB.nii.gz
 
These files correspond to betas_sessionBB.nii.gz and indicate which voxels contain valid data for each given scan session.
Noise ceiling 
Noise ceiling estimates have been computed based on the trial-to-trial reliability of the beta weights. In essence, the more repeatable the response across repeated presentations of an image, the more variance in the response can be attributed to a stimulus-related signal. These noise ceiling estimates are useful for putting an upper bound on the amount of variance that can be explained/predicted in a given voxel’s (or vertex’s) beta weights. Formal description of the statistical theory behind the noise ceiling calculation can be found in the NSD data paper.
 
nsddata_betas/ppdata/subjAA/func*/betas_*/ncsnr.nii.gz
nsddata_betas/ppdata/subjAA/fsaverage/betas_*/[lh,rh].ncsnr.mgh
nsddata_betas/ppdata/subjAA/nativesurface/betas_*/[lh,rh].ncsnr.mgh
﻿
These files provide the noise ceiling signal-to-noise ratio (ncsnr) for each voxel (or vertex). These ncsnr values are computed on basis of all of the beta weights obtained in all NSD scan sessions. Values are generally between 0 and 0.6 but can go higher (a subset of voxels/vertices will be exactly 0, and this is expected behavior given the nature of the procedure). Invalid voxels (e.g. outside the brain) are given a value of NaN. The ncsnr can be easily converted into noise ceilings (see below). The "ncsnr_split1" and "ncsnr_split2" files reflect calculations of the ncsnr value from two independent splits of the images available for each given subject.
﻿
subj05/func1mm/betas_fithrf_GLMdenoise_RR/ncsnr.nii.gzsubj05/func1mm/betas_fithrf_GLMdenoise_RR/ncsnr.nii.gz
Conversion of ncsnr to noise ceiling percentagesIn the NSD data paper, we explain that the noise ceiling (NC) can be expressed as:
﻿
Add a caption...
where sigma_signal is the standard deviation of the signal and sigma_noise is the standard deviation of the noise. But how can this be computed based on knowledge of the noise ceiling signal-to-noise ratio (ncsnr)? Before deriving that result, consider the fact that the user may wish to average together responses across several trials conducted for each image. By averaging, the user is effectively reducing the variance of the noise. Since we are assuming that the noise is Gaussian-distributed, the effective noise variance becomes:
﻿
Add a caption...
where n is the number of trials that are averaged together. We can now re-write the noise ceiling as:
﻿
Add a caption...
Dividing the numerator and denominator by sigma_noise2, we obtain
﻿
Add a caption...
which further reduces to
﻿
Add a caption...
This shows how the noise ceiling for a given voxel can be computed from its ncsnr value.
﻿
One complication is that one might be using a preparation of the data in which different images have different numbers of trials that are averaged together. To flexibly deal with any potential scenario, we can use a weighted average to pool variance estimates across different images and re-write the noise ceiling equation as:
﻿
Add a caption...
where A is the number of data points that reflect 3 trials, B is the number of data points that reflect 2 trials, and C is the number of data points that reflect 1 trial. With some algebra, we can then re-write the noise ceiling equation as follows:
﻿
Add a caption...
Notice that this equation is simply a more general version of the earlier noise ceiling equation.
Technical notesThe b3 betas are appropriate only for brain regions where there is some expectation that the BOLD response will be consistent across the repetitions of a given image. This is because the regularization level is based on cross-validation of responses to image repetitions.
Ridge regression tends to shrink betas and therefore induces bias for percent signal change to be closer to 0. We apply a post-hoc scale and offset to the b3 betas to approximately match what is observed for unregularized betas (see NSD data paper for details). If absolute units of percent signal change are of specific interest, the betas_fithrf (b2) preparation is more straightforward to interpret and is therefore recommended for use instead.
If one seeks to perform connectivity-based analyses that look for correlations in betas across voxels (or regions), there may be large differences in results comparing b1 and b2 against b3. The general expectation is that the GLMdenoise procedure (which is incorporated as part of b3) will tend to remove global signal correlations that may exist in the fMRI data. 
In the ncsnr values that are provided, there are occasionally high values outside the brain; this is likely an artifact due to an interaction between the fact that imaging artifacts tend to have low temporal frequencies and the specific temporal distribution of repeated trials in the NSD experiment.
﻿