fMRI: being quantitative

IntroductionOld-school fMRI often involves looking for and finding blobs. This can be viewed as a qualitative analysis goal. Presumably, trying to be quantitative is harder and more insightful than being qualitative about fMRI data.
The overarching goal of this page is to address the general question "How do you carefully distill complex raw fMRI data into meaningful numbers?"
It is necessary to have a solid foundation for being quantitative about fMRI before we can hope to make valid comparisons across ROIs and subjects.
Part of the challenge in being quantitative is that fMRI has many confounding factors (e.g., pulse sequence parameter choices (e.g. TE); physiological drug state (e.g. caffeine); vasculature; variations in SNR; variations in head motion and other factors that are not of intrinsic interest).
Other reasons why it's challenging to be quantitative:
fMRI generally speaking has very low SNR
voxel selection is often involved and will affect the numerical results
lots of variability across (human) subjects (some of which are neurally real; others of which are not of interest)
lots of different choices in analysis (every choice is going to have a downstream impact)
Hence, you have to know your statistical principles thoroughly and be a careful data analyst.
Quantities and unitsRaw images - We might call the units of raw brain volumes as "raw scanner units". In such data, the scale or gain of the magnitudes is arbitrary. For example, you can multiply the entire scan by a constant, and that would be fine.
Pre-processed images - Here we mean preprocessing steps that are "gentle" and don't do that much to the data (e.g. registration, motion correction, correcting spatial distortion). We still are essentially in raw scanner units.
BOLD percent signal change - This is NO LONGER raw units. This metric is invariant to the scale of the original data. It is unitless (it's a fraction (or percentage)).
tSNR (temporal SNR) - mean of voxel (deemed to be "signal") divided by standard deviation over time (deemed to be "noise"). This metric is unitless and insensitive to gain. This metric is sensitive to additive offset. Often used to compare spatially across the brain (e.g. to find where tSNR is low).
Note: tSNR is sort of dumb if there are signals in your dataset.  tSNR is treating real signals as "noise".  But note that practically it doesn't make a huge difference whether or not you compensate for real evoked BOLD responses (since BOLD responses are quite small (on the order of just 1-3%).
CNR (contrast-to-noise ratio). 
In fMRI, usually we are talking about contrast of two states (e.g. activated, deactivated). Measurement of these two states typically occur at different times.
However, it could also refer to contrast across space (that's more for anatomical analyses).
Note that TE will influence the BOLD percent signal change you will observe. To get the best SNR-images, one would want to choose a low TE (i.e. sample the signal when it's strongest). But to get the best CNR, one would not want to choose maximally low TE but rather a delayed TE. This example helps explain the difference between SNR (defined in the tSNR sense) and CNR.
z-scoring - For example, for each voxel's time series within a scan session, one could subtract the mean and divide by the standard deviation. We are again unitless (you could say the units are standard deviations). Note that we lose the absolute magnitude (because we've removed the mean).
t values - Difference in mean normalized by standard error of that difference. The units are standard errors (where standard error is typically the standard deviation divided by sqrt of number of data points).
Does the t-value depend on the multiplicative scaling of the data on which it's computed? [NO]
Does the t-value depend on an additive offset introduced to the data? [NO]
What factors cause the t-value to increase or decrease? [if you increase the number of data points (e.g. by more trials, or longer trial durations), the t-value will increase] [if you cause a bigger BOLD difference, or if you reduce noise]
MVPA percent correct - This can be viewed as a multivariate t value. Hence, it inherits the same types of issues as t values.
correlation - For example, the Pearson correlation between a boxcar predictor and the pre-processed time-series data of a voxel. Bounded between -1 and 1. We can ask the same old questions: does it depend on scaling of the data? [NO] does it depend on additive offsets on the data? [NO] Does it depend on number of trials? [NO]
Note: with more data, there WILL be some numerical fluctuations of correlation, but the expected average behavior will be stable.
With more and more data, the correlation is stable, BUT we will achieve more and more confidence of its "true" value, and the statistical significance will get better and better.
variance explained - Imagine the context of quantifying how well a computational model predicts some held-out preprocessed time-series data. How does this quantity change with amount of data in the held-out set? How about with noise in the held-out data?
unitless metrics - like ratios, angles (e.g. the angle made by the response to A and the response B interpreted in the 2D Cartesian plane), indices (e.g. (A-B)/(A+B)).
﻿
What factors affect a given quantity?Does the distance to the RF coil change the quantity?
What about scanner drift (e.g. due to gradient heating)? What does it do to your metric?
Is your analysis sensitive to a multiplicative scaling of the data?
What happens with increasing amounts of noise? "If my voxel had more noise, what would happen?"
What happens with increasing amounts of samples? "If I had more data, what would happen?"
﻿
Properties of the voxel/brainIdeally, we are trying to quantify properties of the brain. And not properties of how we measure it. We want metrics that are INSENSITIVE to boring aspects of how we measure it (e.g. amount of data, level of noise in a subject, the level of PSC that happens to exist in a brain region, etc.).
Examples:
Anatomical things (e.g. size of the brain)
Functional metrics that are based on ratios.
Preferred visual field position (pRF center).
﻿
IndependenceWhen performing fMRI, in what circumstances have we obtained a new sample?
What counts as a sample? You can consider trials, runs, scan session, and/or subjects.
However, VOXELS AND BRAIN AREAS ARE NOT SAMPLES! All voxels/brain areas are sampled simultaneously in the fMRI measurement. And, there are tons of "noise correlations" in a single fMRI volume.
Note that computing standard errors is not a problem per se. The problem comes when you INTERPRET those standard errors as actually indicative of the sampling variability in your measurements. If all you are doing is describing the variability of the results across voxels, that is fine. But if you are trying to assess the reliability of the results, you need to quantify variability across trials (or runs or sessions or sessions).
﻿
Valid comparisonsHow can we validly compare two voxels?
How can we validly compare two ROIs?
How can we validly compare two subjects?
﻿
﻿