MVPA and related issues

Basic version of what MVPA is:Vanilla MVPA: Run a "searchlight" across the brain and run some multivariate classifier to try to assess where in the brain there is information about some experimental manipulation of interest
MVPA is heavily used in cognitive neuroscience
TerminologyMVPA: multivariate pattern analysis. A.k.a., multivoxel, pattern classification, decoding, multivariate decoding, etc. One definition (very broad) is the use of multiple brain units simultaneously in some analysis. So this would actually include things like PLS, PCA!
classification: supervised learning problem where the dependent variable is discrete in nature. (x (independent variables, features, etc.) ⟶ y (dependent variable, classes, etc.))
multivariate: referring to the fact that usually you have more than one x (i.e. more than one feature)
searchlight: some metaphor for a window/selection of local neurons/voxels
cross-validation: the act of testing some estimated "model" on some new unseen data
overfitting: a model having too much flexibility can explain/capture variance in a given set of data in a way that is "too much" (in a pejorative sense)... i.e. it will fail to generalize
voxel selection: the experimenter has to decide somehow what "features" (e.g. voxels) to use in some given analysis. Oftentimes, voxel selection is well-intentioned: for example, you wouldn't want to include white matter voxels.
noise: any unknown variability of brain responses across, e.g., trials. Noise is the reason that MVPA is challenging. Anything that affects SNR (broadly construed) is going to show up in your MVPA results.
confound: NOT noise; something that is reliably correlated/associated with some other thing that you actually care about.
classifiers: there are a wide variety of different "algorithms" (e.g. LDA, SVM, nearest-neighbor, Naive Bayes, split-half correlation) that can attempt to solve a given classification problem
train / test split: Typically, one reserves some amount of data for cross-validation (test data); the remainder of the data is used to estimate parameters (training data). Note that there may be other similar terms out there (validation, out-of-sample, novel data, testing, etc.). A common way to perform a train/test split is n-fold cross-validation where the data are split into n equal parts and then the cross-validation is done n times, each time holding out one of the parts.
representational similarity analysis: A set of analysis approaches based on quantifying the "similarity structure" across experimental conditions into a RDM (representational dissimilarity matrix). An RDM is like MVPA on steroids (e.g. do MVPA for all pairs of conditions).
information: MVPA is often described as a way to establish that there exists information in such and such brain region.
Resources:﻿ Searchlight analysis: Promise, pitfalls, and potential ﻿
﻿ Etzel2011 ﻿
﻿ Pereira2009 (A good primer on mvpa) ﻿
﻿ MVPA meanderings ﻿
﻿ Kendrick has a few slides from a long time ago ﻿
﻿ Resolving Ambiguities of MVPA Using Explicit Models of Representation ﻿
﻿ Encoding and decoding in fMRI ﻿
﻿ Cross-validation and permutations in MVPA ﻿
﻿ NSD Abu Dhabi materials include some stuff ﻿
Software tools:scikit-learn, nilearn
MATLAB Stats Toolbox? (LDA, SVM)
custom home grown code?
libsvm (some industry standard C-based SVM library...)
There are certainly a zoo of "neuroimaging" specific toolboxes out there...
AFNI 3DSVM
PyMVPA
The Decoding Toolbox (MATLAB)
MotivationTypical reasons:
MVPA is more sensitive than other analyses (e.g. univariate (one-voxel-at-a-time) analyses).
Because it seems cool and fancy
Information in the brain might importantly be coded somehow in many units (neurons/voxels/etc.)
﻿
An example of how the brain itself might code in some higher dimension than single unitsAn example of how the brain itself might code in some higher dimension than single units
It may provide a more effective characterization of what a brain area "does".
Engineering. MVPA can be a useful way to get some outcome (e.g. brain-machine interfaces). Like, decoding as the actual goal of an analysis.
Prediction/machine learning. E.g. learn a reliable biomarkers of some phenotype.
As a first step towards assessing a system: Like, first establish if a given brain region has reliable selectivity/tuning before doing more detailed analyses
Often MVPA implies an obsession with percent correct.
RSA often does not have that connotation.
MVPA is not necessarily a theory of brain. Keep in mind the difference between data analysis and a brain theory.
MachineryHow should we select the machinery (i.e. the classification method) for MVPA?
Various methods
Be a penguin and do what other people do
Think about the dimensionality of the problem and choose a method that is optimal for what you are doing
First, start with the goal of the analyses and then choose accordingly. Maybe the goal is to "understand the data better", which would then motivate simple-minded classification methods
SVM: as off-the-shelf method, it has generally very good robustness to large numbers of features...
LDA: has nice interpretation properties (you get a "generative model" of the different classes). A downside is that estimating the parameters is hard (takes a lot of data)
Naive Bayes: has benefit of easy interpretation and can also perform well even with limited data
Empirically, many different classification algorithms on neuroimaging data often yield "similar" results.
How to split your data?
There are many ways, and there is not necessarily a single "right" way.
Some issues to consider
You want to try to get independent sets of data (i.e. where noise is independent across splits)
If you want to stress-test your model, you might want to deliberately test generalization on "hard data".  The stickler / hard-ass could claim that training and testing on data that come from, e.g., the same scan session is too "easy" and demand testing on some separate scan session.
There are a number of statistical issues (see ﻿ 🐰⁠⁠Modeling concepts⁠  ).
There are practical time considerations too — running many splits might take too much computational time.
IssuesDebacle of "how" information is represented... Orientation in V1 and/or visual features in visual cortex. Is it fine-scale or is it coarse-scale? Radial bias, salt or pepper, etc.?
Voxel selection — the spatial scale of your MVPA analyses needs to be 'matched' to the relevant neural population.
Be careful about interpreting your searchlight result: If there is a single voxel that is informative, then ANY searchlight that includes that voxel might have good classification performance.
Statistical significance, i.e., how does one establish the statistical significance level for some MVPA-style analysis? A common, effective approach is permutation of, for example, condition labels. 
Be aware of knobs/hyperparameters: there are many choices that go into any single MVPA-style analysis. Know what you are doing, and bear in mind how the choices might affect your final bottom-line results.
﻿
﻿
﻿
﻿
﻿
﻿