RSA noise ceilings

RSA noise ceiling
define where the noise comes from: repeated trials to the same stimulus. (note: this counts as "noise" potentially interesting cognitive effects (e.g. attention, memory).)
if you have infinite trials, then the RDM has no noise, and so noise ceiling is 100%
(we are not talking about group-average noise ceilings)
the challenge is that noise is not independent from voxel to voxel
hence, we have to model/characterize what this noise is!
resting-state
spontaneous activity has structure (i.e. correlations) across voxels / brain areas
also shows up in the "residuals" after removing "task-evoked" responses
noise correlations
traditional topic in computational neuroscience
trial-to-trial variability for a fixed stimulus exhibits weak pairwise correlations across neurons
connection to MVPA
LDA... deals/accommodate with noise covariance... if you have enough data to estimate the covariance, this can lead to better classification performance.
signal correlations
in the absence of noise, there are likely correlations across units in terms of their representation (e.g. nearby V1 neurons have correlated orientation tuning curves)
people are interested in whether noise correlations follow signal correlations
if you compute correlations between two units using different trials, this removes noise correlations and helps focuses on signal correlations
PCA
'greedy' approach to figure out directions of common shared variance
importantly, it is influenced (heavily) by noise correlations
PCA is matched well to multivariate Gaussian distributions!
cross-validated PCA idea (covariance computed from distinct trials)
condition number
condition number of a matrix is the ratio between maximal and minimal singular values
deep connection to experimental design, regression stability, etc.
note: if data are perfectly confined to a subspace, the condition number is infinite
in a sense, the lower the condition number, the better (inversion is more stable)
if there are fewer data points than dimensions, the covariance matrix is singular (i.e. condition number is infinite)
covariance
X: m observations x n voxels           [a la functional connectivity]
X: m data points x n dimensions     [a la clustering]
X: m voxels x n stimuli                       [a la RSA]
cov(X) is proportional to X'*X. it is basically a bunch of correlations.
note that cov.m removes the mean of each column
sigma = sqrt(det(cov(X))) is equal to sqrt of product of eigenvalues of cov(x). And is the "hypervolume" of the ellipsoid.
when m < n, sigma is 0. in other words, one of the eigenvalues is 0 and the condition number is infinite!
statistical approach
key concept is the multivariate Gaussian distribution
if you sum two Gaussian functions, you do not necessarily get a Gaussian. but if you sample from two different Gaussian distributions and sum the samples, the result will be distributed according to a Gaussian distribution. Gaussian distributions summate!
let's assume the signal is distributed according to a multivariate Gaussian (i.e. natural images in voxel space)
let's assume the noise is distributed according to a multivariate Gaussian (and is the same, regardless of the stimulus)
approach: build 'generative' model of multivoxel responses. then everything follows.
same general strategy as the univariate 'encoding model' noise ceiling (as laid out in the NSD data paper): estimate (co)variance of the full data's distribution; subtract signal and estimate the (co)variance of the noise distribution; subtract the two, producing the estimate the (co)variance of the signal distribution.
steps:
estimate signal and noise distributions from empirical data﻿1a. estimate full data distribution﻿1b. subtract signal estimate and estimate noise distribution﻿1c. subtract noise distribution from full data distribution and estimate signal distribution
use Monte Carlo simulations to derive the expected noise ceiling for a given analysis flavor
use PCA to investigate nature of signal distribution and nature of noise distribution, and more!
estimating covariance matrices is hard when data are limited
shrinkage
regularize the covariance estimate
check how well an estimated covariance matrix generalizes to left-out data (cross-validation)
shrinking the off-diagonal elements to 0 tends to make the cloud more ball-like and the condition number smaller (i.e. better)
scientific applications (what good is all this technical stuff for?)
trustable RSA noise ceilings will help assessment of model performance
corrected PCA results might yield interesting/meaningful dimensions (e.g. by ignoring the influence of noise correlations might magically reveal stuff)
better understanding of whether trial-to-trial variability (resting-state activity) matches stimulus-evoked activity?
todo:
more testing, more thinking
simulations to verify correctness and to check accuracy of estimates
﻿