RSA noise ceilings

RSA noise ceilings

  • RSA noise ceiling
  • define where the noise comes from: repeated trials to the same stimulus. (note: this counts as "noise" potentially interesting cognitive effects (e.g. attention, memory).)
  • if you have infinite trials, then the RDM has no noise, and so noise ceiling is 100%
  • (we are not talking about group-average noise ceilings)
  • the challenge is that noise is not independent from voxel to voxel
  • hence, we have to model/characterize what this noise is!
  • resting-state
  • spontaneous activity has structure (i.e. correlations) across voxels / brain areas
  • also shows up in the "residuals" after removing "task-evoked" responses
  • noise correlations
  • traditional topic in computational neuroscience
  • trial-to-trial variability for a fixed stimulus exhibits weak pairwise correlations across neurons
  • connection to MVPA
  • LDA... deals/accommodate with noise covariance... if you have enough data to estimate the covariance, this can lead to better classification performance.
  • signal correlations
  • in the absence of noise, there are likely correlations across units in terms of their representation (e.g. nearby V1 neurons have correlated orientation tuning curves)
  • people are interested in whether noise correlations follow signal correlations
  • if you compute correlations between two units using different trials, this removes noise correlations and helps focuses on signal correlations
  • PCA
  • 'greedy' approach to figure out directions of common shared variance
  • importantly, it is influenced (heavily) by noise correlations
  • PCA is matched well to multivariate Gaussian distributions!
  • cross-validated PCA idea (covariance computed from distinct trials)
  • condition number
  • condition number of a matrix is the ratio between maximal and minimal singular values
  • deep connection to experimental design, regression stability, etc.
  • note: if data are perfectly confined to a subspace, the condition number is infinite
  • in a sense, the lower the condition number, the better (inversion is more stable)
  • if there are fewer data points than dimensions, the covariance matrix is singular (i.e. condition number is infinite)
  • covariance
  • X: m observations x n voxels [a la functional connectivity]
  • X: m data points x n dimensions [a la clustering]
  • X: m voxels x n stimuli [a la RSA]
  • cov(X) is proportional to X'*X. it is basically a bunch of correlations.
  • note that cov.m removes the mean of each column
  • sigma = sqrt(det(cov(X))) is equal to sqrt of product of eigenvalues of cov(x). And is the "hypervolume" of the ellipsoid.
  • when m < n, sigma is 0. in other words, one of the eigenvalues is 0 and the condition number is infinite!
  • statistical approach
  • key concept is the multivariate Gaussian distribution
  • if you sum two Gaussian functions, you do not necessarily get a Gaussian. but if you sample from two different Gaussian distributions and sum the samples, the result will be distributed according to a Gaussian distribution. Gaussian distributions summate!
  • let's assume the signal is distributed according to a multivariate Gaussian (i.e. natural images in voxel space)
  • let's assume the noise is distributed according to a multivariate Gaussian (and is the same, regardless of the stimulus)
  • approach: build 'generative' model of multivoxel responses. then everything follows.
  • same general strategy as the univariate 'encoding model' noise ceiling (as laid out in the NSD data paper): estimate (co)variance of the full data's distribution; subtract signal and estimate the (co)variance of the noise distribution; subtract the two, producing the estimate the (co)variance of the signal distribution.
  • steps:
    estimate signal and noise distributions from empirical data1a. estimate full data distribution1b. subtract signal estimate and estimate noise distribution1c. subtract noise distribution from full data distribution and estimate signal distribution
    use Monte Carlo simulations to derive the expected noise ceiling for a given analysis flavor
    use PCA to investigate nature of signal distribution and nature of noise distribution, and more!
  • estimating covariance matrices is hard when data are limited
  • shrinkage
  • regularize the covariance estimate
  • check how well an estimated covariance matrix generalizes to left-out data (cross-validation)
  • shrinking the off-diagonal elements to 0 tends to make the cloud more ball-like and the condition number smaller (i.e. better)
  • scientific applications (what good is all this technical stuff for?)
  • trustable RSA noise ceilings will help assessment of model performance
  • corrected PCA results might yield interesting/meaningful dimensions (e.g. by ignoring the influence of noise correlations might magically reveal stuff)
  • better understanding of whether trial-to-trial variability (resting-state activity) matches stimulus-evoked activity?
  • todo:
  • more testing, more thinking
  • simulations to verify correctness and to check accuracy of estimates