Ground-truth simulations

Ground-truth simulations

Why are these useful?

  • Can explore parameters (cheaply) to explore model behaviors. Make a prediction machine (build a model, make predictions).
  • Directly compare the efficacy of different methods in a very controlled setting
  • Does a given method actually produce anything close to what the truth is?
  • Exercise your code. Test your code.
  • Test your thinking (testing your knowledge/understanding/survey of the system). Face with Monocle
  • Test your thinking (testing basic/fundamental principles, e.g., of statistics).
  • There are limits to intuition: simulating a complex model in your head is sometimes impossible or really hard.
  • Exploring extreme ranges of parameters and seeing if things break down, get better, etc.
  • Can use for optimizing stimulus/experimental design.
  • Models are cheap. Experiments performed on models are cheap.
  • Quantify the estimation error of a method (and seeing how tolerable/good/bad it is.)

What are the limitations of ground-truth simulations?

  • Is there a ground truth even? I.e., how do you know that the ground truth that you are using is actually accurate?
  • If you are very concerned about the accuracy of your model in your simulations, that's really the domain of modeling (as opposed to simulations).
  • You have to think about the GOAL of your ground-truth simulations:
  • In some cases, it doesn't matter much if the ground truth you simulate is distinct from the system. Maybe the point of your simulation still holds even if it is semi-inaccurate.
  • Sometimes you WANT a simplified system, because it helps you distill and isolate a key concept that you are trying to demonstrate.
  • In other cases, it might matter a lot about the accuracy of your ground truth. For example, if you do an fMRI simulation based on a woefully inaccurate HRF, it could be that the conclusions you obtain are very inaccurate...

Assumptions in the simulations / designing a simulation

  • Obviously, the conclusions one draws from a simulation are dependent on all of the assumptions that go into the simulations.
  • Note that assumptions aren't necessarily bad...
  • Assumptions are useful, e.g., for compressing our understanding and compressing the details of the system we are studying
  • The very act of spelling out assumptions is hugely useful.
  • It would be nice to not hard code anything (make your code general, declare all of the numerical constants)
  • It would be nice to be modular (i.e., accept any form of "ground-truth anatomy" or whatever).
  • Generative models (forward models) - ways of specifying a system such that you can generate new data points to your heart's content (as opposed to discriminative models...)
  • Signal and noise. You have to be very careful in how you define and operationalize these things. "What type of noise did you assume?" This could be very important. Both the signal and noise are critical components of a simulation. The ground-truth arguably includes both signal and noise, and we need to carefully think about both components.

Some examples of simulations in the neuroimaging / computational world

  • Simulating brain responses to stimulus properties (e.g. pRF, example:  https://daslab.shinyapps.io/viewPRF/ )
  • Simulating realistic fMRI time-series data in response to different types of experimental designs (block vs event, trial durations, rest periods, etc; example,  https://daslab.shinyapps.io/BetaSeriesSim/ )
  • Simulating noisy anatomical data and seeing how well signal processing methods can handle them
  • Simulating statistical simulations, linear algebra, stuff like that (example:  https://daslab.shinyapps.io/LogisticRegression/ ).
  • Simulations of pure noise and seeing what analysis results are produced
  • Note the close connection to permutation / shuffling / resampling approaches for analyzing real data
  • Simulations of pure signal and seeing what analysis results are produced

Definitions

  • Ground truth - some system that you control and declare is the signal.
  • It's easier to think about deterministic systems...
  • But sometimes you have stochastic systems (e.g. probabilistic models)
  • Noise - stuff that gets added into your measurement that you aren't necessarily interested in and/or that you don't actually know where it comes
  • Estimation - The idea that we are trying to guess/derive/infer some underlying system parameter based on some limited and noisy data.
  • Recovery - Like estimation, but where we know what the 'ground truth' is, and the emphasis is on matching the estimates to the true value.
  • Metrics [see  https://arxiv.org/abs/2201.09351 ]
  • Error - the overall deviation between an estimate and the ground truth. It includes both bias and variance.
  • Bias - the discrepancy between the expected value of your estimate and the ground truth. In general, we don't want bias.
  • Variance - the variability of your estimate across repeated experiments.