Computational analysis

Computational analysis

Overall scope:

  • Let's consider the case of somebody who wants to be proficient at conducting computational data analysis
  • e.g. writing R/Python/MATLAB code that implements some mathematical analyses
  • We aren't talking about the more general code of trying to use black-box tools on various data.
  • Related, but not quite the same, are these pages:  🔢Basic math ,  🛠️Coding tips ,  🐰Modeling concepts 

3 levels:

  • Conceptual
  • High-level descriptions of what analyses do, why they're useful, how different analysis approaches are similar/different, how this helps understand brain stuff, how the analyses relates to some overarching theory of the brain
  • Mathematical
  • describing a specific analysis at a mathematical/abstract level (not lines of code).
  • what is the fundamental model or algorithm that you are trying to implement
  • critical is the desire for clear mathematical and/or statistical principles
  • sometimes, some analyses/preprocessing are more heuristic in nature (as opposed to principled)
  • often, the Methods section of a paper has mathematical descriptions, e.g. matrix notation; mathematical notation; dimensionality of the dataset; etc.
  • lines, planes, Gaussians, sinusoids, polynomials, exponential
  • Programmatic
  • refers to the actual code implementation for a given analysis
  • efficiency (compute time, memory usage)
  • functions vs. scripts
  • how general-purpose is your code?

Recommendations:

  • Divide and conquer (with respect to the 3 levels). Figure out which bit you need improvement on.
  • If you are trying to deeply understand an analysis, or if you are trying to code an analysis yourself, understanding the conceptual level precedes the mathematical level which precedes the programmatic level.
  • If you are just "using" some tool, you can just stop at conceptual level (in theory)
  • Considering sketching your code architecture out before you start programming. (A flow chart is like an outline for your code.)
  • For big data, there are many additional woes/headaches. See nsdabudhabi.
  • Speed vs. accuracy. When designing code, you may find yourself sacrificating some accuracy for speed.
  • Consider whether you should worry about precision issues.

Things to consider when "coding it yourself":

  • Who is the audience?
  • Corner cases. Make sure your tool won't blow up in some niche cases.
  • API issues
  • Consider your scope. Carve your analysis into separate stages.
  • Input expectations. What types of inputs are valid for the user to give to you?
  • Output. How much to expose to the user. Variable naming. Probably best to shy on the side of returning more, not less.
  • Generality of your code.
  • Do you assume idiosyncratic data file formats, or work with general data formats?
  • How do you approach outputs (figures, results)? Do you save files to disk, or allow the user to do that?
  • What assumptions about the data (or experiment/stimuli) are you making? Tell the user about these assumptions?
  • Consider re-use of tools that exist. I.e., don't reinvent the wheel. (One idea is to reinvent a stone wheel, but maybe ultimately use a modern wheel.)

The divide between the tool and the user:

  • Often, you will be both the tool maker and the tool user
  • When you wear your "tool maker" hat:
  • Consider using ground-truth simulated data
  • Consider using tried-and-true classic experimental data
  • When you wear your "tool user" hat:
  • Be methodological and careful on how you apply your tool (e.g. organize your units, areas, subjects, sessions, trials)

How to learn?

  • The main approach is just do it. Learn by doing.
  • Tutorials. In order to learn computational analysis, small-scale and didactic examples might be useful. Are there any out there?
  •  https://dartbrains.org/content/GLM_Single_Subject_Model.html 
  •  https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials 
  • Maybe take some existing code and play with it. Change inputs, change parameters.