Experimental design issues

What are the main practical choices that a typical visual/cognitive neuroscience study might decide with respect to how to design an fMRI experiment?
﻿
The issues mentioned on this page are diverse and complicated. The goal of this page is to get you thinking. Once you have a complete documented plan of an fMRI experiment, you should reach out to an fMRI expert to give it a deep think (unless you yourself are an expert). Every experiment is different and unique and designing it well really requires specific knowledge of all of the interacting choices. And even though there are a huge number of issues to consider, bear in mind that not all issues are critical or matter depending on your target of study.
﻿
In order get the best possible data, think of it as "you get only one shot". So, you need to think about all possibilities and contingencies and things that can go wrong, and try to prevent them. The space of possible experiments is essentially infinite, so the closer you get to the "best possible" experiment, the better.
Why is fMRI experimental design hard? Noise. The SNR in the BOLD response is modest. Hence, every little optimization matters.
Nonstationarities related to MR imaging, the brain, "human" reasons, or operator variability. You want to make sure your experiment "averages" nonstationarities out. 
Timing/synchronization. People often have timing bugs.
Human time is limited. Staying awake and focused for long periods of time is very hard, and there is only so much that a human can take.
Baseline signal levels are tricky to estimate well.
The BOLD signal in fMRI is sluggish, the slow HRF will result in overlap between responses.
Stimulus delivery / equipment / eyetracking is hard within the scanner. Moreover, keeping the body/head still means that motor experiments are especially hard to run.
Note: if your target of study is more cognitive in nature, then there are at lot more experimental design issues to consider and handle than if your target of study is more sensory in nature. 
General flavors of experiments"Block" experiments: Conditions/events of interest are deliberately/intentionally very long (~12-30 s long). Presumably, these blocks are ignoring or blurring over fine-scale neural/cognitive stuff occurring within the block (e.g. "individual events").
"Event-related" experiments. Experimenter has a "finer" grain sense of what a single important cognitive/neural event is, and the experiment doesn't try to group/block these things together (and, e.g., randomly samples them over time).
Note that technically, whether you consider it block or event is really referring to how the experimenter interprets/analyzes a given set of data. The definition of an event can be defined by the experimenter, or by the subject's behavioral response.
Periodic: Some periodic modulation of some dimension of interest (e.g. every 30 s). For example, a rotating contrast-reversing checkerboard wedge (people analyze the Fourier phase at stimulus frequency) 
Continuous: For example, watching a complex movie. 
Resting-state: Where you don't really ask the subject to do anything in particular except for stay awake and fixate a cross. (Or, arguably, you are asking them to rest.)
Physiology-triggered experiments: Triggering the onset/timing of some experimental manipulations based on some physiological measure (cardiac / respiration)
Feedback-based experiments / BCI: where you analyze the data from the brain in real-time and that changes what you tell/show the subject. You could imagine an experiment where the subject's behavior (e.g. self-reported perceptual state) triggers or starts the experiment.
Major issuesStimulus/condition durationThis is the typical block vs. event-related choice.
Major advantage of block design (where the "same" stimulus is repeated or sustained for a long time (e.g. 10-30 s)) is bigger, stronger BOLD responses.
Major disadvantage of block design is it's not very informative (very few distinct conditions). Also, the longer the block duration, the less trials you can squeeze in, so there is some balancing to be done.
Major advantage of event-related design is it's flexible and you can get a lot of trials and distinct conditions in.
Major disadvantage of event-related design is that SNR is generally lower and so if you are too aggressive, you'll get useless data. Also, there is a heavy reliance on the assumption of temporal linearity/summation.
NSD is 3-s ON / 1-s OFF = 4-s trials. (Relative to other studies, this is somewhat fast. But of course from a vision point of view, it is really really slow.)
Number of distinct conditions vs. number of trials per condition.This is a typical trade-off that people have to consider. It sort of depends on how critical it is that you can very reliably measure individual conditions...  Sometimes this is important; sometimes this is not. If you are in a modeling framework, you probably care less about individual conditions and more care about sampling many different configurations of features.
Blank/rest periods.Typically you need some amount of this in order to have a reasonable estimate of what the baseline signal level is (so that you know whether your experimental conditions evoke positive (or negative) BOLD responses relative to baseline (e.g. look at gray screen))
Overall time commitmentsRun (period of continuous scanner during which your subjects are "on task") -— How long should runs be? Suggest: 3-10 minutes
If too low in duration, it's a waste due to scanning overhead time (like each scan might take an extra 1 min prior to actual data collection); also, you need your subjects to achieve "steady state". Also, it may be psychologically daunting to have a large number of runs?
If too high in duration, it's really tiring (fatigue [and motion fatigue (sneezing)], fixation becomes really hard to maintain), the file sizes start getting somewhat annoyingly large, if something craps out it's more likely to have worked if the run was short. For long runs, you are inviting the possibility of further instabilities having to do with motion and fieldmap-drift stuff.
Number of runs / scan duration (number of minutes of actual fMRI data): maximum of 8 long runs, 10-12 moderate long runs, or 16 short runs? A good range maybe: 40 min - 80 min of actual functional runs?
It sort of depends on how fun your experiment is, what type of person you are scanning, etc.
With long sessions, arousal/attention/comfort will wane.
You can monitor behavioral performance over the course of the session, so you can check!!
AdaptationWithin a long block of similar conditions, you may not actually drive neural activity very much due to adaptation effects (e.g. if you have a sequence of faces, you may be adapting after the first couple of faces). Hence, the benefits of block designs may be less than you expect.
Trial orderingNonstationarities are annoying (e.g., suppose your subject was more sleepy or blinked more in run 4 (or, alternatively, run 4 was subject to egregious hardware instabilities), and it just turns out run 4 is where you stuck all of one type of condition). 
The typical solution is to "distribute/randomize" events/conditions as much as possible. (For a multi-session experiment, consider distributing across that too. As well as scanning at the same time of day.)
Context effects / serial dependence (does the response on a given trial depend on what happened before?) If so, and if that's not of your interest, one strategy is to try to just shuffle/randomize/etc. Note that if you have a tractable number of conditions, perhaps you could do a fully balanced design (e.g., where every condition type is preceded by every other condition type).
Pseudorandom designs can be used to try to balance out any strange order effects. (e.g. pseudorandom order of the conditions within each of multiple runs)
Purely random orderings are risky (because of flukes!). Pseudorandom implies the experimenter still exerts some control over the process. For example, you can generate several random sequences and do some basic checks to make sure you didn't get unlucky. Or another example, if you truly randomly sample, you may not actually get the same number of trials across conditions.
Some issues to think about: do you keep the same order across subjects or attempt to counterbalance across subjects? Does each subject see the same experiment or not?
If you randomize or pseudorandomize for each individual subject, there is no guarantee that the order that group A got was somehow the "same" as group B. Maybe group A got lucky, but group B didn't. Hence, you should consider actually analyzing the random outcomes and/or enforcing control over the process.
But, note that some experiments actually are interested in studying the effect of time on brain activity, in which case randomizing is not what you would do.
JitteringShould you allow variable spacing between conditions/trials?
It's complicated, since it depends on the GLM design matrix that you intend to employ (e.g. if you are trying to estimate the HRF using a finite impulse response (FIR) model, or if you are trying to analyze your data using a fixed assumed HRF). What is optimal for one target analysis may not necessarily be optimal for another target analysis.
There are some psychological effects of jittering (but whether you care is a separate question). I.e. jitter design is less predictable, so that may be a distinct advantage. I.e., the subject is anticipating the start/type of trial, this psychological/arousal effect can be observed in the BOLD response (e.g., work by Eli Merriam & Zvi Roth, Aniruddha Das).
In general, there may not be that much of an argument to jitter. For example, NSD used a fixed 3-s ON 1-s OFF design (with occasional blank trials).
TaskLet's define "task" as the behavioral goal (and perhaps also, the cognitive strategies) that you want your subject to have in the context of sitting in the scanner and participating in your experiment.
Consider exactly what you task your subject with, because it can/will matter to the brain responses.
Consider if your experiment involves task switching (this is a whole topic of study in cognition).
Consider how well your subjects UNDERSTAND the task?
Consider formulating very very specific wording for task instructions.
Consider "training" your subject beforehand?
Consider having rigorous evidence that they did the task you wanted?
Consider giving an explicit instructions to the subject immediately before each run? (what if they forget? or swap buttons in their response?)
Are there "task"/arousal/engagement effects (cf Roth PLOS Biology 2021) in your experiment? In general, you don't want these global confounding effects. (Could even affect continuous designs, not just slow event-related designs, e.g. in slow changing bistable perception:  https://www.jneurosci.org/content/33/5/2188.abstract )
Feasibility of doing the actual task (maybe it's too tiring or annoying)?TRY YOUR EXPERIMENT OUT IN THE SCANNER TO SEE HOW YOU FEEL.
In fact, you may want to do actual real training with your subjects before you do the actual experiment. E.g. using even a mock scanner.
SubjectsHow are you going to incentivize and wheedle your subjects to being good little mice? Use money? Use your verbal encouragement? Use threat of scanning them again? Use beer? Use snacks? Pre-select subjects who want to be winners? Screen your subjects before scanning them?
Training your subjects.
Are they experienced? Do they know what it means to be good subject? Do they understand that breathing influences BOLD and so they should breathe at a regular pace (without breathholding).
Do they understand the task?
Do they really know the level of head motion that is damaging for fMRI?
The worry about psych'ing your subjects out. (I.e. if you put too much burden on your subjects?)
It often happens that lab members scan each other.
Some pitfalls: some experiments need naive observers (people who don't know what the experiment is about). Some people murmur complaints about 'authors as subjects'. It is also not clear that lab people necessarily generate better SNR (since part of the quality of BOLD data depends on intrinsic vascular characteristics, etc.).
Decide who your target subjects are? Super subjects vs. naive subjects vs. patients. Consider using mock scanning environments where you can train and/or filter out wiggle monsters ﻿.
How big is your number of subjects? People fall in widely different camps (large n, like >=16, or small n like <=8).
MAKE SURE YOU POLITELY ASK YOUR SUBJECT TO USE THE RESTROOM PRIOR TO PUTTING THEM IN.
How do you manage and interact with your subject?
Be nice. Be clear with your expectations. Be efficient in your acquisition procedure. Pack them comfortably. Don't mess up on your end. Tell them how long the scan is going to be. If the subject wants you to show an entertaining movie during the setup and/or anatomicals, do that. If the subject doesn't want to talk to you, don't make them talk to you. You can consider just using button presses to communicate with your subject.
Feel them out in terms of their tiredness. Give them breaks if they want it. (Even after the break is done, the actual experiment doesn't actually start until probably like ~1-2 min after that.) Don't give them radio silence (i.e. periods of time where they can't hear or see anything) if that freaks them out.
If you can monitor them during the scan (either through behavior or eye-tracking video), and you notice some bad performance from your subject, consider POLITELY SCOLDING, STOPPING THE SCAN, etc. You could also re-order scans, and/or repeat a run (if that is compatible with your overall experimental design).
EyetrackingIn general, it's really hard to get good data, so don't expect it to just "work".
And it increases setup time significantly. It may make your subject mad/annoyed.
And you may face a situation where the "calibration" just doesn't work —> so you have to make a game-time decision as to what to do.
Physiological data (pulse, respiratory)?If you have streamlined procedures and can reliably perform them (e.g. like 1-2 extra minutes), seems useful to acquire.
Head motionGentle or aggressive cushioning?
Bite bar (it's horrible, and may not actually help in the end)
Head cast, head case (it can/does help but can be uncomfortable and costs money)
Low-cost solution for giving subjects feedback on their head motion:  tape  (on forehead onto the coil)
Training/instruction, possibly stick them in a mock scanner?
﻿ FIRMM  (real-time monitoring of how much the head is moving); prospective motion correction (slices are adjusted in real-time), use navigators [ old reference ,  applied to fMRI ].
Feedback to subjects about their motion (e.g. the cross changes color if the subject motion is detected to exceed a threshold)
The distinction between slow drifts and transient head motion (FD). For fMRI, it's the transients that are of concern.
Motion tracking?Some device in the scanner that might track subject's head position?
Alternatively, if the pulse sequence allows it (prospective motion correction (i.e. correction of the slice slab during the actual sequence)), this might be very helpful.
EquipmentLock everything down. Everything needs to be reproducible from day to day. Every time you set up and tear down, go through reproducible reliable procedures.
BehaviorBeing able to get instant behavioral results for your experiment (after each run or as each trial is done) is super helpful.
Helps you check that your subjects are awake!
Fatigue... does it matter to you if subjects do worse and worse as the runs go on? Do you want to monitor this in behavior and/or the actual fMRI data?
Do you give subjects feedback during the actual experiment (e.g. a little "green" checkmark or even a text results screen)? It's a hard issue — one advantage is doing it is it makes the subject feel better and know if they are doing the right thing. The main disadvantage is it introduces yet more cognitive confounds/phenomena into your experiment. You could, for example, tell your subjects their performance after the run is over (and when scanning isn't happening).
Multiple scan session experiments (fMRI)If an extra session is prescribed for structurals and localizer, no big deal
But if you are trying to pool/aggregate fMRI data across sessions, there are lots of ways in which the data will change across days:
Awakeness, arousal
Behavioral performance
Shim / fieldmap stuff
Head motion jitteriness
Physiology (e.g., caffeine, hunger state)
Time of day
Development, potentially (the brain shape may change over time!)
"raw %BOLD magnitude" may vary from day to day
NoiseProbably the number one plague for fMRI?
What are all the potential sources of response variability?
head motion (THIS IS A BIGGIE)
thermal noise (even a phantom fluctuates)
low-frequency drifts
subjects mental state (sleep / eyes-closed, cognitive variability)
cardiac/respiratory noise
neural adaptation (real effects, but not something you understand / care about)
Your solutions can be usefully divided into acquisition solutions vs. analysis solutions
Efficiency and statistical powerIssues at stake here: interstimulus intervals, the order of your conditions, the number of trials per condition, the number of runs, and things like that.
If you know exactly how you are going to analyze your fMRI time-series data, THEN you can pose the statistical question of which design is optimal. It is very difficult to say in general that "you should do such-and-such" since the truth of this statement really depends on how you plan to analyze the data.
Think about the X'*X matrix (where X is your design matrix).
One can (and there are papers on) running pure code simulations to check how efficient your experimental design. There are tools and ideas out there like optseq2. Alternatively, if you are technically savvy, you can relatively easily generate some ground-truth simulations and analyze those data to evaluate efficiency.
All else equal, you obviously want more trials, more runs, trials whose responses are minimally overlapping, etc.
fMRI pulse sequence choices:field strength (lower field machines might be more reliable for data collection; lower fields generally have more compatible peripheral equipment and setup is usually easier and less finicky ﻿ and less setup time)
brain coverage (number of slices needed)
spatial resolution (voxel size)
temporal resolution (TR)
acceleration: partial Fourier and/or parallel and/or multiband acceleration
gradient-echo vs. spin-echo vs. GRASE vs. VASO??
phase-encode direction
fairly minor: TE (use typical ones for your field strength), flip angle (use Ernst angle)
General rule of thumb: Probably you can use a generic "off the shelf" fMRI pulse sequence choice. Don't try to go aggressive, UNLESS you are trying to probe a cutting-edge part of the acquisition space (like sub-mm fMRI).
General rule of thumb for really trying to tailor a fMRI sequence to a particular experiment: make up a scientific question → field strength → GE/SE/VASO → brain coverage → decide spatial resolution that you need → decide continuous vs. clustered vs. sparse (for auditory studies) → decide if you really care about temporal information and then decide how to trade off TR vs. multiband (keep multiband low [1-3] to help SNR unless you really need the fast sampling) [in general, TR should not really go much slower than 2-3 s]
You might be limited by your hardware
Make sure to empirically piIot your prospective sequences (choose a few pulse sequence parameter choice (protocols), and get real actual human data (fMRI while they're just sitting)). Get an expert to visually inspect the data.
7T: distortion, dropout, louder and more uncomfortable (sound, dizziness, claustrophobic, room environment aesthetics) for your subjects
You should have a fully specified scan protocol that lists all the types of scans you will collect and in what order. E.g., Setup/localizer, Fieldmaps (reverse phase encode or dual echo fieldmaps), T1 anatomy?, Resting-state. Factor that into your scan session design.
Think about overall protocol (sequence of scans, runs) and order of events. If a subject might bail, consider putting important scans first. Also, subjects like breaks, so plan for them.
Other miscellaneous issues:The more "cognitive" the experiment, the more likely that there will be a series/sequence of different cognitive processes that you elicit in the subject on a single trial.
If you are trying to distinguish or understand or analyze the individual subcomponents...then that complicates the experimental design choices. You perhaps should view each individual subcomponent of a trial as an independent "condition" and model the data accordingly.
Catch trials: cue the subject to perform some particular task, but don't actually show the stimulus. Doing this helps to decorrelate cue-related activity from stimulus-related activity.
People sometimes like to "repeat the entire run" multiple times (this typically refers to repetition of the experimental stimulus). This makes reliability analyses easy, and averaging across full runs will lower measurement noise. Caveat of course is that "cognition" doesn't necessary repeat in exactly the same way.
What data will you have acquired if the subject demands to get out of the scanner before you are done? Would be nice to have usable/relatively balanced data even with incomplete data acquisition.
Checklist:Have you tested your experiment thoroughly? Test it SEVERAL DAYS BEFORE a real subject shows up.
Will you always get the behavioral data saved, even if a run crashes?
Have you tried out your experiment outside the scanner? Inside the scanner? Have you and your friends sat through the entire experiment? Does it feel right? Is it doable? Is it too hard? Too easy?
Have you fully documented the nature of your experiment paradigm, and how it is structured, and how you will run it? Here is a  sample documentation  that Kendrick made for an experiment.
Have you tested the actual scanner experiment thoroughly? If you get a real subject, is your experiment or scan protocol going to crash on them?
Check and double-check peripheral equipment: button box, display, eyetracking, etc.
Have you checked your stimulus delivery/timing? How accurate can you confirm it is?
Do you know your "protocol" for introducing naive or semi-naive subjects into your experiment and what you will say to them, etc.?
Have you checked your experimental design trial ordering and you know 100% exactly what conditions happen when and where in each run?
Have you had some "physicist" check over your pulse sequence plan?
Have you had some "data analyst" check over your general experimental design and plan for analysis?
Have you tried to get the physicist to talk to the data analyst to make sure there isn't anything lost in translation?
Advanced extra-credit Do actual simulations of fMRI responses that you think can get given the type of protocol you are using and run your entire analysis on that fake synthesized data.  Example ﻿
Parting thoughtsHow really critical are all of these various exp design considerations? Can we find examples of "pitfalls" and data that turned out poorly? Wouldn't it be useful to cobble a collection of examples? Are we sure that we aren't worrying about tiny/minor things?
EEG has an example of incidental eyeblinks confounding the interpretation of effects.
﻿