Basic statistics

Basic statistics:
mean
variance
statistic - some summary property of a set of data
standard deviation. Note that the reason for n-1 is to have an unbiased estimator for population variance (and almost unbiased for the population standard deviation)
sample vs. population
standard error - the standard deviation of the sampling distribution of the mean
confidence interval - the N% confidence interval is an interval such that N% of the time across repeated experiments we would expect to find the true population parameter in the interval
estimator - procedure that estimates the population parameter
expectation - mean across an infinite number of samples/experiments
bias - the discrepancy between the expected value of the estimate vs. the population parameter
median - 50th percentile. Associated with non-parametric statistics and robustness (as a major advantage).
mode - value corresponding to the peak of a probability distribution
robust - referring to working well across a variety of different situations
probability distribution
histogram
parametric - assuming or conforming to some model. In case of statistics, typically this refers to assumptions on the shapes of probability distribution
non-parametric - tending to not make assumptions about probability distributions
percentiles - The 99th percentile of a set of values is the number at which 99% of the values are smaller (or equal?) than that number. Non-parametric concept. Note that with finite data, there are funny interpolation games one must play in order to get sensible values
quartile - [0 25 50 75 100] percentiles
range - difference between max and min values
spread - referring to the fact that numbers are different
iqr (and semiiqr) - interquartile range is p(75)-p(25). semiinterquartile range is one half of that.
centering (a.k.a. mean-centering) - subtract the mean of a set of values
z-scoring - centering and dividing by the standard deviation. This transforms the numbers into z-score units
rank - one approach is to convert your nice continuous value data into ranks (e.g. 1 2 3 4 5 ...). This can be useful when pursuing non-parametric methods.
correlation - Pearson's correlation is typically what we mean by correlation
for z-scored data (x) and z-scored data (y), correlation is equal to the average product between corresponding data points in x and y (similar to a dot product)
it's equivalent-ish to fitting a line on x to predict y. correlation is measure of how well a linear function of x predicts y (and vice versa).
correlation is symmetric (order does not matter)
Spearman's correlation - Pearson's correlation on rank-transformed data. Good for being "robust" to simple monotonic nonlinearities
supervised learning - Can be viewed as mapping X to y, or predicting y from X. You have a bunch of data examples where X and y are known, and you trying to learn a good (predictive) mapping.
unsupervised learning - learn structure in a set of data. Basically you only have X and you want to learn something about X. Examples include PCA and clustering.
multivariate statistics
observation matrix - 2D matrix where you have observations x features. Each row is a "subject" (or "trials" or repeated experiments), each column is some measured property.
feature - sometimes referred to generically as "dimensions". For example, 2 features => 2 dimensional space. You can think of your observation matrix as just a bunch of points in high-dimensional feature space.
distance - there are different metrics of quantifying distances in high-dimensional feature space (e.g., Euclidean distance, Metropolis distance, cosine distance, correlation distance (1-r), Mahalanobis distance, etc.)
distance matrix - a square matrix (observations x observations) that quantifies the distance between all pairs of observations.
similarity vs. dissimilarity - one is just the inverse of the other, and in different situations, people use one or the other.
statistical significance
t-test
two-sample t-test
paired t-test
null hypothesis
ANOVA - one-way ANOVA is just the t-test extended to more than two groups
alpha - typically, 0.05
effect size - refers to the population (the system you are trying to measure and characterize)... it's the size of the effect. 30% increase in BOLD response from condition A (1% BOLD increment above baseline) to condition B (1.3% BOLD increment above baseline). Alternatively, you could quantify the effect size as 0.3% BOLD. Importantly, the effect size is independent of how you choose to sample the population, etc.
Type I error - rejecting the null when in fact the null hypothesis is correct
Type II error - not rejecting the null when in fact the null hypothesis is incorrect
power - in a given scenario (e.g. for a certain sample size, for a certain effect size), the probability of rejecting the null hypothesis (when it is in fact incorrect)
null hypothesis statistical testing (NHST) - Under the null hypothesis, how likely is our current observation? The answer is the p-value (i.e. the statistical significance level). If the p-value is lower than some specified (arbitrary) alpha level, we REJECT the null hypothesis. If the p-value is relatively high (e.g. 0.3), then we just fail to reject the null hypothesis. Technically, we are not really proffering evidence of the truth of the null hypothesis.
permutation approaches - non-parametric methods for calculating/determining statistical significance.
regression - one or more predictor variables ("regressors") trying to "predict" an outcome variable ("data"), where the outcome variable is treated as continuous
model fitting / estimation
residual sum of squares (RSS) - typically, regression is aimed at tweaking the parameters of a model such that RSS is minimal
likelihood - probability of a data point (or multiple data points) given some probabilistic model
estimation - a process for determining the value of some unknown parameter based on a set of data
maximum likelihood estimation (MLE) - one flavor of method for estimation, where you choose the value that maximizes the likelihood of the data. Note that MLE for Gaussian noise assumptions is equivalent to finding parameters that minimize squared error.
maximum a posteriori (MAP) estimation - if you impose a prior (a la Bayesian statistics), this is a different flavor of estimation method where your MAP estimate is the peak of the posterior distribution.
point estimation - just one value as your guess of the parameter
interval estimation - providing a range (like a confidence interval) for your guess of the parameter
classification - the outcome variable is not continuous (discrete) and instead consists of "classes" or "categories". Note that 'logistic regression' is actually a method for classification.
methods to manipulate your data (see ﻿ 🔧⁠⁠Resampling techniques⁠  for more information)
permutation - usually associated with NHST
bootstrapping - a cool, conceptually easy, computationally demanding non-parametric method for assessing the reliability of your data measures 
cross-validation - testing a model on out-of-sample data (i.e. data to which your model was blind (it had no access)). Specifically, cross-validation says take the data you have and split it into two groups, train and test. Good for obtaining unbiased assessments of your model accuracy and, therefore, for comparing models.
﻿