Scale and offset

Scale and offset

Introduction:

  • Scale and offset are fundamental. It comes up in all sorts of contexts: z-scoring, normalization, standardization of variables, etc.
  • Offset: Think constant term. You are changing the mean. The term 'centering' is also common (i.e. subtract the mean, or, in other words, add an offset such that the mean is zero).
  • Scale: Think gain term (often assumed to be > 0). Scale pertains to the dynamic range of some data. You are changing the standard deviation. Scale changes the units!
  • Any and all quantitative analyses are affected by, and potentially change, the scale and offset of some data.
  • Motivation: Quantitative units matter!! So you should care.
  • Motivation: When combining/averaging a bunch of different units (e.g. voxels), the scale of each unit influences the result.
  • Especially in fMRI, there is massive heterogeneity across voxels (or regions). This refers to space. There are also substantial changes in mean (and potentially scale) across time (e.g., the mean of a voxel might change substantially in a new run of data, or a new scan session).


Should you normalize or not?

  • Normalization can get rid of nuisance factors. This is good if you want to get rid of them.
  • Another reason to normalize is to "democratize" voxels (in other words, put them on the same scale).
  • Dangers of normalization include creating dependencies in your data (you have to be careful). Also, a danger is that if you forgot (or don't know) what normalization was done, this can lead to incorrect interpretations. Another danger is that normalization can also corrupt the actual signals in your data.


Basic math facts:

  • z-scoring: subtract mean and then divide by the standard deviation. After z-scoring, we know the mean is 0 and the standard deviation is 1. The units after z-scoring are "standard deviations".
  • There is a close link to Gaussian distributions. The mean and standard deviation are the fundamental parameters that characterize a Gaussian distribution (the first and second moments, respectively).
  • If a set of data are not de-meaned, then scaling the data also changes the mean!! Be careful.
  • Think about where is the zero point? Where does the zero point map to?
  • Scaling by x means that the standard deviation will grow by a factor of x. In contrast, the variance will grow by a factor by x^2.
  • Note that normalization (like z-scoring) causes information to be lost!
  • z-scoring is invertible if you save the parameters (i.e., the constant that you subtracted, the constant that you divided by).


Other connections:

  • After a set of data are de-meaned (i.e. centered), z-scoring is similar to L2-normalization. Notice that after de-meaning, the standard deviation is sqrt(sum( x.^2 )/(n-1)). Notice that the length of a vector is sqrt(sum(x.^2)). Hence, the normalization achieved by z-scoring is very very similar to L2-normalization (if data have been de-meaned).
  • In a scatter plot, it's the position (i.e. translation) and spread of the dots (i.e. scale).
  • Variance explained is invariant to scaling and offset. This is because typical regression models have the flexibility of scale and offset applied to the model output.
  • Note that if you normalize, immediate statistical dependencies are created.
  • Often, in a machine-learning context, you have a bunch of heterogenous predictors (features), and in those scenarios a knee-jerk reaction is just go ahead and z-score each feature. It could make sense; for example, if each feature just has some arbitrary units that you just don't care about, why not just z-score each feature?


Regression:

  • Think: is there a constant term in the model??
  • "Oh, let's standardize each predictor and also include a constant term" - this is a typical approach, and can work well.
  • Scale comes from the freedom of the betas. The betas scale the predictors, so scaling is implicit when you fit the betas.
  • In standard least squares, the scale of the predictors that you use doesn't matter to the goodness of fit of your final model. In other words, you could scale the predictors however you like, and the goodness of fit that you achieve will be the same.
  • z-scoring doesn't change the expressivity of a regression model (assuming you have a constant term included in the model). However, the betas that you estimate will change and you need to interpret them accordingly.
  • However, scale does matters in ridge regression. If you set up your design matrix with predictors that have vastly different scales... you are regularizing those predictors very differently...