Scatter plots and low-dimensional regression

Introduction:Why are scatter plots so useful? Visualization point of view: it shows you all the data.
Scatter plots have a close linkage to model building
Scatter plots have a close linkage to nonparametric methods
Scatter plots force you think deeply about "what is error?"
Recall distinction between marginal distribution and joint distribution
(Hint: see recording on ﻿ 🏄‍♀️⁠⁠Special Topics⁠  for figures and demonstrations)
Issues:Fitting a line in a scatter plot
Standard least-squares regression.
two-parameter multiple regression (x and a constant term). y=ax+b.
What is error? => OLS implies squared errors (distances) and that error in the y-variable. The line minimizes the sum of the squares of these residuals.
Goodness-of-fit: R2. Conceptually, this is the size of your residuals (in explaining y) compared to the variance in y.
Model reliability / parameter reliability [e.g. bootstrapping]
Model selection [e.g. cross-validation, statistical significance of higher order terms, AIC/BIC]. One approach is to build nested models...
Alternatives that go beyond vanilla line fitting:
Robust regression (median not mean) [e.g. "median absolute deviation"]
Error in two dimensions (see below)
Fit relationships that are more complex than straight lines
You can go nonparametric (binned scatter plot, local regression)
Mixed effects models
Bayesian parameter estimation
The issue of independence
Are the data points in your scatter plot independent, and if so, what independence do they reflect?? One way to think about it: do the dots reflect  fresh sampling of noise?
Note that a different issue is the independence (or dependence) of the two variables we are plotting. That is ultimately the issue we are usually trying to understand when we make a scatter plot.
Errors in two dimensions
Fundamentally different!
Known as "deming regression"
Might be useful.
One drawback is it is CPU intensive and requires iterative methods (I think).
Moving to nonlinear relationships. What are our strategies?
Binned scatter plot. (Requires some choice of bin size)
Higher-order polynomials or other nonlinear transformations of your predictors
Fourier transform is one choice of parameterizing the x-axis. If you smooth your data (i.e. delete high frequencies), this can be viewed as fitting a nonlinear function to your data (i.e. by using a basis set of sinusoids that exist only at low frequencies).
Local regression (a.k.a. LOWESS = locally weighted scatterplot smoothing)
CPU-intensive, data-driven method to flexibly allow any shape of model.
Basically, you fit a simple model (e.g. linear model) to local windows of your data.
Window size is a major choice. The choice can be viewed in terms of bias and variance.
Pros: minimal assumptions, elegant
Cons: CPU intensive, breaks down (poor performance) in high-dimensional situations.
fieldmap regularization is an example of local regression in 3D
Data sampling and impact on model fitting
﻿