Assumes there is approximately a linear relationship between and :
and are known as coefficients or parameters
Use training data to produce estimates and , then predict future response values by:
Estimating Coefficients
Let
represent observation pairs, each which consists of a measurement of and a measurement of
Measure closeness using least squares (this is one of multiple ways to measure closeness)
Let be the prediction for based on the value of . Then represents the residual - the difference between the observed response value and the response value that is predicted. We define the residual sum of squares (RSS) as
or
Least squares chooses and to minimize RSS. Minimizers are:
where and are the sample means.
Assessing Coefficient Accuracy
Standard error of and :
where , assuming the errors have a common variance and are uncorrelated.
Estimate using residual standard error:
measures the proportion of variability in that can be explained using . An statistic that is close to 1 indicates that a large proportion of the variability in the response is explained by the regression. A number near 0 indicates that the regression does not explain much of the variability in the response; this might occur because the linear model is wrong, or the error variance is high, or both.
For a simple linear regression (*this does not hold for a Multiple Linear Regression), where :