Correlation and Regression
Definitions
Bivariate data
If one of the variables has been controlled in some way or is used to explain the other, it is called the independent or explanatory variable. The other variable is called the dependent or response variable.
Plot with a scatter diagram, one variable on the
Correlation
Correlations are mathematical relationships between variables. It does not mean that one variable causes the other.
Linear Correlations
A linear correlation is one that follows a straight line.
- Positive linear correlation is when low
values correspond to low values, and high values correspond to high values. - Negative linear correlation is when low
values correspond to high values, and high values correspond to low values. - If the values of
and form a random pattern, then there’s no correlation.
Line of Best Fit
The line that best fits the data points is called the line of best fit.
Linear regression is a mathematical way of finding the line of best fit,
The sum of squared errors, or SSE, is given by
The slope of the line
The value of
The correlation coefficient,
You find
where
and
Least square regression alternate notation
covariance is
the coefficient of determination
The coefficient of determination is given by
non-linear relationships
If your line of best fit isn’t linear, you can sometimes transform it to a linear form.
You can then perform linear regression on the transformation to find the values of a and b. The big trick is to try and transform your non-linear equation of the line so that it takes the form
where
Once you’ve transformed your y values, you can use least squares regression to find the values of
The confidence interval for the slope of a regression line
The confidence interval for b takes the form
margin of error
standard deviation of
confidence interval
Sources: 1