-
Regression line
- a straight line that describes how a response variable y changes as an explanatory variable x changes. One variable explains or predicts the other.
- May be used to predict the value of y for a given value of x.
-
Least-squares regression line:
- the unique line such that the sum of the squared vertical
- (y) distances between the data points and the line is the smallest possible.
-
Facts about least-squares regression:
- 1. The distinction between explanatory and response variables is essential in regression.
- 2. There is a close connection between correlation and the slope of the least-squares line.
- 3. The least-squares regression line always passes through the point ( x , y )
- 4. The correlation r describes the strength of a straight-line relationship. The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
-
Equation of least-squares regression line:
-
Coefficient of determination, r2
r2: the fraction of the variance in y (vertical scatter from the regression line) that can be explained by changes in x.
-
Residuals
dist. ( y - yˆ) = residual
-
Residual plots
- Residuals are the distances between y-observed and y-predicted. We plot them in a residual plot.
- If residuals are scattered randomly around 0, chances are your data fit a linear model, were normally distributed, and you didn’t have outliers.
- The x-axis in a residual plot is the same as on the scatterplot.
- The line on both plots is the regression line.
-
Outlier:
An observation that lies outside the overall pattern of observations.
-
Influential individual
- An observation that markedly changes the regression if removed.
- This is often an outlier on the x-axis.
-
Interpolation
- Making predictions
- The equation of the least-squares regression allows you to predict y for any x within the
- range studied. This is called interpolating.
-
lurking variable
- is a variable not included in the study design that does have an effect
- on the variables studied.
- It can falsely suggest a relationship.
-
Confounded variables
- Two variables are confounded when their effects on a response variable cannot be
- distinguished from each other. The confounded variables may be either explanatory
- variables or lurking variables.
-
Extrapolation
is the use of a regression line for predictions outside the range of x values used to obtain the line.
|
|