-
Two relationships
- Deterministic
- Stochastic: y=f(x)+e
- => e=disturbance, random error
-
Why is there randomness?
- 1. Measurement error
- 2. We cannot observe all independent variables
-
The Classical Multiple Linear Regression Model
-
CMLRM assumptions (5+1)
- 1. Linearity: dep.=linear fn. (indep. & disturbance term)
- 2. X (n X k) has full rank = k. i.e. n>=k
- 3. Exogeneity of the regressors = E[epsi|x]=0 <=> Cov[eps,x]=0
- 4. Spherical Disturbances: E[eps*eps'|x]=sigma2*In
- 4.1. Var[epsi|x]=sigma2: homoskedasticity
- 4.2. Cov[epsi,epsj|x]=0: nonautocorrelation
- 5. Indep. vari.s are not stochastic (fixed in repeated sample)
- (6. Normality: eps|x ~ N(0,sigma2)
-
How to estimate b? y=Xb+e
- 1. Method of Moments
- 2. Maximum Likelihood
- 3. Least Squares
-
Method of Moments
population mean = sample mean
-
Maximum Likelihood
- based on Normality
- Maximize the log-likelihood fn.
- b_ml = b_mm
-
Least Squares
- min. e'e
- b_ls=b_ml=b_mm (in the CMLR)=inv(x'x)x'y
-
CLRM: residual maker matrix
e=[I-x(x'x)-1x']y
-
CLRM: properties of residual maker matrix M
-
CLRM: projection matrix
- x*inv(x'x)x'
- =>x*inv(x'x)x'y = xb = y_hat = y-e
-
CLRM: properties of projection matrix P
-
Simple vs. Multiple regression
- simple: y=b0+b1x+e
- multiple: y=b0+b1x1+...+bkxk+e
-
Patitioned Regression
- y=x1b1+x2b2+e
- b1=inv(x1'x1)x1'(y-x2b2)
- b2=inv(x2'x2)x2'(y-x1b1)
- If x1'x2=0 (independent, orthogonal) then b1=inv(x1'x1)x1'y & b2=inv(x2'x2)x2'y
-
Frisch-Waugh-Lovell Thm
- In the linear LS regression of y or 2 sets of variables, x1 and x2, subvector b2 is a set of coefficients obtained when residuals from a regression of y on x1 alone (M1y) are regressed on a set of residuals from a regression of each colomn of x2 on x1 (M1x2)
- : b2=inv((M1x2)'(M1x2))(M1x2)'(M1y)
-
Coro1 of Frisch-Waugh-Lovell Thm
Slopes in a multiple regression with a constant term are obtained by regressing deviation of y from its mean on deviation of x from their mean
-
CLRM: Goodness of fit
- SST = SSR + SSE
- <=> Total sum of squares = regression sum of squares + error sum of squares
- As SSR is higher, the model is better
-
Coefficient of Determination
-
2 Problems of Coefficient of Determination
- 1. More regressors => higher R2
- 2. w/o constant => R2>1 or <0 possible
-
Fixing more vari = higher coeffi. of determination
Adjusted R2=1-[(SSE/n-k)/(SST/n-1)]
-
b_ols (Small Sample Properties)
- Unbiased
- Efficient = BLUE by Gauss-Markov thm.
-
Gauss-Markov thm
In the CLRM with regressor matrix X, the LS estimator b is Best Linear Unbiased Estimator or the minimum variance (efficient) linear unbiased estimator of beta.. regardless of whether X is deterministic or stochastic
-
s2_ols
- Unbiased
- => est. Var(b|x)=s2(x'x)-1
-
b_ols (large sample property)
- Consistent
- Asymptotic efficiency (b_ols=b_ml; by Cramer Rao Lower Bound)
- Asysmptotic dist (asy. normally dist)
- => plim(x'x/n)=Q, then sqr(n)(b-beta) converges in distribution N(0,sigma2Q-1)
-
s2_ols (large sample property)
- Consistent
- => Est.Asy.Var(b)=s2(x'x)-1
-
OLS_Hypothesis testing: Z & t dist
- (bk-betak)/sqr(sigma2(x'x)-1kk)~Z(0,1)
- => same/sqr(s2(x'x)-1kk)~t(n-k)
-
OLS: t-test interval
Pr[-t(a/2)<=statistic<=t(a/2)]=1-a
-
Type I error vs. Type II error
- Type I error: incorrectly reject true H0
- Type II error: incorrectly fail to reject (accept) false H0
- (type I) a: level of significance
- 1-a: confidence coefficient
- (type II) 1-b: power of the test
-
2 Potential Problems of OLS
- 1. Multicollinearity
- 2. Missing observations
-
How to handle Multicollinearity
- 1. nothing if bi is significant
- 2. Get more data
- 3. Drop one of collinear vari.s
- 4. Group collinear vari.s together
-
How to handel missing obs.
- 1. yn, xn: no problem
- 2. ynt, xn: filling in for y is not a good idea
- 3. yn, xnt
- >> zero-order method: replace with x_bar
- >> modified zero-order method: 2nd col. of x=0 if complete / x=1 if missing
- >> another way: reg. x on y and x_hat replaced
-
Type I error vs. Type II error
- Type I error: incorrectly reject true H0
- Type II error: incorrectly fail to reject false H0
-
Inference & Test: Rb=q
Wald test ~ Chi [J]
cf. (n-k)s^2/sigma^2~Chi[n-k]
-
If H0: b_k=beta_k (J=1)
- F test[1,n-k] = t-test^2[n-k]
- therefore, r.v.~F[1,n-k], then sqrt(r.v.)~t[n-k]
-
Test unrestriced vs. restricted models
F[J,n-k]
-
If H0: all beta_k=0
[R^2/(k-1)]/[(1-R^2)/(n-k)]~F[k-1,n-k]
-
Large sample test (2)
- 1. Asymptotic t-test: asymptotically, t->std. normal dist (Z(0,1))
- 2. Asymptotic F-test: Asymptotically J*F~chi(J)
-
Test non-linear restrictions
Asymptotically, wald~chi(J)
-
Measures of Accuracy of Prediction
- 1. Root mean squared error
- 2. Mean absolute error
- 3. Theil U-statistic
-
Regarding Accuracy of Prediction: compare y_hat & y_i.. however, what if we don't know y_i?
Divide smaple into two groups, and use a group A to predict a group B, and compare the them as y_hat & y_i
-
Binary variables
- Dummies
- 1. binary case
- 2. several categories
- 3. several groupings
- 4. threshold effects
- 5. interaction terms >> intercept dummies & interaction dummies (e.g. b1*x1+b2*x1*D)
-
Structural Changes (coefficient)
- compare two groups' parameters
- stat~F(# of restrictions, d.f.)
- e.g. s x's are different ~ F(s, n-k-s)
-
Structural changes (variance)
W=(b1-b2)'[Var(b1)+Var(b2)]^-1 (b1-b2)~Chi(J)
-
Omit relevant vari.
coefficient: Biased, but more efficient
-
Include irrelevant vari.
Coefficient: Unbiased, but less efficient
-
Model building
- 1. simple>>general
- 2. general>>simple (recommended) since omission is worse than including irrelevant variables (<=> Kennedy's book)
-
Model selection criteria (4)
- 1. adj. R^2
- 2. Akaike Info. criterion
- 3. Bayesian (Schwarz) info. criterion
- 4. Prediction criterion
-
Choosing b/w nonnested models
- 1. encompassing model
- H0: y=xb+e
- H1: y=zr+e
- y=xb_bar+zr_bar+(x,z)d+e
- F-test: b(or z)_bar=0 >> reject H0 or H1
- 2. J-test
- y=(1-lambda)xb+lambdazr+e
- regress y on z, get r_hat, and regress y on x & zr_har >> get lambda_hat & test lambda=0
-
When? Generalized Least Squares
- 1. Heteroskedasticity
- 2. Autocorrelation
- >> violate the assumption of spherical disturbances of OLS
-
b_ols in GLS cases: small sample property
- 1. ubiased
- 2. efficiency is not guaranteed
-
b_ola in GLS cases: Asymtotic property
- 1. consistent
- 2. asy'ly normally dist.
- 3. aymptotic efficiency (NO!)
-
b_GLS (Sigma known), E(eps eps'|x)=sigma^2*Sigma
- inv(Sigma)=pp'
- then x*>>px, y*>>py, eps*>>peps
- b_gls=inv(x*'x*)x*'y*
-
Small sample property of b_gls (Sigma known)
- 1. unbiased
- 2. efficient (the same with OLS case, thus BLUE)
-
-
b_gls: Asymptotic properties
- 1. consistent
- 2. asy'ly nomally dist.
- 3. asy'ly efficient
-
Sigma completely unknow
- GLS impossible
- 1. do OLS >> unbiaed estimator
- 2. Est. Asy. var(b) >> White's Heteroskedasticity consistent estimator
-
Sigma partially known: Feasible GLS >> procedure
- 1. Reg. OLS
- 2. Reg ei^2 = az+ui >> get a_hat >> Sigma_hat = Sigma(a_hat)
- 3. b_FLS=inv[x' inv(Sigma_hat x)]x' inv(Sigma_hat) y
-
Sigma partially known: MLE
in the log-likelihood fn. inv(Sigma) = Matrix of inv(fn.(a))
-
Sigma partially known >> FGLS, MLE, or GMM
-
4 tests for Heteroskedasticity
- 1. eyeball test
- 2. White's general test~Chi(p-1): all sigma^2 are same
- 2. Goldfeld-Quandt test~F(n1-k,n2-k): two groups' sigma^2 are same
- 4. Brewsch-Pagan (Godfrey LM test): LM stat.~Chi(p)
-
Common reasons of Endogeneity (violate exogeneity=Cov(eps,xi) not 0)
- measurement error
- lagged dep. vari.
- simultaneity
- omitted vari.
-
b_ols using in endogeneity case
-
small/large sample properties: b_iv=inv(z'x)z'y (instrumental variables) when L=K
- 1. biased
- 2. var-cov(estimator) is larger than that of OLS >> based on MSE criterion, OLS can be preferred
- 3. consistent4. Asy'ly normally dist.
- 5. Est. Asy. Var (b_iv) is also consistent
-
properties of b_iv (L>K): regress z on x >> x_hat >> replace x with x_hat
- 1. biased
- 2. consistent
- 3. Asy'ly normally dist.
- 4. Asy. Var(b_iv)-Asy. Var(b_ols) >0
- b_iv: biased & consistent, but less efficient
- b_ols: biased & inconsistent
-
Hausman test (general)
- H0: plim(theta_hat-theta_tilde)=0
- >> stat.=(theta_hat-theta_telde)'inv(V_H/n)(theta_hat-theta_telde)~Chi(# of theta's =parm.s)
- where V_H=V(theta_hat)+V(theta_tilde)-2Cov(theta_hat, theta_tilde)
- If theta_hat is efficient under H0, then Cov(.)=V(theta_hat)
- Then H=(theta_hat-theta_telde)'inv[(V(theta_tilde)-V(theta_hat))/n](theta_hat-theta_telde)~Chi(# of theta's =parm.s)
-
Hausman test (IV case)
- H0: plim x'eps/n=0
- H1: not 0 >> only iv is consistent
-
-
IV in GLS case
- b_iv
- biased
- consistent
- asy'ly normally dist. & asy. Var.(b_iv): Sigma apprears!
-
Weak instrument problem
z is correlated with x weakly
-
Resulats of weak instrument (2)
- 1. Var(b_iv) goes up
- 2. in large samples, it'd be less consistent than b_ols
-
3 test of weak instrument
- 1. R^2 measures
- 2. Godfrey test
- 3. F-statistic measures
-
Alternatives to IV
- 1. limited info. ML
- y=xb; and x1=zr+u >> likelihood
- 2. split sample IV
- (y1,x1,z1) (y2,x2,z2)
- get r_hat from a group 1, regress z1 on x1 >> predict x2_hat=z2*r_hat
- reduce biasedness
-
Test z'eps=0
- 1. L=K.. we cannot test
- 2. L>K
- a. Sargan test
- b. C-test
|
|