
Two relationships
 Deterministic
 Stochastic: y=f(x)+e
 => e=disturbance, random error

Why is there randomness?
 1. Measurement error
 2. We cannot observe all independent variables

The Classical Multiple Linear Regression Model

CMLRM assumptions (5+1)
 1. Linearity: dep.=linear fn. (indep. & disturbance term)
 2. X (n X k) has full rank = k. i.e. n>=k
 3. Exogeneity of the regressors = E[eps_{i}x]=0 <=> Cov[eps,x]=0
 4. Spherical Disturbances: E[eps*eps'x]=sigma^{2}*I_{n}
 4.1. Var[eps_{i}x]=sigma2: homoskedasticity
 4.2. Cov[eps_{i},eps_{j}x]=0: nonautocorrelation
 5. Indep. vari.s are not stochastic (fixed in repeated sample)
 (6. Normality: epsx ~ N(0,sigma^{2})

How to estimate b? y=Xb+e
 1. Method of Moments
 2. Maximum Likelihood
 3. Least Squares

Method of Moments
population mean = sample mean

Maximum Likelihood
 based on Normality
 Maximize the loglikelihood fn.
 b_ml = b_mm

Least Squares
 min. e'e
 b_ls=b_ml=b_mm (in the CMLR)=inv(x'x)x'y

CLRM: residual maker matrix
e=[Ix(x'x)^{1}x']y

CLRM: properties of residual maker matrix M

CLRM: projection matrix
 x*inv(x'x)x'
 =>x*inv(x'x)x'y = xb = y_hat = ye

CLRM: properties of projection matrix P

Simple vs. Multiple regression
 simple: y=b_{0}+b_{1}x+e
 multiple: y=b_{0}+b_{1}x_{1}+...+b_{k}x_{k}+e

Patitioned Regression
 y=x1b1+x2b2+e
 b1=inv(x1'x1)x1'(yx2b2)
 b2=inv(x2'x2)x2'(yx1b1)
 If x1'x2=0 (independent, orthogonal) then b1=inv(x1'x1)x1'y & b2=inv(x2'x2)x2'y

FrischWaughLovell Thm
 In the linear LS regression of y or 2 sets of variables, x1 and x2, subvector b2 is a set of coefficients obtained when residuals from a regression of y on x1 alone (M1y) are regressed on a set of residuals from a regression of each colomn of x2 on x1 (M1x2)
 : b2=inv((M1x2)'(M1x2))(M1x2)'(M1y)

Coro1 of FrischWaughLovell Thm
Slopes in a multiple regression with a constant term are obtained by regressing deviation of y from its mean on deviation of x from their mean

CLRM: Goodness of fit
 SST = SSR + SSE
 <=> Total sum of squares = regression sum of squares + error sum of squares
 As SSR is higher, the model is better

Coefficient of Determination

2 Problems of Coefficient of Determination
 1. More regressors => higher R^{2}
 2. w/o constant => R^{2}>1 or <0 possible

Fixing more vari = higher coeffi. of determination
Adjusted R2=1[(SSE/nk)/(SST/n1)]

b_ols (Small Sample Properties)
 Unbiased
 Efficient = BLUE by GaussMarkov thm.

GaussMarkov thm
In the CLRM with regressor matrix X, the LS estimator b is Best Linear Unbiased Estimator or the minimum variance (efficient) linear unbiased estimator of beta.. regardless of whether X is deterministic or stochastic

s^{2}_ols
 Unbiased
 => est. Var(bx)=s^{2}(x'x)^{1}

b_ols (large sample property)
 Consistent
 Asymptotic efficiency (b_ols=b_ml; by Cramer Rao Lower Bound)
 Asysmptotic dist (asy. normally dist)
 => plim(x'x/n)=Q, then sqr(n)(bbeta) converges in distribution N(0,sigma^{2}Q^{1})

s^{2}_ols (large sample property)
 Consistent
 => Est.Asy.Var(b)=s^{2}(x'x)^{1}

OLS_Hypothesis testing: Z & t dist
 (b_{k}beta_{k})/sqr(sigma^{2}(x'x)^{1}_{kk})~Z(0,1)
 => same/sqr(s^{2}(x'x)^{1}_{kk})~t(nk)

OLS: ttest interval
Pr[t_{(a/2)}<=statistic<=t_{(a/2)}]=1a

Type I error vs. Type II error
 Type I error: incorrectly reject true H0
 Type II error: incorrectly fail to reject (accept) false H0
 (type I) a: level of significance
 1a: confidence coefficient
 (type II) 1b: power of the test

2 Potential Problems of OLS
 1. Multicollinearity
 2. Missing observations

How to handle Multicollinearity
 1. nothing if bi is significant
 2. Get more data
 3. Drop one of collinear vari.s
 4. Group collinear vari.s together

How to handel missing obs.
 1. yn, xn: no problem
 2. ynt, xn: filling in for y is not a good idea
 3. yn, xnt
 >> zeroorder method: replace with x_bar
 >> modified zeroorder method: 2nd col. of x=0 if complete / x=1 if missing
 >> another way: reg. x on y and x_hat replaced

Type I error vs. Type II error
 Type I error: incorrectly reject true H0
 Type II error: incorrectly fail to reject false H0

Inference & Test: Rb=q
Wald test ~ Chi [J]
cf. (nk)s^2/sigma^2~Chi[nk]

If H0: b_k=beta_k (J=1)
 F test[1,nk] = ttest^2[nk]
 therefore, r.v.~F[1,nk], then sqrt(r.v.)~t[nk]

Test unrestriced vs. restricted models
F[J,nk]

If H0: all beta_k=0
[R^2/(k1)]/[(1R^2)/(nk)]~F[k1,nk]

Large sample test (2)
 1. Asymptotic ttest: asymptotically, t>std. normal dist (Z(0,1))
 2. Asymptotic Ftest: Asymptotically J*F~chi(J)

Test nonlinear restrictions
Asymptotically, wald~chi(J)

Measures of Accuracy of Prediction
 1. Root mean squared error
 2. Mean absolute error
 3. Theil Ustatistic

Regarding Accuracy of Prediction: compare y_hat & y_i.. however, what if we don't know y_i?
Divide smaple into two groups, and use a group A to predict a group B, and compare the them as y_hat & y_i

Binary variables
 Dummies
 1. binary case
 2. several categories
 3. several groupings
 4. threshold effects
 5. interaction terms >> intercept dummies & interaction dummies (e.g. b1*x1+b2*x1*D)

Structural Changes (coefficient)
 compare two groups' parameters
 stat~F(# of restrictions, d.f.)
 e.g. s x's are different ~ F(s, nks)

Structural changes (variance)
W=(b1b2)'[Var(b1)+Var(b2)]^1 (b1b2)~Chi(J)

Omit relevant vari.
coefficient: Biased, but more efficient

Include irrelevant vari.
Coefficient: Unbiased, but less efficient

Model building
 1. simple>>general
 2. general>>simple (recommended) since omission is worse than including irrelevant variables (<=> Kennedy's book)

Model selection criteria (4)
 1. adj. R^2
 2. Akaike Info. criterion
 3. Bayesian (Schwarz) info. criterion
 4. Prediction criterion

Choosing b/w nonnested models
 1. encompassing model
 H0: y=xb+e
 H1: y=zr+e
 y=xb_bar+zr_bar+(x,z)d+e
 Ftest: b(or z)_bar=0 >> reject H0 or H1
 2. Jtest
 y=(1lambda)xb+lambdazr+e
 regress y on z, get r_hat, and regress y on x & zr_har >> get lambda_hat & test lambda=0

When? Generalized Least Squares
 1. Heteroskedasticity
 2. Autocorrelation
 >> violate the assumption of spherical disturbances of OLS

b_ols in GLS cases: small sample property
 1. ubiased
 2. efficiency is not guaranteed

b_ola in GLS cases: Asymtotic property
 1. consistent
 2. asy'ly normally dist.
 3. aymptotic efficiency (NO!)

b_GLS (Sigma known), E(eps eps'x)=sigma^2*Sigma
 inv(Sigma)=pp'
 then x*>>px, y*>>py, eps*>>peps
 b_gls=inv(x*'x*)x*'y*

Small sample property of b_gls (Sigma known)
 1. unbiased
 2. efficient (the same with OLS case, thus BLUE)


b_gls: Asymptotic properties
 1. consistent
 2. asy'ly nomally dist.
 3. asy'ly efficient

Sigma completely unknow
 GLS impossible
 1. do OLS >> unbiaed estimator
 2. Est. Asy. var(b) >> White's Heteroskedasticity consistent estimator

Sigma partially known: Feasible GLS >> procedure
 1. Reg. OLS
 2. Reg ei^2 = az+ui >> get a_hat >> Sigma_hat = Sigma(a_hat)
 3. b_FLS=inv[x' inv(Sigma_hat x)]x' inv(Sigma_hat) y

Sigma partially known: MLE
in the loglikelihood fn. inv(Sigma) = Matrix of inv(fn.(a))

Sigma partially known >> FGLS, MLE, or GMM

4 tests for Heteroskedasticity
 1. eyeball test
 2. White's general test~Chi(p1): all sigma^2 are same
 2. GoldfeldQuandt test~F(n1k,n2k): two groups' sigma^2 are same
 4. BrewschPagan (Godfrey LM test): LM stat.~Chi(p)

Common reasons of Endogeneity (violate exogeneity=Cov(eps,xi) not 0)
 measurement error
 lagged dep. vari.
 simultaneity
 omitted vari.

b_ols using in endogeneity case

small/large sample properties: b_iv=inv(z'x)z'y (instrumental variables) when L=K
 1. biased
 2. varcov(estimator) is larger than that of OLS >> based on MSE criterion, OLS can be preferred
 3. consistent
 4. Asy'ly normally dist.
 5. Est. Asy. Var (b_iv) is also consistent

properties of b_iv (L>K): regress z on x >> x_hat >> replace x with x_hat
 1. biased
 2. consistent
 3. Asy'ly normally dist.
 4. Asy. Var(b_iv)Asy. Var(b_ols) >0
 b_iv: biased & consistent, but less efficient
 b_ols: biased & inconsistent

Hausman test (general)
 H0: plim(theta_hattheta_tilde)=0
 >> stat.=(theta_hattheta_telde)'inv(V_H/n)(theta_hattheta_telde)~Chi(# of theta's =parm.s)
 where V_H=V(theta_hat)+V(theta_tilde)2Cov(theta_hat, theta_tilde)
 If theta_hat is efficient under H0, then Cov(.)=V(theta_hat)
 Then H=(theta_hattheta_telde)'inv[(V(theta_tilde)V(theta_hat))/n](theta_hattheta_telde)~Chi(# of theta's =parm.s)

Hausman test (IV case)
 H0: plim x'eps/n=0
 H1: not 0 >> only iv is consistent


IV in GLS case
 b_iv
 biased
 consistent
 asy'ly normally dist. & asy. Var.(b_iv): Sigma apprears!

Weak instrument problem
z is correlated with x weakly

Resulats of weak instrument (2)
 1. Var(b_iv) goes up
 2. in large samples, it'd be less consistent than b_ols

3 test of weak instrument
 1. R^2 measures
 2. Godfrey test
 3. Fstatistic measures

Alternatives to IV
 1. limited info. ML
 y=xb; and x1=zr+u >> likelihood
 2. split sample IV
 (y1,x1,z1) (y2,x2,z2)
 get r_hat from a group 1, regress z1 on x1 >> predict x2_hat=z2*r_hat
 reduce biasedness

Test z'eps=0
 1. L=K.. we cannot test
 2. L>K
 a. Sargan test
 b. Ctest

