# econometircs2

 Two relationships DeterministicStochastic: y=f(x)+e=> e=disturbance, random error Why is there randomness? 1. Measurement error2. We cannot observe all independent variables The Classical Multiple Linear Regression Model y=x*beta+epsy=E[y|x]+eps CMLRM assumptions (5+1) 1. Linearity: dep.=linear fn. (indep. & disturbance term)2. X (n X k) has full rank = k. i.e. n>=k3. Exogeneity of the regressors = E[epsi|x]=0 <=> Cov[eps,x]=04. Spherical Disturbances: E[eps*eps'|x]=sigma2*In4.1. Var[epsi|x]=sigma2: homoskedasticity4.2. Cov[epsi,epsj|x]=0: nonautocorrelation5. Indep. vari.s are not stochastic (fixed in repeated sample)(6. Normality: eps|x ~ N(0,sigma2) How to estimate b? y=Xb+e 1. Method of Moments2. Maximum Likelihood3. Least Squares Method of Moments population mean = sample mean Maximum Likelihood based on NormalityMaximize the log-likelihood fn.b_ml = b_mm Least Squares min. e'eb_ls=b_ml=b_mm (in the CMLR)=inv(x'x)x'y CLRM: residual maker matrix e=[I-x(x'x)-1x']y CLRM: properties of residual maker matrix M 1.My=e2.Mx=03.Me=e CLRM: projection matrix x*inv(x'x)x'=>x*inv(x'x)x'y = xb = y_hat = y-e CLRM: properties of projection matrix P 1.Py=y_hat2.Px=x3.Pe=0 Simple vs. Multiple regression simple: y=b0+b1x+emultiple: y=b0+b1x1+...+bkxk+e Patitioned Regression y=x1b1+x2b2+eb1=inv(x1'x1)x1'(y-x2b2)b2=inv(x2'x2)x2'(y-x1b1)If x1'x2=0 (independent, orthogonal) then b1=inv(x1'x1)x1'y & b2=inv(x2'x2)x2'y Frisch-Waugh-Lovell Thm In the linear LS regression of y or 2 sets of variables, x1 and x2, subvector b2 is a set of coefficients obtained when residuals from a regression of y on x1 alone (M1y) are regressed on a set of residuals from a regression of each colomn of x2 on x1 (M1x2): b2=inv((M1x2)'(M1x2))(M1x2)'(M1y) Coro1 of Frisch-Waugh-Lovell Thm Slopes in a multiple regression with a constant term are obtained by regressing deviation of y from its mean on deviation of x from their mean CLRM: Goodness of fit SST = SSR + SSE<=> Total sum of squares = regression sum of squares + error sum of squaresAs SSR is higher, the model is better Coefficient of Determination R2=SSR/SST=1-SSE/SST 2 Problems of Coefficient of Determination 1. More regressors => higher R22. w/o constant => R2>1 or <0 possible Fixing more vari = higher coeffi. of determination Adjusted R2=1-[(SSE/n-k)/(SST/n-1)] b_ols (Small Sample Properties) UnbiasedEfficient = BLUE by Gauss-Markov thm. Gauss-Markov thm In the CLRM with regressor matrix X, the LS estimator b is Best Linear Unbiased Estimator or the minimum variance (efficient) linear unbiased estimator of beta.. regardless of whether X is deterministic or stochastic s2_ols Unbiased=> est. Var(b|x)=s2(x'x)-1 b_ols (large sample property) ConsistentAsymptotic efficiency (b_ols=b_ml; by Cramer Rao Lower Bound)Asysmptotic dist (asy. normally dist)=> plim(x'x/n)=Q, then sqr(n)(b-beta) converges in distribution N(0,sigma2Q-1) s2_ols (large sample property) Consistent=> Est.Asy.Var(b)=s2(x'x)-1 OLS_Hypothesis testing: Z & t dist (bk-betak)/sqr(sigma2(x'x)-1kk)~Z(0,1)=> same/sqr(s2(x'x)-1kk)~t(n-k) OLS: t-test interval Pr[-t(a/2)<=statistic<=t(a/2)]=1-a Type I error vs. Type II error Type I error: incorrectly reject true H0Type II error: incorrectly fail to reject (accept) false H0(type I) a: level of significance1-a: confidence coefficient(type II) 1-b: power of the test 2 Potential Problems of OLS 1. Multicollinearity2. Missing observations How to handle Multicollinearity 1. nothing if bi is significant2. Get more data3. Drop one of collinear vari.s4. Group collinear vari.s together How to handel missing obs. 1. yn, xn: no problem2. ynt, xn: filling in for y is not a good idea3. yn, xnt>> zero-order method: replace with x_bar>> modified zero-order method: 2nd col. of x=0 if complete / x=1 if missing>> another way: reg. x on y and x_hat replaced Type I error vs. Type II error Type I error: incorrectly reject true H0Type II error: incorrectly fail to reject false H0 Inference & Test: Rb=q Wald test ~ Chi [J] cf. (n-k)s^2/sigma^2~Chi[n-k] If H0: b_k=beta_k (J=1) F test[1,n-k] = t-test^2[n-k]therefore, r.v.~F[1,n-k], then sqrt(r.v.)~t[n-k] Test unrestriced vs. restricted models F[J,n-k] If H0: all beta_k=0 [R^2/(k-1)]/[(1-R^2)/(n-k)]~F[k-1,n-k] Large sample test (2) 1. Asymptotic t-test: asymptotically, t->std. normal dist (Z(0,1))2. Asymptotic F-test: Asymptotically J*F~chi(J) Test non-linear restrictions Asymptotically, wald~chi(J) Measures of Accuracy of Prediction 1. Root mean squared error2. Mean absolute error3. Theil U-statistic Regarding Accuracy of Prediction: compare y_hat & y_i.. however, what if we don't know y_i? Divide smaple into two groups, and use a group A to predict a group B, and compare the them as y_hat & y_i Binary variables Dummies1. binary case2. several categories3. several groupings4. threshold effects5. interaction terms >> intercept dummies & interaction dummies (e.g. b1*x1+b2*x1*D) Structural Changes (coefficient) compare two groups' parametersstat~F(# of restrictions, d.f.)e.g. s x's are different ~ F(s, n-k-s) Structural changes (variance) W=(b1-b2)'[Var(b1)+Var(b2)]^-1 (b1-b2)~Chi(J) Omit relevant vari. coefficient: Biased, but more efficient Include irrelevant vari. Coefficient: Unbiased, but less efficient Model building 1. simple>>general2. general>>simple (recommended) since omission is worse than including irrelevant variables (<=> Kennedy's book) Model selection criteria (4) 1. adj. R^22. Akaike Info. criterion3. Bayesian (Schwarz) info. criterion4. Prediction criterion Choosing b/w nonnested models 1. encompassing modelH0: y=xb+eH1: y=zr+ey=xb_bar+zr_bar+(x,z)d+eF-test: b(or z)_bar=0 >> reject H0 or H1 2. J-testy=(1-lambda)xb+lambdazr+eregress y on z, get r_hat, and regress y on x & zr_har >> get lambda_hat & test lambda=0 When? Generalized Least Squares 1. Heteroskedasticity2. Autocorrelation>> violate the assumption of spherical disturbances of OLS b_ols in GLS cases: small sample property 1. ubiased2. efficiency is not guaranteed b_ola in GLS cases: Asymtotic property 1. consistent2. asy'ly normally dist.3. aymptotic efficiency (NO!) b_GLS (Sigma known), E(eps eps'|x)=sigma^2*Sigma inv(Sigma)=pp'then x*>>px, y*>>py, eps*>>pepsb_gls=inv(x*'x*)x*'y* Small sample property of b_gls (Sigma known) 1. unbiased2. efficient (the same with OLS case, thus BLUE) sigma^2_gls unbisedconsistent b_gls: Asymptotic properties 1. consistent2. asy'ly nomally dist.3. asy'ly efficient Sigma completely unknow GLS impossible1. do OLS >> unbiaed estimator2. Est. Asy. var(b) >> White's Heteroskedasticity consistent estimator Sigma partially known: Feasible GLS >> procedure 1. Reg. OLS2. Reg ei^2 = az+ui >> get a_hat >> Sigma_hat = Sigma(a_hat)3. b_FLS=inv[x' inv(Sigma_hat x)]x' inv(Sigma_hat) y Sigma partially known: MLE in the log-likelihood fn. inv(Sigma) = Matrix of inv(fn.(a)) Sigma partially known >> FGLS, MLE, or GMM 4 tests for Heteroskedasticity 1. eyeball test2. White's general test~Chi(p-1): all sigma^2 are same2. Goldfeld-Quandt test~F(n1-k,n2-k): two groups' sigma^2 are same4. Brewsch-Pagan (Godfrey LM test): LM stat.~Chi(p) Common reasons of Endogeneity (violate exogeneity=Cov(eps,xi) not 0) measurement errorlagged dep. vari.simultaneityomitted vari. b_ols using in endogeneity case 1. biased2. inconsistent small/large sample properties: b_iv=inv(z'x)z'y (instrumental variables) when L=K 1. biased2. var-cov(estimator) is larger than that of OLS >> based on MSE criterion, OLS can be preferred3. consistent4. Asy'ly normally dist.5. Est. Asy. Var (b_iv) is also consistent properties of b_iv (L>K): regress z on x >> x_hat >> replace x with x_hat 1. biased2. consistent3. Asy'ly normally dist.4. Asy. Var(b_iv)-Asy. Var(b_ols) >0 b_iv: biased & consistent, but less efficientb_ols: biased & inconsistent Hausman test (general) H0: plim(theta_hat-theta_tilde)=0>> stat.=(theta_hat-theta_telde)'inv(V_H/n)(theta_hat-theta_telde)~Chi(# of theta's =parm.s)where V_H=V(theta_hat)+V(theta_tilde)-2Cov(theta_hat, theta_tilde) If theta_hat is efficient under H0, then Cov(.)=V(theta_hat)Then H=(theta_hat-theta_telde)'inv[(V(theta_tilde)-V(theta_hat))/n](theta_hat-theta_telde)~Chi(# of theta's =parm.s) Hausman test (IV case) H0: plim x'eps/n=0H1: not 0 >> only iv is consistent Endogeneity test (2) Hausman testWu test IV in GLS case b_ivbiasedconsistentasy'ly normally dist. & asy. Var.(b_iv): Sigma apprears! Weak instrument problem z is correlated with x weakly Resulats of weak instrument (2) 1. Var(b_iv) goes up2. in large samples, it'd be less consistent than b_ols 3 test of weak instrument 1. R^2 measures2. Godfrey test3. F-statistic measures Alternatives to IV 1. limited info. MLy=xb; and x1=zr+u >> likelihood2. split sample IV(y1,x1,z1) (y2,x2,z2)get r_hat from a group 1, regress z1 on x1 >> predict x2_hat=z2*r_hatreduce biasedness Test z'eps=0 1. L=K.. we cannot test2. L>Ka. Sargan testb. C-test Authorlucia831124 ID79045 Card Seteconometircs2 Descriptioneconometrics2 Updated2011-04-18T09:44:05Z Show Answers