
What is Econometrics?
Unification of necessary 3 views, Statistics, Economics Theory, and Mathematics.
A field of economics that concerns itself with the application of mathematical statistics and the tools of statistics inference to the empirical measurement of relationships postulated by economics theory



Vector Space
 Closed under scalar multiplication
 Closed under addition

Basis vectors
A linearly independent set of vectors that span a vector space

Linearly independent vectors
The only solution for Ax=0 is A=0

Singular vs. Nonsingular matrices
Det(A)=0 <=> Singular

Properties of Determinant
 1. one of row(colum)=0 => det=0
 2. det(A')=det(A)
 3. interchanging two rows(columns) => change the sign of det
 4. If 2 rows (columns) are identical => det=0
 5. If one row (column) is a multiple of another => det=0
 6. Linearly independent of rows (colums) <=> det <>0
 7. det(A*B)=det(A)*det(B)

Row Rank & Column Rank
the maximum number of linearly independent rows (columns)

Properties of Rank
 1. rank(A*B)<=min(rank(A),rank(B))
 2. rank(A)=rank(A'*A)=rank(A*A')
 3. If A is full rank, then Ax<>0 for nonzero x

Inverse matrix
AA^(1)=A^(1)A=I

Properties of inverse matrix
 1. det(inv(A))=1/det(A)
 2. inv(inv(A))=A
 3. inv(A)'=inv(A')
 4. A is symmetric => inv(A) is symmetric
 5. inv(ABC)=inv(C)inv(B)inv(A)

Characteristic roots & vectors
=Eigen values & vectors
 (AlambdaI)*c=0
 >> lambda=eigen values
 >> c=eigen vectors

Properties of characteristic roots
1. Zero characteristic roots possible
 2. Rank of symmetric matrix=# of nonzero characteristic roots
 => rank of any matrix = # of nonzero eigenvalues of A'A (symmetric)
3. det=product of its characteristic roots

Trace of a square matrix
Sum(a_{ii}) for all i=1,...,n

Properties of trace
 1. tr(A)=tr(A')
 2. tr(AB)=tr(BA)
 3. tr(ABC)=tr(BCA)=tr(CAB)
 4. A scalar = its trace

Quadratic form & definiteness
q=x'Ax for any nonzero x:
1. q>0 <=> positive definite <=> eigenvalues all +
2. q>=0 <=> positive semidefinite <=> some eigenvalues +, some 0
3. q<0 <=> negative definite <=> eigenvalues all 
4. q<=0 <=> negative semidefinite <=> some eigenvalues , some 0
5. q<>0 <=> indefinite, some eigenvalues , some +

Properties of a symmetric matrix A of a quadratic form
1. If A is positive (semi)definite, then det(A)>=0
2. If A is positive definite, then det(inv(A)) is also <=> characteristic roots of inv(A) are reciprocals of these of A
3. If (nXK) matrix has full (column) rank, then A'A is positive definite => xA'Ax >0

Compare size of matrices
 Q. definite of (AB)
 => for all nonzero x,
 x'(AB)x >0 or <0 ?
 positive definite or negative definite?

Randome variable
Continuous vs. Discrete

PDF vs. CDF
 PDF: f(X=x) continuous f(X=x)=0,
 CDF: F(c)=sum(integral)_{ x<=c} f(x)

Moments
 1. rth moment about the origin: E[X^{r}]
 2. rth moment about the mean of X: E[(XE(X))^{r}]

E(X)?
 sum f(x)*x
 integral f(x)*x dx

Properties of E(X)
 1. E(b)=b, b is a scalar
 2. Y=aX+b => E(Y)=aE(X)+b
 3. if X and Y are independent, then E(XY)=E(X)*E(Y)

2nd moment about the mean = variance
 a measure of dispersion
 sum f(x)*(xE(x))^{2}

E[(xE(x))^{2}]?
E[X^2](E[X]^2)

3rd moment about the mean
 skewness
 if it>0 => positive skew (왼쪽에 봉우리)
 if it<0 => negative skew (오른쪽에 봉우리)

4th moment about the mean
 kurtosis
 low kurtosis: fat tails
 high kurtosis: thin tails

Moment Generating Function (MGF)
E(exp(xt))=M(t)
=> M^{(n)}(t)=E(X^{n})

Normal Dist (mu,sigma^{2})
f(x)=memorize?!

Standard normal dist
 Z=(Xmu)/sigma
 when X~N(mu,sigma^{2})

Chi square dist(d)
 d=degrees of freedom
 Chi(d)=sum d of z^{2}

t distribution (d)
 t=z/sqr(chi(d)/d)
 t>z as n>inf

F distribution (n1,n2)
 [Chi(n1)/n1]/[Chi(n2)/n2]
 e.g. F[n1,nk]=[R^{2}/(n1)]/[(1R^{2})/(nk)] when H0=all coefficients of CLRM are 0's

Joint Distribution
f(x,y)

Marginal probability
 f_{x}(x)=sum_{y}f(x,y)
 f_{y}(y)=sum_{x}f(x,y)

Independence of joint distribution
 1. f(x,y)=f_{x}(x)*f_{y}(y)
 2. for any functions g1(x) and g2(y),
 E[g1(x)g2(y)]=E[g1(x)]*E[g2(y)]

Covariance
 E[(xE(x))*(yE(y))]
 = E[xy]E[x]*E[y]

What if X and Y are independent?
Cov=0

Correlatoin
Cov(x,y)/(st.dev(x) st.dev(y))

Q. Correlation=0 => independent?
No

VarCov matrix
 diagonal = var(x_{i})
 offdiagonal=Cov(x_{i},x_{j})

Conditional Distribution
f(yx)=f(x,y)/f_{x}(x)

Distributions of functions of r.v.s
 a. change of variables
 b. using MGF
 a. Assume that we know f(x) & y=g(x)
 1. x=g^{1}(y)
 2. dx/dy
 3. domain of y
 4. f(g^{1}(y))abs(dx/dy)
 or f(g^{1}(y))det(dx/dy)
b. using MGF e.g. E[exp(axt)]

Statistics
A function of r.v.s that does not dependent on unknown parameters
e.g. sample mean, median...

Random sample <=> iid (independently identically distributed)
A sample of n observations on one or more variables, x1, ..., xn, drawn independently from the same probability distribution f(x1,...,xntheta)

Estimators vs. Estimates
Estimators (statistics) = A formula for using data to estimate a parameters
Estimates = the value you get by plugging data into estimators

Method of moments
 sample moments=popoulation moments
 e.g. sum(x_{i})/n = E[x]

Maximum likelihood estimation
: likelihood function & loglikelihood fn.
 cf. dist is known
 maximize L(thetax1,...,xn) or lnL(.)

MLE procedures
 1. Find L by multiplying f(x_{i})'s
 2. Take the log (not necessarily)
 3. Find the theta's to maximize lnL(.)
 4. Use FOC=0
 5. Check SOC: negative definite

Ways to evaluate estimators
 1. MonteCarlo Analysis
 2. Predata anlysis (small/large sample properties)

Small Sample Properties
 1. Unbiasedness
 2. Variance (Precision)
 3. Mean Square Error
 4. Efficiency

Unbiased
E(theta_hat)=theta
Bias=E(theta_hat)theta

Variance
We prefer an estimator with smaller variance

MSE (Mean Squared Error)
 theta_hat=t:
 MSE(t)=Var(t)+[Bias(t)]^{2}
 =E[(tE(t))^{2}]+[E(t)theta]^{2}
 =E[(ttheta)^{2}]

Efficiency
 Unbiased &
 the smallest variance
 => CramerRao lower bound
 if the estimator is unbiased, the variance >=CRLB=[E[SOC of lnL(.)]]^{1}
cf. sufficient condition, not necessary condition

Large sample property
=asymptotic property as the sample size > inf
 1. consistent
 2. asymptotically efficient

Consistency
plim theta_hat=theta

Asymptotically efficient
consistent & the smallest asymptotic variance

Convergence in Probability
 . . p
 x_{n}>c
 lim_{n>inf} Pr(x_{n}c>eps)=0
 <=> lim_{n>inf}Pr(x_{n}c<eps)=1 for any eps>0

Mean Square Convergence
 . . _{ms}
 x_{n}>c
 mu_{n} converges to c & sigma^{2}_{n} converges to 0 as n>inf

Mean Sq. Convergence => Convergence in Probability (not true conversely)
 Because of Chebyshev's inequality
 : Pr(xmu>eps)<=(sigma^{2}/eps^{2})
 e.g. x_bar (sample mean)
 E(sample mean)=mu
 Var(sample mean)=sigma^{2}/n
 as n>inf, E(.)>mu & Var(.)>0, thus it is consistent

Khinchine's Weak Law of Large numbers
If x1,...xn is a random iid sample from a distribution with a finite mean E(x_{n})=mu, then plim(sample mean)=mu

Convergence in Distribution
 F(x): limiting distribution
 if lim_{n>inf}F_{n}(x_{n})F(x)=0 at all continuity points of F(x)
 . . _{d}
 x_{n}>x

Convergence in dist.
Q. x_{n} converges to constant?
No. different form the convergence in probability
Convergence in dist. related to CLT

LindbergLevy univariate central limit theorem (Asymptotic normality)
 Sums of r.v.s (like, sample mean, weighted sum) are normally distributed in large samples, no matter the distribution of the original population
 Formal def: let x1,...,xn be a random sample from a probabilistic distribution with finite mean mu and finite variance sigma2. Then, sqrt(n)(sample mean of x_{n}  mu) converges to the distribution N(0,sigma2)

Repeated sampling
Get samples from the identical population distribution

Difference b/w joint dist & likelihood fn.
 Joint dist = L(x1,..,xntheta)
 Likelihood = L(thetax1,...,xn)

Classical estimators vs. Bayesian approach
estimation is not one of deducing the values of parameter, but rather one of continually updating and sharpening our subjective beliefs about the state of the world

