-
Statistics:
- The
- science that deals with the analysis and classification of empirical data. It also attempts to draw conclusions based on
- past or present experience.
-
Population
- The
- totality of all data studied.
-
Sample
- A
- subset of data drawn from a population.
-
Parameter
- A numerical measurement referring to a
- population.
-
Statistic:
A numerical measurement referring to a sample.
-
Census:
The collection of data from every member of the population.
-
Variable
- It
- assumes different values. We use letters to indicate variables.
-
Constant
- It
- has a fixed value. It is the opposite of variable.
-
Random Variable
- A
- variable that assumes values depending on chance.
-
Discrete
Variable
- It assumes a finite number of values, or if it
- assumes infinitely many values, the values can be counted using the counting
- numbers 1, 2, 3 … etc.
-
Continuous
Variable
- It assumes infinitely many values that cannot be
- counted. There are no gaps between the values.
-
Scales
or levels of measurement: Nominal
- Non-numerical data such as names, labels, categories,
- etc. They cannot be ordered.
-
Scales
or levels of measurement:
Interval
- Like ordinal but differences make sense. There is no natural starting point, i.e.
- there is no zero. Ratios are
- meaningless. For example body
- temperatures.
-
Scales
or levels of measurement:
Ordinal
- They can be ordered, but differences either they
- cannot be determined or they are meaningless, i.e. rating movies using stars.
-
Scales
or levels of measurement: Ratio
- This
- is the highest level of measurement for numerical data. There is a zero
- starting point and differences and ratios are meaningful.
-
Types
of data:
Categorical or
Qualitative:
- Non-numerical
- data, i.e. color, party or religious affiliation, etc.
- Categorical
- data use either the nominal or ordinal scale of measurement.
-
Types
of data:
Quantitative or Numerical:
- Numerical
- data i.e. test scores, incomes figures, etc.
- Quantitative
- data use either the interval or ratio scale of measurement.
-
Types
of data:
Categorical
variable
- : A variable with
- categorical data.
- Statistical analysis for categorical variables is
- limited to summarizing the data by category of computing the proportion of the
- observations in each category.
-
Types
of data:
Quantitative
variable
- : A variable with
- numerical data.
- The data can be manipulated mathematically and the
- results are meaningful. For example we can add the data and divide by the
- number of observations to arrive at the average value.
-
Types
of data:
Cross-sectional
data
Data collected at one point in time.
- For example a media research company calls up 5,000
- households at random to determine the proportion of households tuned to NBC to
- watch the opening ceremony of the 2012 Olympic Games.
-
Types of data: Time
series data
Data collected at regular intervals over time.
- Typical measuring points are months, for example monthly unemployment figures for the last three
- years, quarters, for example company
- quarterly reports for the last two years etc.
- The best way to represent time series data is by a
- line graph.
-
For statistical studies first we must identify what we want to study. This is referred to as...
The variable of interest
-
Observational statistics:
We observe and measure specific characteristics, but we do not attempt to control or modify the subjects being studied. A Gallup poll is an example of an observational study.
-
-
Experimental statistics:
- We conduct an experiment or as we say in Statistics,
- we apply some treatment and then we observe its effects on the subjects. (The
- subjects are usually called experimental units).
- Pharmaceutical companies conduct such experiments
- when they test new drugs.
-
One of the two parts of statistics: Descriptive Statistics
This part of Statistics attempts to summarize or describe the important characteristics of a set of data.
- Methods of summarizing data include tables, pictures such as bar charts, pies, histograms, frequency polygons,
- line-charts, etc, and numbers that
- measure a specific characteristic of the data. For example, the mean or average
- measures the center of a set of data.
-
One of the two parts of statistics: Inferential
Statistics or Statistical Inference
- This part of Statistics attempts to make inferences or draw conclusions or generalizations
- about a large population, based on a sample drawn from that population.
- The tools used are based on Probability and Probability
- Distributions and are extremely sophisticated.
- The methods used in Statistical Inference have solid
- Mathematical foundation and they will yield valid results provided of course
- that the sample is representative of the population.
- So the weak link in Statistical Inference is the sample and the sample size. Obviously a biased sample will yield unreliable
- results.
- How to choose a “good” sample is a science in
- itself.
-
-
Methods of sampling:Random sampling
Each member from the population has an equal chance of being selected.
-
Methods
of sampling:
Simple random
sample of size n
- Every
- possible sample of the same size n has an equal chance of being selected.
- Notice
- that there is difference between a random sample and a simple random sample.
-
Methods
of sampling:
Stratified
- Divide
- the population into sub-populations or strata, and then draw a sample from each
- stratum.
- Note: If the sample
- selected from each stratum is a random sample, then this procedure, first
- stratification and then random sampling is called stratified random sampling. This is a subgroup of stratified
- sampling.
-
Methods
of sampling:
Systematic
- Choose
- a starting point then select a specified element, say the kth
- element.
-
Methods
of sampling:Cluster
- Divide
- the population into sections or clusters, choose a few clusters at random, and
- then perform a census within each
- selected cluster. This means select all
- the elements from the chosen clusters.
- A
- special case of cluster sampling is area
- sampling, where the clusters are geographic subdivisions.
-
Methods
of sampling:
Convenient
- Just
- choose data readily and conveniently available. This does not yield
- statistically valid results.
-
Methods
of sampling:
Voluntary
Response
- A voluntary response sampling is one in which the
- respondents themselves decide whether to be included or not.
- Such
- a sample is flawed and should not be used for making general statements about a
- population.
-
Methods
of sampling:
Multistage Sampling
Sampling schemes that combine several sampling methods are called multistagesamples.
|
|