
Descriptive statistics
Descriptive statistics are used to describe, or summarize, data in ways that are meaningful and useful. For example, it would not be useful to know that all of the participants wore blue shoes. However, it would be useful to know how spread out the anxiety ratings were. Descriptive statistics is at the heart of all quantitative analysis.

inferential statistics
Inferential statistics makes inferences about populations using data drawn from the population. Instead of using the entire population to gather the data, the statistician will collect a sample or samples from the millions of residents and make inferences about the entire population using the sample.The sample is a set of data taken from the population to represent the population. Probability distributions, hypothesis testing, correlation testing and regression analysis all fall under the category of inferential statistics.

Population
n statistics, a population is a complete set of items that share at least one property in common that is the subject of a statistical analysis. For example, the population of German people share a common geographic origin, language, literature, and genetic heritage, among other traits, that distinguish them from people of different nationalities. As another example, the Milky Waygalaxy comprises a star population. In contrast, a statistical sample is a subset drawn from the population to represent the population in a statistical analysis.[2] If a sample is chosen properly, characteristics of the entire population that the sample is drawn from can be inferred from corresponding characteristics of the sample.

Sample
a data sample is a set of data collected and/or selected from a statistical population by a defined procedure.[1]Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample usually represents a subset of manageable size. Samples are collected and statistics are calculated from the samples so that one can make inferences or extrapolations from the sample to the population. This process of collecting information from a sample is referred to as sampling. The data sample may be drawn from a population without replacement, in which case it is a subset of a population; or with replacement, in which case it is a multisubset.[2]

Qualitative Variables
Qualitative data is a categorical measurement expressed not in terms of numbers, but rather by means of a natural language description. In statistics, it is often used interchangeably with "categorical" data.
 For example: favorite color = "blue"
 height = "tall"
Although we may have categories, the categories may have a structure to them. When there is not a natural ordering of the categories, we call these nominal categories. Examples might be gender, race, religion, or sport.When the categories may be ordered, these are called ordinal variables. Categorical variables that judge size (small, medium, large, etc.) are ordinal variables. Attitudes (strongly disagree, disagree, neutral, agree, strongly agree) are also ordinal variables, however we may not know which value is the best or worst of these issues. Note that the distance between these categories is not something we can measure.

Quantitative Variables
Quantitative data is a numerical measurement expressed not by means of a natural language description, but rather in terms of numbers. However, not all numbers are continuous and measurable. For example, the social security number is a number, but not something that one can add or subtract.
 For example: molecule length = "450 nm"
 height = "1.8 m"
Quantitative data always are associated with a scale measure.Probably the most common scale type is the ratioscale. Observations of this type are on a scale that has a meaningful zero value but also have an equidistant measure (i.e., the difference between 10 and 20 is the same as the difference between 100 and 110). For example, a 10 yearold girl is twice as old as a 5 yearold girl. Since you can measure zero years, time is a ratioscale variable. Money is another common ratioscale quantitative measure. Observations that you count are usually ratioscale (e.g., number of widgets).

Nominal Level Data
A set of data is said to be nominal if the values / observations belonging to it can be assigned a code in the form of a number where the numbers are simply labels. You can count but not order or measure nominal data. For example, in a data set males could be coded as 0, females as 1; marital status of an individual could be coded as Y if married, N if single.

What type of data is this...
Gender
Nominal

What type of data is this...
Hair colour
Nominal

What type of data is this...
Where do you live?
Nominal

Ordinal Data
set of data is said to be ordinal if the values / observations belonging to it can be ranked (put in order) or have a rating scale attached. You can count and order, but not measure, ordinal data.The categories for an ordinal set of data have a natural order, for example, suppose a group of people were asked to taste varieties of biscuit and classify each biscuit on a rating scale of 1 to 5, representing strongly dislike, dislike, neutral, like, strongly like. A rating of 5 indicates more enjoyment than a rating of 4, for example, so such data are ordinal.However, the distinction between neighbouring points on the scale is not necessarily always the same. For instance, the difference in enjoyment expressed by giving a rating of 2 rather than 1 might be much less than the difference in enjoyment expressed by giving a rating of 4 rather than 3.
With ordinal scales, it is the order of the values is what’s important and significant, but the differences between each one is not really known. Take a look at the example below. In each case, we know that a #4 is better than a #3 or #2, but we don’t know–and cannot quantify–how much better it is. For example, is the difference between “OK” and “Unhappy” the same as the difference between “Very Happy” and “Happy?” We can’t say.Ordinal scales are typically measures of nonnumeric concepts like satisfaction, happiness, discomfort, etc.“Ordinal” is easy to remember because is sounds like “order” and that’s the key to remember with “ordinal scales”–it is the order that matters, but that’s all you really get from these.Advanced note: The best way to determine central tendency on a set of ordinal data is to use the mode or median; the mean cannot be defined from an ordinal set.

What type of data is this...
Likert scale  How do you feel today 1 happy, 10 unhappy
Ordinal

What type of data is this...
How satisfied are you with our service? 1 unsatisfied, 5 satisfied
Ordinal

Interval Data
An interval scale is a scale of measurement where the distance between any two adjacents units of measurement (or 'intervals') is the same but the zero point is arbitrary. Scores on an interval scale can be added and subtracted but can not be meaningfully multiplied or divided. For example, the time interval between the starts of years 1981 and 1982 is the same as that between 1983 and 1984, namely 365 days. The zero point, year 1 AD, is arbitrary; time did not begin then. Other examples of interval scales include the heights of tides, and the measurement of longitude.
Interval scales are numeric scales in which we know not only the order, but also the exact differences between the values. The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example of an interval scale in which the increments are known, consistent, and measurable.Interval scales are nice because the realm of statistical analysis on these data sets opens up. For example, central tendency can be measured by mode, median, or mean; standard deviation can also be calculated.Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself means “space in between,” which is the important thing to remember–interval scales not only tell us about order, but also about the value between each item.Here’s the problem with interval scales: they don’t have a “true zero.” For example, there is no such thing as “no temperature.” Without a true zero, it is impossible to compute ratios. With interval data, we can add and subtract, but cannot multiply or divide. Confused? Ok, consider this: 10 degrees + 10 degrees = 20 degrees. No problem there. 20 degrees is not twice as hot as 10 degrees, however, because there is no such thing as “no temperature” when it comes to the Celsius scale. I hope that makes sense. Bottom line, interval scales are great, but we cannot calculate ratios, which brings us to our last measurement scale…

What type of data is this...
Temperature
Interval

Ratio data
atio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero–which allows for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition of zero. Good examples of ratio variables include height and weight.Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.

What type of data is this...
height
ratio

what type of data is this...
weight
ratio

Discrete data
A set of data is said to be discrete if the values / observations belonging to it are distinct and separate, i.e. they can be counted (1,2,3,....). Examples might include the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth; gender (male, female); blood group (O, A, B, AB).

Continuous Data
A set of data is said to be continuous if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. For example height, weight, temperature, the amount of sugar in an orange, the time required to run a mile.

Frequency Table
A frequency table is a way of summarising a set of data. It is a record of how often each value (or set of values) of the variable in question occurs. It may be enhanced by the addition of percentages that fall into each category.A frequency table is used to summarise categorical, nominal, and ordinal data. It may also be used to summarise continuous data once the data set has been divided up into sensible groups.When we have more than one categorical variable in our data set, a frequency table is sometimes called a contingency table because the figures found in the rows are contingent upon (dependent upon) those found in the columns.

Relative Freqency
How often something happens divided by all outcomes.
Example: if your team has won 9 games from a total of 12 games played:* the Frequency of winning is 9* the Relative Frequency of winning is 9/12 = 75%

Histogram
a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval.

population mean
sum of the values in the population/number of values in the population
In probability and statistics, mean and expected value are used synonymously to refer to one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution.

Sample Mean
= Sum of all values in the sample/number of values in the sample
The mean is the average of the numbers: a calculated "central" value of a set of numbers. To calculate: Just add up all the numbers, then divide by how many numbers there are. Example: what is the mean of 2, 7 and 9?
(2+7+9)/3

Properties of the arithmetic mean
 1. Every set of an intervallevel or ratio  level data has a mean.
 2. All the values are included in computing the mean
 3. The mean is unique
 4. The sum of the deviation of each value from the mean is zero

The sum of deviations from the mean is zero.
Show this .... the mean of 3,8 and 4 is 5.
e(xmean)=0
(35)+(85)+(45)=0

What are advantages and disadvantages of the Mean?
 Advantage: All the data is used to find the answer
 Disadvantage: Very large or very small numbers can distort the answer

What are advantages and disadvantages of the Median?
 Advantage: Very big and very small values don't affect it
 Disadvantage: Takes a long time to calculate for a very large set of data

What are advantages and disadvantages of the Mode?
 Advantage:The only average we can use when the data is not numerical
 Disadvantages: There may be more than one mode, there may be no mode at all if none of the data is the same, and it may not accurately represent the data

Weighted Mean
a mean that is computed with extra weight given to one or more elements of the sample.
For example, instead of adding all data and dividing by the number of data items...
[2(20)+1(40)+6(100)]/160

Mode
The mode is the value that appears most often in a set of data. Themode of a discrete probability distribution is the value x at which its probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled.

Geometric mean
((X1)(X2)(X3)........(XN))1/N
X = Individual score N = Sample size (Number of scores)
in mathematics, the geometric mean is a type of mean or average, which indicates the central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometric mean is defined as the nth root of the product of n numbers.
Geometric mean is a kind of average of a set of numbers that is different from the arithmetic average. The geometric mean is well defined only for sets of positive real numbers. This is calculated by multiplying all the numbers (call the number of numbers n), and taking the nth root of the total. A common example of where the geometric mean is the correct choice is when averaging growth rates.
 Geometric Mean Example:
 To find the Geometric Mean of 1,2,3,4,5.
 Step 1:N = 5, the total number of values. Find 1/N. 1/N = 0.2
 Step 2:Now find Geometric Mean using the formula.
 ((1)(2)(3)(4)(5))0.2 = (120)0.2
 So, Geometric Mean = 2.60517 This example will guide you to calculate the geometric mean manually.

Dispersion
In statistics, dispersion (also called variability, scatter, or spread) denotes how stretched or squeezed is a distribution (theoretical or that underlying a statistical sample). Common examples of measures ofstatistical dispersion are the variance, standard deviation and interquartile range.

A large measure of dispersion indicates...
that the mean is not reliable

Measures of dispersion
Range... largest valuesmallest value
mean deviation

Range
Largest unit  smallest unit

variance
variance measures how far a set of numbers is spread out. A variance of zero indicates that all the values are identical. Variance is always nonnegative: a small variance indicates that the data points tend to be very close to the mean (expected value) and hence to each other, while a high variance indicates that the data points are very spread out around the mean and from each other.

Standard deviation
In statistics and probability theory, the standard deviation (SD) (represented by the Greek letter sigma, σ) measures the amount of variation or dispersion from the average.
The standard deviation is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the standard deviation is big; and vice versa.
It is important to distinguish between the standard deviation of a population and the standard deviation of a sample. They have different notation, and they are computed differently.

Chebyshev's theorem
Chebyshev’s theorem refers to several theorems, all proven by Russian mathematician Pafnuty Chebyshev. They include: Chebyshev’s inequality, Bertrand’s postulate, Chebyshev’s sum inequality and Chebyshev’s equioscillation theorem.Chebyshev’s inequality is the theorem most often used in stats. It states that no more than 1/k2 of a distribution’s values are more than “k” standard deviations away from the mean. With a normal distribution, standard deviations tell you how much of that distribution’s data are within kstandard deviations from the mean.
If you have a distribution that isn’t normal, you can use Chebyshev’s to help you find out what percentage of the data is clustered around the mean.Chebyshev’s Inequality relates to the distribution of numbers in a set. The formula was originally developed by Chebyschev’s friend, IrénéeJules Bienaymé. In layman’s terms, the formula helps you figure out the number of values that are inside and outside the standard deviation. The standard deviation tells you how far away values are from the average of the set. Roughly twothirds of the values should fall within one standard deviation either side of mean in a normal distribution. In statistics, it’s often referred to as Chebyshev’s Theorem (as opposed to Chebyshev’s Inequality). This can be a little confusing, but it’s just semantics; if you’re asked to calculate Chebyshev’s Inequality in Statistics, watch this video for the steps.

Empirical Rule
the empirical rule shows that 68% will fall within the first standard deviation, 95% within the first two standard deviations, and 99.7% will fall within the first three standard deviations of the mean.

Coefficient Variation
In probability theory and statistics, the coefficient of variation (CV) is a standardized measure of dispersion of a probability distribution or frequency distribution. It is defined as the ratio of the standard deviation to the mean .

Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a realvalued random variable about its mean. The skewness value can be positive or negative, or even undefined.

Percentiles
A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found.

Box Plot
In descriptive statistics, a box plot or boxplot is a convenient way of graphically depicting groups of numerical data through their quartiles.

What is probability?
Probability is the chance that something will happen  how likely it is that some event will happen.Sometimes you can measure a probability with a number like "10% chance of rain", or you can use words such as impossible, unlikely, possible, even chance, likely and certain.

Mutually exclusive
A statistical term used to describe a situation where the occurrence of one event is not influenced or caused by another event. In addition, it is impossible for mutually exclusive events to occur at the same time.
 Examples:
 Turning left and turning right are Mutually Exclusive (you can't do both at the same time)
 Tossing a coin: Heads and Tails are Mutually Exclusive
 Cards: Kings and Aces are Mutually Exclusive
 What is not Mutually Exclusive:
 Turning left and scratching your head can happen at the same timeKings and Hearts, because we can have a King of Hearts!

Subjective probability
A probability derived from an individual's personal judgment about whether a specific outcome is likely to occur. Subjective probabilities contain no formal calculations and only reflect the subject's opinions and past experience.
Subjective probabilities differ from person to person. Because the probability is subjective, it contains a high degree of personal bias. An example of subjective probability could be asking New York Yankees fans, before the baseball season starts, the chances of New York winning the world series. While there is no absolute mathematical proof behind the answer to the example, fans might still reply in actual percentage terms, such as the Yankees having a 25% chance of winning the world series.

Permutation
A permutation is an arrangement of all or part of a set of objects, with regard to the order of the arrangement.
For example, suppose we have a set of three letters: A, B, and C. We might ask how many ways we can arrange 2 letters from that set. Each possible arrangement would be an example of a permutation. The complete list of possible permutations would be: AB, AC, BA, BC, CA, and CB.
When they refer to permutations, statisticians use a specific terminology. They describe permutations as n distinct objects taken r at a time. Translation: n refers to the number of objects from which the permutation is formed; and r refers to the number of objects used to form the permutation. Consider the example from the previous paragraph. The permutation was formed from 3 letters (A, B, and C), so n = 3; and the permutation consisted of 2 letters, so r = 2.

Combination
A combination is a selection of all or part of a set of objects, without regard to the order in which objects are selected.
For example, suppose we have a set of three letters: A, B, and C. We might ask how many ways we can select 2 letters from that set. Each possible selection would be an example of a combination. The complete list of possible selections would be: AB, AC, and BC.

Conditional Probability
Theprobability that event B occurs, given that event A has already occurred is P(BA) = P(A and B) / P(A)

Contingency Tables
In statistics, a contingency table (also referred to as cross tabulation or crosstab) is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables.

Tree diagram
A diagram used in strategic decision making, valuation or probability calculations. The diagram starts at a single node, with branches emanating to additional nodes, which represent mutually exclusive decisions or events.

random variable
a variable that assumes numerical values that are determined by the outcome of an experiment

Discrete random variable
 Possible values can be counted or listed.
 •For example, the number of defective units in a batch of 20, a listener rating (on a scale of 1 to 5) in a music survey

Continuous random variable
 May assume any numerical value in one or
 more intervals
•For example, the waiting time for a credit card authorization, the interest rate charged on a business loan

What are the properties of a discrete random variable?
 1. Value is between 0 and 1
 2. All outcomes are mutally exclusive
 3. the probabilities must add up to 1

expected value of a discrete probability distribution
multiply each value by its probability and sum them up

What is the binomial distribution?
 A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
  The experiment consists of n repeated trials.
  Each trial can result in just two possible outcomes.
  We call one of these outcomes a success and the other, a failure.
  The probability of success, denoted by P, is the same on every trial.
  The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.

What type of question is this?
Suppose historical data indicates that
10% of customers who enter a store
make a purchase. From a sample of 4
customers entering the store, what is the probability that 2 of them will make
a purchase?
Binomial distribution

Hypergemotric distribution
In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of successes in draws, without replacement, from a finite population of size containing exactly successes, wherein each draw is either a success or a failure.

The poisson distribution
a discrete frequency distribution that gives the probability of a number of independent events occurring in a fixed time.
For example, if the average number of people that rent movies on a friday night at a single video store location is 400, a Poisson distribution can answer such questions as, "What is the probability that more than 600 people will rent movies?" Therefore, application of the Poisson distribution enables managers to introduce optimal scheduling systems. One of the most famous historical practical uses of the Poisson distribution was estimating the annual number of Prussian cavalry soldiers killed due to horsekicks. Other modern examples include estimating the number of car crashes in a city of a given size; in physiology, this distribution is often used to calculate the probabilistic frequencies of different types of neurotransmitter secretions.

Find the probability that 3 errors (x =3) will occur
in a week
Poisson distribution

continuous probability distribution
 A continuous random variable may
 assume any numerical value in one or more intervals
 Use a continuous probability distribution to
 assign probabilities to intervals of values
 Many business measures such as sales,
 investment, costs and revenue can be represented by continuous random variables
 Other names for a continuous probability
 distribution are probability curve or probability density function

Examples of continuous variables
height, time, age, and temperature

Properties of continuous probabilities
1.f(x) ³ 0 for all x
 2.The total area under the curve of f(x) is
 equal to 1
 3.P(a≤x≤b) is
 given by the area under the probability curve between a and b.

uniform distribution
In statistics, a type of probabilitydistribution in which all outcomes are equally likely. A deck of cards has a uniform distribution because the likelihood of drawing a heart, club, diamond or spade is equally likely.
Looks like a square
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by the two parameters, a and b, which are its minimum and maximum values. The distribution is often abbreviated U(a,b). It is the maximum entropy probability distribution for a random variate X under no constraint other than that it is contained in the distribution's support.[1]

Normal probability distribution
In probability theory, the normal (or Gaussian) distribution is a very commonly occurring continuous probability distribution—a function that tells the probability that any real observation will fall between any two real limits or real numbers, as the curve approaches zero on either side.
The normal curve is symmetrical around its mean and bellshaped
So the mean is also the median, and is also the mode
The tails of the normal extend to infinity in both directions
The tails get closer to the horizontal axis but never touch it (the distribution is asymptotic)
The area under the entire normal curve isThe area under either half of the curve is 0.5 Denote the mean as μ and the standard deviation as σ

THe emperical rule
The empirical rule, also known as the threesigma rule or the 689599.7 rule, provides a quick estimate of the spread of data in a normal distribution given the mean and standard deviation. Specifically, theempirical rule states that for a normal distribution:

What are some reasons to sample?
 1.To contact the whole population would be
 time consuming.
 2.The cost of studying all the items in a
 population may be prohibitive.
 3.It is physically impossible to checking
 all items in the population.
4.Certain tests are destructive in nature.
5.The sample results are (often) adequate.

Simple random sampling
 A sample selected so that each item or
 person in the population has an equal
 chance of being included.
 •To select the sample we can use tables of
 random numbers, lottery method, random number generator…
•There is no selection bias in the way our sample is chosen.
 •A key advantage of a simple random sample
 is its representativeness of the population.
 In a true random sample, this is only compromised by luck.
 •A key disadvantage of a simple random
 sample is feasibility
•Requires knowledge of, access to, andcooperation of the entire population

Systamatic random sampling
•The items or individuals of the population are arranged in some order. A starting point is selected randomly and then every kth member of the population is selected for the sample.
 •k is calculated as the population size
 divided by the sample size.
•When the physical order is related to the population characteristic, then systematic random sampling should not be used.

Stratified random sampling
 A population is divided into subgroups,
 called strata,
and a sample is randomly selected from each stratum.
 Once the strata are defined, we can apply
 simple random sampling within each group or strata to collect the sample.
 •Can ensure that specific groups are
 represented, even proportionally, in the sample
 •Could be used if variance between groups
 (strata) is very different
•Strata must be carefully defined

cluster sampling
 •A population is divided into clusters
 using naturally occurring geographic or other boundaries.
 •Then, clusters are selected randomly and
 a sample is collected from each cluster selected randomly.
Cluster sampling is a sampling technique used when "natural" but relatively homogeneous groupings are evident in a statistical population. It is often used in marketing research. In this technique, the total population is divided into these groups (or clusters) and a simple random sample of the groups is selected.

sampling distribution of the sample mean
uppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probabilitydistribution of this statistic is called a sampling distribution.

central limit theorm
In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a welldefined expected value and welldefined variance, will be approximately normally distributed, regardless of the underlying distribution.[1][2] That is, suppose that a sample is obtained containing a large number of observations, each observation being randomly generated in a way that does not depend on the values of the other observations, and that the arithmetic average of the observed values is computed. If this procedure is performed many times, the central limit theorem says that the computed values of the average will be distributed according to thenormal distribution (commonly known as a "bell curve").

sampling  how large is large enough?
 •If the sample size is at least 30, then for most sampled populations, the sampling distribution of sample
 means is approximately normal
 •We will assume that if n is at least 30, the sampling
 distribution of x is approximately normal
 •If the population is normal, then the sampling distribution of x is normal no regardless of the
 sample size

