What is a frequency distribution? What is it used for? What does it look like?
A frequency distribution is a method of representing data graphically.
It is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample.
What is a histogram? What is it used for? What does it look like?
- A histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable.
Explain the true limits of an interval. Define the real Upper and the real Lower Limits.
The true limits of the interval are decimal values that fall halfway between the top of one interval and the bottom of the next.
An interval's real upper limit is the largest value that would be classed as being in the interval.
The real lower limit of an interval is the smallest value that would be classed as falling into the interval.
Ex: interval 35-39 would include all values between 34.5 and 39.5
Define a midpoint.
A midpoint is the average of the upper and lower limits. When we plot the data, we often plot the points as if they all fell at the midpoints of their respoective intervals.
What is a Kernel Density Plot?
A kernel density plot trys to fit a smooth curve to a distribution of data while at the sme time taking into account of the fact that there is a lot of random noise in the observations that should not be allowed to distort the curve to much.
- Kernel density plots pay no attention to the mean and standard deviation of the observations.
What are two criticisms of histograms and frequency distributions?
1) Histograms show observations that have been grouped into intervals, therefore they lose the actual numerical values of the individual scores in each interval
2) Frequency distributions retain the values of the individual obsercvations, but they can be difficult to use when they do not summarize the data sufficiently.
What graphing approach avoids the criticisms of histograms and frequency distributions?
Stem and leaf display.
Describe how a stem and leaf plot works.
- The digits are divided into leading digits that form the stem and trailing digits which form the leaves.
What is a disadvantage of using a stem and leaf plot?
How do we solve this problem?
A disadvantage is that for some data sets it will lead to a grouping that is too course or large.
- Use Tukey's Method:
What is a bimodal distribution?
A bimodal distribution is one in which it has two predominant peaks.
Describe a unimodal distribution.
A unimodal distribution has only one major peak.
What is the term used to refer to the number of major peaks in a distribution?
Describe a negatively skewed distribution
- A negatively skewed distribution is one in which the mean as a measure of central tendency is much less than where the majority of the data occur.
Describe a positively skewed distribution.
- A positively skewed distribution is one in which the mean as a measure of central tendency is much more than where the majority of the data occur.
Describe and define Kurtosis.
Kurtosis refers to the relative concentration of scores in the center, upper ends, lower ends and shoulders of a distribution.
Define mesokurtic. Draw the function.
- Mesokurtic is a normal distribution
Define and draw platykurtic.
- PLatykurtic is when you start with a normal distribution and more scores from both the center and the tails into the shoulders, the curve becomes flatter.
Define and draw leptokurtic
- Leptokurtic distribution is when is when the score move from the shoulders into both the center and the tails, the curve becomes more peaked with thicker tails.
Explain what a measure of central tendency is.
Measures of central tendency refer to a set of measures that reflect where on the scale the distribution is centered.
What are the three measures of central tendency?
Define the Mode. Explain the term bi-modal.
Mode is simply the most common score, aka the score which occurred in the largest number of participants.
If there are 2 scores that occur the most then the distribution is bi-modal.
Define Median. How is it calculated.
Median is the 50th percentile. The center number of scores which are in numerical order. If the center point is between 2 scores, the average of the 2 scores is taken as the median.
Define Median Location. State the equation.
Median location is the place between scores where the median falls.
In what situation will all three measures of central tendency be the same?
When the distribution is symmetric and unimodal.
In what situation will the mean and median be equal?
When the distribution is symmetric.
What are the 4 advantages and 2 disadvantages of using the Mode as a measure of central tendency?
- 1) Represents the largest number of subjects
- 2) It is a score that actually occurred, whereas the mean and mode may end up being values that never occurred in the data
- 3) The probability that an observation drawn at random will be equal to the mode is greater than the probability that it will be equal to any other specific score
- 4) Applicable to nominal data
- 1) The mode depends on how we group our data
- 2) May not be representative of the entire collection of numbers
- 3) Cannot write a standard equation for the mode.
What are the advantages and disadvantages of the Median?
- 1)Unaffected by extreme scores
- 1) A problem when there are many repetitive scores in the variable
- 2) Cannot write a standard equation for the median
What are the advantages and disadvantages of the Mean?
- Advantages:1) Can be manipulated algebraically
- 2) Sample mean is generally a better estimate of the population mean than other measures of central tendency
- 1) Influenced by extreme scores
- 2)the value may not actually exist in the data
- 3) Its interpretation in terms of the underlying variable being measured requires at least some faith in the interval properties of the data.
What is a trimmed mean?
A trimmed mean is a mean calculated on data in which we have discarded a certain percentage of the data at each end of the distribution.
Why would you use a trimmed mean?
- 1) Creates a more stable estimate of the population mean from the sample mean.
- 2) Controls problems with skewness.
What are the measures of dispersion?
- 1) Range
- 2) Interquartile Range
- 3) Variance
- 4) Standard Deviation
The range is a measure of distance from the lowest to the highest score.
What is an advantage and disadvantage of using the range as a measure of variability?
- Advantage:1) Easy to calculate
- Disadvantage:1) Relies on extreme values so if their are outliers in the data the range may give a distored picture of the variability.
Define Interquartile Range.
The interquartile range is obtained by getting rid o teh upper and lower 25% of the distribution and taking the range of what remains.
It is made up of the 1st quartile- lowest 25%, 2nd quartile- median, 3rd quartile- highest 25%.
The interquartile range is the difference between the 1st and 3rd quartiles Q3-Q1
Describe the advantages and disadvantages of the interquartile range.
- Advantages:1) Not affected by extreme values such as the range
- Disadvantages:1) Discards too much data
Define variance. State the equation for sample variance and population variance.
Variance is a measure of dispersion.
Sample Variance is:
Population Variance is:
Define standard deviation. State the equation for sample sd and population sd.
Standard deviation is the positive square root of the variance.
What are the properties of good Estimators (4)?
- 1) Unbiased
- 2) Consistent
- 3) Efficient
- 4) Sufficient
What are the most important properties of good estimators?
Unbiased & Consistent
Define unbiased statistic.
A statistic is unbiased if the mean of the sampling distribution of the statistic is equal to the propulation parameter being estimated.
Example: xbar = mu
Define point estimation.
Point estimate is using a sample value to infer a population value.
What type of variance is a biased estimator. What type of variance is an unbiased estimator.
- Sample variance with a denominator of N is biased.
- Sample variance with a denominator of N-1 is unbiased.
Does sample variance underestimate or overestimate the true population variance?
It underestimates the true population variance.
Which measure of central tendency has a relative efficency greater than the others.
What is more efficient, the biased sample variance or the unbiased sample variance.
Biased sample variance.
In what situation are the unbiased and biased estimators of sample variance have nearly equal efficiency?
When there is a large value of N.
Define consistency in terms of point estimation.
A statistic is consistent if it has a higher probability of being close to the population value the larger the sample size.
Define Sufficiency in terms pf point estimation.
If a sample statistic contains all the information in the data about the value of some parameter than the statisitic is a sufficient estimator for that parameter.