Three main aspects of statistics.
(Design, Description, Inference)
- 1) Design: Planning on how to obtain data to answer the questions of interest. How to conduct an experiment that produces trustworthy results.
- 2) Description: This is where you explore and summarize different patterns within the data. This can include descriptive statistics and graphs that have been summarized to present clear data from an experiment.
- 3) Inference: Making decisions based upon the data. This is where we take the data that we have gained through the experiment, descriptive statistics, and make an inference upon how this applies to the general population. This is where inferential statistics comes into play.
Identify the parts of a study that are associated with Design, Description and Inference.
- Design: The beginning stages of a study. This is where you develop your hypothesis and decide on how you will acquire the data to test your hypothesis.
- Description: Where you will take the data that you gathered in the experiment/study and compile it into what is known as descriptive statistics, which can include the mean, median, mode, and standard deviation to name a few. Also graphs are used here to pictorially display the data.
- Inference: The end of an experiment/study and this is where you take your descriptive statistics that you gathered from your sample and apply the results on to the population of interest
Difference between Sample and Population.
- Population: All of the subjects that are of interest.
- Sample: The subset of the population from which we plan to gather data from.
Difference between statistics and parameters.
- Statistics: A number gathered from data that describes some characteristic of a sample.
- Parameters: A number that describes some characteristic of the population.
What is a Random Sample?
-A simple random sample is a sampling method in which all people of the population have an equal chance of being selected. There is no set method to use a random sample; typically you would use a random number generator. This creates a good chance that we will have a sample that represents the population and we can make some powerful inferences when applying the results from a sample to the population.
Difference between descriptive and inferential statistics.
- Descriptive Statistics: Methods that are used for summarizing the data that was obtained from the sample. This usually includes graphs and measures of central tendencies. Descriptive statistics are fact.
- Inferential Statistics: Methods of making decisions or predictions about the population based on obtained from a sample. An inferential statistic is a very educated guess based on our descriptive statistics.
Understand something about the role of computers and calculators in statistics.
- MINITAB: Statistical software that can compute descriptive and inferential statistics.
- Calculators: The TI-83+ and TI-84 are graphing calculators that can calculate similar statistical analysis but is more limited than MINITAB.
Know the types of variables and how to determine if a variable is quantitative or qualitative (categorical.)
- Quantitative Variable: A variable that uses numerical values that expresses the data from the experiment.
- Qualitative (Categorical) Variable: Separates observations through categories. For example some qualitative variables can include religion, eye color, hair color,gender, marital status.
Know the types of quantitative variables. AND How to determine if a quantitative variable is discrete or continuous.
- Quantitative Variables: Discrete and continuous.
- Discrete Variables: Possible values can form a set of separate numbers. This type of variable is usually associated with counting numbers; it only deals with finite data. (For example: number of siblings, number of cars in a parking lot)
- Continuous Variable: Has a set of possible values that contain an entire interval of numbers in it. Some examples of this are weight, average daily temperature, time taken to study for a test, height, blood pressure.
Understand frequency tables and be able to find proportions or relative frequencies.
- Frequency Table: A numerical summary of the data which compiles the frequency of the values in a data set.
- Relative frequencies/ proportions: The frequency of the individual observations divided by the total number of observations.
What is Pareto charts?
Pareto Charts: A bar graph with the categories that are ordered from most frequent down to the least frequent.
Be able to determine the shape of a distribution, including skewness and modality.
- Unmodal Distribution: A has one peak and the highest part of the distribution is the mode, a unimodal distribution has one mode.
- Bimodal Distribution: Two distinct mounds which results from having two modes.
- Symmetric Distribution: a distribution that has both sides that are roughly equivalent to the other. Please do note that even though a distribution might be bimodal it can still be a symmetric distribution.
- Left Skewed Distribution: Where the tail falls onto the left side of the graph.
- Right Skewed Distribution: happens when the tail of the graph falls on the right side.
Be able to understand, construct and interpret time series plots.
Data sets that are collected over a period of time are called a time series and when this is graphically displayed it is a time series plot.
Be able to find and interpret the mean, median, and mode for a set of data.
- Mean: Average of the data. This is NOT a measure that is resistant to extreme measure.
- Median: the middle number in a data set when arranged in ascending order. This IS a measure that is resistant to extreme measures.
- Mode: the observation that occurs the most often.
Understand how the shape of a distribution affects the mean and median.
- In a normal distribution the mean median and mode are all centered at the peak of the curve.
- If the distribution is right skewed then this pulls the mean towards the end of the right tail past the mode and median.
- If the distribution is left skewed then it pulls the mean before the median and mode.
Be able to find the range and standard deviation for a set of data values.
The range is the largest number subtracted from the smallest number
Be able to understand, interpret, and apply the Empirical Rule.
- 68.3% of all the data values fall within 1 standard deviation away from the mean.
- 95.4% of all the data values fall within 2 standard deviations away from the mean.
- 99.7% of all the data values fall within 3 standard deviations away from the mean.
Understand the “three standard deviations from the mean” rule for outliers.
If the z-score is more than 3 standard deviations away from the mean it is an outlier.
What is the five number summary for a set of data values?
Min, Q1, Q2, Q3, Max.
Be able to understand and interpret quartiles, percentiles, and deciles.
- A percentile is a value that approximately some percentage LESS THAN that value. Example: The 90th percentile means that approximately 90% of people fall beneath your score. There are only 99 percentiles.
- A quartile is the same thing. There are only 3 quartiles.
- A deciles is dividing the data into 10 equal segments, there are only 9 deciles.
Be able to interpret the p-value to determine if an association exists between two categorical variables
A p-value < 0.05 is considered a significant finding. The smaller the better!!!!!
Understand what is meant by the response variable and the explanatory variable.
- Response variable: The outcome variable on which comparisons are made. In the example above the response variable is happiness.
- Explanatory variable: defines the groups to be compared with respect to the values of the response variable. It explains why you are seeing a response.
Be able to find and interpret the correlation coefficient, r, for a set of data. You may use your calculator to find r
The correlation coefficient measures the strength of the relationship.
Understand what r represents and know how to guess an approximate value for r from a scatter plot
The closer the correlation coefficient is to 1 or -1 the closer the observations are going to fall into a nice line.
Understand positive and negative associations between two quantitative variables.
- Positive associations: Positive associations mean that both of the observed variables are going up together.
- Negative associations: Negative associations mean that as one variable increases another decreases.
Understand the effect of outliers on the regression model.
Outliers can have a big effect on the regression. What will happen is the regression line will pull more towards the outlier. How much it pulls the regression line can depend on how much of an outlier the data point actually is. Also the location of the outlier plays a part; if the outlying point is towards the middle of the other x-values then the outlier doesn’t have much leverage and doesn’t affect the linear model greatly. The opposite of this is true also meaning that if the outlier is not toward the middle of the x-values (closer to one of the ‘end’ values) then there can be a large impact on the linear model.
Understand when outlying points are influential.
As stated in the previous answer the impact that the outlier can have larger or smaller impact upon the linear model depending upon where it actually falls within the data set.
Understand what extrapolation is and whether or not it is appropriate
This is when you use a linear model to predict y-values for x-values that are outside the range of the x-values in the data. This becomes very risky to do the further out that you get from the observed range of data. Remember the example about predicting prices of trucks based on the age of the truck, when we extrapolated by trying to determine the price of a 14 year old truck it turned out that they would end up paying you!
Know the difference between experiments and observational studies.
An experiment is where there are different treatments, randomly assigned subjects, and control over variables. An observational study on the other hand is where you are simply trying to establish an association. An observational study DOES NOT establish causation. Typically an observation is done when it is impossible or unethical to do an experiment.
Understand the different components of experiments.
- You have a response variable and an explanatory variable. The explanatory variable is going to explain what is happening to the response variable, this defines the groups that are going to be compared to. The response variable is the actual outcome variable on which the comparisons are made. This is typically where your data is.
- There are also control group—this group would receive a placebo if this is applicable. The comparison group is the group that you are actually giving your treatment to.
- -The Placebo Effect: This happens when the participants are receiving a placebo but they believe that they are receiving the actual treatment and therefore they improve but it has nothing to do with the treatment it’s just the participant’s belief.
- Randomization: All of the participants should be randomly assigned to groups in attempt to lower the chances of a biased sample.
- Replication: You re-do the experiment in order to validate that your findings are from the treatment and not a random chance.
- Blinding/Double Blinding: This is when either the participant or the participant and the experimenter are unaware of who is receiving the actual treatment and who is receiving the placebo.
Understand when causation can be established.
Causation can only be established through a well conducted experiment that has some type of significant findings tying the variables under investigations together leading one to say that the explanatory variable caused the response variable. Remember correlation (an association) is NOT causation!
Understand simple random sampling and some sources of bias.
- Biases: (1) undercoverage which is when some part of the population is not as well represented as it should be. Another bias is (2) the nonresponsive bias— some subjects cannot be reached or they just decline to participate (for example this can happen with over the phone surveys and mailing surveys). The last example of a bias that I will give is (3) the sampling bias which usually happens from not having a random sample or random assignment to groups.
- Simply random sampling: involves choosing samples in such a way that every possible sample is just as likely to be chosen as every other possibility. To have use the random number generator on your calculator follow the following steps: Math → prob → randomint (#5) and hit enter → (enter lowest number, enter highest number) and hit enter until you have the required number of random integers.
Understand the different aspects of conducting an experiment and the corresponding Terminology
- Experimental Units: Subjects.
- Treatment: These are the conditions imposed upon the subjects.
- Explanatory variable: Defines groups and treatments
- Response variable: Outcome
- Randomized experiments: Subjects randomly assigned to treatments.
- Placebo: A fake treatment; sugar pill.
Understand different sampling techniques, such as cluster sampling, stratified sampling, and simple random samples.
- Cluster Sampling: You identify groups clusters—and then you choose the cluster randomly and take all of the members of that cluster. With the apartment example you can think of each floor as a cluster and choose two floors in order to obtain the 20 subjects needed.
- Stratified sampling: You divide the population into groups—strata—and then randomly choose part of the sample from each group. So for the apartment example you would choose an apartment from each floor.
- Simple Random Sample: Each possible sample is equally likely.
Know the difference between retrospective and prospective studies
- Retro Study: You are looking back into the past. Example 9 on page 184 looks back at the cases of smokers and non smokers to compare the rates of lung cancer, there was not present study not research proposal only an observational study that is looking at information from the past.
- Prospective study: You are looking into the future. Example 11 on page 186 followed 121,700 female nurses began in 1976 and they fill out a questionnaire every two years. This is going to look for connections between a multitude of factors and risk of coronary heart disease, pulmonary disease, and stroke. This goes into the future.
Understand the concepts of multi factor experiments, matched pairs designs, and blocking
- Multifactor Experiments: This is a single experiment that analyzes two or more factors. For example on page 189 example 12 looks into the effectiveness of nicotine patches and/or anti-depressants on quitting smoking.
- Matched Pairs Design: Subjects are assigned to all the treatments but they do the treatments at different times. So in the example on page 190 examining the effectiveness of migraine medicine a subject would be assigned to the drug group for the first migraine and then the placebo group for the next migraine.
- Blocking: This is where you arrange the subjects in ‘blocks’ where the subjects are similar to each other. So in the example given in class with examining the effectiveness of a weight loss program we would block participants into certain weight classes because typically a person that weighs more will be more apt to lose more weight than a person who is smaller.
What is Independent trials
If the outcome of one trail is not affected by the outcome of another. For example if you flip a coin 50 times and get tails 50 times this does not affect the chances of the 51st toss being heads.
What is Law of large numbers
As we perform the experiment more and more the relative frequency in which an event occurs gets closer and closer to the true probability.
Be able to list sample spaces for an experiment and find basic probabilities
A sample space is a list of all possible outcomes. The sample space for flipping a coin 3 times there is 8 possible outcomes which means that there is 1/8 of a chance that you could get 3 tails in a row.
Understand mutually exclusive or disjoint events
A mutually exclusive or disjoint events are two events that have no common elements so for example a club card and a heart card would be mutually exclusive because a club card is a black card and a heart card is a red card. Also in order for mutually exclusive events to be independent one of the events has to be 0—an impossible event.