STATS 121

  1. Bar graph
    a graphical representation of categorical data. Names of each category are listed on teh x axis and a bar is placed over each category name having height equal to the frequency (or percentage) in that category
  2. Bias
    a condition that occurs when the design of a study systematically favors certain outcomes
  3. Blocking
    the grouping of individuals according to some characteristic like rats in teh same litter or plots of land at the same locatio. the random allocation is carried out separately within each group
  4. Boxplot
    a plot of data based on the five number summary. a line is drawn from the minimum observation to Q1; a bos is drawn from Q1 to Q3 with a vertical line at the median and a line is drawn from Q3 to teh maximum observation. Good for side
  5. Categorical variable
    a variable that can be classified into groups or categories such as gender, religion, zip-code, etc. typically, words are used to describe an individual
  6. Comparative study
    a study where the explanatory variable has two active treatments rather than an active treatment versus a contro. purpose of study is to determine which treatment works best rather than whether a treatment works
  7. Completely randomized design
    an experimental design where all individauls participating in the experiment are assigned at random to the treatments
  8. Confounded variable
    a variable whose effect on the response variable cannot be separated from the effect of the explanatory variable on the response variable. (Note: usually confounded variables are lurking variables but only a few lurking variables are also confounded)
  9. Confounding
    a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable
  10. Control
    an 'inactive' treatment where no experimental condition is applied to teh individuals in order to determine whether the active treatment works. Randomizing together with a conrol enables the researcher to manage lurking variables when there is not a comparison group. Note: a control is not necessary for a valid experiment as long as two or more comparison treatments are used
  11. Convenience sample
    a sample where the researcher contacts those subjects who are readily available and does not use any random selection. the results are almost surely biased
  12. Distribution
    a list or a graph that shows the possible values of a variable together with the frequency of each value
  13. dotplot
    a one dimensional plot of a quantitative data set where each value in the data set is represented by a dot above its corresponding location on the x axis
  14. Double blind
    neither the subject nor the doctor, nurse or whomever is diagnosing the results knowns which treatment the subject recieved
  15. experiment
    a study where a treatment is deliberately imposed on each individual in the study before reonses are measured in order to observe responses to the treatment. a valid experiment must have 1) control or comparison, 2) randomization and 3) replication
  16. Explanatory variable
    a variable that may or may not explain the outcomes (responses) of a study. it is described using a phase that describes all possible treatments. Note: an observational study can have an explanatory variable, but a valid experimetn always has an explanatory variable
  17. five number summary
    minimum, Q1, median, Q2, maximum; preferred when data are very skewed or have outliers
  18. histogram:
    a graphical display of a quantitative data set; data are separated into intervals of equal width and a bar is drawn over the interval having height equal to the frequency (or percentages) are given on the y axis (hence, a histogram gives a distribution). Histograms are described by shape, center and spread. Used for large data sets.
  19. individual
    the basic unit (or subject) of the experiment upon which a tretment is applied
  20. interquartile range (IQR)
    a measure of variablitiy recommended for skewed data or data with outliers; computed as IQR = Q3 - Q1
  21. lack of realism
    a weakness in experiments where the setting of the experiment does not realistically duplicate the conditions we really want to study
  22. left skewed
    a density curve where the left side of the distribution extends in a long tail (Mean < Median)
  23. Lurking variable
    a variable that has an important effect on the relationship among the variables in a study but is not taken into account
  24. mean
    a measure of the center of the data; it's the oint that "balances" the data
  25. median
    a measure of the center of data; it's the oint such that half the number are smaller and the other half are larger 9the midpoint of the ordered data set)
  26. multi-stage sample
    sampling is conducted in stages; for a two-stage smaple, the individuals are grouped according to some characteristic-- groups are first randomly selected and then individuals are randomly selected from those selected groups. (In a stratified sample, individuals are randomly selected from every group). for example, states could be randomly selected; then school districts within selected states, followed by schools within selected school districts within selected states and finally students would be randomly selected from teh selected schools from teh selected school districts from selected states. that would be a four- stage sample
  27. non-response bias
    bias resulting when individuals selected to be in a survey either cannot be contracted or refuse to answer survey quesitons
  28. Normal distribution
    a bell-shaped symmetric density curve used to model many data sets that have a symmetric mound or bell shape
  29. observational study
    a study that merely observes conditions of idividuals in a population and records information; the population is disturbed as little as possilbe (note: treatments are not imposed on units)
  30. Outlier
    an obervation that falls outside the overall pattern of the data set. can be detected by checking; observation < Q1 - 1.5 IQR or observation> Q3 + 1.5 IQR
  31. Pie chart
    a graphical display of categorical data using a "pie", each category is represented as a slice where the size of the slice is proportional to the percentage fo data in that category. not recommended by statisticians
  32. placebo effect
    the response of patients to any treatment even though it has no physical effect
  33. population
    the entire group of individuals about whom we desire to collect information
  34. probability sample
    a sample selected using a random device where each individaul in the population has a chance (doesn't have to be equal) of being selected. Probability samples are necessary for making inferences. Examples include: SRS, stratified and multistage
  35. Q1
    a location measure of the data such that has one fourth or 25% of the data is smaller than it.
  36. Q3
    A location measure of the data that has three-fourths or 75% of teh data is samller than it.
  37. quantitative variable
    a variable with numerical values such as heigh or weight
  38. random number table
    a table of digits consisting of digits 0-9 whose order cannot be determined but in the long run, each digit occurs 10% of the time.
  39. Randomization
    a method of assigning individuals in an experiment to treamtent groups using some random device that eliminates bias and gives each unit the same probability of bein assigned to any treatmetn group. randomization "balances" the treamtent groups, thus averaging out lurking and extraneous variables. allows us to use the laws of probability to maek inferences. Randomization as a condition can be SRS or RAT (Random allocation to treatments)
  40. Range
    the maximum observation minus the minimum observation
  41. Replication
    having more than one individual in each treatmetn group replication is necessary for measuring variablity. also the greater the replication, the more precise the results
  42. Response bias
    bias resulting from individuals in a samle lying or giving incorrect repsonse because they do not have knowledge about the question or can't recall; response bias could also result from wording of teh question or from interviewers influence the response either intentionally or unintentionally
  43. Response variable
    a variable that gives the results (may not be a numbeR) of the oucome of a study; measured on an individual
  44. right skewed distribution
    a density curve where the right side of teh distribution extends in a long tail;
  45. sample
    a subset of individuals in the population; the group of individuasl about which we actually collect information from
  46. simple random sample
    a sample of size a selected from the population in such a way that each possible sample of size n has an equal chance of being selected
  47. standard deviation
    a measure of the "average" or typical deviation of the observation about the mean; measures variability of data about the mean
  48. standard Normal curve
    a normal distribution with mean of zero and standard deviation of one. probabilities are given in Table A for values of the standard Normal variable
  49. Statistically significant
    results of a study that differ too much from what we expected to attribute to chance variation alone
  50. stemplot
    a graphical representation of a quantitative data set. leading values of each data point are presented as stems and second digits are given as leaves. used for small data sets
  51. stratified sampling
    a sampling scheme where the population has been divided into strata according to some characteristic and a simple random sample is selected from within each stratum
  52. symmetric distribution
    a density curve wher ethe right half is a mirror image of the left half of the distribution (Mean= Median)
  53. Undercoverage bias
    bias that occurs because the list of the population from which the sample is drawn is incomplete- meaning that some people in the population are not listed for selection (homeless)
  54. Voluntary response sample
    a method of sample selection that consists of peopel choosing themselves by responding to ageneral appeal
  55. z-score
    a measure of the number deviations a value or observation is from teh mean, a standardized value
Author
kccall93
ID
132115
Card Set
STATS 121
Description
Test 1
Updated