
Bar graph
a graphical representation of categorical data. Names of each category are listed on teh x axis and a bar is placed over each category name having height equal to the frequency (or percentage) in that category

Bias
a condition that occurs when the design of a study systematically favors certain outcomes

Blocking
the grouping of individuals according to some characteristic like rats in teh same litter or plots of land at the same locatio. the random allocation is carried out separately within each group

Boxplot
a plot of data based on the five number summary. a line is drawn from the minimum observation to Q1; a bos is drawn from Q1 to Q3 with a vertical line at the median and a line is drawn from Q3 to teh maximum observation. Good for side

Categorical variable
a variable that can be classified into groups or categories such as gender, religion, zipcode, etc. typically, words are used to describe an individual

Comparative study
a study where the explanatory variable has two active treatments rather than an active treatment versus a contro. purpose of study is to determine which treatment works best rather than whether a treatment works

Completely randomized design
an experimental design where all individauls participating in the experiment are assigned at random to the treatments

Confounded variable
a variable whose effect on the response variable cannot be separated from the effect of the explanatory variable on the response variable. (Note: usually confounded variables are lurking variables but only a few lurking variables are also confounded)

Confounding
a situation where the effect of one variable on the response variable cannot be separated from the effect of another variable on the response variable

Control
an 'inactive' treatment where no experimental condition is applied to teh individuals in order to determine whether the active treatment works. Randomizing together with a conrol enables the researcher to manage lurking variables when there is not a comparison group. Note: a control is not necessary for a valid experiment as long as two or more comparison treatments are used

Convenience sample
a sample where the researcher contacts those subjects who are readily available and does not use any random selection. the results are almost surely biased

Distribution
a list or a graph that shows the possible values of a variable together with the frequency of each value

dotplot
a one dimensional plot of a quantitative data set where each value in the data set is represented by a dot above its corresponding location on the x axis

Double blind
neither the subject nor the doctor, nurse or whomever is diagnosing the results knowns which treatment the subject recieved

experiment
a study where a treatment is deliberately imposed on each individual in the study before reonses are measured in order to observe responses to the treatment. a valid experiment must have 1) control or comparison, 2) randomization and 3) replication

Explanatory variable
a variable that may or may not explain the outcomes (responses) of a study. it is described using a phase that describes all possible treatments. Note: an observational study can have an explanatory variable, but a valid experimetn always has an explanatory variable

five number summary
minimum, Q1, median, Q2, maximum; preferred when data are very skewed or have outliers

histogram:
a graphical display of a quantitative data set; data are separated into intervals of equal width and a bar is drawn over the interval having height equal to the frequency (or percentages) are given on the y axis (hence, a histogram gives a distribution). Histograms are described by shape, center and spread. Used for large data sets.

individual
the basic unit (or subject) of the experiment upon which a tretment is applied

interquartile range (IQR)
a measure of variablitiy recommended for skewed data or data with outliers; computed as IQR = Q3  Q1

lack of realism
a weakness in experiments where the setting of the experiment does not realistically duplicate the conditions we really want to study

left skewed
a density curve where the left side of the distribution extends in a long tail (Mean < Median)

Lurking variable
a variable that has an important effect on the relationship among the variables in a study but is not taken into account

mean
a measure of the center of the data; it's the oint that "balances" the data

median
a measure of the center of data; it's the oint such that half the number are smaller and the other half are larger 9the midpoint of the ordered data set)

multistage sample
sampling is conducted in stages; for a twostage smaple, the individuals are grouped according to some characteristic groups are first randomly selected and then individuals are randomly selected from those selected groups. (In a stratified sample, individuals are randomly selected from every group). for example, states could be randomly selected; then school districts within selected states, followed by schools within selected school districts within selected states and finally students would be randomly selected from teh selected schools from teh selected school districts from selected states. that would be a four stage sample

nonresponse bias
bias resulting when individuals selected to be in a survey either cannot be contracted or refuse to answer survey quesitons

Normal distribution
a bellshaped symmetric density curve used to model many data sets that have a symmetric mound or bell shape

observational study
a study that merely observes conditions of idividuals in a population and records information; the population is disturbed as little as possilbe (note: treatments are not imposed on units)

Outlier
an obervation that falls outside the overall pattern of the data set. can be detected by checking; observation < Q1  1.5 IQR or observation> Q3 + 1.5 IQR

Pie chart
a graphical display of categorical data using a "pie", each category is represented as a slice where the size of the slice is proportional to the percentage fo data in that category. not recommended by statisticians

placebo effect
the response of patients to any treatment even though it has no physical effect

population
the entire group of individuals about whom we desire to collect information

probability sample
a sample selected using a random device where each individaul in the population has a chance (doesn't have to be equal) of being selected. Probability samples are necessary for making inferences. Examples include: SRS, stratified and multistage

Q1
a location measure of the data such that has one fourth or 25% of the data is smaller than it.

Q3
A location measure of the data that has threefourths or 75% of teh data is samller than it.

quantitative variable
a variable with numerical values such as heigh or weight

random number table
a table of digits consisting of digits 09 whose order cannot be determined but in the long run, each digit occurs 10% of the time.

Randomization
a method of assigning individuals in an experiment to treamtent groups using some random device that eliminates bias and gives each unit the same probability of bein assigned to any treatmetn group. randomization "balances" the treamtent groups, thus averaging out lurking and extraneous variables. allows us to use the laws of probability to maek inferences. Randomization as a condition can be SRS or RAT (Random allocation to treatments)

Range
the maximum observation minus the minimum observation

Replication
having more than one individual in each treatmetn group replication is necessary for measuring variablity. also the greater the replication, the more precise the results

Response bias
bias resulting from individuals in a samle lying or giving incorrect repsonse because they do not have knowledge about the question or can't recall; response bias could also result from wording of teh question or from interviewers influence the response either intentionally or unintentionally

Response variable
a variable that gives the results (may not be a numbeR) of the oucome of a study; measured on an individual

right skewed distribution
a density curve where the right side of teh distribution extends in a long tail;

sample
a subset of individuals in the population; the group of individuasl about which we actually collect information from

simple random sample
a sample of size a selected from the population in such a way that each possible sample of size n has an equal chance of being selected

standard deviation
a measure of the "average" or typical deviation of the observation about the mean; measures variability of data about the mean

standard Normal curve
a normal distribution with mean of zero and standard deviation of one. probabilities are given in Table A for values of the standard Normal variable

Statistically significant
results of a study that differ too much from what we expected to attribute to chance variation alone

stemplot
a graphical representation of a quantitative data set. leading values of each data point are presented as stems and second digits are given as leaves. used for small data sets

stratified sampling
a sampling scheme where the population has been divided into strata according to some characteristic and a simple random sample is selected from within each stratum

symmetric distribution
a density curve wher ethe right half is a mirror image of the left half of the distribution (Mean= Median)

Undercoverage bias
bias that occurs because the list of the population from which the sample is drawn is incomplete meaning that some people in the population are not listed for selection (homeless)

Voluntary response sample
a method of sample selection that consists of peopel choosing themselves by responding to ageneral appeal

zscore
a measure of the number deviations a value or observation is from teh mean, a standardized value

