
statistic
science of data

individuals/cases
objects being described by set of data

variable
any characteristic of an individual and can take on different values

quantitative variable
numeric

qualitative variable
wordy

values
particular things variables take on

observational study
observes individual and measures variables of interest but does not influence responses. used to describe group or situation

response
variable that measures an outcome or result of a study

sampling
to gain info of whole through one part

sample surveys
survery goup of individuals by studying only some of members. it represents the larger group

population
entire group of individuals about which we want information (group want to study)

sample
part of population from which we actually get the information and use it to draw conclusions about the whole

census
sample survey that attempts to include entire population in sample

experiments
deilberately imposing some treatment on individuals to observe responses. can give cause and effect

biased
statisitical study that systematically favors certain outcomes.

convenience sampling
selection of individuals who are easiest to reach

voluntary response sampling
chooses itself by responding to a general appeal (call in)

simple random sample
choose a sample of n individuals form the population by a way that every set of n individuals has a chance to actually be selected

table of random digits
 long string of digits with two properties...
 1. each entry is equally likely to be 09
 2. entires are independent of each other

parameter
number that describes the population

statistic
number that describes a sample

parameter is to __________ as statistic is to ______.
population; sample

variablility
describes how spread out values are


p
proportion (fraction thats divided)


margin of error
plus or minus 2% points of how close sample stat is to pop parameter

95% confident
truth lies within the margin of error ( what % of all possible samples satisfy margin of error)

confidence statements
 fact about what happens in all possible samples and is used to say how trustworthy result of sample is
 1. margin of error
 2. level of confidence

sampling errors
 errors caused by thw act of taking a sample
 undercoverage
 random sampling error
 biased sampling methods

random sampling error
deviation between sample stat and the population paramenter cause by chance in selecting a random sample

nonsampling error
 errors not related to the act of selecting a sample from the population
 processing erros
 response error
 nonresponse

sampling frame
list of every individual from population

undercoverage
occurs when some groups in population are left our of process of choosing the sample

processing errors
mistakes in mechanical tasks like arithmatic or entering responses into a computer

response error
when subject gives incorrect response (lie, guess, bad memory)

nonresponse
failure to obtain data from individual selected for a sample (cant contact or no coorpation)

stratified random sample
 1. strata  divide sampling frame into distinct groups of individuals
 2. clusters  take separate SRS in each stratum and combine to make complete sample

probability sample
sample chosen by chance

response variable
variable that measures an outcome or reult of study (dependent)

explanatory variable
variable that we think explains or causes change to response variable (independent)

subjects
individuals studied in an experiment

treatment
specific experimental condition applied to subjects

lurking variable
variable that has important effect on relationship among variables in study but isnt one of explanatory variables studied

confounded
when 2 variables have effect on a response variable and cannot distinguish from each other

clinical trials
experiment that studies effectiveness of medical treatments on actual patients

placebo
dumby treatment with no active ingredients

placebo effect
response to dumby treatment

doubleblind
neither subjects nor testers recording know which treatment was to who

randomized comparative experiment
one that compares two treatments and allow us to draw cause and effect and is random and compares two things that are actually operating equally

control group
can be placebo group (no treatment at all)

control
effects of lurking variable on response, most simply by comparing 2 or mor treatment

randomize
use impersonal chance to assign subjects to treatments

statistically significant
observed effect of a size that would rarely occur by chance

comparative
good, compare in observance

matching
combine comparison in creating a control group

nonadherers
subjects who participate but do not follow the experimental treatment

dropouts
those hwo begin an experiment that continues over extended period of time then they do not complete it

generalizability
accurate of whole population

completely randomized
experimental design, all the experiemental subjects are allocated at random among all treatments

matched pair design
compares 2 treatments that the pairs of subjects are closely matched as possible

block design
random assignment of subjects to treatment is carried out sepaately within each block

block
experimental subjects that are similar in some way prior to experiment that is expected to affect response of treatments

measure
a property of person or thing that we assign a number to represent the property

instrument
make a measurment

units
used to record the measurment

variable
result of measurement is numberical

valid
meassure of a property if it is relevant or appropriate as a representation of that property

predictive validity
can be used to predict success on tasks that are related to the property measured

bias in measurement
sustematically tends to overstate or understate true value of measured property

random error in measurement
repeated measurements on same individual but gives different results

reliable
if random error is small

average in measurement
several repeated measurements of same individual is more reliable than a single measurement

distribution
variable that tells us what values it takes and how often it takes these

frequency table
 raw data
 values  frequency

roundoff errors
rounded entries dont quite add to total which is rounded seperately

pie chart
show how a whole is divided into parts and forces us to see parts that make a whole

bar graph
help distinguish tween variables whose values have meaningful numerical scale

categorical variable
places individual into one of several groups of categories

quantitative varaible
takes numerical values for which arithmatic operations like ading and averaging make sense

pictogram
bar graph in whic pictures replace bars and ar not proportional

linegraph
to display change overtime and plits each variable against time

histogram
distribution of quantitative variable and bars touch

center
midpoint of distribution

spread
variability of data (dont count outliers)

shape
peaks (unimodal, bimodal, mutlimodal), symmetric

right skew (positively skewed)
when the tail goes to the right

left skew (negatively skewed)
when the tail goes to the left

stemplot
stem is on the left and leaves are on the right

median
midpoint of distribution, the # that is positioned half way tween all the observations

quartiles Q1 and Q3
midpoints from beginning to median and median to end and divides observations into quarters

five number summary
min, Q1, median, Q3, max

boxplot
graph of five num sum

Mean (xbar)
average of set of observations

mode
most frequent number

standard deviation (s)
measures average distance of observations from mean

