
Statistics
Branch of mathematics that focuses on the organization, analysis, and interpretation of a group of numbers.
 *how to prove a point
 *numbers to use to advance a cause
 *data are not theory neutral
 *numbers don't mean anything out of a particular context
 *Designed to advance a particular cause and supported by particular backgrounds

Descriptive Statistics
Procedures for summarizing a group of scores or otherwise making them more comprehensible
 *used to summarize and describe data
 *data is succinct and clear
 *a way to characterize an overall opinion in one number, ways to get around the mounds of data
 *Describing what you actually collected

Inferential Statistics
Procedures for drawing conclusions based on the scores collected in a research study but going beyond them.
*includes methods for generalizing beyond the actual sample data to infer the properties of population data that you as a researcher did not actually collect
Example effects of drug on memory performance
 * is a step beyond descriptive
 *consider the assumptions for "generalizability"
 * what applies to a smaller group can actually apply to a larger group

Variable
characteristic that can have different values
Example: Stress level, age, gender, religion

Values
possible number or category that a score can have

Score
particular person's value on a variable

Data
This is a generic term for whatever is being studied, is a pleural term (data are). It could be social groups (Rugby team). It could be events (basketball games), It could be organisms (twotied tree sloth).
**a set of measurements that are made from the observations you make or the research you conduct.
**anything you are interested in and ask questions that you have collected data. But that does not mean you can analyze what you have collected.

Raw Data:
The original measurements, not things that have been derived.
 Example
 Raw: Number of suicide attempts (reported)
Derived, transformed: Severity of depression

Sets of Data
 *Samples
 *Populations
 *Parameters

Samples
measure the most deals with statistics. Subsets of populations
Part of a population, a set of data from which we draw conclusions about the population of interest.
*A sample can be larger than a population
 *Samples are often more convenient and practical to use than populations are
 *Limited Time
 *Limited resources
 *Limited accessibility to subjects

Populations
the group that we are interested in; can be any size 5 to an entire country. Not size but interest. Will not generalize. The species as a whole.
"Everybody" but have to make a distinction of what we are looking for.
*the complete set of data that we want to draw inferences from or make conclusions about.
 Examples
 **all people between the ages of 12 and 15 who smoke cigarettes.
**all Drexel freshman from Zimbabwe

Parameters
Quantitative summary characteristics of populations. Deals with population
Greek symbols are used to specify parameters

Mean (Parameters)
Mean µ

Standard Deviation (Parameters)
standard deviation o

Regression weights (Parameters)
regression weights B

Correlation Coefficients (Parameters)
Correlation Coefficients  p

Mean Differences (Parameters)
Mean Differences ∆

Statistics symbols
these symbols are American (Latin) symbols are used to specify statistics.

Mean (Statistics)
Mean  M

Standard Deviation (Statistics)
 Standard Deviation s
 One and only one thing

Correlation coefficient (Statistics)
Correlation coefficient r

Mean difference (Statistics)
Mean difference  d

Finite Populations
sometimes a small set of data is of interest for its own sake
Example: Drexel freshmen from Zimbabwe
*here if only 10 exist and all 10 are participating in your study you are working with a finite population
*NOTE: you would use parameters to summarize the data of this group.

Ways of obtaining Parameters
 *Census
 *The Random Sample

Census
 A case where the entire population is measured via a survey. Measuring everybody in a population like a country, city or state.
 *can be completed on a large population
Example: The US Census, The Drexel Men's Basketball Team

The Random Sample
although it may seem that there is no relation and or connection, doesn't mean that they aren't related. There very well could be a relation to each stimuli.
*every observation in the population has an equal chance of being includes
*the choice of any one observation does not change the likelihood of the choice of any other observation.

Random samples are generally .....
*Not identical to each other
*Not identical to the population
However, random samples are more like the population the larger the samples are.

Variable
any attribute, property, or characteristic of some organism, object, or samples are
*A variable is not a constant. There should be a possibility of difference.
*For a variable of interest, not all members of a population or sample will have the same scores or values on that variable.
Examples eye color, number of classes attended, score on the first exam, etc.

Categorical variable
if y (our variable) represents an observation on some category.
Example y = mental health status
y1= depressed, y2 = depressed (diff level of severity), y3 = normal

Numeric
if z (our variable) is something that we can count or measure.
Example: z= number of arrests
z1=4, z2=1, z3= 5

Two kinds of Variables
*Dependent variable
*Independent variable

Independent Variable (IV)
the variable that is controlled or manipulated
 Examples
 *the number of cigarettes smoked per day
 *number of hours studying for exam 1
 *Gender* (cant change your sex)
 *Handedness* (can make you switch what hand you use to write but it would be uncomfortable)

Organismic Variable
Type of variable, this is a characteristic of an organism. Also called demographic variable. Typically used as an independent variable.
Examples Gender, height, religion, beliefs about smoking

Dependent Variable (DV)
the measured variable that is believed to result from manipulation of the independent variable. Something controlled. A consequence of the IV

Examples of IV vs DV
 IVnumber of hours studying ,
 mental health status, amount of exercise per week
DV score on Exam 1, Number of suicide attempts, Average weight loss per week
**Whatever the dependent variable is depends on what the independent variable is

Discrete Variable
can be exactly measured by counting. It takes on a finite number of values, usually whole numbers. A mean can involve a decimal (we are concerned with groups as a whole not individuals)
Examples
 *Number correct on first exam 20
 *Number of parking tickets 5

Discrete equals
whole numbers

Continuous Variable
takes on an infinity of values within some interval, where each value requires an infinite number of numeric characters to specify.
Examples time, weight

Constants
 the same value exists for all measured (in the sense of your observations that could have been variables became constants)

Variables
multiple values exists across measured

Qualitative variables
 levels differ by category, quality, characteristics (one
 kind of eye color or two kinds of eye color)

Quantitative variables
 variables differ by amount or quantity (the amount it took you to react to a certain stimuli).

Discrete vs. Continuous
*Discrete variables can be accurately measured exactly
*Continuous variables are refined ad infinitum
*Materialism and reductionism

Nominal Data
classification into mutually exclusive categories
No logical order is needed, only that the categories differ.(male to female or female to male, there is nothing in between)
Numbers may be Used, but only to identify categories.
distinguishing things by kind (male or female, blue eyes or brown eyes)
 NOTE counting is the only operation you can perform on the data, cant really average these number is these cases. There are “one” more of
 that category or name.

Ordinal Data
*Classification using numbers (though not always) where the numbers:
represent mutually exclusive quantities
have ordering based on the relationships of > and <

Interval data
numbers represent mutually exclusive quantites that have an ordering and have equal steps along the measured variable.
In other words, a 1point difference in any location along the measured variable is the same as a 1point difference at any other location.
EXAMples Fahrenheit or Celsius

RATIO Data
Numbers represent mutually exclusive quantities that have an ordering, with equal intervals along the measured variable and have the property that a true zero point exists.
*This zero point indicates the total absence of the measured attribute.
*Negative numbers do not exist
Examples *Temperature in Kelvin, drug dosage, time elapsed

Central Tendency
The central value toward which scored tend. Trying to describe a distribution distinctly.
*measures of central tendency provide us with a single summary figure that describes the central location of an entire distribution of observations
*measures of central tendency help us to simplify the comparison of two or more groups tested under different conditions.
Most common: Mode, Median, Arithmetic Mean

MODE
The most frequent score in the distribution the score with the highest frequency
In ungrouped distributions: mode is the score that appears with the greatest frequency
In grouped distributions: mode is taken as the midpoint of the class interval that contains the greatest number of scores

Properties of the Modes:
the mode is easy to obtain, but is not very stable from sample to sample.
*in grouped data, the mode may be strongly affected by the width and the location of the class intervals.
There may be more than one mode for a set of scores
With numerical data, the mean or the median is often preferred to the mode

Remember the mode (Mo ) is the only
measure of central tendency

The Median (Mdn)
the middle
The Median of the distribution is the point along the scale of possible scores below which 50% of the scores fall
In other words: Median is the value that divides the distribution in two halves

How to find the Mdn
*Put scores in rank from lowest to highest
*Make sure to include zero (if it is an actual score)
*if n (or N) is an odd number, the median will be the score that has an equal number of scores below and above it.
 * if n is an even number, the median is taken as the point halfway between he two scores that bracket the middle position
 12, 14, 15, 18, 19 ,20

Two interpretations of the mean:
 “The mean can be viewed as the amount that each
 person would get if the total amount (not frequency) of the variable being measured were divided up equally” (p.110)
 *****Income for faculty
the sum of all deviations around the mean=0
Use: can be used with any quantitative level of measurment.

Qualitative data
ways of labeling information (eye color brown eyes vs. blue eyes). Qualities that you have

Quantitative data
people vary in terms of an amount of something that you could posses

Mode you use for??
for qualitative

Median you use for ??
For quantitative
*only characterizes a distribution by a single score. Does not care about an extreme score. Only interested in the middle number. The middle most x. if your looking at a distribution with extreme scores.

Variability
a measure of variability is a single summary figure thatdescribes the spread of observations within a distribution (eye color and thereare different types of eye color that occur in our distribution). If everybody has the same eye color than that is a constant.

Measures of variability: What are they?
*the measures of variability express quantitatively the extent to which the scores in a distribution scatter about or cluster together.
 *Measures of variability describe the spread of
 an entire set of scores:
o They do not specify how far a particular score diverges from the center of a group
o They do not provide information about the shape of the distribution or the performance of a group.

*Nomothetic approach to research
 is my measure representative of anything or anyone?
Concerned with measuring variables

Range
*difference between the highest and lowest scores
*Two types: Exclusive and Inclusive

Exclusive range
distance between the midpoints of the intervals containing the two most extreme scores (highest score minus the lowest score)

Inclusive range:
distance between the upper limit of the highest score and the lower limit of the lowest score.

Properties of the Range
 1 the range is ideal for preliminary work or in
 other circumstances where precision is not an important requirement.
2 The range is very sensitive to outliers
3 The range is not sensitive to the total condition of the distribution
4 The range is of little use beyond the descriptive level
5 The range depends on sample size: greater sample size means grater range

*Negative feature of the Range
highly sensitive to extreme scores (outliers)
Sampling fluctuation is extreme
Magnitude depends on sample size
Virtually useless in advanced statistics

The Variance
(a kind of mean a typical way in which scores differ/deviate)
*if deviation scores provide the distance of each raw score from the mean, the mean of the deviation scores might be an attractive measure of variability
BUT: Remember!
*The sum of all deviations from the mean equals zero

UBE
*The unbiased estimate formula for the variance corrects for the tendency of the traditional formula to underestimate the population variance

Properties of the Standard Deviation
The SD is closely related to the arithmetic mean
The SD is the most important of the measures of variability
The SD is responsive to the exact position of every score in the distribution
 The SD is very sensitive to the presence of a few extreme scores (thus, for skewed
 distributions it may not be the best..)

