Branch of mathematics that focuses on the organization, analysis, and interpretation of a group of numbers.
*how to prove a point
*numbers to use to advance a cause
*data are not theory neutral
*numbers don't mean anything out of a particular context
*Designed to advance a particular cause and supported by particular backgrounds
Procedures for summarizing a group of scores or otherwise making them more comprehensible
*used to summarize and describe data
*data is succinct and clear
*a way to characterize an overall opinion in one number, ways to get around the mounds of data
*Describing what you actually collected
Procedures for drawing conclusions based on the scores collected in a research study but going beyond them.
*includes methods for generalizing beyond the actual sample data to infer the properties of population data that you as a researcher did not actually collect
Example- effects of drug on memory performance
* is a step beyond descriptive
*consider the assumptions for "generalizability"
* what applies to a smaller group can actually apply to a larger group
characteristic that can have different values
Example: Stress level, age, gender, religion
possible number or category that a score can have
particular person's value on a variable
This is a generic term for whatever is being studied, is a pleural term (data are). It could be social groups (Rugby team). It could be events (basketball games), It could be organisms (two-tied tree sloth).
**a set of measurements that are made from the observations you make or the research you conduct.
**anything you are interested in and ask questions that you have collected data. But that does not mean you can analyze what you have collected.
The original measurements, not things that have been derived.
Raw: Number of suicide attempts (reported)
Derived, transformed: Severity of depression
Sets of Data
measure the most deals with statistics. Subsets of populations
Part of a population, a set of data from which we draw conclusions about the population of interest.
*A sample can be larger than a population
*Samples are often more convenient and practical to use than populations are
*Limited accessibility to subjects
the group that we are interested in; can be any size 5 to an entire country. Not size but interest. Will not generalize. The species as a whole.
"Everybody"- but have to make a distinction of what we are looking for.
*the complete set of data that we want to draw inferences from or make conclusions about.
**all people between the ages of 12 and 15 who smoke cigarettes.
**all Drexel freshman from Zimbabwe
Quantitative summary characteristics of populations. Deals with population
Greek symbols are used to specify parameters
Standard Deviation (Parameters)
standard deviation- o
Regression weights (Parameters)
regression weights- B
Correlation Coefficients (Parameters)
Correlation Coefficients - p
Mean Differences (Parameters)
Mean Differences- ∆
these symbols are American (Latin) symbols are used to specify statistics.
Mean - M
Standard Deviation (Statistics)
Standard Deviation- s
One and only one thing
Correlation coefficient (Statistics)
Correlation coefficient- r
Mean difference (Statistics)
Mean difference - d
sometimes a small set of data is of interest for its own sake
Example: Drexel freshmen from Zimbabwe
*here if only 10 exist and all 10 are participating in your study you are working with a finite population
*NOTE: you would use parameters to summarize the data of this group.
Ways of obtaining Parameters
*The Random Sample
A case where the entire population is measured via a survey. Measuring everybody in a population like a country, city or state.
*can be completed on a large population
Example: The US Census, The Drexel Men's Basketball Team
The Random Sample
although it may seem that there is no relation and or connection, doesn't mean that they aren't related. There very well could be a relation to each stimuli.
*every observation in the population has an equal chance of being includes
*the choice of any one observation does not change the likelihood of the choice of any other observation.
Random samples are generally .....
*Not identical to each other
*Not identical to the population
However, random samples are more like the population the larger the samples are.
any attribute, property, or characteristic of some organism, object, or samples are
*A variable is not a constant. There should be a possibility of difference.
*For a variable of interest, not all members of a population or sample will have the same scores or values on that variable.
Examples- eye color, number of classes attended, score on the first exam, etc.
if y (our variable) represents an observation on some category.
Example- y = mental health status
y1= depressed, y2 = depressed (diff level of severity), y3 = normal
if z (our variable) is something that we can count or measure.
Example: z= number of arrests
z1=4, z2=1, z3= 5
Two kinds of Variables
Independent Variable (IV)
the variable that is controlled or manipulated
*the number of cigarettes smoked per day
*number of hours studying for exam 1
*Gender* (cant change your sex)
*Handedness* (can make you switch what hand you use to write but it would be uncomfortable)
Type of variable, this is a characteristic of an organism. Also called demographic variable. Typically used as an independent variable.
Examples- Gender, height, religion, beliefs about smoking
Dependent Variable (DV)
the measured variable that is believed to result from manipulation of the independent variable. Something controlled. A consequence of the IV
Examples of IV vs DV
IV-number of hours studying ,
mental health status, amount of exercise per week
DV- score on Exam 1, Number of suicide attempts, Average weight loss per week
**Whatever the dependent variable is depends on what the independent variable is
can be exactly measured by counting. It takes on a finite number of values, usually whole numbers. A mean can involve a decimal (we are concerned with groups as a whole not individuals)
*Number correct on first exam- 20
*Number of parking tickets- 5
takes on an infinity of values within some interval, where each value requires an infinite number of numeric characters to specify.
Examples- time, weight
- the same value exists for all measured (in the sense of your observations that could have been variables became constants)
multiple values exists across measured
levels differ by category, quality, characteristics (one
kind of eye color or two kinds of eye color)
- variables differ by amount or quantity (the amount it took you to react to a certain stimuli).
Discrete vs. Continuous
*Discrete variables can be accurately measured exactly
*Continuous variables are refined ad infinitum
*Materialism and reductionism
classification into mutually exclusive categories
-No logical order is needed, only that the categories differ.(male to female or female to male, there is nothing in between)
-Numbers may be Used, but only to identify categories.
-distinguishing things by kind (male or female, blue eyes or brown eyes)
NOTE- counting is the only operation you can perform on the data, cant really average these number is these cases. There are “one” more of
that category or name.
*Classification using numbers (though not always) where the numbers:
-represent mutually exclusive quantities
-have ordering based on the relationships of > and <
numbers represent mutually exclusive quantites that have an ordering and have equal steps along the measured variable.
In other words, a 1-point difference in any location along the measured variable is the same as a 1-point difference at any other location.
EXAMples- Fahrenheit or Celsius
Numbers represent mutually exclusive quantities that have an ordering, with equal intervals along the measured variable and have the property that a true zero point exists.
*This zero point indicates the total absence of the measured attribute.
*Negative numbers do not exist
Examples- *Temperature in Kelvin, drug dosage, time elapsed
The central value toward which scored tend. Trying to describe a distribution distinctly.
*measures of central tendency provide us with a single summary figure that describes the central location of an entire distribution of observations
*measures of central tendency help us to simplify the comparison of two or more groups tested under different conditions.
Most common: Mode, Median, Arithmetic Mean
The most frequent score in the distribution- the score with the highest frequency
In ungrouped distributions: mode is the score that appears with the greatest frequency
In grouped distributions: mode is taken as the midpoint of the class interval that contains the greatest number of scores
Properties of the Modes:
the mode is easy to obtain, but is not very stable from sample to sample.
*in grouped data, the mode may be strongly affected by the width and the location of the class intervals.
There may be more than one mode for a set of scores
With numerical data, the mean or the median is often preferred to the mode
Remember the mode (Mo ) is the only
measure of central tendency
The Median (Mdn)-
The Median of the distribution is the point along the scale of possible scores below which 50% of the scores fall
In other words: Median is the value that divides the distribution in two halves
How to find the Mdn
*Put scores in rank from lowest to highest
*Make sure to include zero (if it is an actual score)
*if n (or N) is an odd number, the median will be the score that has an equal number of scores below and above it.
* if n is an even number, the median is taken as the point halfway between he two scores that bracket the middle position
12, 14, 15, 18, 19 ,20
Two interpretations of the mean:-
“The mean can be viewed as the amount that each
person would get if the total amount (not frequency) of the variable being measured were divided up equally” (p.110)
*****Income for faculty
the sum of all deviations around the mean=0
Use: can be used with any quantitative level of measurment.
ways of labeling information (eye color brown eyes vs. blue eyes). Qualities that you have
people vary in terms of an amount of something that you could posses
Mode you use for??
Median you use for ??
*only characterizes a distribution by a single score. Does not care about an extreme score. Only interested in the middle number. The middle most x. if your looking at a distribution with extreme scores.
a measure of variability is a single summary figure thatdescribes the spread of observations within a distribution (eye color and thereare different types of eye color that occur in our distribution). If everybody has the same eye color than that is a constant.
Measures of variability: What are they?
*the measures of variability express quantitatively the extent to which the scores in a distribution scatter about or cluster together.
*Measures of variability describe the spread of
an entire set of scores:
o They do not specify how far a particular score diverges from the center of a group
o They do not provide information about the shape of the distribution or the performance of a group.
*Nomothetic approach to research
- is my measure representative of anything or anyone?
Concerned with measuring variables
*difference between the highest and lowest scores
*Two types: Exclusive and Inclusive
distance between the midpoints of the intervals containing the two most extreme scores (highest score minus the lowest score)
distance between the upper limit of the highest score and the lower limit of the lowest score.
Properties of the Range
1- the range is ideal for preliminary work or in
other circumstances where precision is not an important requirement.
2- The range is very sensitive to outliers
3- The range is not sensitive to the total condition of the distribution
4- The range is of little use beyond the descriptive level
5- The range depends on sample size: greater sample size means grater range
*Negative feature of the Range
-highly sensitive to extreme scores (outliers)
-Sampling fluctuation is extreme
-Magnitude depends on sample size
-Virtually useless in advanced statistics
(a kind of mean a typical way in which scores differ/deviate)
*if deviation scores provide the distance of each raw score from the mean, the mean of the deviation scores might be an attractive measure of variability
*The sum of all deviations from the mean equals zero
*The unbiased estimate formula for the variance corrects for the tendency of the traditional formula to underestimate the population variance
Properties of the Standard Deviation
The SD is closely related to the arithmetic mean
The SD is the most important of the measures of variability
The SD is responsive to the exact position of every score in the distribution
The SD is very sensitive to the presence of a few extreme scores (thus, for skewed