-
individuals
- Individuals can be people,
- animals, plants, or any object of interest.
-
variable
- any characteristic of an
- individual. A variable varies
- among individuals.
-
distribution
- tells us what values the
- variable takes and how often it takes these values.
-
quantitative variable
- Something that takes
- numerical values for which arithmetic operations, such as adding and averaging,
- make sense.
-
categorical variable
- Something that falls into
- one of several categories. What can be counted is the count or proportion of
- individuals in each category
-
Ways to chart categorical data
Bar graphs and pie charts
-
Bar graphs
- Each category is
- represented by one bar. The bar’s height shows the count (or sometimes the
- percentage) for that particular category
-
Pie charts
- Each slice represents a piece of one whole. The size of
- a slice depends on what percent of the whole this category represents.
-
Ways to chart quantitative data
Histograms and stemplots, and Line graphs: time plots
-
Line graphs: time plots
- Use when there is a
- meaningful sequence, like time. The line connecting the points helps emphasize
- any change over time
-
Histograms and stemplots
- These are summary graphs
- for a single variable. They are very useful to understand the pattern of
- variability in the data
-
Histograms
- - The range of values that a
- variable can take is divided into equal size intervals.
- - The histogram shows the
- number of individual data points that fall in each interval.
-
stem plots
- -To compare two related distributions, a back-to-back stem plot with common
- stems is useful.
- -Stem plots do not work well for large
- datasets.
- -When the observed values have too many
- digits, trim the numbers before making
- a stem plot.
- -When plotting a moderate number of
- observations, you can split each
- stem.
-
Interpreting histograms
We can describe the overall pattern of a histogram by its shape, center, and spread.
-
A distribution is symmetric if ...
- the right and left sides
- of the histogram are approximately mirror images of each other.
-
A distribution is skewed to the right
- if the right side of
- the histogram (side with larger values) extends much farther out than the left
- side
-
skewed to the left
- if the left side of
- the histogram extends much farther out than the right side
-
An important kind of
deviation is an outlier
- Outliers are observations that lie outside the overall
- pattern of a distribution
-
A trend is
a rise or fall that persists over time, despite small irregularities.
-
seasonal variation
A pattern that repeats itself at regular intervals of time
-
mean
- add all values, then divide by the number of individuals. It is the “center of
- mass.”
-
median
- the midpoint of a
- distribution—the number such that half of the observations are smaller and half are larger
-
Comparing the mean and the median
- The
- mean and the median are the same only if the distribution is symmetrical. The
- median is a measure of center that is resistant to skew and outliers. The mean
- is not
-
first quartile, Q1
- the value
- in the sample that has 25% of the data at or below it
-
third quartile, Q3
- is the
- value in the sample that has 75% of the data at or below it
-
“1.5 * IQR rule for outliers
- if it falls more than 1.5
- times the size of the interquartile range (IQR) above the first quartile or
- below the third quartile
-
-
standard
deviation s.
- -s measures spread about the mean and should be
- used only when the mean is the measure of center.
- -s = 0 only when all observations have the same
- value and there is no spread.
- Otherwise, s > 0.
- -s is not resistant to outliers.
- -s has the same units of measurement as the
- original observations.
-
linear transformation
- do not change the basic shape of a distribution (skew, symmetry,
- multimodal). But they do change the measures of center and spread:
-
density curve
- The
- total area under the curve, by definition, is equal to 1, or 100%.
- The area under the
- curve for a range of values is the proportion of all observations for that
- range
-
median of a density curve is
the equal-areas point
- the
- point that divides the area under the curve in half.
-
mean of a density curve is
the balance point
at which the curve would balance if it were made of solid material.
-
Normal – or Gaussian –
distributions
- a family of symmetrical,
- bell-shaped density curves defined by a mean m (mu) and a standard deviation
- s (sigma) : N(m,s).
-
z-score
- measures the number of standard deviations that a data value x
- is from the mean m.
|
|