# Statistics - PAP-575

 Definition of Statistics The theory, procedures, and methodology by which data are summarized. Descriptive Statistics serve the function of numerically describing a phenomenon, such as how many patients are diagnosed with lung cancer per year or the average cholesterol levels among patients in the control group and the patients in the treatment group. Inferential Statistics make predictions about a population based on a representative sample. (Outcome of treatment group is assumed to predict how the population will respond.) For a normal distribution, with a mean of 500 and standard deviation of 100.  A score of 600 is what percent better than the rest of the scores. 84% Percent under normal curve for: 1 st dev 2 st dev 3 st dev 68%95%99.7% On a normal curve, where is the mean and median? Center of the curve (highest point) A linear regression test r2 = 0.45 means: 45% of the variation in the response variable can be accounted for by variation in the predictor variable In a positive correlation scatter plot, as x increases y increases Predictive value an estimate of the response variable based on the linear model Residual The difference between the predicted and observed values Observed value The value of an individual response variable found in a sample. In order to use a t-test, the sample data needs to be unimodal, symmetric distribution In a Scatterplot with dots that seem to follow a linear pattern the variables are correlated t-test a test for differences between groups, using continuous data, with an unknown population standard deviation z-test a test for differences between groups, using continuous data, with a known population standard deviation ANOVA (F-test) a test for differences between more than two groups whose group sizes are nearly equal. The data must be normally distributed and the groups must have similar variance. Similar to a t-test.(ANalysis Of VAriance) Chi-squared test a test which can be used for differences or correlations used with categorical data, non-normal distributions, or small sample sizes.(χ2) Median value that splits the distribution in half Mode most frequently occurring value of values Standard Deviation the average distance of values in a sample from its mean Correlation type of test which checks for relationships between variables Reliability the degree to which a study's data are consistent Validity the degree to which a study's data are accurate Qualitative Data Types OrdinalNot measurable (observations)Sex (Male/Female)Positive/Negative test resultsLikert ScaleStudied by Non-parametric statistical testing (Chi2) Likert Scale: 1. Strongly Disagree2. Disagree3. Neutral4. Agree5. Strongly Agree Nominal Categorical Data Types(Nominal - urban, suburban, rural) Binary Categorical Data Type(Binary - dead/alive) Ordinal Categorical Data Type(Ordinal - mild, moderate, severe)Likert Scale Categorical Data Types (Nominal - urban, suburban, rural)(Binary - dead/alive)(Ordinal - mild, moderate, severe) Quantitative Data Types Patient ageWeightCholesterol level(Continuous Data types) Continuous Data Types (numerical)Lab valuesInteger dataRatio data (contains true zero)Studied by parametric statistical testing (t-test) Variance The square root of the sum of the squared deviations divided by (n-1) Simple random sample Selecting subjects such that each subject is chosen entirely by chance and every member of the population had an equal chance of being selected Stratified random sample Selecting subjects such that each subject is randomly chosen from categories or participants that meet predetermined criteria Skewed Distribution Median is the best summary of the center of the distribution Researchers want to know if regular exercise or good nutrition is more effective for weight loss. What is the best null hypothesis for this study? "Exercise and nutrition are equally effective" How many modes does a normal distribution have? 1 Alpha The probability that the outcome happened by random chance aloneProbability of a Type 1 ErrorThis is compared to the p-value, if p<0.05 it is considered significant Beta β, often set at 0.20; power defined as 1-β; power indicates probability that a type II error (False Negative) will be rejected Critical value Numerical value associated with alpha Test statistic The value that results from the statistical hypothesis test (e.g. t, F, χ²; often scarce) p-value The proportion of area under the normal curve beyond the test statistic, p<0.05 it is considered significant, the probability that the test statistic represents normal variability Publication bias When an article is not published for some reason other than the quality of the article or the study Type 1 Error Incorrectly rejecting the null hypothesis(Saying there is a difference when in fact there is not.)False Positive Type 2 Error Incorrectly failing to reject the null hypothesisFalse Negative Alpha halves The risk of Type 1 error when a test of equality is performed. If equality is tested, the researchers assume that either group could turn out superior, in which case alpha should be divided in half. When a test of equality is performed it is referred to as a two-tailed test, meaning there are two sides. Confidence Interval The range of the expected values that contains the true population's parameter within a given percentage likelihood (95%). A narrow range of CI is best. Frequency A descriptive statistic showing the number of observations or the proportion of observations (if expressed as a percentage) A correlation coefficient (r) of 1.0 means: the variables are 100% positively correlated A variable, other than the predictor variable, that influences the response variable is referred to as: a lurking variable Three questions for data studies Is the study valid?Are the results important?Can the results help you? Positive skew Right skewed - outliers are to the right end (more positive) end of graph(Age of PA Class of 2016) Negative skew Left skewed - outliers are the left end (more negative) end of the graph(Age at death) Mann-Whitney u-test for non-parametric tests SEM SEM = S/√n, so that the more in your sample, the smaller the SEM is (to halve the SEM, need 4X the sample number, given the square root relationship). (Mode < median < mean) describes what kind of distribution? Positively (right) skewed A researcher notes that the 95% confidence interval is very wide in the first component of the clinical trial. Generally speaking, what would be the best way to halve the width of the confidence interval and maintain that level of statistical significance? quadruple the number of subjects in the study A test comparison of the effectiveness of a modified penicillin (94% cure rate) compared to (standard) penicillin G (82% cure rate) with respect to treating a pharyngeal abscess generates a p < 0.05. What does this mean? The probability is less than 0.05 that the new drug is only as effective as or less effective than regular penicillin. Statistical power addresses what Type II errors Kaplan-Meier curve Survivor curveplot changes over time as they occur With random sampling, we are testing to see if our samples are representing: the same underlying population (null hypothesis) or two different underlying populations (alternative hypothesis) Bonferroni correction (p value of interest)/(n of observations) ex: With 200 observations, 0.05/200= 0.00025 would be the “equivalent” p value that would suggest the null hypothesis or no difference Point Estimate A single value (statistic) that is measured from a sample (e.g. a mean or a proportion) What value should not be included in a Confidence Interval for it to be considered significant? When evaluating a difference a 95% CI including zero is non-significant, when evaluating a ratio the 95% CI including one is non-significant. AuthorPandora320 ID257742 Card SetStatistics - PAP-575 DescriptionStatistics Updated2014-01-27T02:06:27Z Show Answers