Tuesday, April 13, 2010

Statistical Theory

Note: I will keep this post archived in the "Key Blog Posts" section of the column to the right for quick reference by users. The latest revisions to the post will be marked via italicized, red text. I will add content as posts require.

In general, the hyperlinked term will take you to a Wikipedia article on the topic with far more detail. My text within this archived post will provide a short, layman's explanation of the term. I will also list any specific posts that go into the topic in more detail.

General Theory
  • parametric statistics: a branch of general statistical theory where an underlying distribution type (Weibull, normal, etc.) for a particular data set is assumed, and thus a specific set of test assumptions are used when calculating test statistics from the data set. By using assumed distributions in the test statistics, more precise calculations or projections can be made. The use of parametric statistics requires that the user perform more checks of the data to ensure the underlying assumptions are correct when compared to other statistical methods.
  • non-parametric statistics: a branch of general statistical theory that does not require the user assume that sample data has been taken from a particular distribution type. Non-parametric test are often used when original data does not fit a parametric distribution nor can it be transformed into one, or when using rank-ordered or some other non-continuous data type. Parametric statistical tests provide less precise calculations compared to parametric tests, but they are also more "forgiving".
  • Null (Ho) and alternative (Ha) hypotheses: The null hypothesis is one that is assumed to be true at the outset of a test and often represents the default position, such as there is no difference between two data sets. The statistical tests are then performed to determine the likelihood that the null hypothesis is correct given the sample data (this likelihood is reflected in the p-value). If the p-value is sufficiently small, one can safely reject the null hypothesis and accept the alternative hypothesis (example: one population is greater than the other).
  • p-value: Used in statistical tests, the p-value is an estimate of the likelihood of obtaining a result at least as extreme as the one observed in the test while the null hypothesis is true. The p-value helps one assess the risk of falsely rejecting the null hypothesis. Typical statistical theory sets a critical p-value at 0.05, which means that there is less than 5% chance one is making an incorrect assumption of rejecting the null hypothesis. P-values are calculated differently for each assumed probability distribution or type.
  • normality or normal distribution: a distribution of data centered around a mean that can be characterized by its spread from the mean via its standard deviation. This is the most commonly discussed statistical distribution, often referred to as a "bell shaped curve". The most frequently used statistical analyses - t-tests, regression, and other parametric statistical test - rely on the assumption that the data being studied is normally distributed. Thus, ensuring normality in the underlying data set via something like an Anderson-Darling test is a pre-requisite to beginning such parametric tests.
Parametric Analysis

This section of post to be completed at a later date.

Non-Parametric Analysis
  • Mann-Whitney Test: A test used to determine whether or not two sample sets of data come from the same distribution. It is similar to a two-sample t-test, but does not require the data to be normally distributed. The test has an Ho = distributions of both groups are the same, while Ha = distributions of the groups are different. Thus, it can be used to tell whether one population is greater or less than another in a statistical sense.
  • Levene's Test: A test used to determine whether or not the variances of two samples are the same. It is similar to the F-Test, but it does not require that the underlying data be normally distributed. It is especially useful in evaluating whether or not two sample populations meet the equal variances requirements of other statistical tests, or to evaluate an increase in spread or disparity when that is the statistic of concern.

No comments:

Post a Comment

LinkWithin

Related Posts Plugin for WordPress, Blogger...