Copy of `Concept Stew - Statistics for the Terrified: Glossary`

The wordlist doesn't exist anymore, or, the website doesn't exist anymore. On this page you can find a copy of the original information. The information may have been taken offline because it is outdated.

Concept Stew - Statistics for the Terrified: Glossary
Category: Mathematics and statistics > Statistics
Date & country: 13/10/2007, UK
Words: 143

The likelihood of an event happening; must be between 0 and 1.

An observed fraction of the total.

Prospective study
By nature a long-term experiment, which plots the progression from an initial state to the particular state under examination. For example, how many first-time prisoners from a particular prison re-offend within one year of release? See also Retrospective study.

The protocol of an experiment is a set of rules laid down at the start to impose a rigid structure on everyone involved, to ensure consistency, and to provide a focus for the experiment by defining of the decision rule.

Random sample
Every item in the population has the same chance of being selected for a random sample, with no favouritism.

The process of ensuring that a sample is truly random. There are many ways of doing this, and to different levels. The entire sample should be chosen according to some strategy that does not unduly favour any particular group of individuals. A further level ensures that the subjects already chosen for the study are randomly allocated to the treatments of the study. This removes any suspicion that unscrupulous experimenters ensure that their particular hobby horse is tried out on the most susceptible subjects.

The difference between the maximum and minimum values in a sample or population.

The data is sorted into numerical order and each value is given a rank according to its position. This is the basis of most non-parametric tests.

Also known as linefitting. A method that finds the best 'line' through a set of plotted points, used to model an outcome variable in terms of a linear combination of predictor variables (also called independent variables). See also Multiple regression.

Relative risk
For instance, the ratio of the risk of disease in one group compared to another group.

Getting statistical significance is not necessarily the end of the story. In any experiment you would also want to see practical significance, which is not always present. In practical terms, the statistically significant difference may be so small as to be irrelevant. Noe that we may see practical significance in a sample, but we cannot conclude that this is a real effect in the population unless we also have statistical significance.

The difference between the observed value and the value predicted by the model under investigation. In a linefit situation, this would be the vertical difference between the line and the actual point.

Retrospective study
In a retrospective study you work backwards from a condition (eg. disease) to the causal factor. For example, how many of those first-time prisoners who re-offended within a year of release served sentences at a liberal prison as opposed to a traditional prison? See also Prospective study.

The proportion succumbing to disease in a group, for example.

A set of individuals chosen (usually randomly) from a larger population.

Sample size
The number of individuals in a sample.

The process of drawing a sample of subjects from a population.

See Standard deviation.

See Standard error.

Significance is an English word that has been hijacked by statisticians. In general usage the term significant difference means an important difference, but 'significant' in this sense is subjective. On the other hand, statistical significance is objective, and is based on the concept 'p<0.05'. In an experiment, a difference is detected by challenging a null hypothesis of 'no difference'. When p is less than 0.05, the null hypothesis is rejected - when p is greater than 0.05, the null hypothesis is accepted. So what does p<0.05 really mean? A p value of less than 0.05 means that there is even less than a 0.05 probability of getting the observed test statistic if the null hypothesis is true. This is an incredibly misunderstood concept, by even experienced users of statistics. See also Probability.

Significance level
The standard significance levels are 95% (p<0.05) and 99% (p<0.01).

Simple regression
See Linefitting

A double-blind study, in which neither subject nor evaluator knows what treatment or regime has been administered, reduces the risk of bias (psychological or otherwise) being introduced by either the investigator or the subjects of the study. Single-blind occurs when one of these two is aware of the threatment or regime administered. See also Blind study, Blinded evaluation, Double-blind.

Data which is not symmetrically distributed.

Also called the gradient. The rate of increase in the vertical-axis variable for a unit change in the horizontal-axis variable.

Standard deviation (sd)
The variance and its square root, the standard deviation, are the pre-eminent statistics used to summarise how much variability there is in a sample or population.

Standard error (se)
Standard error is a kind of tolerance around the sample mean, and in many ways the most important concept in statistics. It is related to, but not the same as, standard deviation. Essentially, if you could take infinitely many samples of a particular size from a population, the means of the samples would themselves form a population. The standard error is the standard deviation of this population of sample means. More formally this is called the standard error of the mean. By the same token, there are standard errors or any estimated parameter, eg. the standard error of the intercept and standard error of the gradient.

This is used in statistics to mean an individual (rather than its more usual meaning of a topic). It need not be a person - it depends on the discipline; an animal possibly.

t statistic
The statistic produced by the paired t test and the two sample t test.

Test statistic
A numeric summary value used to determine significance under the decision rule.

Two sample t test
This test is used to compare the means between two separate samples of individuals. It assumes the data are normally distributed.

Twoway analysis of variance
A generalisation of oneway analysis of variance which includes levels of a second factor. This approach also allows for interactions between the factors to be modelled and tested.

Type I error
False positive - wrongly concluding that there is a significant difference. See also Multiple testing (multiplicity).

Type II error
False negative - failing to find a significant difference when it exists.

A trial in which the investigator knows which patients receive which treatments and each patient is aware of their own particular treatment. See also Blind study.

Uniform distribution
The distribution where everything is equally likely. See also Distribution.

A measurement which can vary. For instance, height is a measurement which varies from person to person, as opposed to pi which does not vary from circle to circle; that is a constant.

A summary value indicating the amount of variation within the data. See also Standard deviation.

Welch's approximation
A modified test statistic used in a two sample t test scenario when the variances of the two groups are unequal.

Wilcoxon test
This is the non-parametric equivalent of the paired t test. It ranks the raw data into ascending order, then compares the total of the ranks in each group against Wilcoxon tables.

Wilks-Shapiro test
A test used to determine whether a sample of data could have come from a normally distributed population.

X axis
By convention this is the horizontal axis on a graph.

Y axis
By convention this is the vertical axis on a graph.