Copy of `Concept Stew - Statistics for the Terrified: Glossary`
The wordlist doesn't exist anymore, or, the website doesn't exist anymore. On this page you can find a copy of the original information. The information may have been taken offline because it is outdated.
|
|
Concept Stew - Statistics for the Terrified: Glossary
Category: Mathematics and statistics > Statistics
Date & country: 13/10/2007, UK Words: 143
|
ProbabilityThe likelihood of an event happening; must be between 0 and 1.
ProportionAn observed fraction of the total.
Prospective studyBy nature a long-term experiment, which plots the progression from an initial state to the particular state under examination. For example, how many first-time prisoners from a particular prison re-offend within one year of release? See also Retrospective study.
ProtocolThe protocol of an experiment is a set of rules laid down at the start to impose a rigid structure on everyone involved, to ensure consistency, and to provide a focus for the experiment by defining of the decision rule.
Random sampleEvery item in the population has the same chance of being selected for a random sample, with no favouritism.
RandomisationThe process of ensuring that a sample is truly random. There are many ways of doing this, and to different levels. The entire sample should be chosen according to some strategy that does not unduly favour any particular group of individuals. A further level ensures that the subjects already chosen for the study are randomly allocated to the treatments of the study. This removes any suspicion that unscrupulous experimenters ensure that their particular hobby horse is tried out on the most susceptible subjects.
RangeThe difference between the maximum and minimum values in a sample or population.
RankThe data is sorted into numerical order and each value is given a rank according to its position. This is the basis of most non-parametric tests.
RegressionAlso known as linefitting. A method that finds the best 'line' through a set of plotted points, used to model an outcome variable in terms of a linear combination of predictor variables (also called independent variables). See also Multiple regression.
Relative riskFor instance, the ratio of the risk of disease in one group compared to another group.
RelevanceGetting statistical significance is not necessarily the end of the story. In any experiment you would also want to see practical significance, which is not always present. In practical terms, the statistically significant difference may be so small as to be irrelevant. Noe that we may see practical significance in a sample, but we cannot conclude that this is a real effect in the population unless we also have statistical significance.
ResidualThe difference between the observed value and the value predicted by the model under investigation. In a linefit situation, this would be the vertical difference between the line and the actual point.
Retrospective studyIn a retrospective study you work backwards from a condition (eg. disease) to the causal factor. For example, how many of those first-time prisoners who re-offended within a year of release served sentences at a liberal prison as opposed to a traditional prison? See also Prospective study.
RiskThe proportion succumbing to disease in a group, for example.
SampleA set of individuals chosen (usually randomly) from a larger population.
Sample sizeThe number of individuals in a sample.
SamplingThe process of drawing a sample of subjects from a population.
sdSee Standard deviation.
seSee Standard error.
SignificanceSignificance is an English word that has been hijacked by statisticians. In general usage the term significant difference means an important difference, but 'significant' in this sense is subjective. On the other hand, statistical significance is objective, and is based on the concept 'p<0.05'. In an experiment, a difference is detected by challenging a null hypothesis of 'no difference'. When p is less than 0.05, the null hypothesis is rejected - when p is greater than 0.05, the null hypothesis is accepted. So what does p<0.05 really mean? A p value of less than 0.05 means that there is even less than a 0.05 probability of getting the observed test statistic if the null hypothesis is true. This is an incredibly misunderstood concept, by even experienced users of statistics. See also Probability.
Significance levelThe standard significance levels are 95% (p<0.05) and 99% (p<0.01).
Simple regressionSee Linefitting
Single-blindA double-blind study, in which neither subject nor evaluator knows what treatment or regime has been administered, reduces the risk of bias (psychological or otherwise) being introduced by either the investigator or the subjects of the study. Single-blind occurs when one of these two is aware of the threatment or regime administered. See also Blind study, Blinded evaluation, Double-blind.
SkewedData which is not symmetrically distributed.
SlopeAlso called the gradient. The rate of increase in the vertical-axis variable for a unit change in the horizontal-axis variable.
Standard deviation (sd)The variance and its square root, the standard deviation, are the pre-eminent statistics used to summarise how much variability there is in a sample or population.
Standard error (se)Standard error is a kind of tolerance around the sample mean, and in many ways the most important concept in statistics. It is related to, but not the same as, standard deviation. Essentially, if you could take infinitely many samples of a particular size from a population, the means of the samples would themselves form a population. The standard error is the standard deviation of this population of sample means. More formally this is called the standard error of the mean. By the same token, there are standard errors or any estimated parameter, eg. the standard error of the intercept and standard error of the gradient.
SubjectThis is used in statistics to mean an individual (rather than its more usual meaning of a topic). It need not be a person - it depends on the discipline; an animal possibly.
t statisticThe statistic produced by the paired t test and the two sample t test.
Test statisticA numeric summary value used to determine significance under the decision rule.
Two sample t testThis test is used to compare the means between two separate samples of individuals. It assumes the data are normally distributed.
Twoway analysis of varianceA generalisation of oneway analysis of variance which includes levels of a second factor. This approach also allows for interactions between the factors to be modelled and tested.
Type I errorFalse positive - wrongly concluding that there is a significant difference. See also Multiple testing (multiplicity).
Type II errorFalse negative - failing to find a significant difference when it exists.
UnblindedA trial in which the investigator knows which patients receive which treatments and each patient is aware of their own particular treatment. See also Blind study.
Uniform distributionThe distribution where everything is equally likely. See also Distribution.
VariableA measurement which can vary. For instance, height is a measurement which varies from person to person, as opposed to pi which does not vary from circle to circle; that is a constant.
VarianceA summary value indicating the amount of variation within the data. See also Standard deviation.
Welch's approximationA modified test statistic used in a two sample t test scenario when the variances of the two groups are unequal.
Wilcoxon testThis is the non-parametric equivalent of the paired t test. It ranks the raw data into ascending order, then compares the total of the ranks in each group against Wilcoxon tables.
Wilks-Shapiro testA test used to determine whether a sample of data could have come from a normally distributed population.
X axisBy convention this is the horizontal axis on a graph.
Y axisBy convention this is the vertical axis on a graph.