This page has poor content and/or formatting. Help StudyingMed by improving it to the appropriate standard.

## Medical Statistics

A population is everybody in the proposed area of search you're looking at; sampling is used to infer truths about the population, and randomisation is important to ensure proper sampling.

Frequency is the count of individuals with a particular quality.

• Proportional or relative frequency is the proportion of the study group having that particular
• Cumulative frequency for a value is the number of subjects with values less than or equal to that value. This is used only for qualitative variables that can be ordered or quantitative variables
• The mode of a continuous variable is the most frequent value
• The median is the central value, it divides the distribution into two equal halves (2nd quartile)
• Quantiles divide up the distribution and help to describe it further.
• Quartiles divide the distribution into four equal quarters by finding the points where: 25%, 50% and 75% of the data lie below. Interquartile range is the distance between the 1st and 3rd quartiles and gives us a measure of how far from the central tendency the observations beyond these points are.

Unimodal distributions are frequency distribution curves with one peak (the mode): 1. Symmetrical and bell-shaped – normal distribution 2. Positively skewed or skewed to the right (the long tail on right) 3. Negatively skewed or skewed to the left

PLOT the data – visual interpretation allows us to see patterns, shape, range and mode Distribution-free or non-parametric is when plotted data has no apparent pattern. For data where patterns can be recognised patterns (particularly those where frequencies are clustered around the middle value in a symmetrical way), we can see

• Central tendency =average. Typical measures of this are the mode, mean, and median (middle value)
• Dispersion – how far each value is away from the central tendency, measured in standard deviations:
1. Square the divisions and add them \[sum of squares\]
2. Divide by the degrees of freedom \[n-1\] \[once data points have had something done to them (e.g. added up, averaged), you've taken away a degree of freedom, or constricted their usefulness by one degree\]
3. The square root of the answer = SD

Standard deviations are important because:

• About 68% will fall between +1 and -1 SDs either side
• About 95% will fall between +2 and -2 SDs either side
• More than 99% will fall between +3 and -3 SDs either side

Normal distribution is special in that the mean, median and mode are all the same; dispersion is completely symmetrical. It is important as:

• Many characteristics are normally (or nearly normally) distributed (e.g. height, weight)
• Well selected samples can lead to useful inferences about the whole population
• There are very versatile statistical methods that use the features of this distribution
• The Central Limit Theorem
• If random samples are taken from a normal distribution, then the distribution of sample means will also be approximately normal
• Further, even if the basic distribution is not normal, and we are taking very small samples, the sample means will be normal – allowing a further useful application of statistical methods

The t-distribution is related to the normal and is designed to reflect our increased uncertainty about the truth when we are dealing with relatively small samples – small samples of the population are likely to not look normal. Also, repeated small samples may not look like each other. The t-test is used to see if the apparent difference between tem is 'real' or if it's just due to random variation.

## Probability

A probability always lies between 0.0 and 1.0. So, when an event for a random variable never happens the probability is zero. When it always happens the probability is 1.0.

• Multiplication rule – when there are two independent events (e.g. 2 dice), the values are multiplied
• Addition rule – e.g. probability of getting 5 or 6 on one roll

Significance testing: we commonly want to know if two samples are from the same population. We want to know the significance of any differences we have measured.

There are two important hypotheses:

• The hypothesis you want to test for significance – for example, we observe from our practice that the serum rhubarb is often elevated in patients suffering from coeliac disease. This might lead us to the hypothesis that serum rhubarb is predictably higher in these patients than in people who do not have coeliac disease. In order to test this, we must devise a research protocol that will allow us to test the clinical question: "Is the serum rhubarb level in patients with coeliac disease higher than it is in patients who do not have coeliac disease?"
• The null hypothesis – the hypothesis we actually go out and test with our statistical methods. The null hypothesis for our question above would be "There is no difference in serum rhubarb between those who have coeliac disease and those who do not." We test whether this null hypothesis should be rejected or not rejected

To test the difference (whether it is significant), we actually pose the null hypothesis that there is no difference between the two groups, and will end up with a value for the test statistic. With that probability in hand, we can make some conclusions about how likely the null hypothesis is. Then using this evidence, we can then decide whether to reject or not reject that hypothesis based on probability.

It is proposed that decision to "reject" or "not reject" the null hypothesis be based on whether the p value was on one or the other side of a threshold value, commonly 5%. So, when the chance that the null hypothesis is correct reaches less than a probability of 0.05 (P = <0.05), we reject it and accept there is likely to be a real difference.

• Type 1 Error is the chance of falsely rejecting the null hypothesis
• Type 2 Error is falsely accepting the null hypothesis; there is real difference between our groups, but we have been unable to confidently reject the null hypothesis

The power of a study is the ability of that study to detect a given difference. The usual convention for power is that we should have an 80% chance of detecting the difference we seek (the power of a study should be 0.8)