Hello and welcome back to the bioinformaticamente blog. In the previous article, we discussed hypothesis testing, and today we will continue our overview of inferential statistics. So, without further delay, let's get started.
A few days ago, I was at home with my two nieces, one seven years old and the other three. Noticing that the younger one was feeling down, I asked her what was making her so sad. She told me that her older sister was noticeably taller than her, which she thought was utterly unfair! I tried to reassure her by downplaying the height difference and explaining that it was simply due to their age difference. However, as she moved closer and lined up with her sister, she exclaimed, "Can't you see? She is taller!"
I found her behavior fascinating. By comparing her height to that of her older sister, she acted just as scientists often do. She compared the values of a variable (height in her case) in two distinct groups/units (her height versus her sister's).
This event inspired me to write this article, where I will talk about comparing groups concerning a certain variable and comparing variables within the same group.
Comparison Between Groups/Samples Regarding a Specific Variable
Generally, scientists aim to compare the average value of a certain variable under examination between two groups of statistical units to determine if these are statistically different concerning the said variable.
To assess this difference, can be used the so-called 'Two Samples T-test'. There are various types of two-sample T-tests, as illustrated in the diagram below:

In general, the key distinction points in these statistical test are:
-
Unpaired or Paired Samples. In the case of unpaired samples, measurements made on different groups at the same times are compared. For paired samples, the comparison is between measurements taken within the same group but at different times.
-
Parametric or Non-parametric. Parametric data implies that the probability values related to the variable under examination are distributed according to a specific probability distribution taken ad reference model, often the normal distribution. In non-parametric tests, the data do not follow a specific probability distribution.
-
Equal Variance or Not. When we say that two groups have equal variances, it means that the variability within each of the groups is assumed to be the same. This concept is also referred to as "homoscedasticity." This is important because many statistical tests, like the t-test, assume that the variances are equal across groups to produce accurate results. f the assumption of homoscedasticity is violated (i.e., if the variances are significantly different), then the results of the statistical test might be misleading. For example, in a t-test, unequal variances can affect the calculation of the test statistic and may lead to incorrect conclusions about the significance of the difference between the two groups. Before conducting a two-sample test, it's often advisable to test for homoscedasticity. Tools like Levene's test or Bartlett’s test can be used to assess whether the variances are equal across groups. If these tests indicate that the variances are significantly different (heteroscedasticity), alternative statistical methods or adjustments may be needed. For know more please take a look to this video and also to this video.
Discussing in detail each type of two-sample test shown in the diagram above is quite complex, so I have decided to provide excellent sources for each of these.
About two samples Student t-test:
AND:
About two samples Welch t-test:
About two samples Mann-Wgitney U-test:
AND:
About two samples paired t-test:
AND:
AND:
About two samples Wilconxon signed rank test:
AND:
Ok. At this point, we might wonder what happens if the number of samples to be compared is more than two. Which statistical test should be used to understand if such samples are statistically different?
In this case, too, there are various types of tests, and the distinction checkpoints are the same as those seen above, as you can see from the diagram below:

Before consulting the sources listed below to understand in detail how each of the tests expressed on the right side of the above image works, it is necessary to answer an important question.
In the context of statistical comparison of more or equal than 3 samples, what is meant with "multiple comparison post tests"?
When conducting statistical comparisons of more than two samples, "multiple comparison post tests" are an essential tool. These tests are used after an initial analysis, such as an ANOVA (Analysis of Variance), in order to indicate that there are significant differences among the groups.
Two are the main purpose of Multiple Comparison Post Tests:
1) Identify Specific Differences: When you have more than two groups, an overall test like ANOVA can tell you that there's a significant difference somewhere among these groups, but it doesn't specify where the differences lie. So they help in accurately pinpointing which specific groups differ from each other, enhancing the interpretability of the research. To be more clear, When I mentioned that an overall test like ANOVA "doesn't specify where the differences lie," it means that while ANOVA can tell you that there is a statistically significant difference among the groups you're comparing, it doesn’t identify which specific groups are different from each other.
For example, imagine you have three groups, A, B, and C, and you run an ANOVA test. The ANOVA might indicate that there is a significant difference in the means of these groups. However, it doesn’t tell you:
Whether group A is different from group B,
Whether group B is different from group C,
Or whether group A is different from group C.
To find out the specific differences between these groups, you would perform a multiple comparison post-test, like Tukey's HSD or Bonferroni Correction. These tests compare each pair of groups (A vs. B, B vs. C, A vs. C) to identify where exactly the significant differences lie. They help in pinpointing which groups are significantly different from each other, providing a more detailed understanding of your data.
2) Control for Type I Errors: These tests are designed to control the risk of false positives (Type I errors) that increase with the number of comparisons. So they control for the inflation of the Type I error rate that occurs when multiple comparisons are made. Indeed when you perform multiple statistical tests, the probability of incorrectly rejecting at least one true null hypothesis (i.e., making a Type I error) increases. This is because each test carries its own risk of a false positive (Type I error), and these risks accumulate over multiple tests. For example, to counter this issue, the Bonferroni correction adjusts the criteria for statistical significance. It does this by dividing the desired overall alpha level (e.g., 0.05) by the number of comparisons being made. For example, if you're conducting 10 tests and want to maintain an overall alpha of 0.05, the Bonferroni correction would set the significance level for each individual test at 0.05 / 10 = 0.005.
In general, these type of tests are used in any experimental setup where comparisons between multiple groups or conditions are needed.
Below are the common Types of Multiple Comparison Tests:
- Tukey's Honest Significant Difference (HSD) Test: Used to find means that are significantly different from each other. It’s widely used due to its balance between Type I and Type II error rates.
- Bonferroni Correction: Adjusts p-values to reduce the chances of Type I errors, but can be overly conservative.
- False Discovery Rate (FDR): This offer a more balanced approach between discovering true effects and controlling false positives.
- Scheffé’s Test: Offers more flexibility when comparing different sets of groups and is generally more conservative.
- *[Dunnett’s Test:](http://https://youtu.be/_wFlvQuPoew?si=I-osCYDHgaI4XlPx "Dunnett’s Test:")** Compares multiple treatments with a control group.
The choice of which post-hoc test to use depends on various factors, including the number of comparisons, the nature of the data, and the balance between Type I and Type II error tolerances. In any case you should always ensure that the data meet the assumptions of the chosen test, such as normality or homogeneity of variances.
Now that we have been talk about Multiple Comparison Tests we could focus on the different types of more than two groups comparison statistical test:
About > or = 3 samples One-way ANOVA:
AND:
About > or = 3 samples Kruskall-Waillis test:
AND:
About > or = 3 samples One-way repeated measure ANOVA:
AND:
About > or = 3 samples Friedman test:
AND:
Comparison between Variables
Let's make one final effort. Scientists often also compare different variables to find possible relationships or patterns between them. When you wonder if your daily back pain is due to sleeping with your partner's feet in your face, you are essentially comparing two variables. Is variable A, the frequency with which your partner kicks you during the night, somehow related to variable B, the intensity of morning back pain?
When evaluating the relationship between two variables, one must first ask if they are:
-
Independent; When knowing the values of one variable does not help you know the values of the other considered variable because these two are distinct and independent.
-
Correlated or Associated; When knowing the values of one variable is sufficient to know, predict, or hypothesize the values of a second considered variable because they are as if connected or associated.
To quantify the relationship between two considered variables, it is necessary to calculate some important indices such as the covariance.
Covariance:
Covariance is a numerical index that measures how much two considered quantitative variables change together. To calculate covariance, one simply needs to sum the individual variances of the variables under examination (remember, variance is a statistical index that expresses how much the values of a variable vary relative to its mean value).

Observing the formula above, we can add that:
-
Two compared variables are said to be concordant when, as the value of variable x increases, an increase in the value of variable y is observed.
-
Two compared variables are said to be discordant when, as x increases, y decreases, or when x decreases, y increases.
-
If the covariance value is zero or very close to zero, it is said that there is no covariance between the two considered variables.
Covariance is a very powerful index for quantifying the correlation, or the relationship, between two variables under examination, but it is difficult to interpret because its values have a range that goes from -infinity to +infinity, as you can see from the image above. For this reason, covariance as such is not used, but instead, the Bravais-Pearson linear correlation coefficient is used, and its effectiveness is shown in the image below.

To graphically visualize the correlation between two quantitative variables, one can use the scatter plot.

There are various types of linear correlations between two quantitative variables depending on the value of the linear correlation coefficient (often indicated by the letter r), as you can notice from this image:

Please be careful to the term "LINEAR"!!!
In the context of statistics, the term "linear" in linear correlation refers to a relationship between two variables where the change in one variable is associated with a proportional change in the other variable and this relationship could be represented as a straight line on a scatter plot, where each point represents a pair of values for the two variables.
There are also non-linear correlations that cannot be captured by the linear correlation coefficient. It's important to note that the Bravais-Pearson correlation coefficient is specifically designed to measure the strength and direction of a linear relationship between two variables. It is not suitable for quantifying non-linear correlations, as you can see in the image below.

In general, when we want to compare variables that are:
- quantitative or ordinal qualitative.
- value scales or rankings (e.g., Likert scale).
- paired, meaning measured on the same statistical units.
- without known probability values distribution, for example, they do not have a normal probability distribution.
- characterized by a NON-linear relationship.
it is more convenient to use other methods to investigate the correlation between two variables rather than calculating the linear correlation coefficient.
Let's look at some of these methods:
Spearman's Rank Correlation Coefficient:
Specifically, Spearman's rank correlation coefficient is used to measure co-ranking, an index that allows investigating the relationship between two variables measured on the same statistical units. For instance, it is very useful when we want to compare two rankings collected on the same statistical units, such as comparing the ranks (rank = position of a statistical unit in a certain ranking) of students in a class based on their mathematics grades and their English grades.

Please take a look at these two videos below to know more about Spearman's rank correlation coefficient.
Pearson's Chi-Square:
Pearson's Chi-Square is used when one particularly wants to test the hypothesis of independence between two ordered qualitative variables or two discrete quantitative variables, placed in a contingency table.
A system of hypotheses is defined as follows:
- H0: The two variables under examination are independent.
- H1: The two variables under examination are associated.
It should be noted that the probability values related to the chi-square values assume a known probability distribution and are used as a model known as the chi-square distribution. Statisticians are quite imaginative, aren't they?
This distribution has a positively asymmetric shape that varies with the degrees of freedom. As you can see from the image below, this distribution is used to reject or not reject the null hypothesis, taking into account only the right tail of the distribution and delineating the rejection zone thanks to the usual alpha significance threshold. Generally, if the calculated chi-square value falls within the rejection region, it is said that the two variables under examination are associated and not independent.

To know more about the chi-square independence test, please take a look at these three video:
or read here.
ATTENTION!!!
The chi-square test of independence allows us to understand whether two variables (two discrete quantitative, or two ordered qualitative) are independent of each other or not. However, if these variables are associated, it does not provide any information about the direction of the association. To obtain information about the direction of the association between two variables, it is necessary to calculate Kendall's tau-b index.
Kendall's Correlation:
Kendall's correlation allows us to measure the direction and therefore the type of association between two scales of values or, in general, two variables. More broadly, it takes into account the concordances and discordances between all possible pairs of xy values.
There are three types of Kendall correlation indices depending on the situation, called: Tau-a, Tau-b, and Tau-c.
But to learn more, I strongly suggest watching these videos below:
Or read here.
Ok. This time, I'm afraid I might have gone overboard with the information overload, but it was necessary. I'll see you in the next article where we'll continue discussing inferential statistics. Hang in there, my friends, we're almost through. See you soon.
P.S. Here's a very brief summary of what was discussed in this video.

And remember, if you start dreaming in p-values and confidence intervals, it might be time to take a break and enjoy some statistical humor - like wondering if a median really is just a pedestrian refuge in the highway of data!
Keep crunching those numbers, but don't forget to crunch on some snacks too. 😉📊🍪
UPDATE OF 22 JAN 2024
What is the hypergeometric test?
The hypergeometric test is a statistical test used to determine if there are significant differences between two groups. This test is particularly useful when working with small sample sizes and one wants to know if the distribution of a certain attribute (for example the number of reads that map a certain gene) in one group is significantly different from another group.
To explain in a simple way, let's imagine having an urn full of balls, some of which are red and others blue. We want to know if drawing a certain number of red balls from this urn is a rare (significant) event or not.
Here are the steps for the hypergeometric test:
-
Define the Population: Decide the total size of the population (for example, the total number of balls in the urn).
-
Define Success: Decide what you consider a "success" (for example, drawing a red ball).
-
Sample Size: Determine the size of the sample (how many balls you draw from the urn).
-
Successes in the Sample: Count how many "successes" there are in your sample (how many red balls you have drawn).
-
Successes in the Population: Count how many successes there are in the total population (how many red balls are in the urn).
The hypergeometric test then calculates the probability of obtaining at least as many successes in your sample, given the sample size and the number of successes in the population. If this probability is very low (generally less than 5%), it suggests that the result of your sample is not by chance, but something significant.
In practice, this test is often used in biology to understand, for example, if a certain group of genes is overrepresented in a set of genes of interest compared to what would be expected by chance.
Take a look at a toy example of hypergeometric test in R:
Imagine we have a group of 100 people, of which 20 have attended a certain training course. Subsequently, it is discovered that 10 of these 20 people have received a promotion at work. However, among the 80 people who did not attend the course, only 15 have received a promotion. We want to know if attending the course had a significant impact on the chances of getting a promotion.
In this case:
- The total number of successes in the population (promotions) is 25 (10 + 15).
- The size of the group of interest (people who attended the course) is 20.
- The number of successes in the group of interest (people who attended the course and got a promotion) is 10.
- The total size of the population is 100.
In R, we can use the phyper function to perform this test. Here is the code for the test:
# Number of successes in the group of interest
successes_group_of_interest = 10
# Size of the group of interest
size_group_of_interest = 20
# Total number of successes in the population
total_successes = 25
# Total size of the population
total_population = 100
# Calculating the hypergeometric test
p_value = phyper(successes_group_of_interest - 1,
total_successes,
total_population - total_successes,
size_group_of_interest,
lower.tail = FALSE)
print(p_value)
In this example, phyper calculates the cumulative probability of obtaining a number of successes equal to or lower than that observed in our group of interest, assuming that the course has no impact. The lower.tail = FALSE option calculates the upper tail of the distribution, which tells us the probability of obtaining an outcome as extreme as the one observed or more extreme, if the course did not influence promotions.
If the resulting p value is below a threshold (such as 0.05), we can conclude that there is a statistically significant difference, suggesting that the course had an impact on the chances of getting a promotion.