Statistical Testing for Small Sample

For n = 3, a special case of the Wilconxon-ranksum test (https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full) was derived as followed. The test is applied in https://www.frontiersin.org/articles/10.3389/fphar.2021.673621/full.

Given two sets of real values A and B, the probability p(x_A = x_B) when randomly select any pairs x_A Î A, x_B Î B. Thus, we have the null hypothesis p(x_A > x_B) = p(x_A < x_B) = 0.5, if A and B follows the same distribution. Then, when pairwise comparing A and B, each comparison can be considered a Bernoulli trial with probability of success p = 0.5, and the expected number of x_A > x_B, which is the statistical metric, is n(x_A > x_B) = |A|×|B| / 2.

In testing the sample, we count the real number n(x_A > x_B), denotes as N, over the |A|×|B| pairwise comparisons . The probability of getting at least N comparisons n(x_A > x_B) and more extreme result, which is the p-value, is

Where C_N^|A|×|B| is the number of combinations in picking N (success) over |A|×|B| trials. If A and B have 3 samples in each set, and x_A > x_B for all 9 pairwise comparisons, then p-value is 0.5⁹ × C₉⁹ = 0.002. If x_A > x_B for 8 pairwise comparisons, then p-value is 0.5⁹ × C₈⁹ + 0.5⁹ × C₉⁹ = 0.0352.

Simulation 1. Here, in each experiment, both A and B has three random real numbers, that are drawn from the uniform distribution between 0 and 1. In the experiment, we count the number of comparisons such that x_A > x_B. We repeat the same experiment for 10000 times. The histogram of n(x_A > x_B) illustrates the ‘extreme’ portion (p-value) when N = n(x_A > x_B) ≥ 8. The Matlab code for the experiment is as follow.

numExperiment = 10000; numAMore = zeros(numExperiment, 1); % n(A > B) in all experiments for i = 1 : length(numAMore) A = rand(3, 1); % 3 uniform random number between 0 and 1 B = rand(3, 1); for j = 1 : length(A) numAMore(i) = numAMore(i) + length(find(B < A(j))); end end hist(numAMore)

Figure 1 Distribution of n(x_A > x_B) when A and B in simulation 1.

Figure 1 shows that the proportion of n(x_A > x_B) ≥ 9 only takes a small proportion of the n(x_A > x_B) distribution. This reflects why we compute p-value = 0.002 if observing x_A > x_B for all 9 pairwise comparisons

Simulation 2. Similar to simulation 1, but A has 6 random numbers following the standard normal distribution, and B has 6 random numbers following the normal distribution with mean, which is also the median, of 0 and standard deviation of 2. Here, A and B has the same median

numExperiment = 10000; numAMore = zeros(numExperiment, 1); % n(A > B) in all experiments for i = 1 : length(numAMore) A = normrnd(0, 1, 6, 1); % 4 standard normal random numbers B = normrnd(0, 2, 6, 1); % 4 normal random numbers, mean = 0, std = 2 for j = 1 : length(A) numAMore(i) = numAMore(i) + length(find(B < A(j))); end end hist(numAMore)

Figure 2. Distribution of n(x_A > x_B) in simulation 2.

Figure 2 reflects the small p-value (1.45×10^-11) when observing all 36 x_A > x_B pairwise comparisons. According to formula (1), the p-value is

The standard Wilcoxon-ranksum test result in very-small sample size cases

We also examine the standard Matlab implementation of the Wilcoxon-ranksum test (https://www.mathworks.com/help/stats/ranksum.html)

When n = 3 and n(x_A > x_B) = 9, for example, A = {3.1, 3.2, 3.3} and B = {2.1, 2.2, 2.3}, we observe

ranksum([3.1,3.2,3.3], [2.1,2.2,2.3])

yields p-value = 0.1.

When n = 6, for example A = {3.1, 3.2, 3.3, 3.4, 3.5, 3.6} and B = {2.1, 2.2, 2.3, 2.4, 2.5, 2.6}, we observe

ranksum([3.1,3.2,3.3, 3.4, 3.5, 3.6], [2.1,2.2,2.3, 2.4, 2.5, 2.6])

yields p-value = 0.0022

Further reading

Bian W, Chen W, Nguyen T, Zhou Y, Zhang J. miR-199a Overexpression Enhances the Potency of Human Induced-Pluripotent Stem-Cell–Derived Cardiomyocytes for Myocardial Repair. Frontiers in pharmacology. 2021 Jun 3;12:673621.