For n = 3, a special case of the Wilconxon-ranksum test (https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-18/issue-1/On-a-Test-of-Whether-one-of-Two-Random-Variables/10.1214/aoms/1177730491.full) was derived as followed. The test is applied in https://www.frontiersin.org/articles/10.3389/fphar.2021.673621/full.

Given two sets of real values A and B, the probability *p*(*x*_{A} = *x*_{B}) when randomly select any pairs *x*_{A} Î A, *x*_{B} Î B. Thus, we have the null hypothesis *p*(*x*_{A} > *x*_{B}) = *p*(*x*_{A} < *x*_{B}) = 0.5, if A and B follows the same distribution. Then, when pairwise comparing A and B, each comparison can be considered a Bernoulli trial with probability of success p = 0.5, and the expected number of *x*_{A} > *x*_{B}, which is the statistical metric, is *n*(*x*_{A} > *x*_{B}) = |A|×|B| / 2.

In testing the sample, we count the real number *n*(*x*_{A} > *x*_{B}), denotes as N, over the |A|×|B| pairwise comparisons . The probability of getting at least N comparisons *n*(*x*_{A} > *x*_{B}) and more extreme result, which is the *p-value*, is

Where *C _{N}*

^{|A|×|B|}is the number of combinations in picking N (success) over |A|×|B| trials. If A and B have 3 samples in each set, and

*x*

_{A}>

*x*

_{B}for all 9 pairwise comparisons, then p-value is 0.5

^{9}×

*C*

_{9}

^{9}= 0.002. If

*x*

_{A}>

*x*

_{B}for 8 pairwise comparisons, then p-value is 0.5

^{9}×

*C*

_{8}

^{9}+ 0.5

^{9}×

*C*

_{9}

^{9}= 0.0352.

** Simulation 1**. Here, in each experiment, both A and B has three random real numbers, that are drawn from the uniform distribution between 0 and 1. In the experiment, we count the number of comparisons such that

*x*

_{A}>

*x*

_{B}. We repeat the same experiment for 10000 times. The histogram of

*n*(

*x*

_{A}>

*x*

_{B}) illustrates the ‘extreme’ portion (p-value) when N =

*n*(

*x*

_{A}>

*x*

_{B}) ≥ 8. The Matlab code for the experiment is as follow.

`numExperiment = 10000;`

numAMore = zeros(numExperiment, 1); % n(A > B) in all experiments

for i = 1 : length(numAMore)

A = rand(3, 1); % 3 uniform random number between 0 and 1

B = rand(3, 1);

for j = 1 : length(A)

numAMore(i) = numAMore(i) + length(find(B < A(j)));

end

end

hist(numAMore)

Figure 1 Distribution of *n*(*x*_{A} > *x*_{B}) when A and B in simulation 1.

Figure 1 shows that the proportion of *n*(*x*_{A} > *x*_{B}) ≥ 9 only takes a small proportion of the *n*(*x*_{A} > *x*_{B}) distribution. This reflects why we compute p-value = 0.002 if observing *x*_{A} > *x*_{B} for all 9 pairwise comparisons

** Simulation 2. **Similar to simulation 1, but A has 6 random numbers following the standard normal distribution, and B has 6 random numbers following the normal distribution with mean, which is also the median, of 0 and standard deviation of 2. Here, A and B has the same median

`numExperiment = 10000;`

numAMore = zeros(numExperiment, 1); % n(A > B) in all experiments

for i = 1 : length(numAMore)

A = normrnd(0, 1, 6, 1); % 4 standard normal random numbers

B = normrnd(0, 2, 6, 1); % 4 normal random numbers, mean = 0, std = 2

for j = 1 : length(A)

numAMore(i) = numAMore(i) + length(find(B < A(j)));

end

end

hist(numAMore)

Figure 2. Distribution of *n*(*x*_{A} > *x*_{B}) in simulation 2.

Figure 2 reflects the small p-value (1.45×10^{-11}) when observing all 36 *x*_{A} > *x*_{B} pairwise comparisons. According to formula (1), the p-value is

**The standard Wilcoxon-ranksum test result in very-small sample size cases**

We also examine the standard Matlab implementation of the Wilcoxon-ranksum test (https://www.mathworks.com/help/stats/ranksum.html)

When n = 3 and *n*(*x*_{A} > *x*_{B}) = 9, for example, A = {3.1, 3.2, 3.3} and B = {2.1, 2.2, 2.3}, we observe

`ranksum([3.1,3.2,3.3], [2.1,2.2,2.3])`

yields p-value = 0.1.

When n = 6, for example A = {3.1, 3.2, 3.3, 3.4, 3.5, 3.6} and B = {2.1, 2.2, 2.3, 2.4, 2.5, 2.6}, we observe

`ranksum([3.1,3.2,3.3, 3.4, 3.5, 3.6], [2.1,2.2,2.3, 2.4, 2.5, 2.6]) `

yields p-value = 0.0022

**Further reading**

Bian W, Chen W, Nguyen T, Zhou Y, Zhang J. miR-199a Overexpression Enhances the Potency of Human Induced-Pluripotent Stem-Cell–Derived Cardiomyocytes for Myocardial Repair. *Frontiers in pharmacology*. 2021 Jun 3;12:673621.