In order to run a Mann-Whitney U test, the following four assumptions must be met. The first three relate to your choice of study design, whilst the fourth reflects the nature of your data:

- Assumption #1: You have
**one dependent variable**that is measured at the**continuous**or**ordinal**level. Examples of**continuous variables**include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. Examples of**ordinal variables**include Likert items (e.g., a 7-point scale from "strongly agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a 5-point scale explaining how much a customer liked a product, ranging from "Not very much" to "Yes, a lot"). - Assumption #2: You have
**one independent variable**that consists of**two categorical**,**independent groups**(i.e., a**dichotomous variable**). Example independent variables that meet this criterion include gender (two groups: "males" or "females"), employment status (two groups: "employed" or "unemployed"), transport type (two groups: "bus" or "car"), smoker (two groups: "yes" or "no"), trial (two groups: "intervention" or "control"), and so forth. - Assumption #3: You should have
**independence of observations**, which means that there is no relationship between the observations in each group of the independent variable or between the groups themselves. For example, there must be different participants in each group with no participant being in more than one group. This is more of a study design issue than something you can test for, but it is an important assumption of the Mann-Whitney U test. If your study fails this assumption, you will need to use another statistical test instead of the Mann-Whitney U test (e.g., a**Wilcoxon signed-rank test**). - Assumption #4: You must determine whether the
**distribution of scores for both groups of your independent variable**(e.g., the distribution of scores for "males" and the distribution of scores for "females" for the independent variable, "gender") have the**same shape**or a**different shape**. This will determine how you interpret the results of the Mann-Whitney U test. Since this is a critical assumption of the Mann-Whitney U test, and will affect how to work your way through this guide, we discuss this further in the next section.

Note: Practically speaking, your **independent variable** can actually have **three or more groups** (e.g., the independent variable, "transport type", could have four groups: "bus", "car", "train" and "plane"). However, when you run the Mann-Whitney U test procedure in SPSS, you will need to decide which two groups you want to compare (e.g., you could compare "bus" and "car", or "bus" and "plane", and so forth).

If you are unfamiliar with any of the above terms, you might wish to read our **Types of Variables** guide or use our **Statistical Test Selector** to check you are using the correct test before going any further.

Evaluating the distributions of the two groups of your independent variable

The Mann-Whitney U test was developed as a test of *stochastic equality* (Mann and Whitney, 1947). However, it is not often that the test is directly interpreted in this way. In practice, the Mann-Whitney U test is more broadly used to interpret whether there are **differences in the "distributions" of two groups** or **differences in the "medians" of two groups**. However, this is not so much a choice that you make, but is based on whether the **distribution of scores for both groups of your independent variable** (e.g., the distribution of scores for "males" and the distribution of scores for "females" for the independent variable, "gender") have the **same shape** or a **different shape**.

If the two distributions have a **different shape**, the Mann-Whitney U test is used to determine whether there are differences in the **distributions** of your two groups. However, if the two distributions are the **same shape**, the Mann-Whitney U test is used to determine whether there are differences in the **medians** of your two groups. We discuss these two different approaches to using the Mann-Whitney U test in turn:

Let us consider the first possible objective for using the Mann-Whitney U test – testing for differences in distributions – by considering an example where engagement score was measured in males and females. Using this interpretation of the Mann-Whitney U test, we would wish to know whether male and female engagement scores are similar or whether one gender has higher or lower values than the other. An example of similar engagement scores and dissimilar engagement scores can be seen in the diagrams shown below:

For the above diagrams, you would most likely want to confirm that males and females had similar scores in the diagram on the left, but that females had higher engagement scores than males in the diagram on the right. The Mann-Whitney U test can do this – determine whether the values in one group are lower or higher than the values in the other group (e.g., females higher than males) – by comparing the **mean ranks** of each distribution of scores (e.g., males and females engagement scores).

The Mann-Whitney U test works by ranking each score of the dependent variable (e.g., engagement), irrespective of the group it is in (e.g., males or females), according to its size, with the smallest rank assigned to the smallest value. The ranks obtained for males are then averaged, as are the female's ranks. This results in a mean rank for males and a mean rank for females. If the distributions are identical, which is the null hypothesis of the Mann-Whitney U test, the mean rank will be the same for both males and females. However, if one group (e.g., females) tends to have higher values than the other group, that group's scores will have been assigned higher ranks and will have a higher mean rank (and vice-versa for the group with lower scores). It is this difference in mean rank that is tested by the Mann-Whitney U test for statistical significance. Using this approach, different distributions of scores can be accommodated by the Mann-Whitney U test when determining whether values (i.e., via mean ranks) are different between two groups, as shown below:

Both charts above show non-identical distributions, but with females having higher engagement scores than males in both cases. The chart on the left shows the distribution of male and female engagement scores having the same shape, but a different location (i.e., the female scores are 'shifted' to the right). However, the chart on the right shows dissimilarly shaped distributions of male and female engagement scores, but again with females tending to score higher than males for engagement. The mean rank of both of these distributions can be calculated and assessed by the Mann-Whitney U test to determine whether one group has higher or lower scores than the other group.

Sometimes you will be required to explicitly state the null and alternative hypotheses for a Mann-Whitney U test, and then state which was accepted and rejected at the end of the experiment. One such null hypothesis might be:

H_{0}: the distribution of scores for the two groups are equal

And the alternative hypothesis might be:

H_{A}: the distribution of scores for the two groups are not equal

However, another way to express the alternative hypothesis is as follows:

H_{A}: the mean ranks of the two groups are not equal

The reason for describing the alternative hypothesis with respect to mean ranks is due to a problem that can occur if you have groups with different variances. Under these conditions, you can have very different distributions but still not reject the null hypothesis of equal distributions (see, for example, Hart (2001)) or get a good idea of whether values are higher or lower in one group compared to another. Indeed, any interpretation of differences between groups becomes difficult when variances are not equal.

You read in the previous section that – regardless of similar or dissimilar distributions – you can use the Mann-Whitney U test to determine whether engagement scores are higher or lower in males versus females based on the use of mean ranks to describe the group differences. However, rather than **mean ranks**, it would be nice if we were able to describe our data using the more familiar **median** value. This would be more in keeping with the Mann-Whitney U test being used as an alternative to the independent-samples t-test (i.e., both would then use a measure of central tendency: the 'mean' for the independent-samples t-test and the 'median' for the Mann-Whitney U test). Indeed, the Mann-Whitney U test can be used for this very purpose, but it requires an additional **assumption** about the shapes of the distributions: *to compare medians the distribution of engagement scores for males and females must have the same shape (including dispersion)* (see below):

First, consider the chart on the right where the distributions are differently shaped. In this situation, you are limited to describing the differences between male and female engagement scores to higher/lower statements as described in the previous section. However, the chart on the left shows an example where the distributions of engagement scores for males and females are the **same shape**. As such, only the **location** of the engagement scores is considered to be different between the two groups, with the **median** being the **measure of location** used. This is sometimes referred to as a **shift in location** (i.e., all scores are being shifted to the right). What all this means is that we can use the Mann-Whitney U test to determine if the group's medians are statistically significantly different rather than before where we could only make more general higher/lower statements based on mean ranks.

Expressing the difference in the medians as a null and alternative hypothesis, we have:

H_{0}: the distributions of the two groups are equal

H_{A}: the medians of the two groups are not equal

It is important to note that the null hypothesis is the same for both detecting equal distributions or changes in median using the Mann-Whitney U test; namely, that the distributions of the two groups are equal. It is just that with the assumption of similarly shaped distributions, you can appropriate any differences between groups highlighted by the Mann-Whitney U test as being down to a difference in medians.

If you are still finding the idea of having to test for differences in the **distributions** or **medians** of your two groups a little challenging, it will become clearer as you work through the guide. After setting up your data in SPSS in the Example & Data Setup section next and then running the Mann-Whitney U test procedures in the the Procedures section, we will return to idea of testing for differences in the mean ranks or medians of your two groups in the Assumptions and Interpreting & Reporting sections of this guide. If you are working through this guide step-by-step, we suggest moving onto the Example & Data Setup section now.

1Introduction
2Assumptions
3Problems solved
4Flowchart
5Example used in this guide
6Setting up your data

(continuous dependent variable) 7Setting up your data

(ordinal dependent variable) 8Introduction to the procedure 9Mann-Whitney U test procedure

(new procedure) 10Mann-Whitney U test procedure

(legacy procedure) 11Generating medians 12Introduction to the distributional assumption 13Generating a population pyramid

(legacy procedure) 14Comparing distributional shapes

(legacy procedure) 15Comparing distributional shapes

(legacy procedure) 16Interpreting & reporting:

Getting started 17Comparison of medians

(new procedure) 18Comparison of distributions

(new procedure) 19Comparison of medians

(legacy procedure) 20Comparison of distributions

(legacy procedure)

(continuous dependent variable) 7Setting up your data

(ordinal dependent variable) 8Introduction to the procedure 9Mann-Whitney U test procedure

(new procedure) 10Mann-Whitney U test procedure

(legacy procedure) 11Generating medians 12Introduction to the distributional assumption 13Generating a population pyramid

(legacy procedure) 14Comparing distributional shapes

(legacy procedure) 15Comparing distributional shapes

(legacy procedure) 16Interpreting & reporting:

Getting started 17Comparison of medians

(new procedure) 18Comparison of distributions

(new procedure) 19Comparison of medians

(legacy procedure) 20Comparison of distributions

(legacy procedure)