Laerd Statistics LoginCookies & Privacy

Spearman's Correlation using Stata

Introduction

The Spearman rank-order correlation coefficient (shortened to Spearman’s rank correlation in Stata) is a nonparametric test which measures the strength and direction of association between two variables that are measured on an ordinal or continuous scale. The Spearman correlation coefficient is often denoted by the symbol rs (or the Greek letter ρ, pronounced rho). It is a useful test when Pearson's correlation cannot be run due to violations of normality, a non-linear relationship or when ordinal variables are being used.

For example, you could use a Spearman's correlation to understand whether there is an association between high jump performance and time spent training (i.e., your two variables would be "high jump performance", measured in height jumped, and "time spent training", measured in hours per week). If there was a moderate, positive association, we could say that more time spent training was associated with better high jump performance. Alternately, you could use a Spearman's correlation to understand whether there is an association between length of unemployment and health (i.e., your two variables would be "length of unemployment", measured in days, and "health", measured using a continuous scale). If there was a strong, negative association, we could say that the longer the length of unemployment, the lower the level of health.

In this guide, we show you how to carry out a Spearman's correlation using Stata, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a Spearman's correlation to give you a valid result. We discuss these assumptions next.

Stata

Assumptions

There are two assumptions that underpin a Spearman's correlation. If either of these two assumptions are not met, analysing your data using a Spearman's correlation might not lead to a valid result. Since assumption #1 relates to your choice of variables, it cannot be tested for using Stata. However, you should decide whether your study meets this assumption before moving on.

Fortunately, you can check assumption #2 using Stata. When moving on to assumption #2, do not be surprised if your data fails this assumption since this can happen when working with real-world data rather than textbook examples, which often only show you how to carry out a Spearman's correlation when everything goes well. However, don't worry because even when your data fails this assumption, there is often a solution to overcome this (e.g., transforming your data or using another statistical test instead). Just remember that if you do not check that you data meets this assumption or you do not test for it correctly, the results you get when running a Spearman's correlation might not be valid.

It is also worth noting that your variables do not need to be normally distributed to run a Spearman’s correlation. In addition, Spearman's correlation is not very sensitive to outliers (unusual observations in your data), so the presence of these data points does not automatically invalidate the results you get from running a Spearman's correlation.

In the section, Test Procedure in Stata, we illustrate the Stata procedure required to perform a Spearman's correlation assuming that no assumptions have been violated. First, we set out the example we use to explain the Spearman's correlation procedure in Stata.

Stata

Example

The value of receiving an education is always in the news, particularly with respect to the salary improvement that can be attained with a "graduate job". As a very simply study, a researcher wanted to know whether the number of years that a person was educated was associated with their salary at 35 years old. To carry out this analysis, the researcher recruited a small sample of 13 participants aged 35 years old. They recorded the number of years of education they had received (entered into the variable, edu_years) and their current salary (entered into the variable, salary). Expressed in variable terms, the researcher wanted to correlate salary and edu_years.

Note: The example and data used for this guide are fictitious. We have just created them for the purposes of this guide. In addition, it was assumed that the data failed the assumptions required for running a Pearson's correlation.

Stata

Setup in Stata

In Stata, we created two variables: (1) salary, which is a participant's salary (in $1,000s per year); and (2) edu_years, which is the number of years of education that a participant has received.

Note: It does not matter which variable you create first.

After creating these two variables – edu_years and salary – we entered their scores into the two columns of the Data Editor (Edit) spreadsheet, as shown below:

Data editor for the Pearson's correlation in Stata

Published with written permission from StataCorp LP.

Stata

Test Procedure in Stata

In this section, we show you how to analyse your data using a Spearman's correlation in Stata when the two assumptions described in the previous section, Assumptions, have not been violated. You can carry out a Spearman's correlation using code or Stata's graphical user interface (GUI). After you have carried out your analysis, we show you how to interpret your results. First, choose whether you want to use code or Stata's graphical user interface (GUI).


Code

The basic code to run a Spearman's correlation takes the form:

spearman VariableA VariableB

Using this code, Stata will report: (a) the number of observations (i.e., participants) in the Spearman's correlation analysis; (b) Spearman's correlation coefficient; and (c) its statistical significance (i.e., p-value). There are various other options available in Stata, but we will concentrate on the basic statistics in this guide.

Substituting the two variables in our example – salary and edu_years – into respectively VariableA and VariableB in the code shown above, you get the following code:

spearman salary edu_years

This is the code you need to enter into Stata to run a Spearman's correlation on these variables. Code needs to be entered into the box in Stata, which is shown below:

Command box in Stata

Published with written permission from StataCorp LP.

Therefore, enter the code into the box, as shown below:

Command box for the Pearson's correlation in Stata

Published with written permission from StataCorp LP.

Now press the "Return/Enter" key on your keyboard to generate the results. The Stata output that will be produced is shown here.


Graphical User Interface (GUI)

The three steps required to carry out a Spearman's correlation in Stata 12 and 13 are shown below:

Stata

Output of a Spearman's correlation in Stata

If your data passed assumption #2 (i.e., there was a monotonic relationship between your two variables), which we explained earlier in the Assumptions section, you will only need to interpret the following Spearman's correlation output in Stata:

Output for a Pearson's correlation in Stata (including statistical significance and observations)

Published with written permission from StataCorp LP.

The first line (i.e., "spearman edu_years salary") contains the code that Stata ran to generate a Spearman's correlation. If you followed the code approach (here) you will recognize this as the code you entered into Stata. On the other hand, GUI users will not recognize this code, but this is the code that was run behind the scenes when you selected the various options in the spearman - Spearman's rank correlation coefficients dialogue box.

The next line ("Number of obs = 13") contains the number of observations (i.e., participants) that were analysed. There were 13 participants in this example; hence "13" in the output. The next line ("Spearman's rho = 0.8583") presents the actual value for Spearman's correlation coefficient. You can see that Spearman's rho (ρ) is 0.8583. Values for Spearman's correlation coefficient are generally less than for Pearson's correlation coefficient and a Spearman's ρ of 0.8583 indicates a strong monotonic relationship. As Spearman's coefficient is positive you can conclude that a greater number of years in education is associated with a larger salary. The last line ("Prob > |t| = 0.0002") of the output presents the two-tailed statistical significance (i.e., p-value) of the Spearman's correlation coefficient. You can see that Spearman's correlation coefficient is statistically significant because p = .0002, which is less than p < .05 (a common threshold for statistical significance).

Stata

Reporting a Spearman's correlation

When you report the output of your Spearman's correlation, it is good practice to include:

Based on the results above, we could report the results of this study as follows:

A Spearman's correlation was run to assess the relationship between salary and years of education using a small sample of 13 participants aged 35 years old. There was a strong positive correlation between salary and years of education, which was statistically significant, rs = .8583, p = .0002.

In addition to the reporting the results as above, a diagram can be used to visually present your results. For example, you could do this using a scatterplot. This can make it easier for others to understand your results and is easily produced in Stata.

1