Laerd Statistics LoginCookies & Privacy

Pearson Product-Moment Correlation (cont...)

How can you detect outliers?

An outlier (in correlation analysis) is a data point that does not fit the general trend of your data, but would appear to be a wayward (extreme) value and not what you would expect compared to the rest of your data points. You can detect outliers in a similar way to how you detect a linear relationship, by simply plotting the two variables against each other on a graph and visually inspecting the graph for wayward (extreme) points. You can then either remove or manipulate that particular point as long as you can justify why you did so (there are far more robust methods for detecting outliers in regression analysis). Alternatively, if you cannot justify removing the data point(s), you can run a non-parametric test such as Spearman's rank-order correlation or Kendall's Tau Correlation instead, which are much less sensitive to outliers. This might be your best approach if you cannot justify removing the outlier. The diagram below indicates what a potential outlier might look like:

An outlier in Correlation.

Why is testing for outliers so important?

Outliers can have a very large effect on the line of best fit and the Pearson correlation coefficient, which can lead to very different conclusions regarding your data. This point is most easily illustrated by studying scatterplots of a linear relationship with an outlier included and after its removal, with respect to both the line of best fit and the correlation coefficient. This is illustrated in the diagram below:

The effect of an outlier in Correlation.
Join the 1,000s of students, academics and professionals who rely on Laerd Statistics. TAKE THE TOUR

What is homoscedasticity?

Homoscedasticity basically means that the variances along the line of best fit remain similar as you move along the line. It is required that your data show homoscedasticity for you to run a Pearson product-moment correlation. Homoscedasticity is most easily demonstrated diagrammatically as below:

Homoscedasticity in Correlation.

Can you establish cause-and-effect?

No, the Pearson correlation cannot determine a cause-and-effect relationship. It can only establish the strength of the association between two variables. As stated earlier, it does not even distinguish between independent and dependent variables.

How do I report the output of a Pearson product-moment correlation?

You need to state that you used the Pearson product-moment correlation and report the value of the correlation coefficient, r, as well as the degrees of freedom (df). You should express the result as follows:

Expressing the Pearson Correlation.

where the degrees of freedom (df) is the number of data points minus 2 (N - 2). If you have not tested the significance of the correlation then leave that section out of the results.

TAKE THE TOUR

Can I determine whether the association is statistically significant?

Yes, the easy way to do this is through a statistical programme, such as SPSS. We provide a guide on how to do this, which you can find here. You need to be careful how you interpret the statistical significance of a correlation. If your correlation coefficient has been determined to be statistically significant this does not mean that you have a strong association. It simply tests the null hypothesis that there is no relationship. By rejecting the null hypothesis, you accept the alternative hypothesis that states that there is a relationship, but with no information about the strength of the relationship or its importance.

What is the Coefficient of Determination?

The coefficient of determination, r2, is the square of the Pearson correlation coefficient r (i.e., r2). So, for example, a Pearson correlation coefficient of 0.6 would result in a coefficient of determination of 0.62, which is 0.36. Therefore, r2 = 0.36. The coefficient of determination, with respect to correlation, is the proportion of the variance that is shared by both variables. It gives a measure of the amount of variation that can be explained by the model (the correlation is the model). It is sometimes expressed as a percentage (e.g., 36% instead of 0.36) when we discuss the proportion of variance explained by the correlation. However, we must never write r2 = 36%, or any other percentage. We must always write it as a proportion (e.g., r2 = 0.36).

To run a Pearson correlation in SPSS, go to our guide here.

1 2