Point-Biserial Correlation using SPSS Statistics
Introduction
A point-biserial correlation is used to measure the strength and direction of the association that exists between one continuous variable and one dichotomous variable. It is a special case of the Pearson’s product-moment correlation, which is applied when you have two continuous variables, whereas in this case one of the variables is measured on a dichotomous scale.
For example, you could use a point-biserial correlation to determine whether there is an association between salaries, measured in US dollars, and gender (i.e., your continuous variable would be "salary" and your dichotomous variable would be "gender", which has two categories: "males" and "females"). Alternately, you could use a point-biserial correlation to determine whether there is an association between cholesterol concentration, measured in mmol/L, and smoking status (i.e., your continuous variable would be "cholesterol concentration", a marker of heart disease, and your dichotomous variable would be "smoking status", which has two categories: "smoker" and "non-smoker").
This "quick start" guide shows you how to carry out a point-biserial correlation using SPSS Statistics, as well as how to interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a point-biserial correlation to give you a valid result. We discuss these assumptions next.
SPSS Statistics
Assumptions
When you choose to analyse your data using a point-biserial correlation, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using a point-biserial correlation. You need to do this because it is only appropriate to use a point-biserial correlation if your data "passes" five assumptions that are required for a point-biserial correlation to give you a valid result. In practice, checking for these five assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these five assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out a point-biserial correlation when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these five assumptions:
- Assumption #1: One of your two variables should be measured on a continuous scale. Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. You can learn more about continuous variables in our article: Types of Variable.
- Assumption #2: Your other variable should be dichotomous. Examples of dichotomous variables include gender (two groups: male or female), employment status (two groups: employed or unemployed), smoker (two groups: yes or no), and so forth.
- Assumption #3: There should be no outliers for the continuous variable for each category of the dichotomous variable. You can test for outliers using boxplots.
- Assumption #4: Your continuous variable should be approximately normally distributed for each category of the dichotomous variable. You can test this using the Shapiro-Wilk test of normality.
- Assumption #5: Your continuous variable should have equal variances for each category of the dichotomous variable. You can test this using Levene's test of equality of variances.
You can check assumptions #3, #4 and #5 using SPSS Statistics. Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running a point-biserial correlation might not be valid.
In the section, Procedure, we illustrate the SPSS Statistics procedure to perform a point-biserial correlation assuming that no assumptions have been violated. First, we set out the example we use to explain the point-biserial correlation procedure in SPSS Statistics.