Partial Correlation using SPSS Statistics
Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables whilst controlling for the effect of one or more other continuous variables (also known as 'covariates' or 'control' variables). Although partial correlation does not make the distinction between independent and dependent variables, the two variables are often considered in such a manner (i.e., you have one continuous dependent variable and one continuous independent variable, as well as one or more continuous control variables).
Note: Many aspects of partial correlation can be dealt with using multiple regression and it is sometimes recommended that this is how you approach your analysis. This is somewhat evident in the SPSS Statistics where you can carry out partial correlation using two different procedures: Correlate and Regression.
For example, you could use partial correlation to understand whether there is a linear relationship between 10,000 m running performance and VO2max (a marker of aerobic fitness), whilst controlling for wind speed and relative humidity (i.e., the continuous dependent variable would be "10,000 m running performance", measured in minutes and seconds, the continuous independent variable would be VO2max, which is measured in ml/min/kg, and the two control variables – that is, the two other continuous independent variables you are adjusting for – would be "wind speed", measured in mph, and "relative humidity", expressed as a percentage). You may believe that there is a relationship between 10,000 m running performance and VO2max (i.e., the larger an athlete's VO2max, the better their running performance), but you would like to know if this relationship is affected by wind speed and humidity (e.g., if the relationship changes when taking wind speed and humidity into account since you suspect that athletes' performance decreases in more windy and humid conditions). Alternately, you could use partial correlation to understand whether there is a linear relationship between ice cream sales and price, whilst controlling for daily temperature (i.e., the continuous dependent variable would be "ice cream sales", measured in US dollars, the continuous independent variable would be "price", also measured in US dollars, and the single control variable – that is, the single continuous independent variable you are adjusting for – would be daily temperature, measured in °C). You may believe that there is a relationship between ice cream sales and prices (i.e., sales go down as price goes up), but you would like to know if this relationship is affected by daily temperature (e.g., if the relationship changes when taking into account daily temperature since you suspect customers are more willing to buy ice creams, irrespective of price, when it is a really nice, hot day).
This "quick start" guide shows you how to carry out a partial correlation using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a partial correlation to give you a valid result. We discuss these assumptions next.
When you choose to analyse your data using partial correlation, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using partial correlation. You need to do this because it is only appropriate to use a partial correlation if your data "passes" five assumptions that are required for a partial correlation to give you a valid result. In practice, checking for these five assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.
Before we introduce you to these five assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., is not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out a partial correlation when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let’s take a look at these five assumptions:
- Assumption #1: You have one (dependent) variable and one (independent) variable and these are both measured on a continuous scale (i.e., they are measured on an interval or ratio scale). Examples of continuous variables include revision time (measured in hours), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), temperature (measured in °C), sales (measured in US dollars), and so forth.
- Assumption #2: You have one or more control variables, also known as covariates (i.e., control variables are just variables that you are using to adjust the relationship between the other two variables; that is, your dependent and independent variables). These control variables are also measured on a continuous scale (i.e., they are continuous variables). Examples of continuous variables are provided above.
- Assumption #3: There needs to be a linear relationship between all three variables. That is, all possible pairs of variables must show a linear relationship. This is often accomplished by visually inspecting a scatterplot.
- Assumption #4: There should be no significant outliers. Outliers are simply single data points within your data that do not follow the usual pattern. Partial correlation is sensitive to outliers, which can have a very large effect on the line of best fit and the correlation coefficient, leading to incorrect conclusions regarding your data. Therefore, it is best if there are no outliers or they are kept to a minimum.
- Assumption #5: Your variables should be approximately normally distributed. In order to assess the statistical significance of the partial correlation, you need to have bivariate normality for each pair of variables, but this assumption is difficult to assess, so a simpler method is more commonly used whereby the distribution for each variable individually is tested. This can be achieved using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics.
You can check assumptions #3, #4 and #5 using SPSS Statistics. Remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running a partial correlation might not be valid.
In the section, Test Procedure in SPSS Statistics, we illustrate the SPSS Statistics procedure to perform a partial correlation assuming that no assumptions have been violated. First, we set out the example we use to explain the partial correlation procedure in SPSS Statistics.