Linear regression, also known as simple linear regression or bivariate linear regression, is used when we want to predict the value of a dependent variable based on the value of an independent variable. The dependent variable can also be referred to as the outcome, target or criterion variable, whilst the independent variable can also be referred to as the predictor, explanatory or regressor variable. We will refer to these as dependent and independent variables throughout this guide.
For example, you could use linear regression to understand whether test anxiety can be predicted based on revision time (i.e., the dependent variable would be "test anxiety", measured using an anxiety index, and the independent variable would be "revision time", measured in hours). Alternatively, you could use linear regression to understand whether cholesterol concentration (a fat in the blood linked to heart disease) can be predicted based on time spent exercising (i.e., the dependent variable would be "cholesterol concentration", measured in mmol/L, and the independent variable would be "time spent exercising", measured in hours).
Note: If you have two or more independent variables, rather than just one, you need to use multiple regression. Alternatively, if you just want to establish whether a linear relationship exists, but are not making predictions, you could use Pearson's correlation. If your dependent variable is dichotomous, you could use a binomial logistic regression.
In this guide, we show you how to carry out linear regression using Minitab, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for linear regression to give you a valid result. We discuss these assumptions next.
Linear regression has seven assumptions. You cannot test the first two of these assumptions with Minitab because they relate to your study design and choice of variables. However, you should check whether your study meets these assumptions before moving on. If these assumptions are not met, there is likely to be a different statistical test that you can use instead. Assumptions #1 and #2 are explained below:
Assumptions #3, #4, #5, #6 and #7 relate to the nature of your data and can be checked using Minitab. You have to check that your data meets these assumptions because if it does not, the results you get when running a linear regression might not be valid. In fact, do not be surprised if your data violates one or more of these assumptions. This is not uncommon. However, there are possible solutions to correct such violations (e.g., transforming your data) such that you can still use a linear regression. Assumptions #3, #4, #5, #6 and #7 are explained below:
In practice, checking for assumptions #3, #4, #5, #6 and #7 will probably take up most of your time when carrying out linear regression. However, it is not a difficult task, and Minitab provides all the tools you need to do this.
In the section, Test Procedure in Minitab, we illustrate the Minitab procedure required to perform linear regression assuming that no assumptions have been violated. First, we set out the example we use to explain the linear regression procedure in Minitab.
An educator wants to determine whether students' exam scores were related to revision time. For example, as students spent more time revising, did their exam score also increase (a positive relationship); or did the opposite happen? The educator also wanted to know the proportion of exam score that revision time could explain, as well as being able to predict the exam score. The educator could then determine whether, for example, students that spent just 10 hours revising could still pass their exam. Therefore, the dependent variable was "exam score", measured on a scale from 0 to 100, and the independent variable was "revision time", measured in hours.
To carry out the analysis, the researcher recruited 40 students. The length of time revising (i.e., the independent variable, Revision time) and the exam scores (i.e., the dependent variable, Exam score) were recorded for all 40 participants. Expressed in variable terms, the researcher wanted to regress Exam score on Revision time. A linear regression was used to determine whether there was a statistically significant relationship between exam score and revision time.
Note: The example and data used for this guide are fictitious. We have just created them for the purposes of this guide.
In Minitab, we entered our two variables into the first two columns ( and ). Under column we entered the name of the dependent variable, Exam score, as follows: . Then, under column we entered the name of the independent variable, Revision time, as follows: . Finally, we entered the scores for the dependent variable, Exam score, into the column, and independent variable, Revision time, into the column. This is illustrated below:
Published with written permission from Minitab Inc.
Note: It does not matter whether you enter the dependent variable or independent variable under C1 or C2. We have just entered the data into Minitab this way in our example.
In this section, we show you how to analyze your data using a linear regression in Minitab when the seven assumptions set out in the Assumptions section have not been violated. Therefore, the three steps required to run a linear regression in Minitab are shown below:
Click Stat > Regression > Regression... on the top menu, as shown below:
Published with written permission from Minitab Inc.
You will be presented with the following Regression dialogue box:
Published with written permission from Minitab Inc.
Transfer the dependent variable, C1 Exam score into the Response: box, and the independent variable, C2 Revision time into the Predictors: box. You will end up with the dialogue box shown below:
Published with written permission from Minitab Inc.
Note: To transfer the two variables, you first need to click inside the Response: box for your two variables to appear in the main left-hand box (e.g., C1 Exam score and C2 Revision time). This will activate the button (it is usually faded: ). Since the Response: box is where you put your dependent variable, you need to select the appropriate variable in the main left-hand box and either press the button or simply double-click on the variable (i.e., C1 Exam score in our example). You now need to follow the same procedure, but for the independent variable, which should be transferred into the Predictors: box (i.e., C2 Revision time in our example).
The Minitab output for a linear regression is shown below:
The output provides four important pieces of information:
In this example, R^{2} = 72.8%, whilst the adjusted R^{2} = 72.1%, which means that the independent variable, Revision time, explains 72.8% of the variability of the dependent variable, Exam score. Adjusted R^{2} is also an estimate of the effect size, which at 72.1%, is indicative of a large effect size according to Cohen's (1988) classification. In this example, the regression model is statistically significant, F(1, 38) = 101.90, p < .0005. This indicates that, overall, the model applied can statistically significantly predict the dependent variable, Exam score.
Note: In addition to the linear regression output above, you will also have to interpret (a) the scatterplots you used to check if there was a linear relationship between your two variables (i.e., Assumption #3); (b) casewise diagnostics to check there were no significant outliers (i.e., Assumption #4); (c) the output from the Durbin-Watson statistic to check for independence of observations (i.e., Assumption #5); (d) a scatterplot of the regression standardized residuals against the regression standardized predicted value to determine whether your data showed homoscedasticity (i.e., Assumption #6); and (e) a histogram (with superimposed normal curve) and Normal P-P Plot to check whether the residuals (errors) of the model were approximately normally distributed (i.e., Assumption #7) (see the Assumptions section earlier if you are unsure what these assumptions are). Remember that if your data failed any of these assumptions, the output that you get from the linear regression procedure (i.e., the output we discussed above) might not be valid, and you will have to take steps to deal with such violations (e.g., transforming your data using Minitab) or using a different statistical test.
When you report the output of your linear regression, it is good practice to include:
Based on the Minitab output above, we could report the results of this study as follows:
A linear regression established that revision time statistically significantly predicted exam score, F(1, 38) = 101.90, p < .0005, and time spent revising accounted for 72.8% of the explained variability in exam score. The regression equation was: predicted exam score = 44.540 + 0.555 x (revision time).
In addition to reporting the results as above, a diagram can be used to visually present your results. For example, you could use a scatterplot with confidence and prediction intervals (although it is not very common to add the last). This can make it easier for others to understand your results. Furthermore, you can use your linear regression equation to make predictions about the value of the dependent variable based on different values of the independent variable. Whilst Minitab does not produce these values as part of the linear regression procedure above, there is a procedure in Minitab that you can use to do so.