Laerd Statistics LoginCookies & Privacy

Binomial logistic regression using Minitab

Introduction

A binomial logistic regression is used to predict a dichotomous dependent variable based on one or more continuous or nominal independent variables. It is the most common type of logistic regression and is often simply referred to as logistic regression. However, in Minitab they refer to it as binary logistic regression. In many ways a binomial logistic regression can be considered as a multiple linear regression, but for a dichotomous rather than a continuous dependent variable.

For example, you could use a binomial logistic regression to understand whether the presence of heart disease can be predicted from physical activity level, cholesterol concentration, glucose concentration and body composition. Heart disease is the dichotomous dependent variable (i.e., presence of heart disease is either "Yes" or "No"). Physical activity level (in minutes per week), cholesterol concentration (mmol/L) and glucose concentration (mmol/L) are continuous independent variables and body composition is a nominal independent variable (i.e., with three groups: "Normal", "Overweight" and "Obese"). Another example where you could use a binomial logistic regression is to understand whether the premature failure of a new type of light bulb (i.e., before its one year warranty) can be predicted from the total duration the light is on for, the number of times the light is switched on and off, and the temperature of the ambient air. In this case, premature failure is the dichotomous dependent variable (i.e., the light bulb fails within its one year warranty: "Yes" or "No"). The other three variables used to predict the light bulb failure are all continuous independent variables: the total duration the light is on for (in minutes), the number of times the light is switched on and off and the ambient air temperature (in °C).

In this guide, we show you how to carry out a binomial logistic regression using Minitab, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a binomial logistic regression to give you a valid result. We discuss these assumptions next.

Note: We do not currently have a premium version of this guide in the subscription part of our website. If you would like us to add a premium version of this guide, please contact us.

Minitab

Assumptions

Binomial logistic regression has seven assumptions. You cannot test the first two of these assumptions with Minitab because they relate to your study design and choice of variables. However, you should check whether your study meets these assumptions before moving on. If these assumptions are not met, there is likely to be a different statistical test that you can use instead. Assumptions #1 and #2 are explained below:

Assumptions #3, #4, #5 and #6 relate to the nature of your data and can be checked using Minitab. You have to check that your data meets these assumptions because if it does not, the results you get when running a binomial logistic regression might not be valid. In fact, do not be surprised if your data violates one or more of these assumptions. This is not uncommon. However, there are possible solutions to correct such violations (e.g., transforming your data) such that you can still use binomial logistic regression. Assumptions #3, #4, #5 and #6 are explained below:

In practice, checking for assumptions #3, #4, #5 and #6 will probably take up most of your time when carrying out a binomial logistic regression. However, it is not a difficult task, and Minitab provides all the tools you need to do this.

In the section, Test Procedure in Minitab, we illustrate the Minitab procedure required to perform binomial logistic regression assuming that no assumptions have been violated. First, we set out the example we use to explain the binomial logistic regression procedure in Minitab.

Minitab

Example

A marathon is a very hard race and many who have never ran a marathon before do not finish. A sport scientist is interested in reducing this dropout rate by discovering what might predict whether a first-time marathon runner quits the race. In order to do this a researcher randomly interviewed many finishers and non-finishers, who were also first-time marathon runners, at a number of marathon races across the world. They asked how long they had been training for the marathon, whether they were running for a charity, their age and whether the marathon was considered a 'prestigious' marathon (e.g., the London Marathon, which draws huge crowds).

Therefore, in this example, the dichotomous dependent variable is finished_race, which has two categories: "Yes" and "No". The length of training prior to the marathon was a continuous independent variable, training_duration (in months), and participants' age was also a continuous independent variable, age (in years). Whether a participant was running for a charity was a dichotomous independent variable, charity, with two categories: "Yes" and "No". In total, 203 first-time runners were recruited.

Note: The example and data used for this guide are fictitious. We have just created them for the purposes of this guide.

Minitab

Setup in Minitab

In Minitab, we entered our four variables into the first fours columns (, , and ). Under column we entered the name of the dichotomous dependent variable, finished_race, as follows: . Then, under column we entered the name of the continuous independent variable, training_duration, as follows: . Next, under column we entered the name of the dichotomous independent variable, charity, as follows: . In the final column, , we entered the name of the continuous independent variable, age, as follows: . The data setup is shown below:

Data setup for binary logistic regression in Minitab

Published with written permission from Minitab Inc.

Note: It does not matter which order you enter the variables into Minitab.

Minitab

Test Procedure in Minitab

In this section, we show you how to analyse your data using a binomial logistic regression in Minitab when the six assumptions set out in the Assumptions section have not been violated. Therefore, the six steps required to run a binomial logistic regression in Minitab are shown below:

Minitab

Output of the binomial logistic regression in Minitab

You will notice that there is a lot of output produced by Minitab after you have run the binary logistic regression procedure. We summarize some of the most important parts of the output, as shown below:

Output from the binary logistic regression procedure in Minitab

This output provides three important pieces of information:

In this example, the Hosmer-Lemeshow test is not statistically significant (p = .721), which indicates that the model fits the data well. The p-values for the training_duration, charity and age coefficients indicate that only training duration (p < .0005) and age (p = .022) are statistically significant predictors of dropout in a marathon race amongst first-time runners.

Note: A Classification Table is very useful to produce, but is not produced automatically by Minitab. Nonetheless, it can be produced in Minitab by selecting the correct options in the binary logistic regression procedure and following these up with further tests. Producing this table will allow you to calculate percentage accuracy in classification (PAC), Sensitivity, Specificity, positive predictive value and negative predictive value, all potentially useful measures in evaluating your data.

Minitab

Reporting the output of the binomial logistic regression

When you report the output of your binomial logistic regression, it is good practice to include:

Based on the Minitab output above, you could report the results as follows:

A binomial logistic regression was run to understand the effects of training duration, running for charity and age on dropout in a marathon race for first-time runners. The Hosmer-Lemeshow test showed that the model fitted the data well, p = 721. Both time spent training for the marathon (p < .0005) and a runner's age (p = .0022) statistically significantly predicted dropout. However, running for a charity did not statistically significantly predict dropout, p = .373.

In addition to reporting the results as above, a diagram can be used to visually present your results. This can make it easier for others to understand your results. Furthermore, you can use Minitab to make predictions about dropout (the dependent variable) based on values you define for your independent variables. This is a separate procedure available in Minitab that you can use once you have run the binary logistic regression procedure.

Portions of information contained in this publication/book are printed with permission of Minitab Inc. All such material remains the exclusive property and copyright of Minitab Inc. All rights reserved.

1