Multinomial Logistic Regression using SPSS Statistics

Introduction

Multinomial logistic regression (often just called "multinomial regression") is used to predict a nominal dependent variable given one or more independent variables. It is sometimes considered an extension of binomial logistic regression to allow for a dependent variable with more than two categories. As with other types of regression, multinomial logistic regression can have nominal and/or continuous independent variables and can have interactions between independent variables to predict the dependent variable.

For example, you could use multinomial logistic regression to understand which type of drink consumers prefer based on location in the UK and age (i.e., the dependent variable would be "type of drink", with four categories – Coffee, Soft Drink, Tea and Water – and your independent variables would be the nominal variable, "location in UK", assessed using three categories – London, South UK and North UK – and the continuous variable, "age", measured in years). Alternately, you could use multinomial logistic regression to understand whether factors such as employment duration within the firm, total employment duration, qualifications and gender affect a person's job position (i.e., the dependent variable would be "job position", with three categories – junior management, middle management and senior management – and the independent variables would be the continuous variables, "employment duration within the firm" and "total employment duration", both measured in years, the nominal variables, "qualifications", with four categories – no degree, undergraduate degree, master's degree and PhD – "gender", which has two categories: "males" and "females").

This "quick start" guide shows you how to carry out a multinomial logistic regression using SPSS Statistics and explain some of the tables that are generated by SPSS Statistics. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a multinomial logistic regression to give you a valid result. We discuss these assumptions next.

Note: We do not currently have a premium version of this guide in the subscription part of our website. If you would like us to add a premium version of this guide, please contact us.

SPSS Statistics

Assumptions of a multinomial logistic regression

When you choose to analyse your data using multinomial logistic regression, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using multinomial logistic regression. You need to do this because it is only appropriate to use multinomial logistic regression if your data "passes" six assumptions that are required for multinomial logistic regression to give you a valid result. In practice, checking for these six assumptions just adds a little bit more time to your analysis, requiring you to click a few more buttons in SPSS Statistics when performing your analysis, as well as think a little bit more about your data, but it is not a difficult task.

Before we introduce you to these six assumptions, do not be surprised if, when analysing your own data using SPSS Statistics, one or more of these assumptions is violated (i.e., not met). This is not uncommon when working with real-world data rather than textbook examples, which often only show you how to carry out a multinomial logistic regression when everything goes well! However, don’t worry. Even when your data fails certain assumptions, there is often a solution to overcome this. First, let's take a look at these six assumptions:

Assumption #1: Your dependent variable should be measured at the nominal level. Examples of nominal variables include ethnicity (e.g., with three categories: Caucasian, African American and Hispanic), transport type (e.g., with four categories: bus, car, tram and train), profession (e.g., with five groups: surgeon, doctor, nurse, dentist, therapist), and so forth. Multinomial logistic regression can also be used for ordinal variables, but you might consider running an ordinal logistic regression instead. You can learn more about types of variables in our article: Types of Variable.
Assumption #2: You have one or more independent variables that are continuous, ordinal or nominal (including dichotomous variables). However, ordinal independent variables must be treated as being either continuous or categorical. They cannot be treated as ordinal variables when running a multinomial logistic regression in SPSS Statistics; something we highlight later in the guide. Examples of continuous variables include age (measured in years), revision time (measured in hours), income (measured in US dollars), intelligence (measured using IQ score), exam performance (measured from 0 to 100), weight (measured in kg), and so forth. Examples of ordinal variables include Likert items (e.g., a 7-point scale from "strongly agree" through to "strongly disagree"), amongst other ways of ranking categories (e.g., a 3-point scale explaining how much a customer liked a product, ranging from "Not very much", to "It is OK", to "Yes, a lot"). Example nominal variables were provided in the previous bullet.
Assumption #3: You should have independence of observations and the dependent variable should have mutually exclusive and exhaustive categories.
Assumption #4: There should be no multicollinearity. Multicollinearity occurs when you have two or more independent variables that are highly correlated with each other. This leads to problems with understanding which variable contributes to the explanation of the dependent variable and technical issues in calculating a multinomial logistic regression. Determining whether there is multicollinearity is an important step in multinomial logistic regression. Unfortunately, this is an exhaustive process in SPSS Statistics that requires you to create any dummy variables that are needed and run multiple linear regression procedures.
Assumption #5: There needs to be a linear relationship between any continuous independent variables and the logit transformation of the dependent variable.
Assumption #6: There should be no outliers, high leverage values or highly influential points.

You can check assumptions #4, #5 and #6 using SPSS Statistics. Assumptions #1, #2 and #3 should be checked first, before moving onto assumptions #4, #5 and #6. Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running a multinomial logistic regression might not be valid.

In the section, Procedure, we illustrate the SPSS Statistics procedure to perform a multinomial logistic regression assuming that no assumptions have been violated. First, we introduce the example that is used in this guide.