Poisson regression is used to predict a dependent variable that consists of "count data" given one or more independent variables. The variable we want to predict is called the dependent variable (or sometimes the response, outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes the predictor, explanatory or regressor variables). Some examples where Poisson regression could be used are described below:
Having carried out a Poisson regression, you will be able to determine which of your independent variables (if any) have a statistically significant effect on your dependent variable. For categorical independent variables you will be able to determine the percentage increase or decrease in counts of one group (e.g., deaths amongst "children" riding on roller coasters) versus another (e.g., deaths amongst "adults" riding on roller coasters). For continuous independent variables you will be able to interpret how a single unit increase or decrease in that variable is associated with a percentage increase or decrease in the counts of your dependent variable (e.g., a decrease of $1,000 in salary – the independent variable – on the percentage change in the number of times people in Australia default on their credit card repayments – the dependent variable).
This "quick start" guide shows you how to carry out Poisson regression using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for Poisson regression to give you a valid result. We discuss these assumptions next.
Note: We do not currently have a premium version of this guide in the subscription part of our website.
When you choose to analyse your data using Poisson regression, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using Poisson regression. You need to do this because it is only appropriate to use Poisson regression if your data "passes" five assumptions that are required for Poisson regression to give you a valid result. In practice, checking for these five assumptions will take the vast majority of your time when carrying out Poisson regression. However, it is essential that you do this because it is not uncommon for data to be violated (i.e., fail to meet) one or more of these assumptions. However, even when your data does fail some of these assumptions, there is often a solution to overcome this. First, let's take a look at these five assumptions:
You can check assumptions #3, #4 and #5 using SPSS Statistics. Assumptions #1 and #2 should be checked first, before moving onto assumptions #3, #4, and #5. Just remember that if you do not run the statistical tests on these assumptions correctly, the results you get when running Poisson regression might not be valid.
Also, if your data violated Assumption #5, which is extremely common when carrying out Poisson regression, you need to first check if you have "apparent Poisson overdispersion". Apparent Poisson overdispersion is where you have not specified the model correctly such that the data appears overdispersed. Therefore, if your Poisson model initially violates the assumption of equidispersion, you should first make a number of adjustments to your Poisson model to check that it is actually overdispersed. This requires that you make six checks of your model/data: (a) Does your Poisson model include all important predictors?; (b) Does your data include outliers?; (c) Does your Poisson regression include all relevant interaction terms?; (d) Do any of your predictors need to be transformed?; (e) Does your Poisson model require more data and/or is your data too sparse?; and (f) Do you have missing values that are not missing at random (MAR)?
In the section, Procedure, we illustrate the SPSS Statistics procedure to perform a Poisson regression assuming that no assumptions have been violated. First, we introduce the example that is used in this guide.
The Director of Research of a small university wants to assess whether the experience of an academic and the time they have available to carry out research influences the number of publications they produce. Therefore, a random sample of 21 academics from the university are asked to take part in the research: 10 are experienced academics and 11 are recent academics. The number of hours they spent on research in the last 12 months and the number of peer-reviewed publications they generated are recorded.
To set up this study design in SPSS Statistics, we created three variables: (1) no_of_publications, which is the number of publications the academic published in peer-reviewed journals in the last 12 months; (2) experience_of_academic, which reflects whether the academic is experienced (i.e., has worked in academia for 10 years or more, and is therefore classified as an "Experienced academic") or has recently become an academic (i.e., has worked in academic for less than 3 years, but at least one year, and is therefore classified as a "Recent academic"); and (3) no_of_weekly_hours, which is number of hours an academic has available each week to work on research.
The 13 steps below show you how to analyse your data using Poisson regression in SPSS Statistics when none of the five assumptions in the previous section, Assumptions, have been violated. At the end of these 13 steps, we show you how to interpret the results from your Poisson regression.
Click Analyze > Generalized Linear Models > Generalized Linear Models... on the main menu, as shown below:
You will be presented with the Generalized Linear Models dialogue box below:
Select Poisson loglinear in the area, as shown below:
Note: Whilst it is standard to select Poisson loglinear in the area in order to carry out a Poisson regression, you can also choose to run a custom Poisson regression by selecting Custom in the area and then specifying the type of Poisson model you want to run using the Distribution:, Link function: and –Parameter– options.
Select the tab. You will be presented with the following dialogue box:
Transfer your dependent variable, no_of_publications, into the Dependent variable: box in the area using the button, as shown below:
Select the tab. You will be presented with the following dialogue box:
Transfer the categorical independent variable, experience_of_academic, into the Factors: box and the continuous independent variable, no_of_weekly_hours, into the Covariates: box, using the buttons, as shown below:
Note 1: If you have ordinal independent variables, you need to decide whether these are to be treated as categorical and entered into the Factors: box, or treated as continuous and entered into the Covariates: box. They cannot be entered into a Poisson regression as ordinal variables.
Note 2: Whilst it is typical to enter continuous independent variables into the Covariates: box, it is possible to enter ordinal independent variables instead. However, if you choose to do this, your ordinal independent variable will be treated as continuous.
Note 3: If you click on the button the following dialogue box will appear:
In the –Category Order for Factors– area you can choose between the Ascending, Descending and Use data order options. These are useful because SPSS Statistics automatically turns your categorical variables into dummy variables. Unless you are familiar with dummy variables, this can make it a little tricky to interpret the output from a Poisson regression for each of the groups of your categorical variables. Therefore, making changes to the options in the –Category Order for Factors– area can make it easier to interpret your output.
Select the tab. You will be presented with the following dialogue box:
Keep the default of in the –Build Term(s)– area and transfer the categorical and continuous independent variables, experience_of_academic and no_of_weekly_hours, from the Factors and Covariates: box into the Model: box, using the button, as shown below:
Note 1: It is in the dialogue box that you build your Poisson model. In particular, you determine what main effects you have (the option), as well as whether you expect there to be any interactions between your independent variables (the option). If you suspect that you have interactions between your independent variables, including these in your model is important not only to improve the prediction of your model, but also to avoid issues of overdispersion, as highlighted in the Assumptions section earlier.
Whilst we provide an example for a very simply model with just a single main effect (between the categorical and continuous independent variables, experience_of_academic and no_of_weekly_hours), you can easily enter more complex models using the , , . and options in the –Build Term(s)– area depending on the type of main effects and interactions you have in your model.
Note 2: You can also build nested terms into your model by adding these into the Term: box in the –Build Nested Term– area. We do not have nested effects in this model, but there are many scenarios where you might have nested terms in your model.
Select the tab. You will be presented with the following dialogue box:
Keep the default options selected.
Note: There are a number of different options you can select within the –Parameter Estimation– area, including the ability to choose a different: (a) scale parameter method (i.e., or instead of in the Scale Parameter Method: box), which might be considered to deal with issues of overdispersion; and (b) covariance matrix (i.e., Robust estimator instead of Model-based estimator in the –Covariance Matrix– area), which presents another potential option (amongst other things) to deal with issues of overdispersion.
There are also a number of specifications you can make in the –Iterations– area in order to deal with issues of non-convergence in your Poisson model.
Select the tab. You will be presented with the following dialogue box:
Select Include exponential parameter estimates in the area, as shown below:
Note 1: In the area, you can choose between the Wald and Likelihood ratio based on factors such as sample size and the implications that this can have for the accuracy of statistical significance testing.
In the area, the Lagrange multiplier test can also be useful to determine whether the Poisson model is appropriate for your data (although this cannot be run using the Poisson regression procedure).
Note 2: You can also select a wide range of other options from the and tabs. These include options that are important when examining differences between the groups of your categorical variables as well as testing the assumptions of Poisson regression, as discussed in the Assumptions section earlier.
SPSS Statistics will generate quite a few tables of output for a Poisson regression analysis. In this section, we show you the eight main tables required to understand your results from the Poisson regression procedure, assuming that no assumptions have been violated.
The first table in the output is the Model Information table (as shown below). This confirms that the dependent variable is the "Number of publications", the probability distribution is "Poisson" and the link function is the natural logarithm (i.e., "Log"). If you are running a Poisson regression on your own data the name of the dependent variable will be different, but the probability distribution and link function will be the same.
The second table, Case Processing Summary, shows you how many cases (e.g., subjects) were included in your analysis (the "Included" row) and how many were not included (the "Excluded" row), as well as the percentage of both. You can think of the "Excluded" row as indicating cases (e.g., subjects) that had one or more missing values. As you can see below, there were 21 subjects in this analysis with no subjects excluded (i.e., no missing values).
The Categorical Variable Information table highlights the number and percentage of cases (e.g., subjects) in each group of each independent categorical variable in your analysis. In this analysis, there is only one categorical independent variable (also known as a "factor"), which was experience_of_academic. You can see that the groups are fairly balanced in numbers between the two groups (i.e., 10 versus 11). Highly unbalanced group sizes can cause problems with model fit, but we can see that there is no problem here.
The Continuous Variable Information table can provide a rudimentary check of the data for any problems, but is less useful than other descriptive statistics you can run separately before running the Poisson regression. The best you can get out of this table is to gain an understanding of whether there might be overdispersion in your analysis (i.e., Assumption #5 of Poisson regression). You can do this by considering the ratio of the variance (the square of the "Std. Deviation" column) to the mean (the "Mean" column) for the dependent variable. You can see these figures below:
The mean is 2.29 and the variance is 2.81 (1.67758^{2}), which is a ratio of 2.81 ÷ 2.29 = 1.23. A Poisson distribution assumes a ratio of 1 (i.e., the mean and variance are equal). Therefore, we can see that before we add in any explanatory variables there is a small amount of overdispersion. However, we need to check this assumption when all the independent variables have been added to the Poisson regression. This is discussed in the next section.
The Goodness of Fit table provides many measures that can be used to assess how well the model fits. However, we will concentrate on the value in the "Value/df" column for the "Pearson Chi-Square" row, which is 1.108 in this example, as shown below:
A value of 1 indicates equidispersion whereas values greater than 1 indicate overdispersion and values below 1 indicate underdispersion. The most common type of violation of the assumption of equidispersion is overdispersion. With such a small sample size in this example a value of 1.108 is unlikely to be a serious violation of this assumption.
The Omnibus Test table fits somewhere between this section and the next. It is a likelihood ratio test of whether all the independent variables collectively improve the model over the intercept-only model (i.e., with no independent variables added). Having all the independent variables in our example model we have a p-value of .006 (i.e., p = .006), indicating a statistically significant overall model, as shown below in the "Sig." column:
Now that you know that the addition of all the independent variables generates a statistically significant model, you will want to know which specific independent variables are statistically significant. This is discussed in the next section.
The Tests of Model Effects table (as shown below) displays the statistical significance of each of the independent variables in the "Sig." column:
There is not usually any interest in the model intercept. However, we can see that the experience of the academic was not statistically significant (p = .644), but the number of hours worked per week was statistically significant (p = .030). This table is mostly useful for categorical independent variables because it is the only table that considers the overall effect of a categorical variable, unlike the Parameter Estimates table, as shown below:
This table provides both the coefficient estimates (the "B" column) of the Poisson regression and the exponentiated values of the coefficients (the "Exp(B)" column). It is usually the latter that are more informative. These exponentiated values can be interpreted in more than one way and we will show you one way in this guide. Consider, for example, the number of hours worked weekly (i.e., the "no_of_weekly_hours" row). The exponentiated value is 1.044. This means that the number of publications (i.e., the count of the dependent variable) will be 1.044 times greater for each extra hour worked per week. Another way of saying this is that there is a 4.4% increase in the number of publications for each extra hour worked per week. A similar interpretation can be made for the categorical variable.
You could write up the results of the number of hours worked per week as follows:
A Poisson regression was run to predict the number of publications an academic publishes in the last 12 months based on the experience of the academic and the number of hours an academic spends each week working on research. For every extra hour worked per week on research, 1.044 (95% CI, 1.004 to 1.085) times more publications were published, a statistically significant result, p = .030.