Login

Repeated measures logistic regression using generalized estimating equations (GEE)

Introduction

A repeated measures logistic regression is used to understand if there are differences between two or more independent groups (e.g., a "control group" and an "intervention group", "males" and "females") across two or more repeated measurements (e.g., multiple "time points" over a year, "multiple conditions/treatments" that subjects undergo) when the dependent variable is dichotomous (e.g., a fitness test is "passed" or "failed", cholesterol concentration is "above" or "below" a specific value after taking the new drug).

Note: If you have a similar study design, but your dependent variable is continuous rather than dichotomous, see our SPSS Statistics guide on the mixed ANOVA.

In the tabs below, we set out two examples where a repeated measures logistic regression could be used: (a) where the repeated measurements are multiple time points (e.g., "before" and "after" an exercise intervention, "every month" for a year after taking a new drug); and (b) where the repeated measurements are multiple conditions/treatments (e.g., subjects take "four doses" of a migraine drug to measure adverse drug reactions, "three types" of energy drink are imbibed by subjects to measure driving alertness):

  • The repeated measurements are multiple time points
  • The repeated measurements are multiple conditions/treatments

The repeated measurements are multiple time points

Your within-subjects factor is time.
Your between-subjects factor consists of conditions (also known as treatments).

Imagine that a researcher working for a consumer watchdog wants to help learner drivers understand how many hours of driving lessons they should purchase in order to pass their driving test. The researcher also wants to understand if there is a difference based on whether the learner has manual driving lessons (known as "stick shift" in the United States) or automatic driving lessons.

To achieve this, the researcher sets up an experiment where 100 learner drivers are randomly assigned to one of two treatment/experimental groups: "manual/stick shift driving lessons" and "automatic driving lessons". All 100 learners are given 40 hours of driving lessons. At the end of every 5 hours of driving, the instructor gives each learner a 30-minute "mock driving test", which they can either "pass" or "fail". Therefore, the learners take a total of 8 mock driving tests (i.e., 40 hours of lessons divided by 5 hours = 8 mock tests).

The researcher wanted learners to be tested after every extra 5 hours of driving because driving lessons are typically booked in "blocks" (e.g., blocks of 5, 10 or 20 hours of lessons). Therefore, the researcher wanted to use the smallest block (i.e., 5 hour blocks) because the goal of the research was to save learners money (i.e., learners are encouraged to buy bigger blocks of driving lessons with discounted prices, but this could still cost more overall if learner drivers do not need so many lessons).

In this example, the dichotomous dependent variable is "driving test", which has two groups: "pass" and "fail". The within-subjects factor (also known as a repeated measures independent variable) is "time", with "8 time points" (i.e., where learners are given a mock driving tests at the end of each 5 hours of driving lessons). The between-subjects factor is "lesson type", which has two independent groups: "manual/stick shift" and "automatic" (i.e., the two groups are "independent" because a learner can only be in one of these two groups).

At the end of the experiment, the researcher starts by describing the proportion of "passes" and "fails" over the 8 mock driving tests between manual/stick shift and automatic learners. For example, in the 1st mock driving test, the "pass" rate was .0 (i.e., 0%) for manual/stick shift learners and .0 (i.e., 0%) for automatic learners. In other words, none of the learners passed the mock test after their first 5 hours of driving lessons, which is perhaps not surprising. By comparison, by the 8th mock driving test, the "pass" rate was .634 (i.e., 63.4%) for manual/stick shift learners and .701 (i.e., 70.1%) for automatic learners. Expressed another way, the "fail" rate was .366 (i.e., 36.6%) for manual/stick shift learners and .299 (i.e., 29.9%) for automatic learners. The researcher presents these "pass" and "fail" rates for all 8 mock driving tests in a table, which provide useful descriptive statistics to get a sense of the data.

Next, the researcher uses a repeated measures logistic regression using generalized estimating equations (GEE) to determine whether there is a difference in the proportion of "passes" over the 8 mock driving tests between manual/stick shift and automatic learners. In other words, is there a two-way interaction effect between "time" and "lesson type" in terms of the dependent variable, which is the proportion of learners who "pass" each mock driving test? If there is a two-way interaction effect, this suggests that the proportion of passes and fails is not the same over time for manual/stick shift and automatic learners. In other words, how learners improve after each 5 hours of extra driving lessons, assessed in terms of the proportion of "passes", is not the same for manual/stick shift learners compared to learners who had automatic lessons. The researcher can also use follow-up analyses to understand how pass rates might change over time for manual/stick shift and automatic learners. For example, the researcher may want to understand how pass rates changed for manual/stick shift and automatic learners for every extra 5 hours driving lessons.

The repeated measurements are multiple conditions/treatments

Your within-subjects factor consists of conditions (also known as treatments).
Your between-subjects factor is a characteristic of your sample.

A national health service wants to review the adverse side effects of three migraine drugs that its doctors prescribe to patients with long-term (chronic) migraine. The health service wants to determine the proportion of patients who experience an "adverse drug reaction (ADR)" after taking the drugs. Whilst there are different types of adverse drug reaction (ADR), for the purpose of this example, we are referring to "Type A" reactions (e.g., see MHRA, 2025). The health service is also interested in differences in ADR between males and females.

To achieve this, a researcher from the health service sets up a study where 120 migraine patients are prescribed each of the three drugs for a 3-month period. For the simplicity of this example, the order that the patients receive the three drugs is random to try to avoid "order effects" and there is a "washout period" between each trial to ensure that the effects of each drug are no longer present before the next drug is prescribed. If a patient experiences an ADR whilst taking a drug, the trial is stopped and the patients starts the washout period before starting the next drug.

In this example, the dichotomous dependent variable is "ADR", which has two groups: "no" and "yes". The within-subjects factor (also known as a repeated measures independent variable) is "condition/treatment", with "3 repeated measures" (i.e., where patients take 3 different migraine drugs, each for a 3-month period). The between-subjects factor is "gender", which has two independent groups: "males" and "females".

At the end of the experiment, the researcher starts by describing the proportion of male and female patients who experienced an ADR for each of the three migraine drugs. For example, when taking the first migraine drug, .012 (i.e., 1.2%) of males and .024 (i.e., 2.4%) of females experienced an ADR. The researcher presents the proportion of male and female patients who experience an ADR for each of the three migraine drugs in a table, which provide useful descriptive statistics to get a sense of the data.

Next, the researcher uses a repeated measures logistic regression using generalized estimating equations (GEE) to determine whether there is a difference in the proportion of male and female patients who experience an ADR when taking each of the three migraine drugs. In other words, is there a two-way interaction effect between "gender" and "drug" in terms of the dependent variable, which is the proportion of patients who experience an ADR (i.e., who state "yes" to experiencing an ADR)? If there is a two-way interaction effect, this suggests that the proportion of male and female patients who experience an ADR is not the same when taking each of the three migraine drugs. In other words, whether patients experience an ADR after taking each drug differs based on whether they are male or female and the drug that was taken. The researcher can also use follow-up analyses to understand in more detail how the proportion of patients who experience an ADR might differ based on their gender and which of the three migraine drugs they take.

Note: In the examples above, there is one between-subjects factor and one within-subjects factor, which is called a two-way mixed design. However, a repeated measures logistic regression can be used when you have one or more between-subjects factors and one or more within-subjects factors (e.g., a three-way mixed design that can have two between-subjects factors and one within-subjects factor). In this guide, we illustrate the two-way mixed design as an introduction to the use of a repeated measures logistic regression.

There are many methods that can be used when analysing data for a mixed design when the dependent variable is dichotomous. In this guide, we show how to carry out a repeated measures logistic regression using generalized estimating equations (GEE), which we will simply refer to as GEE for the remainder of this guide. However, we also plan to add guides to show how to carry out a repeated measures logistic regression using different methods such as generalized linear mixed models (GLMM). If you would like us to email you when these guides become available, please contact us.

Note: Both the GEE and GLMM methods can be used for a wide range of mixed designs and types of dependent variable (e.g., count, ordinal, nominal, dichotomous and continuous dependent variables). However, in this introductory guide, we illustrate the use of GEE for a two-way mixed design when the dependent variable is dichotomous (i.e., an ordinal or nominal variable with two groups/categories).

In the sections that follow, we start by setting out the basic requirements and assumptions that underpin a repeated measures logistic regression using GEE when you have a two-way mixed design. It is important that your study design, variables and data fit with these basic requirements and assumptions to ensure that a repeated measures logistic regression using GEE will give you accurate/valid results. Next, we set out the example that is used throughout this guide to illustrate the use of a repeated measures logistic regression using GEE for a two-way mixed design. Third, we show you how to set up your data in SPSS Statistics to carry out this type of analysis. Next, we set out the SPSS Statistics procedure to carry out a repeated measures logistic regression using GEE for a two-way mixed design, which uses the Generalized Estimating Equations procedure in SPSS Statistics. Finally, we explain how to interpret the SPSS Statistics output/results that are produced when running the Generalized Estimating Equations procedure.

Note: We do not currently have a premium version of this guide in the subscription part of our website. However, we plan to add a whole series of detailed guides to help with GEE and GLMM for repeated measures and mixed designs with count, ordinal, nominal, dichotomous and continuous dependent variables. If you would like us to email you when these guides become available, please contact us.

SPSS Statistics

Basic requirements and assumptions of a repeated measures logistic regression using generalized estimating equations (GEE)

When you choose to analyse your data using repeated measures logistic regression using GEE, part of the process involves checking to make sure that the data you want to analyse can actually be analysed using this type of statistical analysis. You need to do this because it is only appropriate to use repeated measures logistic regression using GEE if your data: (a) "meets" three basic requirements that are needed for a repeated measures logistic regression using GEE to be appropriate for your study design and how you measured your variables; and (b) "passes" three assumptions that are required to give you a valid result. These basic requirements and assumptions are set out below:

If your study design and variables does not met any of the three requirements above, a repeated measures logistic regression using GEE would not be a suitable type of analysis. If you are unsure what alternatives are available, please contact us, letting us know which of these requirements were not met. Alternatively, if your study and variables fit with these three requirements, you can continue onto the two assumptions of a repeated measures logistic regression using GEE below:

You can partially check assumptions #1, #2 and #3 using SPSS Statistics. As for requirements #1, #2 and #3, these should be checked before assumptions #1, #2 and #3 because a repeated measures logistic regression will not be a suitable type of analysis if your study design and variables do not meet these requirements.

In the section, Procedure, we illustrate the SPSS Statistics procedure to perform a repeated measures logistic regression using GEE assuming that both assumptions #1, #2 and #3 have been met, such that we have selected a suitable working correlation matrix, the model fits the data well and there is no missing data or the data is MCAR. In the next section, we introduce the example that is used in this guide.

Testimonials
TAKE THE TOUR


SPSS Statistics

Example used in this guide

Imagine that a sports scientist wants to determine whether a new, 6-month training programme will help military personnel to pass a particularly demanding fitness test.

To achieve this, the researcher sets up an experiment where 100 military personnel are: (a) randomly sampled from the military population of interest (for the purpose of this example: front-line troops); and (b) randomly assigned to one of two groups: the "control group" and the "intervention group". The control group continues with their usual training routines, whereas the intervention group is given the new training programme. The experiment lasts 6 months with the subjects (i.e., military personnel) undergoing 7 fitness tests: one fitness test before the experiment starts (i.e., at month 0) and another fitness test every month for six months (i.e., at months 1, 2, 3, 4, 5 and 6).

In this example, the dichotomous dependent variable is "fitness test", which has two categories: "fail" and "pass". The within-subjects factor (also known as a repeated measures independent variable) is "time", with "7 time points" (i.e., where subjects take the fitness test at months 0, 1, 2, 3, 4, 5 and 6). The between-subjects factor is "group", which has two independent groups: "control" and "intervention" (i.e., the two groups are "independent" because a subject can only be in one of these two groups).

At the end of the experiment, the researcher starts by describing the proportion of "passes" over the 7 fitness tests between the control group who continued with their unusual training routine and the intervention group who undertook the new training programme. For example, the proportion of "passes" is shown in the table below:

  Time in months
Group 0 1 2 3 4 5 6
Control group .375 (37.5%) .387 (38.7%) .381 (38.1%) .500 (50.0%) .506 (50.6%) .494 (49.4%) .417 (41.7%)
Intervention group .417 (41.7%) .369 (36.9%) .292 (29.2%) .304 (30.4%) .482 (4.82%) .554 (55.4%) .875 (87.5%)
Table: Proportion of "passes" over time in the control and intervention groups

In the table above, the "pass" rate at the start of the experiment (i.e., month 0) was similar between the two groups, with .375 (i.e., 37.5%) in the control group and .417 (i.e., 41.7%) in the intervention group, with almost no difference in month 1 (i.e., .387 or 38.7% in the control group versus .369 or 36.9% in the intervention group). Throughout the 6-month experiment, "pass" rates in the control group fluctuated, but remained between .375 (37.5%) and .506 (50.6%). By comparison, pass rates dropped in months 2 and 3 in the intervention group to .292 (29.2%) and .304 (30.4%) respectively, before rebounding in months 4 and 5 to .482 (48.2%) and .554 (55.4%) respectively, and finally seeing a jump in month 6 to .875 (87.5%). These differences are also highlighted in the line graph below, where the proportion of "passes" is displayed on the y-axis, time in months is displayed on the x-axis, and the two groups are represented by the blue line (control group) and red line (intervention group) on the graph:

'Model Fitting Information' table. Multinomial logistic regression SPSS. 'Model Fitting Criteria' & 'Likelihood Ratio Tests'

Published with written permission from SPSS Statistics, IBM Corporation.

These descriptive statistics provide a useful insight into the data, but the researcher wants to learn more:

A repeated measures logistic regression using GEE can be used to try to answer these two questions. Whilst for the purpose of this basic guide, we only determine whether there is a two-way interaction effect, the researcher would typically use follow-up analyses to understand how pass and failure rates might change over time between the two groups. For example, if the new training programme is successful in helping military personnel to pass the fitness test relative to their unusual training routine, the researcher may want to understand how pass rates changed for each extra month of training. Alternatively, the researcher may be able to learn if any difference in pass rates between the two groups occurred after a specific number of months of training.

SPSS Statistics

Data setup in SPSS Statistics

In our example, which shows how to carry out a repeated measures logistic regression using GEE for a two-way mixed design, we created four variables in SPSS Statistics:

(1) The case identifier, subject_id, which provides an identifier for each participant in the study.

(2) The dichotomous dependent variable, passed_test, which has two groups "Fail" and "Pass".

(3) The between-subjects factor, group, with has two categories: "Control" and "Intervention".

(4) The within-subjects factor, time_in_months, which is measured in months and includes 7 time points (i.e., 7 measurements of the dependent variable, with each measurement being one month apart).

To set up this variable, SPSS Statistics has a Variable View where you define the types of variables you are analysing and a Data View where you enter your data for these variables. First, we show you how to set up your four variables in the Variable View of SPSS Statistics. Next, we show you how to enter your data into the Data View.

Note: If you have already entered your data into SPSS Statistics, but the way you have set up your Variable View and Data View are different from the setup in our guide below, we will be adding a series of guides to show how to restructure existing datasets. This will include a guide dedicated to a two-way mixed design. If you would like us to email you when these guides become available, please contact us.

The "Variable View" in SPSS Statistics

At the end of the data setup process, your Variable View window will look like the one below, which illustrates the setup for your four variables:

variable view showing 'subject_id', 'passed_test', 'group', 'time_in_months'

Published with written permission from SPSS Statistics, IBM Corporation.

In the Variable View above, you will have entered each of your four variables on a separate row. It does not matter on which rows in the Variable View you enter each variable. It will only determine how the data is set up in the Data View, as explained later. It will not affect your analysis.

In our example, we entered the case identifier, subject_id, on 1, the dichotomous dependent variable, passed_test, on 2, the between-subjects factor, group, on 3, and the within-subjects factor, time_in_months, on 4.

First, you need to give each variable a name in the cells under the name column (e.g., "passed_test" in row 2 to represent our dichotomous dependent variable, passed_test). There are certain "illegal" characters that cannot be entered into the name cell. Therefore, if you get an error message and you would like us to add an SPSS Statistics guide to explain what these illegal characters are, please contact us.

Note: For your own clarity, you can also provide a label for your variable in the label column. For example, the label we entered for the within-subjects factor, time_in_months, was "Training time (in months)".

The cells under the measure column should show how your four variables were measured. The case identifier, subject_id, on row 1, is a nominal variable, so nominal is entered under the measure column. The dependent variable, passed_test, on row 2, is an ordinal variable with two categories (i.e., it is a dichotomous variable), so ordinal is entered under the measure column. The between-subjects factor, group, on row 3, is also a nominal (dichotomous) variable, so nominal is entered under the measure column. Finally, the within-subjects factor, time_in_months, on row 4, is a continuous variable with 7 time points, so scale is entered under the measure column.

Note: We suggest changing the cell under the role column from input to none, but you do not have to make this change. We suggest that you do because there are certain analyses in SPSS Statistics where the input setting results in your variables being automatically transferred into certain fields of the dialogue boxes you are using. Since you may not want to transfer these variables, we suggest changing the input setting to none so that this does not happen automatically.

The cell under the values column should contain the information about the categories of your dichotomous dependent variable, passed_test, on row 2, and between-subjects factor, group, on row 3. To enter this information, click into the cell under the values column for your dichotomous dependent variable. The Three dots button will appear in the cell. Click on this button and the Value Labels dialogue box will appear.

Note: The appearance and functionality of the Value Labels dialogue box is slightly different based on whether you have: (a) SPSS Statistics versions 28 to 30 (and the subscription version of SPSS Statistics); or (b) version 27 or an earlier version of SPSS Statistics. If you are unsure which version of SPSS Statistics you are using, see our guide: Identifying your version of SPSS Statistics. The explanation below is relevant for versions 28 to 30 (and the subscription version), but if you would like us to demonstrate this for version 27 or an earlier version of SPSS Statistics, please contact us.

First, click on the Add button, with will allow you to start entering a "value" and a "label" for each category of your dichotomous dependent variable. Therefore, start by entering a "value" for the first category of your dichotomous dependent variable into the Value: box (e.g., "0"), followed by a "label" into the Label: box (e.g., "Fail"). Next, click on the Add button and repeat the process for the second category of your dichotomous dependent variable (i.e., we entered "1" into the Value: box and "Pass" into the Label: box). To complete the process, click on the ok button. The setup for both our dichotomous dependent variable, passed_test, and between-subjects factor, group, are shown on the left and right-hand side of the Value Labels dialogue box below respectively:

value labels showing 'passed_test' and 'group'

Published with written permission from SPSS Statistics, IBM Corporation.

You have now successfully entered all the information that SPSS Statistics needs to know about your four variables into the Variable View. In the next section, we show you how to enter your data into the Data View.

The Data View in SPSS Statistics

Based on the file setup for your four variables in the Variable View above, the Data View should look as follows:

data view showing 'subject_id', 'passed_test', 'group', 'time_in_months'

Published with written permission from SPSS Statistics, IBM Corporation.

The four variables will be displayed in the columns based on the order you entered them into the Variable View. In our example, we first entered the case identifier, subject_id, so this appears in the first column, entitled subject_id. Therefore, the passed_test column, group column and time_in_months column represent the dichotomous dependent variable, passed_test, the between-subjects factor, group, and the within-subjects factor, time_in_months, respectively.

Now, you simply have to enter your data into the cells under each column. In a repeated measures logistic regression using GEE, each row (e.g., row 1) represents one data point (value) (e.g., the fitness test being "passed") for one cell of the design. A cell of the design is represented by one group of the between-subjects factor (e.g., the "control" group) and one group of the within-subjects factor (e.g., "month 0"). Therefore, in our example, a cell of the design would be the control group at month 0. To illustrate this further, see rows 1, 9 and 17 that are highlighted below:

data view showing rows 1, 9, 17 of 'subject_id', 'passed_test', 'group', 'time_in_months' highlighted

Published with written permission from SPSS Statistics, IBM Corporation.

Rows 1, 9 and 17 all provide data for subject 1 (i.e., "1" under the subject_id column) who was in the control group (i.e., "Control" under the group column). The data in rows 1, 9 and 17 shows that at month 0, month 1 and month 2 (i.e., "0", "1" and "2" under the time_in_months column), subject 1 failed the fitness test (i.e., "Fail" under the passed_test column at months 0, 1 and 2).

Therefore, each subject, whether they are in the control group or the intervention group (i.e., "Control" or "Intervention" under the group column) has a fitness test result (i.e., "Pass" or "Fail" under the passed_test column) at each of the 7 time points (i.e., at months 0 to 6 under the time_in_months column).

Since the cells in the Data View will initially be empty, you need to click into the cells to enter your data. You will notice that when you click into the cells under your dichotomous dependent variable (e.g., passed_test) and between-subjects factor (e.g., group), SPSS Statistics will give you a drop-down option with the groups already populated (e.g., "Fail" or 0" and "Passed" or "1" for passed_test).

Your data is now set up correctly in SPSS Statistics. In the next section, we show you how to carry out a repeated measures logistic regression using GEE using SPSS Statistics.

Join the 10,000s of students, academics and professionals who rely on Laerd Statistics.TAKE THE TOUR
SPSS Statistics

SPSS Statistics procedure to carry out a repeated measures logistic regression using generalized estimating equations (GEE)

The 16 steps below show you how to use the Generalized Estimating Equations procedure in SPSS Statistics to carry out a repeated measures logistic regression using GEE to determine if there is a two-way interaction effect between our between-subjects factor, group, and within-subjects factor, time_in_months, in terms of our dichotomous dependent variable, passed_test. In our example, this will inform us if the proportion of subjects (participants) who "passed" the fitness test depends on both the group they were in (i.e., the control group or intervention group) and the length of time after the 6-month experiment started (i.e., in monthly increments from 0 to 6 months). We explain how to interpret if there was a two-way interaction effect in the Interpreting Results section later.

  1. Click on Analyze > Generalized Linear Models > Generalized Estimating Equations... in the main menu, as shown below:
    Menu for a repeated measures logistic regression with genealized estimating equations (GEE) in SPSS Statistics

    Published with written permission from SPSS Statistics, IBM Corporation.


    You will be presented with the Generalized Estimating Equations dialogue box, as shown below:
    'Generalized Estimating Equations' dialogue box in SPSS Statistics. All variables displayed on the left

    Published with written permission from SPSS Statistics, IBM Corporation.

  2. Transfer the subject identifier, subject_id, into the Subject variables: box, and the continuous variable, time_in_months, into the Within-subject variables: box, using the right arrow buttons. Next, in the –Working Correlation Matrix– area, select AR(1) in the Structure: drop-down box, as shown below:

    Note: For the purpose of this example, we selected AR(1) as the working correlation matrix. However, as we mentioned in Assumption #2, there are a lot of considerations and practical steps that are required in order to choose a suitable working correlation matrix. If you would like to know more about this process, please contact us and we will email you when we have added a guide to help with this.

    'subject_id' and 'time_in_months' transferred on the right. Working correlation matrix changed to 'AR(1)'

    Published with written permission from SPSS Statistics, IBM Corporation.

  3. Click on the Statistics tab (which will become highlighted: Statistics). You will be presented with the following dialogue box, as shown below:
    'Type of Model' tab in 'Generalized Estimating Equations' dialogue box selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  4. Select the Binomial logistic option in the –Binary Response or Events/Trials Data– area, as shown below:
    'Binary logistic' selected on the right to run a binomial logistic regression model in SPSS Statistics with repeated measures

    Published with written permission from SPSS Statistics, IBM Corporation.

  5. Click on the Statistics tab (which will become highlighted: Statistics). You will be presented with the following dialogue box, as shown below:
    'Response' tab in 'Generalized Estimating Equations' dialogue box selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  6. Transfer the dependent variable, passed_test, into the Dependent Variable: box, as shown below:
    dependent variable 'passed_test' transferred on the right

    Published with written permission from SPSS Statistics, IBM Corporation.

    Note: When you transfer the dependent variable, passed_test, into the Dependent Variable: box, the reference category will become active: reference category.

  7. In the –Type of Dependent Variable (Binomial Distribution Only)– area, click on the reference category button under the Binary option, as highlighted below:
    'Reference Category' button is highlighted on the right under the 'Binary' option

    Published with written permission from SPSS Statistics, IBM Corporation.


    You will be presented with the Generalized Estimating Equations: Reference Category dialogue box, as shown below:
    'Generalized Estimating Equations: Reference Category' dialogue box in SPSS. Default reference category is 'last (highest value)'

    Published with written permission from SPSS Statistics, IBM Corporation.

  8. In the –Reference Category– area, select the First (lowest value) option, as shown below:
    Reference category changed to 'first (lowest value)' in 'Generalized Estimating Equations: Reference Category' dialogue box

    Published with written permission from SPSS Statistics, IBM Corporation.

  9. Click on the Continue button and you will be returned to the Statistics tab of the Generalized Estimating Equations dialogue box, as shown below:
    'Response' tab in 'Generalized Estimating Equations' dialogue box with options in 'Reference Category' dialogue box now selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  10. Click on the Statistics tab (which will become highlighted: Statistics). You will be presented with the following dialogue box, as shown below:
    'Predictors' tab in 'Generalized Estimating Equations' dialogue box selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  11. Transfer the between-subjects factor, group, and the within-subjects factor, time_in_months, into the Factors: box, using the Right arrow button, as shown below:
    Both factors, 'group' and 'time_in_months' transferred into the top right-hand side box, 'factors'

    Published with written permission from SPSS Statistics, IBM Corporation.

  12. Click on the Statistics tab (which will become highlighted: Statistics). You will be presented with the following dialogue box, as shown below:
    'Model' tab in 'Generalized Estimating Equations' dialogue box selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  13. In the –Specify Model Effects– area, select the between-subjects factor, group, and the within-subjects factor, time_in_months, in the Factors and Covariates: box. Next, in the Type: box in the –Build Term(s)– area, select the factorial option from the drop-down box (the default option is main effects). Finally, click on the right arrow button to transfer the main effects of group and time_in_months and their interaction term, group*time_in_months, into the Model: box, as shown below:
    main effects of 'group' and 'time_in_months' and interaction term 'group x time_in_months' transferred into 'Model' box on right

    Published with written permission from SPSS Statistics, IBM Corporation.

  14. Click on the Statistics tab (which will become highlighted: Statistics). You will be presented with the following dialogue box, as shown below:
    'Statistics' tab in 'Generalized Estimating Equations' dialogue box selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  15. Select the Include exponential parameter estimates and Working correlation matrix options in the –Print– area, as shown below:
    At bottom of 'Print' area, two options - 'include exponential parameter estiamtes' and 'working correlation matrix' - are selected

    Published with written permission from SPSS Statistics, IBM Corporation.

  16. Click on the OK button. This will generate the results.

In the next section, we explain how to interpret the two-way interaction effect that was calculated when running the Generalized Estimating Equations procedure above.

Join the 10,000s of students, academics and professionals who rely on Laerd Statistics.TAKE THE TOUR
SPSS Statistics

Interpreting the results of a repeated measures logistic regression using generalized estimating equations (GEE)

The SPSS Statistics procedure in the previous section will generate quite a few tables of output. In this section, we explain two of these tables. In the first section below, we explain how to use the Model Information table, which is helpful to check that you have carried out the correct procedure in SPSS Statistics. In the second section, we explain how to interpret the Tests of Model Effects table, which shows whether there is a two-way interaction between the between-subjects factor (e.g., group) and within-subjects factor (e.g., time_in_months) in terms of your dependent variable (e.g., passed_test). We do this using our example. Therefore, we are trying to determine if differences in the proportion of military personnel who pass the fitness test is the result of: (a) the group they belonged to (i.e., the control group where they continued their current training routine or the intervention group where they underwent the new training programme); and (b) the amount of time they spent training (i.e., from the first fitness test at month 0 through to the last fitness test at month 6).

Note: As mentioned in the section, Example used in this guide, you would typically want to know more than whether there is a two-way interaction effect. For example, you may want to use confidence intervals (CI) and more specific contrasts to learn more about your data. Therefore, we plan to add guides to explain how to carry out and interpret these follow-up analyses using SPSS Statistics. If these guides are of interest, please contact us and we will email you when they become available.

Checking that you carried out the correct procedure in SPSS Statistics

First, it is good practice to check that you ran the correct procedure in SPSS Statistics, which simply means that you select the options you wanted in the Procedure section and did not make any mistakes in your selections. You can do this by consulting the Model Information table, as shown below:

Note: In the image below, we have included the Model Information table twice because this will look slightly different depending on whether you: (a) only included the names of your variables under the name column of the Variable View when setting up your data, which reflects the Model Information table on the left below; or (b) included the names and labels of your variables under the name and label columns, as we did in the Data Setup section earlier, which reflects the Model Information table on the right below.

'Model Information' table for a repeated measures logistic regression with generalized estimating equations in SPSS.

Published with written permission from SPSS Statistics, IBM Corporation.

In the bullets that follow, we explain how to interpret each row of the Model Information table. We do this using the text in the table on the right above:

After consulting the Model Information table to check that you ran the correct procedure in SPSS Statistics, you can start to interpret the results from your repeated measures logistic regression using GEE analysis.

Determining if there is a two-way interaction effect

In this section, we explain how to interpret the two-way interaction of a repeated measures logistic regression using GEE. In our example, this is the two-way interaction between the between-subjects factor, group, and within-subjects factor, time_in_months, in terms of the dependent variable, passed_test.

You can learn whether there is a two-way interaction by consulting the Tests of Model Effects table, which shows the model that was run to analyse your data using a repeated measures logistic regression using GEE, as shown below:

'Tests of Model Effects' table for a repeated measures logistic regression with generalized estimating equations in SPSS.

Published with written permission from SPSS Statistics, IBM Corporation.


The Tests of Model Effects table includes a footnote stating what was included in the model that you created in Step 13, as highlighted below:

Factorial model in 'Tests of Model Effects' table highlighted for a repeated measures logistic regression with generalized estimating equations in SPSS.

Published with written permission from SPSS Statistics, IBM Corporation.


This confirms that a full factorial model was run to include the dependent variable, "Fitness test" (i.e., passed_test), the "Intercept", the two main effects of "Group" (i.e., group) and "Training time (in months)" (i.e., time_in_months), and the two-way interaction of "Group * Training time (in months)".

For the purpose of interpreting whether there was a two-way interaction effect between "group" and "time" in terms of the proportion of passes of the "fitness test", you need to consulting the "Group * Training time (in months)" row, as highlighted below:

Two-way interaction effect in 'Tests of Model Effects' table highlighted for a repeated measures logistic regression with generalized estimating equations in SPSS.

Published with written permission from SPSS Statistics, IBM Corporation.


The two-way interaction of "Group * Training time (in months)" displays the Wald chi-square statistic (i.e., under the "Wald Chi-Square" column), the degrees of freedom (i.e., under the "df" column) and the p-value (i.e., the statistical significance value) of the two-way interaction (i.e., under the "Sig." column). For the purpose of this analysis, we are interested in the (i.e., p-value) of the two-way interaction, as highlighted under the "Sig" column below:

Two-way interaction effect in 'Tests of Model Effects' table highlighted for a repeated measures logistic regression with generalized estimating equations in SPSS.

Published with written permission from SPSS Statistics, IBM Corporation.


If the p-value under the "Sig" column is less than .05 (i.e., p < .05), the two-way interaction is statistically significant. If so, you can reject the null hypothesis of no two-way interaction effect and accept the alternative hypothesis that there is a two-way interaction effect.

Alternatively, if the p-value is greater than .05 (i.e., p > .05), the two-way interaction is not statistically significant. If so, you fail to reject the null hypothesis, suggesting that there is not enough evidence to accept the alternative hypothesis that there is a two-way interaction effect (i.e., rejecting the alternative hypothesis that there is a two-way interaction effect). It is important to note that you cannot accept the null hypothesis that there is no two-way interaction effect. In other words, if the p-value is greater than .05, we cannot conclude that there is no two-way interaction effect. We can only state that there is insufficient evidence to prove that there is a two-way interaction effect.

Since the p-value is .002 in our example (i.e., p = .002), the two-way interaction is statistically significant (i.e., a p-value of .002 is less than .05). This suggests that:

Whilst for the purpose of this introductory guide, we have only determined whether there is a two-way interaction effect, the researcher would typically use follow-up analyses to understand how pass rates might change over time between the two groups. For example, if the new training programme is successful in helping military personnel to pass the fitness test relative to their unusual training routine, the researcher may want to understand how pass rates changed for each extra month of training. Alternatively, the researcher may be able to learn if any difference in pass rates between the two groups occurred after a specific number of months of training.

What type of follow-up analyses should be carried out will depend on the goals of your research. If you would like us to add a guide explaining the different types of follow-up analyses that can be carried out, please contact us.

SPSS Statistics

Bibliography and Referencing

Book Agresti, A. (2013). Categorical data analysis (3rd ed.). Hoboken, NJ: John Wiley & Sons.
JournalBreitung, J., Chaganty, N. R., Daniel, R. M., Kenward, M. G., Lechner, M., Martus, P., Sabo, R. T., Wang, Y.-G., & Zorn, C. (2010). Discussion of "generalized estimating equations: Notes on the choice of the working correlation matrix, Methods Inf Med, 49(05), 426-432. https://doi.org/10.1055/s-0038-1625133
BookDiggle, P. J., Heagerty, P., Liang, K. Y., & Zeger, S. L. (2002). The analysis of longitudinal data (2nd ed.). Oxford: Oxford University Press.
JournalFu, L., Hao, Y., & Wang, Y-G. (2018). Working correlation structure selection in generalized estimating equations. Computational Statistics, 33, 983-996. https://doi.org/10.1007/s00180-018-0800-4
BookHardin, J. W., & Hilbe, J. M. (2013). Generalized estimating equations (2nd ed.). Boca Raton, FL: CRC Press. https://doi.org/10.1201/b13880
JournalHin, L-Y., & Wang, Y-G. (2008). Working-correlation-structure identification in generalized estimating equations. Statistics in Medicine, 28(4), 642-658. https://doi.org/10.1002/sim.3489
WebsiteIBM Corporation (2024). Generalized Estimating Equations. Retrieved May 04, 2025, from https://www.ibm.com/docs/en/spss-statistics/saas?topic=statistics-generalized-estimating-equations.
JournalLiang, K-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22. https://doi.org/10.1093/biomet/73.1.13
BookLittle, R. J. A., & Rubin, D. B.(2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: John Wiley & Sons. https://doi.org/10.1002/9781119013563
Book Little, R. J. A., & Rubin, D. B.(2020). Statistical analysis with missing data (3rd ed.). Hoboken, NJ: John Wiley & Sons. https://doi.org/10.1002/9781119482260
JournalPreisser, J. S., Lohman, K. K., & Rathouz, P. J. (2002). Performance of weighted estimating equations for longitudinal binary data with drop-outs missing at random. Stat Med., 21(20), 3035-3054. https://doi.org/10.1002/sim.1241
JournalRobin, J. M., & Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the Americal Statistical Association, 90(429), 122-129. https://doi.org/10.2307/2335739
JournalRubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592. https://doi.org/10.1002/sim.1241
JournalWang, Y-G., & Fu, L. (2017). Selection of working correlation structure in generalized estimating equations. Statistics in Medicine. 36(14), 2206-2219. https://doi.org/10.1002/sim.7262
JournalZiegler, A., & Vens, M. (2010). Generalized estimating equations. Notes on the choice of the working correlation matrix. Methods Inf Med. 49(05), 421-425. https://doi.org/10.3414/ME10-01-0026
WebsiteMHRA (Medicines & Healthcare products Regulatory Agency) (2025). Guidance on adverse drug reactions. Retrieved April 26, 2025, from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/949130/Guidance_on_adverse_drug_reactions.pdf.
SPSS Statistics

Reference this article

Laerd Statistics (2025). Repeated measures logistic regression with generalized estimating equations (GEE) using SPSS Statistics. Statistical tutorials and software guides. Retrieved Month, Day, Year, from https://statistics.laerd.com/spss-tutorials/repeated-measures-logistic-regression-with-generalized-estimating-equations-gee-using-spss-statistics.php

For example, if you viewed this guide on 10th May 2025, you would use the following reference:

Laerd Statistics (2025). Repeated measures logistic regression with generalized estimating equations (GEE) using SPSS Statistics. Statistical tutorials and software guides. Retrieved May, 10, 2025, from https://statistics.laerd.com/spss-tutorials/repeated-measures-logistic-regression-with-generalized-estimating-equations-gee-using-spss-statistics.php

1