Two-way ANOVA in Stata
Introduction
The two-way ANOVA compares the mean differences between groups that have been split on two independent variables (called factors). The primary purpose of a two-way ANOVA is to understand if there is an interaction between the two independent variables on the dependent variable.
For example, you could use a two-way ANOVA to understand whether there is an interaction between educational level and degree type on salary (i.e., your dependent variable would be "salary", measured on a continuous scale using US dollars, and your independent variables would be "educational level", which has three groups – "undergraduate", "master's" and "PhD" – and "degree type", which has five groups: "business studies", "psychology", "biological sciences", "engineering" and "law"). Alternately, you could use a two-way ANOVA to understand whether there is an interaction between physical activity level and gender on blood cholesterol concentration in children (i.e., your dependent variable would be "blood cholesterol concentration", measured on a continuous scale in mmol/L, and your independent variables would be "physical activity level, which has three groups – "low", "moderate" and "high" – and "gender", which has two groups: "males" and "females").
Note: If you have three independent variables rather than two, you need a three-way ANOVA.
If you have a statistically significant interaction between your two independent variables on the dependent variable, you can follow up this result by determining whether there are any "simple main effects", and if there are, what these effects are (e.g., perhaps females with a university education had a greater interest in politics than males with a university education). We come back to "simple main effects" later.
In this "quick start" guide, we show you how to carry out a two-way ANOVA using Stata, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for a two-way ANOVA to give you a valid result. We discuss these assumptions next.
Stata
Assumptions
There are six "assumptions" that underpin the two-way ANOVA. If any of these six assumptions are not met, you cannot analyze your data using a two-way ANOVA because you will not get a valid result. Since assumptions #1, #2 and #3 relate to your study design and choice of variables, they cannot be tested for using Stata. However, you should decide whether your study meets these assumptions before moving on.
- Assumption #1: Your dependent variable should be measured at the continuous level. Examples of such continuous variables include height (measured in feet and inches), temperature (measured in °C), salary (measured in US dollars), revision time (measured in hours), intelligence (measured using IQ score), reaction time (measured in milliseconds), test performance (measured from 0 to 100), sales (measured in number of transactions per month), and so forth. If you are unsure whether your dependent variable is continuous (i.e., measured at the interval or ratio level), see our Types of Variable guide.
- Assumption #2: Your two independent variables should each consist of two or more categorical, independent (unrelated) groups. Examples of categorical variables include gender (e.g., 2 groups: male and female), ethnicity (e.g., 3 groups: Caucasian, African American and Hispanic), profession (e.g., 5 groups: surgeon, doctor, nurse, dentist, therapist), and so forth.
- Assumption #3: You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. For example, there must be different participants in each group with no participant being in more than one group. If you do not have independence of observations, it is likely you have "related groups", which means you might need to use a two-way repeated measures ANOVA instead of the two-way ANOVA.
Fortunately, you can check assumptions #4, #5 and #6 using Stata. When moving on to assumptions #4, #5 and #6, we suggest testing them in this order because it represents an order where, if a violation to the assumption is not correctable, you will no longer be able to use a two-way ANOVA. In fact, do not be surprised if your data fails one or more of these assumptions since this is fairly typical when working with real-world data rather than textbook examples, which often only show you how to carry out a two-way ANOVA when everything goes well. However, don’t worry because even when your data fails certain assumptions, there is often a solution to overcome this (e.g., transforming your data or using another statistical test instead). Just remember that if you do not check that your data meets these assumptions or you test for them incorrectly, the results you get when running a two-way ANOVA might not be valid.
- Assumption #4: There should be no significant outliers. An outlier is simply a single case within your data set that does not follow the usual pattern (e.g., in a study of 100 students' IQ scores, where the mean score was 108 with only a small variation between students, one student had a score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The problem with outliers is that they can have a negative effect on the two-way ANOVA, reducing the accuracy of your results. Fortunately, when using Stata to run a two-way ANOVA on your data, you can easily detect possible outliers.
- Assumption #5: Your dependent variable should be approximately normally distributed for each combination of the groups of the two independent variables. Your data need only be approximately normal for running a two-way ANOVA because it is quite "robust" to violations of normality, meaning that this assumption can be a little violated and still provide valid results. You can test for normality using the Shapiro-Wilk test of normality, which is easily tested for using Stata.
- Assumption #6: There needs to be homogeneity of variances for each combination of the groups of the two independent variables. You can test this assumption in Stata using Levene's test for homogeneity of variances.
In practice, checking for assumptions #4, #5 and #6 will probably take up most of your time when carrying out a two-way ANOVA. However, it is not a difficult task, and Stata provides all the tools you need to do this.
In the section, Test Procedure in Stata, we illustrate the Stata procedure required to perform a two-way ANOVA assuming that no assumptions have been violated. First, we set out the example we use to explain the two-way ANOVA procedure in Stata.
Stata
Example
A researcher was interested in whether an individual's interest in politics was influenced by their level of education and gender. Therefore, the dependent variable was "interest in politics", and the two independent variables were "gender" and "level of education".
In particular, the researcher wanted to know whether there was an interaction between education level and gender. Put another way, was the effect of level of education on interest in politics different for males and females?
To answer this question, a random sample of 60 participants were recruited to take part in the study – 30 males and 30 females – equally split by level of education: school, college and university (i.e., 10 participants in each group). Each participant in the study completed a questionnaire that scored their interest in politics on a scale of 0 to 100, with higher scores indicating a greater interest in politics.
Participants' interest in politics was recorded in the variable, Int_Politics, their gender in the variable, Gender, and their level of education in the variable, Edu_Level. In variable terms, the researcher wanted to know if there was an interaction between Gender and Edu_Level on Int_Politics.
Stata
Setup in Stata
In Stata, we separated the individuals into their appropriate groups by using two columns representing the two independent variables, and labelled them Gender and Edu_Level. For Gender, we coded "Male" as 1 and "Female" as 2, and for Edu_Level, we coded "School" as 1, "College" as 2 and "University" as 3. The participants' interest in politics – the dependent variable – was entered under the variable name, Int_Politics. The setup for this example can be seen below:
Published with written permission from StataCorp LP.
The scores for the independent variables, Edu_Level and Gender, as well as the scores for the dependent variable, Int_Politics, were then entered into the Data Editor (Edit) spreadsheet, as shown below:
Published with written permission from StataCorp LP.
Stata
Test Procedure in Stata
In this section, we show you how to analyze your data using a two-way ANOVA in Stata when the six assumptions in the previous section, Assumptions, have not been violated. You can carry out a two-way ANOVA using code or Stata's graphical user interface (GUI). After you have carried out your analysis, we show you how to interpret your results. First, choose whether you want to use code or Stata's graphical user interface (GUI).
Stata
Code
In the first section below, we set out the code to carry out a two-way ANOVA. All code is entered into Stata's box, as illustrated below:
Published with written permission from StataCorp LP.
The code to run a two-way ANOVA on your data takes the form:
anova DependentVariable FirstIndependentVariable##SecondIndependentVariable
Using our example where the dependent variable is Int_Politics and the two independent variables are Gender and Edu_Level, the required code would be:
anova Int_Politics Gender##Edu_Level
Therefore, enter the code and press the "Return/Enter" button on your keyword.
You can see the Stata output that will be produced here. If there is a statistically significant interaction, you can carry out simple main effects. We discuss this later.
Stata
Graphical user interface (GUI)
- Click Statistics > Linear models and related > ANOVA/MANOVA > Analysis of variance and covariance on the top menu as shown below:
Published with written permission from StataCorp LP.
You will be presented with the following anova - Analysis of variance and covariance dialogue box:
Published with written permission from StataCorp LP.
- Select the dependent variable, Int_Politics, from within the Dependent variable: drop-down box, and click on the three dot button, , to the far right of the Model: drop-down box.
Published with written permission from StataCorp LP.
You will be presented with the following Create varlist with factor variables dialogue box:
Published with written permission from StataCorp LP.
- Keep the Factor variable option selected in the –Type of variable– area. In the –Add factor variable– area, select the option from within the Specification: drop-down box. You will be presented with a second Variables drop-down box, as shown below:
Published with written permission from StataCorp LP.
- For Variable 1:, select Gender under the Variables drop-down box and default under the Base drop-down box. For Variable 2:, select Edu_Level under the Variables drop-down box and default under the Base drop-down box. Next, click on the button, which will add the Model term, Gender##Edu_Level, to the Varlist: box.
Published with written permission from StataCorp LP.
Note: We have not ticked the check box, , under c. for either of our two independent variables, Gender or Edu_Level. This is because Assumption #2 of a two-way ANOVA is that both independent variables are "factorial variables" (i.e., categorical variables); that is, Gender has two categories (i.e., Male and Female), whilst Edu_Level has three categories (i.e., School, College and University).
- Click on the button. You will be presented with the anova - Analysis of variance and covariance dialogue box, but now with the Model term, Gender##Edu_Level, having been added in the Model: box, as shown below:
Published with written permission from StataCorp LP.
Click on the button. This will generate the Stata output for the two-way ANOVA, shown in the next section.
Stata
Output of the two-way ANOVA in Stata
If your data passed assumption #4 (i.e., there were no significant outliers), assumption #5 (i.e., your dependent variable was approximately normally distributed for each group of the independent variable) and assumption #6 (i.e., there was homogeneity of variances), which we explained earlier in the Assumptions section, you will only need to interpret the following Stata output for the two-way ANOVA:
Published with written permission from StataCorp LP.
The Gender, Edu_Level and Gender#Edu_Level rows in the output above explain whether we have statistically significant effects for our two independent variables, Gender and Edu_Level, and for their interaction, Gender#Edu_Level.
We first look at the Gender#Edu_Level interaction because this is the most important result we are after. We can see from the Prob > F column that we have a statistically significant interaction at the p = .0016 level. You may wish to report the results of Gender and Edu_Level as well. We can see from the output above that there was no statistically significant difference in interest in politics between Gender (p = .4987), but there were statistically significant differences between educational levels (p < .0005).
Finally, if you have a statistically significant interaction, you will also need to report simple main effects; that is, the effect of one of the independent variables at a particular level of the other independent variable. In our example, this would involve determining the mean difference in interest in politics between genders at each educational level, as well as between educational level for each gender (e.g., perhaps females with a university education had a greater interest in politics than males with a university education). Alternately, if you do not have a statistically significant interaction, you might report the main effects instead. Both the simple main effects and main effects can be calculated using Stata.
Stata
Reporting the results of a two-way ANOVA
When you report the output of your two-way ANOVA, it is good practice to include:
- A. An introduction to the analysis you carried out.
- B. Information about your sample (including how many participants were in each of your groups if the group sizes were unequal or there were missing values).
- C. A statement of whether there was a statistically significant interaction between your two independent variables on the dependent variable (including the observed F-value [F], degrees of freedom [df], and significance level, or more specifically, the 2-tailed p-value [Prob > F].
- D. If the interaction was statistically significant, a statement of which groups from the two independent variables showed statistically significant differences in terms of the dependent variable; that is, the "simple main effects" (indicating which groups were or were not statistically significantly different, including the relevant p-values).
Based on the Stata output above, we could report the results of this study as follows (N.B., we have also included an example of simple main effects):
- General
A two-way ANOVA was run on a sample of 60 participants to examine the effect of gender and education level on interest in politics. There was a significant interaction between the effects of gender and education level on interest in politics, F(2, 52) = 7.33, p = .0016. Simple main effects analysis showed that males were significantly more interested in politics than females when educated to university level (p = .002), but there were no differences between gender when educated to school (p = .465) or college level (p = .793).