Cohen's kappa using SPSS Statistics

Introduction

In research designs where you have two or more raters (also known as "judges" or "observers") who are responsible for measuring a variable on a categorical scale, it is important to determine whether such raters agree. Cohen's kappa (κ) is such a measure of inter-rater agreement for categorical scales when there are two raters (where κ is the lower-case Greek letter 'kappa').

There are many occasions when you need to determine the agreement between two raters. For example, the head of a local medical practice might want to determine whether two experienced doctors at the practice agree on when to send a patient to get a mole checked by a specialist. Both doctors look at the moles of 30 patients and decide whether to "refer" or "not refer" the patient to a specialist (i.e., where "refer" and "not refer" are two categories of a nominal variable, "referral decision"). The level of agreement between the two doctors for each patient is analysed using Cohen's kappa. Since the results showed a very good strength of agreement between the two doctors, the head of the local medical practice feels somewhat confident that both doctors are diagnosing patients in a similar manner. However, it is worth noting that even if two raters strongly agree, this does not necessarily mean that their decision is correct (e.g., both doctors could be misdiagnosing the patients, perhaps referring them too often when it is not necessary). This is something that you have to take into account when reporting your findings, but it cannot be measured using Cohen's kappa (when comparing two the doctors).

Note: There are variations of Cohen's kappa (κ) that are specifically designed for ordinal variables (called weighted kappa, κ_w) and for multiple raters (i.e., more than two raters).

This "quick start" guide shows you how to carry out Cohen's kappa using SPSS Statistics, as well as interpret and report the results from this test. However, before we introduce you to this procedure, you need to understand the different assumptions that your data must meet in order for Cohen's kappa to give you a valid result. We discuss these assumptions next.

SPSS Statistics

Basic requirements and assumptions of Cohen's kappa

Cohen's kappa has five assumptions that must be met. If these assumptions are not met, you cannot use Cohen's kappa, but may be able to use another statistical test instead. Therefore, in order to run a Cohen's kappa, you need to check that your study design meets the following five assumptions:

Assumption #1: The response (e.g., judgement) that is made by your two raters is measured on a nominal scale (i.e., either an ordinal or nominal variable) and the categories need to be mutually exclusive. For example, the two raters could be assessing whether a patient's mole was "normal" or "suspicious" (i.e., two categories); whether the quality of service provided by a customer service agent was "above average", "average" or "below average" (i.e., three categories); or whether a person's activity level should be considered "sedentary", "low", "medium" or "high" (i.e., four categories). In addition, the categories being assessed by the two raters should be "mutually exclusive", which means that no categories overlap (e.g., a rater could only consider a patient's mole to be normal OR suspicious; the mole cannot be normal and suspicious at the same time).
Assumption #2: The response data are paired observations of the same phenomenon, meaning that both raters assess the same observations. Take the example above of two experienced doctors who were asked to look at the moles of 30 patients and decide whether to "refer" or "not refer" each patient to a specialist. A single paired observation reflects the assessment of "Doctor 1" for "Patient 1" compared to the assessment of "Doctor 2" for "Patient 1" (i.e., they are comparing the same patient). With 30 patients in the study, this means that there are 30 paired observations.
Assumption #3: Each response variable must have the same number of categories and the crosstabulation must be symmetric (i.e., "square") (e.g., a 2x2 crosstabulation, 3x3 crosstabulation, 4x4 crosstabulation, etc.). For example, a 2x2 crosstabulation means that the responses of both raters are measured on a dichotomous scale; that is, a nominal scale with two categories (e.g., no scarring vs scarring; more trustworthy vs less trustworthy; resuscitate vs do not resuscitate; and so forth). Therefore, a 3x3 crosstabulation would mean that the responses for both raters were measured on a nominal scale with three categories (e.g., complete information recall vs some information recall vs no information recall), whilst a 4x4 crosstabulation involves a nominal variable with four categories (e.g., undecided voter vs floating voter vs protest voter vs partisan voter).
Assumption #4: The two raters are independent (i.e., one rater's judgement does not affect the other rater's judgement). For example, if the two doctors in the example above discuss their assessment of the patients' moles before recording their response (i.e., "refer" or "not refer") or perhaps are simply in the same room when they make their assessment, this could influence the assessment they make. It is important that the potential for such bias is removed from the study design as much as is possible.
Assumption #5: The same two raters are used to judge all observations (e.g., patients). This has been referred to as having fixed or unique raters. If different raters were used for each observation (e.g., patient), Cohen's kappa is not the appropriate test to use. However, in this latter case, you could use Fleiss' kappa instead, which allows randomly chosen raters for each observation (e.g., patient).

If your study design does not meet these five assumptions, you will not be able to run a Cohen's kappa. If you would like to know more about the characteristics of Cohen's kappa, including the null and alternative hypothesis it is testing, see our enhanced Cohen's kappa guide. In the section, Test Procedure in SPSS Statistics, we show you how to analyse your data using Cohen's kappa in SPSS Statistics. First, we introduce you to the example we use in this guide.