Login

Creating an "ID" variable in SPSS Statistics

Introduction

An ID variable, also know as a case identifier or subject/participant ID, is often used in SPSS Statistics to distinguish between "cases" in your data set. Cases may be people (e.g., participants in your study), objects, animals or something else that you have measured.

In this guide, we show you how to create an ID variable using the Compute Variable... procedure in SPSS Statistics. First, we set out the example used in this guide.

Note: The data setup and procedure to create an ID variable are identical for all versions of SPSS Statistics. However, in version 27, SPSS Statistics introduced a new look to their interface called "SPSS Light", replacing the previous look for versions 26 and earlier versions of SPSS Statistics, which was called "SPSS Standard". Therefore, if you are using SPSS Statistics version 26 or an earlier version of SPSS Statistics, the images in this guide will be blue rather than light grey. However, the data setup and procedure are identical.

SPSS Statistics

Data setup in SPSS Statistics

We want to carry out a multiple regression to predict maximal aerobic capacity (VO2max), which is an indicator of fitness and health, based on a person's age, weight, heart rate and gender. In this study, the dependent variable is "VO2max" and the four independent variables are "age", "weight", "heart rate" and "gender". Data was collected from 100 participants that took part in the research.

Explanation: An ID variable is not used directly in calculations for a multiple regression analysis. However, when testing the assumptions of a multiple regression, such as detecting outliers in the data, it becomes easier to identify such outliers when an ID variable has been created.

To set up these variables, SPSS Statistics has a Variable View where you define the types of variables you are analysing and a Data View where you enter your data for these variables. The data setup in the Variable View is shown below:

Data View showing five variables

Published with written permission from SPSS Statistics, IBM Corporation.

Each of our five variables is displayed on a separate row (i.e., the dependent variable, VO2max, on row 1, and the four independent variables – age, weight, heart_rate and gender – on rows 2, 3, 4 and 5, respectively).

After you use the Compute Variable... procedure in SPSS Statistics later in this guide to create an ID variable, ID, this sixth variable will appear in our Variable View, as highlighted below:

Variable View showing five variables. ID variable highlighted

Published with written permission from SPSS Statistics, IBM Corporation.

On row 6, you can see that the ID variable, ID, is a nominal variable (i.e., nominal in the cell under the measure column) with no decimal places (i.e., "0" in the cell under the decimals). We have also given it the label, "Participant ID" (i.e., in the cell under the label) so that it is clear what the ID variable represents (i.e., the ID variable has been set up in our example to represent each of the 100 participants who took part in the research, but if your ID variable represents something different – perhaps an object, animal, a series of variables, etc. – you may want to label your ID variable accordingly). Finally, the cell under the role is set to none.

Note: We suggest changing the cell under the role column from input to none, but you do not have to make this change. We suggest that you do because there are certain analyses in SPSS Statistics where the input setting results in your variables being automatically transferred into certain fields in the dialogue boxes you may use to carry out your analysis. Since you may not want to transfer these variable, we suggest changing the input setting to none so that this does not happen automatically.

Based on the file setup for the five variables in the Variable View above (i.e., without the ID variable), the Data View window will look as follows:

Data View showing five variables

Published with written permission from SPSS Statistics, IBM Corporation.

Our five variables are displayed in the columns of the Data View based on the order we entered them into the Variable View window. Therefore, in our example, we first entered the dependent variable, VO2max, so this appears in the first column, entitled v22max, followed by the four independent variables from left to right: age in the second column, entitled age, and then weight, heart_rate and gender in the weight, heart rate and gender columns respectively.

Note: The setup above is known as wide format because there is one case per row. This means that each row contains the data for a single case; that is, each row contains the scores for a single participant for each of the variables that have been measured. If your data is set up differently (i.e., in long format), the SPSS Statistics procedure shown in this guide will not help you to create an ID variable. If this is the case, please contact us.

Testimonials
TAKE THE TOUR


SPSS Statistics

SPSS Statistics procedure to create an "ID" variable

In this section, we explain how to create an ID variable, ID, using the Compute Variable... procedure in SPSS Statistics. The following procedure will only work when you have set up your data in wide format where you have one case per row (i.e., your Data View has the same setup as our example, as explained in the note above):

  1. Click Transform > Compute Variable... on the main menu, as shown below:

    Note: Depending on your version of SPSS Statistics, you may not have the same options under the Transform menu as shown below, but all versions of SPSS Statistics include the same compute variable menu option that you will use to create an ID variable.

    computer menu to create a new ID variable

    Published with written permission from SPSS Statistics, IBM Corporation.


    You will be presented with the Compute Variable dialogue box, as shown below:
    'recode into different variables' dialogue box displayed

    Published with written permission from SPSS Statistics, IBM Corporation.

  2. Enter the name of the ID variable you want to create into the Target Variable: box. In our example, we have called this new variable, "ID", as shown below:
    ID variable entered into Target Variable box in top left

    Published with written permission from SPSS Statistics, IBM Corporation.

  3. Click on the change button and you will be presented with the Compute Variable: Type and Label dialogue box, as shown below:
    empty 'compute variable: type and label' dialogue box

    Published with written permission from SPSS Statistics, IBM Corporation.

  4. Enter a more descriptive label for your ID variable into the Label: box in the –Label– area (e.g., "Participant ID"), as shown below:
    participant ID entered in 'compute variable: type and label' dialogue box

    Published with written permission from SPSS Statistics, IBM Corporation.

    Note: You do not have to enter a label for your new ID variable, but we prefer to make sure we know what a variable is measuring (e.g., this is especially useful if working with larger data sets with lots of variables). Therefore, we entered the label, "Participant ID", into the Label: box. This will be the label entered in the label column in the Variable View of SPSS Statistics when you complete at the steps below.

  5. Click on the continue button. You will be returned to the Compute Variable dialogue box, as shown below:
    ID variable entered

    Published with written permission from SPSS Statistics, IBM Corporation.

  6. Enter the numeric expression, $CASENUM, into the Numeric Expression: box, as shown below:
    second category - '2' and '4' - entered

    Published with written permission from SPSS Statistics, IBM Corporation.

  7. Explanation: The numeric expression, $CASENUM, instructs SPSS Statistics to add a sequential number to each row of the Data View. Therefore, the sequential numbers start at "1" in row 1, then "2" in row 2, "3" in row 3, and so forth. The sequential numbers are added to each row of data in the Data View. Therefore, since we have 100 participants in our example, the sequential numbers go from "1" in row 1 through to "100" in row 100.

    Note: Instead of typing in $CASENUM, you can click on "All" in the Function group: box, followed by "$Casenum" from the options that then appear in the Functions and Special Variables: box. Finally, click on the up arrow button. The numeric expression, $CASENUM, will appear in the Numeric Expression: box.

  8. Click on the ok button and the new ID variable, ID, will have been added to our data set, as highlighted in the Data View window below:

data view with new 'nominal' ID variable highlighted

Published with written permission from SPSS Statistics, IBM Corporation.


If you look under the ID column in the Data View above, you can see that a sequential number has been added to each row, starting with "1" in row 1, then "2" in row 2, "3" in row 3, and so forth. Since we have 100 participants in our example, the sequential numbers go from "1" in row 1 through to "100" in row 100.

Therefore, participant 1 along row 1 had a VO2max of 55.79 ml/min/kg (i.e., in the cell under the vo2max column), was 27 years old (i.e., in the cell under the age column), weighed 70.47 kg (i.e., in the cell under the weight column), had an average heart rate of 150 (i.e., in the cell under the heart rate column) and was male (i.e., in the cell under the gender column).

The new variable, ID, will also now appear in the Variable View of SPSS Statistics, as highlighted below:

variable view for new 'nominal' ID variable highlighted

Published with written permission from SPSS Statistics, IBM Corporation.


The name of the new variable, "ID" (i.e., under the name column), reflects the name you entered into the Target Variable: box of the Compute Variable dialogue box in Step 2 above. Similarly, the label of the new variable, "Participant ID" (i.e., under the label column), reflects the label you entered into the Label: box in the –Label– area in Step 4 above. You may also notice that we have made changes to the decimals, measure and role columns for our new variable, "ID". When the new variable is created, by default in SPSS Statistics the role column will be set to "2" (i.e., two decimal places), the measure will show scale and the role column will show input. We changed the number of decimal places in the decimals column from "2" to "0" because when you are creating an ID variable, this does not require any decimal places. Next, we changed the variable type from the default entered by SPSS Statistics, scale, to nominal, because our new ID variable is a nominal variable (i.e., a nominal variable) and not a continuous variable (i.e., not a scale variable). Finally, we changed the cell under the role from the default, input, to none, for the same reasons mentioned in the note above.

Referencing

Laerd Statistics (2025). Creating an "ID" variable in SPSS Statistics. Statistical tutorials and software guides. Retrieved from https://statistics.laerd.com/


Join the 10,000s of students, academics and professionals who rely on Laerd Statistics.TAKE THE TOUR
1