Measures of Spread

Introduction

A measure of spread, sometimes also called a measure of dispersion, is used to describe the variability in a sample or population. It is usually used in conjunction with a measure of central tendency, such as the mean or median, to provide an overall description of a set of data.

Why is it important to measure the spread of data?

There are many reasons why the measure of the spread of data values is important, but one of the main reasons regards its relationship with measures of central tendency. A measure of spread gives us an idea of how well the mean, for example, represents the data. If the spread of values in the data set is large, the mean is not as representative of the data as if the spread of data is small. This is because a large spread indicates that there are probably large differences between individual scores. Additionally, in research, it is often seen as positive if there is little variation in each data group as it indicates that the similar.

We will be looking at the range, quartiles, variance, absolute deviation and standard deviation.

Range

The range is the difference between the highest and lowest scores in a data set and is the simplest measure of spread. So we calculate range as:

Range = maximum value - minimum value

For example, let us consider the following data set:

The maximum value is 85 and the minimum value is 23. This results in a range of 62, which is 85 minus 23. Whilst using the range as a measure of spread is limited, it does set the boundaries of the scores. This can be useful if you are measuring a variable that has either a critical low or high threshold (or both) that should not be crossed. The range will instantly inform you whether at least one value broke these critical thresholds. In addition, the range can be used to detect any errors when entering data. For example, if you have recorded the age of school children in your study and your range is 7 to 123 years old you know you have made a mistake!

Quartiles and Interquartile Range

Quartiles tell us about the spread of a data set by breaking the data set into quarters, just like the median breaks it in half. For example, consider the marks of the 100 students below, which have been ordered from the lowest to the highest scores, and the quartiles highlighted in red.

Order

Score

Order

Score

Order

Score

Order

Score

Order

Score

1st

21st

41st

61st

81st

2nd

22nd

42nd

62nd

82nd

3rd

23rd

43rd

63rd

83rd

4th

24th

44th

64th

84th

5th

25th

45th

65th

85th

6th

26th

46th

66th

86th

7th

27th

47th

67th

87th

8th

28th

48th

68th

88th

9th

29th

49th

69th

89th

10th

30th

50th

70th

90th

11th

31st

51st

71st

91st

12th

32nd

52nd

72nd

92nd

13th

33rd

53rd

73rd

93rd

14th

34th

54th

74th

94th

15th

35th

55th

75th

95th

16th

36th

56th

76th

96th

17th

37th

57th

77th

97th

18th

38th

58th

78th

98th

19th

39th

59th

79th

99th

20th

40th

60th

80th

100th

The first quartile (Q1) lies between the 25th and 26th student's marks, the second quartile (Q2) between the 50th and 51st student's marks, and the third quartile (Q3) between the 75th and 76th student's marks. Hence:

First quartile (Q1) = (45 + 45) ÷ 2 = 45
Second quartile (Q2) = (58 + 59) ÷ 2 = 58.5
Third quartile (Q3) = (71 + 71) ÷ 2 = 71

In the above example, we have an even number of scores (100 students, rather than an odd number, such as 99 students). This means that when we calculate the quartiles, we take the sum of the two scores around each quartile and then half them (hence Q1= (45 + 45) ÷ 2 = 45) . However, if we had an odd number of scores (say, 99 students), we would only need to take one score for each quartile (that is, the 25th, 50th and 75th scores). You should recognize that the second quartile is also the median.