Work through the steps below to select the appropriate statistical test
for your research. If we do not have a study design that matches your own, contact us.
These statistical tests are used to: (a) determine whether an association or correlation between two or more variables exists; and (b) if such an association or correlation does exist, measure the strength and direction of the association/correlation. Even though we use the words "association" and "correlation", you will often see research questions that measure associations/correlations using the word "relationship", or sometimes not mention any of these words. Consider some of the following research questions:
|Research questions||Phrased differently|
|# 1||Is there an association between exam performance and time spent revising?||Is there a relationship between exam performance and time spent revising?|
|# 2||Is there an association between depression and length of unemployment?||Does the length a person is unemployed increase their level of depression?|
|# 3||Is there an association between maximal aerobic capacity and age?||Is your maximal aerobic capacity related to your age?|
|# 4||Is there an association between the likelihood to take up smoking and gender?||Are females more likely to take up smoking than males?|
|# 5||Is there an association between parental income and the decision to take a gap year?||Is there a relationship between parental income and the decision to take a gap year?|
Irrespective of whether you want to predict a score or a membership of a group, these statistical tests are based on there being a relationship between two or more variables. For example, a relationship between revision time and exam performance. However, prediction goes further, and allows you to use the existence of these relationships to predict the value of one variable based on the value(s) of the other variable(s).
On the one hand, we might want to predict the value of a dependent variable (also known as the criterion, outcome or target variable), based on the value of one or more independent variables (also known as predictor variables, regressors or covariates). For example, can we predict the mark out of 100 that a student might achieve in an exam based on the number of hours they spent revising?
On the other hand, rather than predicting the value of a dependent variable, sometimes we are interested in predicting group membership. For example, can we predict whether somebody is likely to have heart disease (i.e., group 1: "yes" or group 2: "no"), based on their age, weight, gender and maximal aerobic capacity (an indicator of health and fitness)?
Furthermore, these statistical techniques allow you to understand the importance of each independent variable separately, and how much they contribute (as a percentage) to the predicted score. For example: How much does revision time explain exam performance (i.e., it could be just 5%; or perhaps 55%)? Consider some of the following research questions:
|# 1||Can cigarette consumption be predicted based on smoking duration?||How much of an individual's cigarette consumption can be explained by the amount of time they have smoked?|
|# 2||Can level of depression be predicted based on the length of time that a person is unemployed?||How much does unemployment length explain level of depression?|
|# 3||Can maximal aerobic capacity be predicted based on age, weight, heart rate and gender?||How much of an individual's maximal aerobic capacity can be explained by their age, weight, heart rate and gender?|
|# 4||Can exam performance be predicted based on revision time, lecture attendance, prior academic achievement and exam anxiety?||How much do revision time, lecture attendance, prior academic achievement and exam anxiety explain exam performance?|
|# 5||Can the likelihood of being asked for ID at a nightclub be predicted based on the customer's gender?||How much does gender explain the likelihood of being asked for ID at a nightclub?|
|# 6||Can the presence of heart disease be predicted based on age, weight, gender and maximal aerobic capacity?||How much do age, weight, gender and maximal aerobic capacity explain the presence of heart disease?|
These statistical tests are used to: (a) determine whether there are differences between two or more groups of related and/or unrelated (independent) cases on a dependent variable; and (b) if such differences exist, determine where these differences lie (i.e., when you have three or more groups). The "cases" that you study could be people, animals, objects, organizations, and so forth. For example, is there a difference in the salary amongst male nurses in the United States of America based on ethnicity (i.e., where the dependent variable is "salary", and the independent variable, "ethnicity", consists of three unrelated, independent groups: "Caucasians", "African Americans" and "Hispanics")? Or is there a difference in daily cigarette use amongst heavy smokers before and after a hypnotherapy programme (i.e., where the dependent variable is "daily cigarette use", and the independent variable, "time", consists of two related groups: daily cigarette use "before" and "after" the hypnotherapy programme)?
Statistical tests that are used to determine whether there are differences between groups can be used for a wide range of study designs, from between-subjects designs that involve unrelated (independent) groups, to within-subjects designs that involve related groups, as well as mixed designs that have both related and unrelated groups. At their simplest, these involve testing for differences between just two related or unrelated groups, but they can be far more involved, incorporating multiple groups, multiple conditions/treatments and multiple dependent variables. Most often, you are comparing the mean scores between different groups.
However, at this point, don't worry if you're not familiar with these terms because they are explained later. The important point is that you decide whether "groups" might be the correct type of study design for your research. Therefore, consider some of the following research questions:
|No repeated measures
|Example #1||Is there a difference in salary between male and female doctors?||Is there a difference in daily cigarette use amongst heavy smokers before and after a hypnotherapy programme?|
|Example #2||Is there a difference in productivity amongst packers at a factory based on the use of background music?||Is there a difference in the time it takes a new brand of kettle to reach boiling point based on the number of uses?|
|Example #3||Is there a difference in the salary of male nurses in the United States based on ethnicity?||Is there a difference in C-Reactive Protein (a marker of heart disease) based on treatment type and time?|
|Example #4||Is there a difference in the average page views per visitor based on the three types of site design?||Is there a difference in productivity amongst packers in a factory based on the use of background music?|
|Example #5||Is there an interaction between gender and educational level on test anxiety amongst university students?||Is there a difference in ski performance when using three different coloured tints of goggle?|
|Example #6||Is there an interaction between unemployment length, age and the presence of a secondary household earner on depression amongst the unemployed?||Is there a difference in the dehydration of athletes between two competing isotonic sports drinks?|
There are four general types of reliability: (a) test-retest reliability; (b) inter-rater reliability; (c) internal consistency; and (d) test comparisons. Test-retest reliability refers to whether a test is consistent in its measurements, inter-rater reliability is the degree of agreement between two or more raters in terms of the score they assign an object, internal consistency is a type of reliability that indicates how well the items on a test (e.g., a questionnaire) that are measuring the same underlying construct produce similar results, and test comparison reliability is useful when you want to know whether the results of one test (e.g., a newer test) are similar to the results of another test (e.g., an older and/or “gold standard” test). Examples of the type of studies where reliability testing may be appropriate are shown below:
|The reliability of a new protocol for assessing serum cholesterol concentration in the blood.||A serum cholesterol measurement was taken from a group of participants using a new protocol. Two weeks later the same participants had another serum cholesterol measurement taken using this same new protocol. The researcher wanted to know if this new protocol is reliable. That is, how similar are the serum cholesterol concentrations taken on these two separate occasions?|
|Are judges' scores of ice skating performance reliable?||Six judges watch 20 ice skating competitors' performances in an ice skating competition and rate their performances using the International Skating Union's (ISU) Judging System. The ratings for each of the six judges for each of the 20 ice skating competitors are analysed to determine if the judges assign similar scores to each of the performances.|
|The use of an existing questionnaire to assess depression in a new population.||An existing questionnaire with 30 Likert items that was designed to measure depression in middle-aged men was administered to middle-aged women. The scale reliability (internal consistency) of the questionnaire in this new population was assessed.|
|Is prediction of percentage body fat based on seven skinfold measurements a reliable method compared to DXA scans (the "gold standard")?||DXA scans are considered to be an accurate measure of a person's body fat percentage. However, they are a costly method to assess body fat percentage and other methods exist that are less costly. A researcher wanted to know whether one such alternative method (the skinfold method) was a reliable measure of body fat percentage. As such, the researcher wanted to compare the body fat results from both methods to establish whether the skinfold method is reliable. To do this, 20 participants had their body fat percentage assessed via a DXA scan (the "gold standard") and a method that takes seven skinfold measurements using skin calipers (i.e., the seven skinfold measurements are used to predict body fat percentage). These two tests were compared to assess whether the skinfold method was reliable.|
These statistical tests help you either (a) describe a variable or (b) compare a variable from a sample to a known distribution (i.e., either a population or hypothesized distribution/value). Therefore, one-sample tests are only appropriate when you you are dealing with a single variable from one sample/group).
For example, you could be interested in a continuous variable such as "salary" (measured in US dollars) for "first year graduates in the United States", an ordinal variable such as "length of commute" (with four ordered categories: "<20 minutes", "20 to 40 minutes", "40 to 60 minutes" and "60 to 80 minutes") amongst "people working in London", a dichotomous variable such as "presence of heart disease" (with two categories: "Yes" or "No") in "men over 60 years old in the United States", or a multinomial variable such as "preferred mode of transport" (with three categories: "bus", "car" and "tram") amongst "parents travelling with their children in London".
In each of these examples there is one variable (e.g., "salary") and one sample/group (e.g., "first year graduates"). You may have studied multiple variables in your research (e.g., the "salary", "age", "gender" and "ethnicity" of first year graduates), but you only want to (a) describe one of these variables or (b) compare one of these variables from a sample to a known distribution. If you do have multiple variables, you could still use one-sample tests to consider (a) or (b), but you would have to analyse each variable separately (e.g., you would have to run a one-sample test for "salary", another one-sample test for "age", another for "gender", and so forth).
Important: If you do have multiple variables and want to understand possible associations/relationships or differences between these, or differences between any groups, one-sample tests are not appropriate. Instead, consider the Group Differences route if you are interested in different between variables and groups, or the Associations & Correlation or Prediction & Relationships routes for associations/relationships between variables. These can be accessed using the tabs above.
Consider some of the following research questions for one-sample tests that are used to either (a) describe a variable or (b) compare a variable from a sample to a known distribution:
|Describe a variable||Compare a single variable in a sample to a known distribution|
|Sometimes we simply want to describe a variable (e.g., “salary”) for a given sample/group (e.g., “first year graduates”) in terms of frequencies (counts), measures of central tendency and measures of spread.||Sometimes you want to determine how a single variable in a sample compares to a known or hypothesized population (i.e., either a population or hypothesized distribution/value).|
|Research questions||Research questions|
|What is the average (mean) salary of first year graduates in US dollars? How much variation is there in this average (mean) salary?||Are the salaries of first year graduates drawn from a Normal distribution? Is the salary of the 70 participants in our study comparable to all first year graduate salaries?|
|What percentage of people working in London spend 60 to 80 minutes commuting each day? What is the most common (frequent) commuting time in London (<20 minutes, 20 to 40 minutes, 40 to 60 minutes or 60 to 80 minutes)?||Were the commuting times of the 250 London workers that took part in our research comparable to the distribution of all workers in London (i.e., how representative was our sample to the population)?|
|How many men in the United States over the age of 60 years old display a presence of heart disease? Are there more men who display a presence of heart disease than those who do not?||Imagine that 15% of the male population in the United States over the age of 60 years old have a presence of heart disease. Is the presence of heart disease for males over 60 years old sampled in Los Angeles different from the United States population?|
|Do parents prefer to use the bus, car or tram when travelling with their children in London? What is the proportion of parents who preferred each of the three types of transport?||A 2010 study showed the preferred modes of transport in London for parents travelling with their children. Were the same modes of transport similarly popular in 2016?|