Essential 10

6. Outcome measures

An outcome measure (also known as a dependent variable or a response variable) is any variable recorded during a study (e.g. volume of damaged tissue, number of dead cells, specific molecular marker) to assess the effects of a treatment or experimental intervention. Outcome measures may be important for characterising a sample (e.g. baseline data) or for describing complex responses (e.g. ‘haemodynamic’ outcome measures including heart rate, blood pressure, central venous pressure, and cardiac output). Failure to disclose all the outcomes that were measured introduces bias in the literature as positive outcomes (e.g. those statistically significant) are reported more often [1-4].

Explicitly describe what was measured, especially when measures can be operationalised in different ways. For example, activity could be recorded as time spent moving or distance travelled. Where possible, the recording of outcome measures should be made in an unbiased manner (e.g. blinded to the treatment allocation of each experimental group; see item 5 – Blinding). Specify how the outcome measure(s) assessed are relevant to the objectives of the study. 



  1. John LK, Loewenstein G and Prelec D (2012). Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science. doi: 10.1177/0956797611430953
  2. Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan AW, Cronin E, Decullier E, Easterbrook PJ, Von Elm E, Gamble C, Ghersi D, Ioannidis JP, Simes J and Williamson PR (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One. doi: 10.1371/journal.pone.0003081
  3. Tsilidis KK, Panagiotou OA, Sena ES, Aretouli E, Evangelou E, Howells DW, Al-Shahi Salman R, Macleod MR and Ioannidis JP (2013). Evaluation of excess significance bias in animal studies of neurological diseases. PLoS Biol. doi: 10.1371/journal.pbio.1001609
  4. Sena ES, van der Worp HB, Bath PM, Howells DW and Macleod MR (2010). Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol. doi: 10.1371/journal.pbio.1000344

Example 1 

“The following parameters were assessed: threshold pressure (TP; intravesical pressure immediately before micturition); post-void pressure (PVP; intravesical pressure immediately after micturition); peak pressure (PP; highest intravesical pressure during micturition); capacity (CP; volume of saline needed to induce the first micturition); compliance (CO; CP to TP ratio); frequency of voiding contractions (VC) and frequency of non-voiding contractions (NVCs).” [1] 



  1. Claudino MA, Leiria LOS, da Silva FH, Alexandre EC, Renno A, Mónica FZ, de Nucci G, Fertrin KY, Antunes E, Costa FF and Franco-Penteado CF (2015). Urinary Bladder Dysfunction in Transgenic Sickle Cell Disease Mice. PLOS ONE. doi: 10.1371/journal.pone.0133996

In a hypothesis-testing experiment, the primary outcome measure answers the main biological question. It is the outcome of greatest importance, identified in the planning stages of the experiment and used as the basis for the sample size calculation (see item 2 - Sample size). For exploratory studies it is not necessary to identify a single primary outcome and often multiple outcomes are assessed (see item 13 – Objectives).

In a hypothesis-testing study powered to detect an effect on the primary outcome measure, data on secondary outcomes are used to evaluate additional effects of the intervention but subsequent statistical analysis of secondary outcome measures may be underpowered, making results and interpretation less reliable [1,2]. Studies that claim to test a hypothesis but do not specify a pre-defined primary outcome measure, or those that change the primary outcome measure after data were collected (also known as primary outcome switching) are liable to selectively report only statistically significant results, favouring more positive findings [3].

Registering a protocol in advance protects the researcher against concerns about selective outcome reporting (also known as data dredging or p-hacking) and provides evidence that the primary outcome reported in the manuscript accurately reflects what was planned [4] (see item 19 – Protocol registration).

In studies using inferential statistics to test a hypothesis (e.g. t-test, ANOVA), if more than one outcome was assessed, explicitly identify the primary outcome measure and state whether it was defined as such prior to data collection and whether it was used in the sample size calculation. If there was no primary outcome measure, explicitly state so. 



  1. John LK, Loewenstein G and Prelec D (2012). Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychological Science. doi: 10.1177/0956797611430953
  2. Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R, Bradley EW, Crystal RG, Darnell RB, Ferrante RJ, Fillit H, Finkelstein R, Fisher M, Gendelman HE, Golub RM, Goudreau JL, Gross RA, Gubitz AK, Hesterlee SE, Howells DW, Huguenard J, Kelner K, Koroshetz W, Krainc D, Lazic SE, Levine MS, Macleod MR, McCall JM, Moxley RT, 3rd, Narasimhan K, Noble LJ, et al. (2012). A call for transparent reporting to optimize the predictive value of preclinical research. Nature. doi: 10.1038/nature11556
  3. Head ML, Holman L, Lanfear R, Kahn AT and Jennions MD (2015). The Extent and Consequences of P-Hacking in Science. PLOS Biology. doi: 10.1371/journal.pbio.1002106
  4. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, Simonsohn U, Wagenmakers E-J, Ware JJ and Ioannidis JPA (2017). A manifesto for reproducible science. Nature Human Behaviour. doi: 10.1038/s41562-016-0021

Example 1 

“The primary outcome of this study will be forelimb function assessed with the staircase test. Secondary outcomes constitute Rotarod performance, stroke volume (quantified on MR imaging or brain sections, respectively), diffusion tensor imaging (DTI) connectome mapping, and histological analyses to measure neuronal and microglial densities, and phagocytic activity…The study is designed with 80% power to detect a relative 25% difference in pellet-reaching performance in the Staircase test.” [1] 

Example 2 

“The primary endpoint of this study was defined as left ventricular ejection fraction (EF) at the end of follow-up, measured by magnetic resonance imaging (MRI). Secondary endpoints were left ventricular end diastolic volume and left ventricular end systolic volume (EDV and ESV) measured by MRI, infarct size measured by ex vivo gross macroscopy after incubation with triphenyltetrazolium chloride (TTC) and late gadolinium enhancement (LGE) MRI, functional parameters serially measured by pressure volume (PV-)loop and echocardiography, coronary microvascular function by intracoronary pressure- and flow measurements and vascular density and fibrosis on histology. Based on a power calculation (estimated effect 7.5% [6], standard deviation of 5%, a power of 0.9 and alpha of 0.05) 8 pigs per group were needed.” [2] 



  1. Emmrich J, Neher J, Boehm-Sturm P, Endres M, Dirnagl U and Harms C (2018). Stage 1 Registered Report: Effect of deficient phagocytosis on neuronal survival and neurological outcome after temporary middle cerebral artery occlusion (tMCAo) [version 3; referees: 2 approved]. F1000Research. doi: 10.12688/f1000research.12537.3
  2. Jansen of Lorkeers SJ, Gho JMIH, Koudstaal S, van Hout GPJ, Zwetsloot PPM, van Oorschot JWM, van Eeuwijk ECM, Leiner T, Hoefer IE, Goumans M-J, Doevendans PA, Sluijter JPG and Chamuleau SAJ (2015). Xenotransplantation of Human Cardiomyocyte Progenitor Cells Does Not Improve Cardiac Function in a Porcine Model of Chronic Ischemic Heart Failure. Results from a Randomized, Blinded, Placebo Controlled Trial. PLOS ONE. doi: 10.1371/journal.pone.0143953