7. Statistical methods
The statistical analysis methods implemented will reflect the goals and the design of the experiment, they should be decided in advance before data are collected (see item 19 – Protocol registration). Both exploratory and hypothesis-testing studies might use descriptive statistics to summarise the data (e.g. mean and SD, or median and range). In exploratory studies where no specific hypothesis was tested, reporting descriptive statistics is important for generating new hypotheses that may be tested in subsequent experiments but it does not allow conclusions beyond the data. In addition to descriptive statistics, hypothesis-testing studies might use inferential statistics to test a specific hypothesis.
Reporting the analysis methods in detail is essential to ensure readers and peer-reviewers can assess the appropriateness of the methods selected and judge the validity of the output. The description of the statistical analysis should provide enough detail so that another researcher could re-analyse the raw data using the same method and obtain the same results. Make it clear which method was used for which analysis.
Analysing the data using different methods and selectively reporting those with statistically significant results constitutes p-hacking and introduces bias in the literature [1,2]. Report all analyses performed in full. Relevant information to describe the statistical methods include:
- the outcome measures
- the independent variables of interest
- the nuisance variables taken into account in each statistical test (e.g. as blocking factors or covariates),
- what statistical analyses were performed and references for the methods used
- how missing values were handled
- adjustment for multiple comparisons
- the software package and version used, including computer code if available 
The outcome measure is potentially affected by the treatments or interventions being tested, but also by other factors, such as the properties of the biological samples (sex, litter, age, weight, etc.), and technical considerations (cage, time of day, batch, experimenter, etc.). To reduce the risk of bias, some of these factors can be taken into account in the design of the experiment, for example by using blocking factors in the randomisation (see item 4 – Randomisation). Factors deemed to affect the variability of the outcome measure should also be handled in the analysis, for example as a blocking factor (e.g. batch of reagent or experimenter), or as a covariate (e.g. starting tumour size at point of randomisation).
Furthermore, to conduct the analysis appropriately, it is important to recognise the hierarchy that can exist in an experiment. The hierarchy can induce a clustering effect; for example, cage, litter or animal effects can occur where the outcomes measured for animals from the same cage/litter, or for cells from the same animal, are more similar to each other. This relationship has to be managed in the statistical analysis by including cage/litter/animal effects in the model or by aggregating the outcome measure to the cage/litter/animal level. Thus, describing the reality of the experiment and the hierarchy of the data, along with the measures taken in the design and the analysis to account for this hierarchy, is crucial to assessing whether the statistical methods used are appropriate.
For bespoke analysis, for example regression analysis with many terms, it is essential to describe the analysis pipeline in detail. This could include detailing the starting model and any model simplification steps.
When reporting descriptive statistics, explicitly state which measure of central tendency is reported (e.g. mean or median) and which measure of variability is reported (e.g. standard deviation, range, quartiles or interquartile range). Also describe any modification made to the raw data before analysis (e.g. relative quantification of gene expression against a house-keeping gene). For further guidance on statistical reporting, refer to the SAMPL (Statistical Analyses and Methods in the Published Literature) guidelines .
- Tsilidis KK, Panagiotou OA, Sena ES, Aretouli E, Evangelou E, Howells DW, Al-Shahi Salman R, Macleod MR and Ioannidis JP (2013). Evaluation of excess significance bias in animal studies of neurological diseases. PLoS Biol. doi: 10.1371/journal.pbio.1001609
- Head ML, Holman L, Lanfear R, Kahn AT and Jennions MD (2015). The Extent and Consequences of P-Hacking in Science. PLOS Biology. doi: 10.1371/journal.pbio.1002106
- British Ecological Society (2017). A guide to reproducible code in ecology and evolution. Available at: https://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf
- Lang TA and Altman DG (2015). Basic statistical reporting for articles published in biomedical journals: the "Statistical Analyses and Methods in the Published Literature" or the SAMPL Guidelines. Int J Nurs Stud. doi: 10.1016/j.ijnurstu.2014.09.006
“Analysis of variance was performed using the GLM procedure of SAS (SAS Inst., Cary, NC). Average pen values were used as the experimental unit for the performance parameters. The model considered the effects of block and dietary treatment (5 diets). Data were adjusted by the covariant of initial body weight. Orthogonal contrasts were used to test the effects of SDPP processing (UV vs no UV) and dietary SDPP level (3% vs 6%). Results are presented as least squares means. The level of significance was set at P < 0.05.” 
“All risk factors of interest were investigated in a single model. Logistic regression allows blocking factors and explicitly investigates the effect of each independent variable controlling for the effects of all others...As we were interested in husbandry and environmental effects, we blocked the analysis by important biological variables (age; backstrain; inbreeding; sex; breeding status) to control for their effect. (The role of these biological variables in barbering behavior, particularly with reference to barbering as a model for the human disorder trichotillomania, is described elsewhere…). We also blocked by room to control for the effect of unknown environmental variables associated with this design variable. We tested for the effect of the following husbandry and environmental risk factors: cage mate relationships (i.e. siblings, non-siblings, or mixed); cage type (i.e. plastic or steel); cage height from floor; cage horizontal position (whether the cage was on the side or the middle of a rack); stocking density; and the number of adults in the cage. Cage material by cage height from floor; and cage material by cage horizontal position interactions were examined, and then removed from the model as they were nonsignificant. N = 1959 mice were included in this analysis.” 
- Polo J, Rodríguez C, Ródenas J, Russell LE, Campbell JM, Crenshaw JD, Torrallardona D and Pujols J (2015). Ultraviolet Light (UV) Inactivation of Porcine Parvovirus in Liquid Plasma and Effect of UV Irradiated Spray Dried Porcine Plasma on Performance of Weaned Pigs. PLOS ONE. doi: 10.1371/journal.pone.0133008
- Garner JP, Dufour B, Gregg LE, Weisker SM and Mench JA (2004). Social and husbandry factors affecting the prevalence and severity of barbering ('whisker trimming') by laboratory mice. Applied Animal Behaviour Science. doi: 10.1016/j.applanim.2004.07.004
Hypothesis tests are based on assumptions about the underlying data. Describing how assumptions were assessed, and whether these assumptions are met by the data, enables readers to assess the suitability of the statistical approach used. If the assumptions are incorrect, the conclusions may not be valid. For example, the assumptions for data used in parametric tests (such as a t-test, Z-test, ANOVA, etc.) are that the data are continuous, the residuals from the analysis are normally distributed, the responses are independent, and that different groups have similar variances.
There are various tests for normality, for example the Shapiro-Wilk and Kolmogorov-Smirnov tests. However, these tests have to be used cautiously. If the sample size is small, they will struggle to detect non-normality, if the sample size is large, the tests will detect unimportant deviations. An alternative approach is to evaluate data with visual plots e.g. normal probability plots, box plots, scatterplots. If the residuals of the analysis are not normally distributed, the assumption may be satisfied using a data transformation where the same mathematical function is applied to all data points to produce normally distributed data (e.g. loge, log10, square root).
Other types of outcome measures (binary, categorical, or ordinal) will require different methods of analysis, and each will have different sets of assumptions. For example, categorical data are summarised by counts and percentages or proportions, and are analysed by tests of proportions; these analysis methods assume that data are binary, ordinal or nominal, and independent .
For each statistical test used (parametric or non-parametric), report the type of outcome measure and the methods used to test the assumptions of the statistical approach. If data were transformed, identify precisely the transformation used and which outcome measures it was applied to. Report any changes to the analysis if the assumptions were not met and an alternative approach was used (e.g. a non-parametric test was used which does not require the assumption of normality). If the relevant assumptions about the data were not tested, state this explicitly.
- Ruxton G and Colegrave N (2017). Experimental design for the life sciences. Fourth Edition. Oxford University Press. https://global.oup.com/academic/product/experimental-design-for-the-life-sciences-9780198717355?cc=us&lang=en&
“Model assumptions were checked using the Shapiro-Wilk normality test and Levene’s Test for homogeneity of variance and by visual inspection of residual and fitted value plots. Some of the response variables had to be transformed by applying the natural logarithm or the second or third root, but were back-transformed for visualization of significant effects.” 
“The effects of housing (treatment) and day of euthanasia on cortisol levels were assessed by using fixed-effects 2-way ANOVA. An initial exploratory analysis indicated that groups with higher average cortisol levels also had greater variation in this response variable. To make the variation more uniform, we used a logarithmic transform of each fish's cortisol per unit weight as the dependent variable in our analyses. This action made the assumptions of normality and homoscedasticity (standard deviations were equal) of our analyses reasonable.” 
- Nemeth M, Millesi E, Wagner K-H and Wallner B (2015). Sex-specific effects of diets high in unsaturated fatty acids on spatial learning and memory in guinea pigs. PLOS ONE. doi: 10.1371/journal.pone.0140485
- Keck VA, Edgerton DS, Hajizadeh S, Swift LL, Dupont WD, Lawrence C and Boyd KL (2015). Effects of habitat complexity on pair-housed zebrafish. J Am Assoc Lab Anim Sci. https://www.ncbi.nlm.nih.gov/pubmed/26224437