Editor’s Note:
In academic publishing, it is common – and important – for research findings to be followed by formal commentaries, replies, and further discussion. This process of critical exchange allows methods, interpretations, and conclusions to be examined openly, strengthening both the evidence-base and public confidence in it.
The correspondence below was submitted to the Equine Veterinary Journal as a reply to Clayton et al.’s response to our published commentary on their study on noseband pressures during standing and chewing. Our commentary’s purpose was to contribute to that normal process of scientific discourse by clarifying methodological and interpretive issues, that remain unresolved.
In their response, the authors addressed some of our points, but several methodological and interpretive issues remain unresolved. Following normal academic convention, we sought to clarify our concerns to promote transparency and support scientific progress. The letter reproduced here is that submission, which the journal declined to publish.
While we respect that decision, we believe that open scrutiny and dialogue are essential to scientific progress, particularly in areas that inform animal welfare policy. By publishing the correspondence here, we aim to make the record complete and accessible to readers who value transparency and rigour.
The full letter appears below, with detailed tables presented as collapsible sections for ease of reading. A PDF version, formatted as originally submitted, is available for download at the end of this article.
Dear Editor,
We write in response to Clayton et al. (2025a), who replied to our critique (Wilkins et al. 2025) of their noseband pressure study (Clayton et al. 2025b). Their response fundamentally mischaracterizes our original concerns while failing to address the majority of methodological issues we raised.
Clayton et al. (2025a) open their response by asserting that Wilkins et al. claim their ‘study of noseband pressures compromises equine welfare’ and that ‘this assertion will be refuted.’ This misrepresents our critique.
Our primary concern was not that the research process itself was harmful, but rather that the study’s conclusions, if used to inform policy without acknowledging significant methodological limitations, could compromise welfare by justifying tighter noseband regulations.
We specifically critiqued the study design, interpretation and generalizability of findings, not the ethical conduct of the research itself.
Our response raised 20 distinct methodological and interpretive concerns aimed at improving scientific rigour in this field and ensuring evidence-based welfare standards. Clayton et al.’s response to our commentary substantively addresses only three of these 20 concerns while ignoring fundamental issues regarding study design, statistical analysis, and reproducibility. This selective engagement undermines the scientific discourse necessary for advancing equine welfare research.
The extent to which each of our critiques was accurately represented and addressed is presented in the following tables that reveal a pattern of incomplete scientific engagement which generally fails to meet standards expected in peer-reviewed discourse.
- Table 1 deals with 16 issues raised in our commentary that were not addressed by Clayton et al. in their response,
- Table 2 deals with 2 issues that were partially addressed,
- Table 3 deals with 3 issues that were fully addressed in Clayton et al.’s response.
Table 1. Misrepresentation of the critique and responses to aspects of the critique that did not address the issue. Click the red icon to view.
“””””
| Our actual critique | Clayton et al. representation of our critique | Clayton et al. response | Our final comments |
| Precautionary principle not applied: “Proposing to reevaluate the two-finger [noseband] rule based on a single study with methodological limitations is scientifically incautious and contradicts the precautionary principle that creates a risk of regression at a time of international progress in equine welfare.” | “Wilkins et al. claiming that our study of noseband pressures compromises equine welfare | “We strongly reject the suggestion from Wilkins et al. that our study has compromised equine welfare.” | Clayton et al. misrepresent our position by recasting a policy-level concern as an allegation of welfare compromise during the study. Our critique was that it is scientifically incautious, and contrary to the precautionary principle, to propose regulatory change based on a single study with methodological limitations. |
| Evidence-based policy concern: “Given the welfare risks associated with nosebands and the absence of any evidence that they benefit horses, the recommendation to re-evaluate current standards based on a single study with a limited design, small sample of horses accustomed to the noseband, and drawing wide-ranging conclusions from non-significant findings is problematic.” | “Wilkins et al. are opposed to revisiting noseband regulations beyond what has ‘traditionally been deemed acceptable’” | “Basing international rules and regulations on what has ‘traditionally been deemed acceptable’ was adequate in the 20th century but not anymore. We have the equipment and expertise to do the necessary research and we are confident in the accuracy of our data. The FEI Equine Ethics and Wellbeing Commission has recommended the use of a tool to measure noseband tightness and our research provides the necessary evidence base to inform the development of this tool.” |
Clayton et al. misrepresent our position by omitting the substance of our critique. Our point was not opposition to revisiting noseband regulations, but that it is problematic to base regulatory change on a single small-scale study with design limitations, a habituated cohort, and non-significant findings. When studying whether an existing guideline is appropriate, it is vital to include treatments in both directions — in this case both looser and tighter nosebands. Clayton et al. asked only whether nosebands can be made tighter. By doing so, the study fails to acknowledge that existing recommendations may already be too tight, thereby seriously limiting the conclusions that can be drawn from their research. Presenting such constrained evidence as a regulatory foundation risks embedding flawed assumptions into policy. |
|
Measurement validity concern: “There are currently no peer-reviewed studies demonstrating that 20 s is sufficient to detect stress-induced changes in eye temperature in horses.” Observation duration concern: “The 20-s duration of the standing trials does not reflect real-world conditions, where horses wear nosebands for extended periods.” |
“20 s is insufficient time to evaluate the behaviours.” | “A 20-s duration was chosen according to the principle of minimal tolerance, which ensures that a majority of participating horses can fulfil the conditions of the study.” | Clayton et al. cite a “minimal tolerance principle” to justify the 20-second observation period. This rationale does not engage with our critique that such a short duration is scientifically inadequate to detect meaningful behavioural or physiological responses. Our second concern is that the short observation period and tests for “standing” and “chewing” fail to reflect real-world conditions, and yet, evidence generated under such constrained conditions has been used to inform welfare-relevant policy. |
| Baseline data concerns: “Also, baseline values and treatment-specific results are not reported, limiting meaningful comparison”. | – | – | Clayton et al. do not engage with the absence of proper baseline measurements. This omission undermines the validity of their comparisons and limits the interpretability of the findings. |
| Inappropriate statistical methodology: “Clayton et al. state: ‘Kruskal–Wallis and post hoc Mann–Whitney U analyses evaluated differences between noseband tightness levels’. This is unclear and, in our view, the approach appears flawed.” | – | – | Clayton et al. do not engage with the concern that independent-sample tests were applied to repeated-measures data. This omission leaves a fundamental flaw in the statistical analysis unaddressed and undermines confidence in the reported results. |
| Replicability concern: “Clayton et al. omits key information on bit positioning relative to the labial commissures and the spacing of noseband holes. These are essential for replicability and may affect how horses experience restrictive nosebands.” | [Wilkins et al. claim that] “Key information relating to bridle fit is missing.” | “As described, all bridles were fitted by three qualified bridle fitters, which is a robust approach not described in previous noseband studies from other research groups.” | Our concern is not whether fitting was competent, but that key details, such as bit positioning and noseband hole spacing, were omitted, preventing replicability and limiting interpretation of how horses experienced the nosebands. |
| Replicability concern: “Clayton et al. note that extreme tightness increases compressive forces on skin, tissues, and bone. However, they do not report the actual force used to achieve this tightness or explain why their study did not replicate it.” | [Wilkins et al. claim that] “The noseband was not tightened sufficiently to compress the underlying skin, tissues and bone.” | “We have not and will not expose horses to extreme pressures. The endpoint of 0 finger-equivalents was based on the simple concept that a noseband resting lightly on the skin is not causing high pressure or discomfort.” | Clayton et al. misrepresent our point as a call for extreme tightness. Our query was about transparency and reproducibility of the 0.0 finger setting. At this setting, the taper gauge cannot be inserted, however, 0 fingers is not a measure of actual tightness, since the noseband can be tightened more or less at this setting. Their claim that a noseband “resting lightly on the skin” does not cause pressure or discomfort is an unwarranted inference about subjective experience. We still do not know with any certainty how horses experience noseband pressure. |
|
Replicability concern: 0.0 (zero) finger definition: “…using a handheld taper gauge in empirical research depends on consistent insertion (or resistance) under a tight noseband. A more robust design would use operators blinded to the study’s purpose to reduce bias.”
|
“A noseband resting against the face is described [by Wilkins et al.] as a tight noseband” | “The results of our study clearly indicate this is untrue. The nosebands were professionally and correctly fitted and were not tight as confirmed by the low pressures recorded on the nasal and mandibular bones…” | In their response, Clayton et al. focus on semantics around the use of the word “tight” but do not address our concern about replicability. Additionally, the low mean pressures they report over 20-s standing trials do not represent real-world conditions, where locomotion, rein tension and associated comfort behaviours alter loading on the noseband. |
| INTERPRETATION OF FINDINGS CONCERNS | |||
| Stress vs. welfare conflation: “We are concerned that Clayton et al. conflate stress with the welfare state and the subjective experience of discomfort.” | – | – | Clayton et al. do not engage with the fundamental critique that stress indicators, whether present or absent, do not, in isolation, reliably indicate welfare state. |
| Welfare concern: “Clayton et al state that a 0.0 finger equivalence was ‘the first setting at which the taper gauge could not be inserted’, but it omits whether any pressure was applied during insertion. This is a key omission, since Clayton et al. 2024 note elsewhere that nosebands can be tightened beyond this point, fully preventing gauge insertion and potentially causing harm.” | “A noseband resting against the face is described [by Wilkins et al.] as a tight noseband” | “At zero fingers’ tightness, the noseband lies lightly against the skin of the face but does not indent or compress the soft tissues. This degree of laxity is in contrast to previously published statements by Wilkins’ coauthors who have persistently claimed that a noseband adjusted to zero fingers is very tight. We believe that erroneous and misleading comments of this nature do a great disservice to equestrian sports.” |
Pressure sensors only activate under load, i.e., if there is a reading, there is compression. The claim that a noseband at “zero fingers” does not compress soft tissues is contradicted by the study’s own data. For the standing trials, at 0.0 fingers, Clayton et al. report 45–70 N of force, with mean pressures of 7–11 kPa. Additionally, the mean values reported are mathematical artifacts of multiple layers of averaging: spatial averaging across contact sensors, temporal averaging across 20-second trials, and biological averaging across 8 different horses (standard deviations of 44-50% of mean values). This layering of averages creates a mathematical middle ground that doesn’t represent the horse’s experience at any specific moment. Furthermore, the contexts in which these averages have been calculated do not represent real-life, where horses are expected to locomote for long periods and must perform comfort-seeking oral behaviours that require mouth opening. |
| Learned helplessness in habituated horses: “Research across species shows that physiological and behavioural responses can diminish following uncontrollable, inescapable aversive experiences. This adaptation, often termed ‘learned helplessness’, arises from prolonged aversive events. The dressage horses in the current study likely learned that facial pressures from nosebands are inescapable. Their muted responses, especially over a short period, may reflect this experience rather than an absence of aversion.” | [Wilkins et al. claim that:] “Horses may have been habituated to restrictive nosebands and learned that facial pressure from nosebands was inescapable putting them into a state of learned helplessness.” | “The horses were trained in nosebands adjusted to 2 finger equivalents, which is the tightness recommended by the authors of the letter. The suggestion that the horses were in a state of learned helplessness is unwarranted and speculative.” | Clayton et al., dismiss learned helplessness as “unwarranted and speculative” without providing scientific rebuttal of this well-established mechanism. Their claim that horses were only exposed to nosebands at “2 finger equivalents” during training is itself speculative, given these horses’ extensive training histories. Learned helplessness is an expected effect of the prolonged use of a noseband, and as such should be addressed by any critical researcher. Further, we did not recommend a 2-finger tightness level. It is currently unknown which tightness level can be applied without a negative impact on horse welfare. For this reason, studies should include treatments in both directions, rather than only tighter. |
| Confounding effects of habituation: The decision to use horses habituated to restrictive nosebands warrants scrutiny. Prior exposure could mask signs of discomfort or stress that might be more apparent in non-habituated horses. |
“Use of trained horses was not appropriate.”
|
“Use of experienced dressage horses is appropriate because this is the cohort to which the noseband tightness regulations informed by our study will apply.” | Clayton et al. justify their choice of experienced dressage horses on the basis of regulatory relevance. This does not address that prior experiences may affect both the behavioural and physiological responses that were measured. |
| Interpretation of “willingness” to eat: Wilkins et al., 2025 includes a detailed critique about food motivation, ingestion analgesia, and how animals may eat through pain as they navigate trade-offs. | – | – |
This central conceptual concern remains unaddressed.
|
| Choice of physiological measures: Findings on the relationship between eye temperature and stress in horses (and other species) are inconsistent. Some studies show increases, others decreases, or no change at all.” And later, “…studies examining blink and half-blink rates (the latter not considered in the current study) have shown inconsistent results.” | – | – | Clayton et al. 2025 do not engage with Wilkins’ detailed literature review showing inconsistent findings for eye temperature and blink rate in the literature, and the discussion on duration of observation, although in their original article, they “caution against reading too much into these simple tests” |
| Claims about noseband functions: “The claim that a noseband stabilises the bridle lacks explanation and supporting data.” | – | – | These evidence gaps aren’t addressed. |
Table 2. Responses to aspects of the critique that partially addressed the issues. Click the red icon to view.
| PARTIALLY ADDRESSED CONCERNS | |||
| Our actual critique | Clayton et al., representation of our critique | Clayton et al. response | Our final comments |
|
Scientific and ethical principles of study design: “We also highlight core principles in animal welfare science: (a) the need for a valid baseline; (b) treatment randomisation; (c) distinction between open and closed economies in study design; and (d) whether trial duration allowed meaningful interpretation and application.” “Repeated trials and randomised tightness order would have strengthened the basis for claims about horses’ ‘willingness’ and lack of distress.” |
“There was sequential, rather than randomised, order of tightening of the noseband.” | “As described previously, this is a limitation of our study that was required by the ethical committee overseeing our work.” |
Clayton et al. acknowledge the lack of randomisation as a limitation, attributing it to ethical committee requirements. This partially addresses the point but does not engage with our broader concern that randomisation and repeated trials are core principles of robust experimental design. Without them, claims about horses’ “willingness” and lack of distress are weakened. Where ethical constraints determine study design, proceeding under such conditions rather than finding a sounder way to answer the research questions may be unethical. |
| Study design & reproducibility concern: “However, fig. 5 in that article shows a displacement of 14 mm, while the smallest dimension of the food was 17 mm. This implies that either the food was crushed or soft tissue deformation under the noseband occurred as the horse chewed.” | “Figure 5 shows oral displacement of only 14 mm which would be insufficient to ingest a treat with a minimal dimension of 17 mm.” | “The legend for figure 5 cites the source of this graph as a different publication in which horses were chewing pellets. It was used to illustrate the phases of the chewing cycle because the current study did not have kinematic capabilities.” | Clayton et al. clarify that Figure 5 was drawn from another study, but this does not resolve the discrepancy between oral displacement and food dimensions. We recommend that future work report the actual dimensions of food items to ensure accuracy and reproducibility. |
Table 3. Responses to aspects of the critique that fully addressed the issues. Click the red icon to view.
| FULLY ADDRESSED CONCERNS | |||
| Our actual critique | Clayton et al., representation of our critique | Clayton et al., response | Final comments |
| Unorthodox reporting concern: “The number of chewing cycles per 20 s test is not reported.” | “The number of chewing cycles per 20 s test is not reported.” | “Mean chewing frequency is reported to be 1.25 Hz from which it is easy to calculate the number of cycles per 20 s.” | We thank the authors for this clarification. |
| “Pressures up to 250 kPa were recorded on both nasal and mandibular sensors, yet only one cell in the 0.0 [zero] fingers equivalent scan exceeded 100 kPa. This is puzzling and highlights the selective presentation of pressure data, which is the study’s most relevant metric, in what is framed as a welfare-focused investigation.” | “Only one cell in the 0 finger equivalents scan exceeds 100 kPa and suggests this ‘highlights selective presentation of pressure data’. | “The 0 finger-equivalent scans in the manuscript clearly show 24 cells with pressure >100 kPa and 10 cells with pressure >250 kPa.” | We thank the authors for this clarification but note that many questions about the averaging and reporting of pressure measures persist. |
| Statistical analysis, reproducibility and welfare concern: “While mean force increased with tightness in standing trials, peak pressure data, crucial for comparing with other studies, are absent.” | “Peak pressure data is crucial for comparing with other studies, are absent.” | “Peak pressure data are presented in Table 2. When standing, the pressure–time waveform has minimal variation as can be visualised in figure 2. A cyclic waveform is needed to extract repeated min and max values, and this is clearly not present when horses are standing still.” | We thank the authors for this clarification but note that many questions about the averaging and reporting of pressure measures persist. |
In closing, Clayton et al.’s response to our commentary addresses some of our concerns, but many of the more substantive critiques are inadequately addressed or dismissed. While the authors provide clarifications on certain factual points, they largely sidestep the deeper methodological and conceptual issues we raised. Key concerns about study validity, statistical approach, and welfare interpretation remain unresolved.
To be clear, our commentary did not assert that conducting the Clayton et al. study itself harmed horses or compromised welfare during the research process, given the short treatment durations. Our primary focus was on methodological and interpretive shortcomings, and on the risk of inappropriate conclusions being used to justify regulatory changes without a robust evidence base.
We have set out our reasoning in detail, not to perpetuate a back-and-forth, but to ensure that readers understand this is not a dispute between academics, it is about scientific rigour in animal welfare science. Weak evidence presented as decisive for policy undermines both horse welfare and public trust. On these points we will not engage further. Our aim, now as before, is to strengthen the quality of future research in this field.
Signed:
- Cristina Wilkins, School of Rural and Environmental Sciences, University of New England, Armidale, Australia
- Janne Winther Christensen, Department of Animal and Veterinary Sciences, Aarhus University, Aarhus, Denmark
- Orla Doherty, School of Veterinary Medicine, University College Dublin, Dublin, Ireland
- Kate Fenner, School of Agriculture and Food Sustainability, Faculty of Science, University of Queensland, St Lucia, Queensland, Australia
- Rafael Freire, Charles Sturt University, School of Animal and Veterinary Sciences, Wagga Wagga, New South Wales, Australia
- Cathrynne Henshall, Charles Sturt University, School of Animal and Veterinary Sciences, Wagga Wagga, New South Wales, Australia
- Paul McGreevy, Sydney School of Veterinary Science, Faculty of Science, University of Sydney, Sydney, New South Wales, Australia
Correspondence: Paul McGreevy, Sydney School of Veterinary Science, Faculty of Science, University of Sydney, Sydney, NSW 2006, Australia. Email: paul.mcgreevy@sydney.edu.au
FUNDING INFORMATION
There are no funders to report for this submission.
References:
Clayton HM, Murray R, Williams JM, Walker V, Fisher M, Fisher D, Nixon, J., Mackechnie-Guire, R. Response to comments on ‘Facial pressure beneath a cavesson noseband adjusted to different tightness levels during standing and chewing’ Equine Vet J. 2025;1–3. DOI: 10.1111/evj.70087
Wilkins, C., Christensen, J.W., Doherty, O., Fenner, K., Freire, R., Henshall, C., McGreevy, P. 2025. Comments on Clayton et al., (2024): Facial pressure beneath a cavesson noseband adjusted to different tightness levels during standing and chewing. Equine Veterinary Journal. DOI: 10.1111/evj.14548
Clayton HM, Murray R, Williams JM, Walker V, Fisher M, Fisher D, Nixon, J., Mackechnie-Guire, R. Facial pressure beneath a cavesson noseband adjusted to different tightness levels during standing and chewing. Equine Vet J. 2025; 57(4):1127–37. https://doi.org/10.1111/evj.14451













