Have you ever trained your horse to do something completely by accident? Unfortunately, when we do teach something by mistake it is rarely a behaviour that we want. I’ve seen a number of horses that have been taught to rear by mistake, but few that have been taught flying changes by accident.
So, what is going on when the horse learns these unwanted behaviours? And why does it sometimes only take one or two repetitions for them to learn?
To answer these questions, we first need to understand operant conditioning.
Operant conditioning is the scientific term for a learning process that is really obvious – when a behaviour has good consequences, we learn to repeat it, whereas when a behaviour has bad consequences, we learn not to repeat it. It’s quite simple really, and it applies to horses, kids and partners!
Our horses are sponges for information, they learn all the time we are with them, whether we realise it or not. Given this, the better we understand how horses learn and how the different forms of operant conditioning work, the more likely we are to train the behaviours we want and avoid those we don’t want.
In order to define the process of learning to associate a behaviour with a consequence, psychologists defined four different ‘quadrants’.
The famous four
The first division of the operant conditioning quadrants differentiates reinforcement and punishment.
It is the user’s ‘intention’ that defines this, not, as is often thought, the severity of the action taken. This is a key point in understanding learning theory and how we train our horses.
Let’s look at those two different intentions:
1. Reinforcement is used when we want to make a behaviour MORE likely to occur in the future
2. Punishment is used when we want to make a behaviour LESS likely to occur in the future.
We use reinforcement a lot in horse training and it comes in many forms, from your voice, to tactile cues, comfort, rest and even food treats.
It is common to make the mistake of thinking punishment is only classed as ‘punishment’ if it is severe, but severity is irrelevant – it’s all about intention.
For example, if my horse kicks at me when I pick up a back foot, I could growl at him with my voice or I could hit him with a stick. While you may agree that growling is less severe than hitting with a stick, in reality, both are forms of punishment because I am trying to make the behaviour – kicking – less likely to occur in the future.
Those two main quadrants of reinforcement and punishment are then further divided into positive and negative forms of reinforcement and punishment.
These can also be thought of as addition (positive) and removal (negative) reinforcement and punishment. We can explain these as follows:
1. Positive reinforcement (addition reinforcement) is something you ADD to make a behaviour MORE LIKELY to occur in the future.
This is the REWARD segment. Good examples here are things you do after the desired behaviour has been performed: a scratch on the wither, a stroke on the neck, a rest, a soft word, or a treat.
2. Negative reinforcement (removal reinforcement) is something you TAKE AWAY or remove to make a behaviour MORE LIKELY to occur in the future.
This is the MOTIVATION segment. Good examples here are your voice (clucking to ask for trot), touch (asking to move away), and other pressure cues (legs, seat, reins and so on).
But remember, that negative reinforcement is the same as ‘pressure-release’, thus it must contain both the pressure and release segments. The release or removal of pressure, whether it’s a tactile cue or simply your voice, is what’s meant by ‘negative’ in this equation.
Pressure that is not released, or not released in a timely fashion can become positive punishment and lead to habituation/desensitisation to your cues.
- Positive punishment (addition punishment) is something you ADD to make a behaviour LESS LIKELYto occur in the future.
Common examples include raising your voice at the horse or smacking it with the whip for performing an unwanted behaviour, such as biting or kicking.
- Negative punishment (removal punishment) is something you TAKE AWAY to make a behaviour LESS LIKELY to occur in the future.
This is not used very often in horse training but it can be seen when a horse is ‘tied to the tree of knowledge’ to contemplate a misdemeanour or deprived of friends or food as a result of displaying unwanted behaviour.
Thus, as a rider, the choice is yours. You can use reinforcement and encourage those behaviours that you want to see, or you can use punishment and try to prevent your horse from continuing to display behaviours that you don’t want.
We know that horses (and people for that matter!) learn more readily from reinforcement than punishment and it’s worth discussing why this might be.
Punishment is necessarily very reactive. The horse must first do something wrong and then we correct that mistake.
This makes us, as riders or trainers, very reactive and it allows the horse to repeat unwanted behaviours, particularly when we are not very good at timing our corrections.
Punishment also relies on the horse knowing there is a correct answer or having to guess what they did wrong. It also encourages the rider or handler to assume that the horse knows this and is thus ‘misbehaving’.
This is a slippery slope. The moment we start making assumptions about what the horse is thinking, we head down that path of classifying the horse as ‘naughty’ ‘disrespectful’ or ‘stubborn’ or in some way ‘out to get us’, when the reality is more likely that we simply haven’t explained the lesson very well.
Punishment is unpredictable. If the horse does not know what they did wrong, does not yet understand the lesson, then the punishment arrives as a surprise, making training unpredictable and likely increasing anxiety as a result.
Let’s bust the biggest three myths around operant conditioning. These are:
Myth #1: Punishment and correction are different things
This is a common misconception probably because ‘punishment’ is a word that triggers strong emotions in us. However, once you understand that punishment/correction is anything done on your part that is intended to make a behaviour less likely to occur in the future, then severity is irrelevant in terms of the actual learning process.
Severity is also a very subjective measure meaning that it is perceived differently by each of us and also by individual horses.
When would the correction-punishment line be crossed, for example, when it came to tapping the horse with the whip or hitting it with a stick? Running towards it screaming like a banshee or hitting on the head? Regardless of severity, punishment is defined by the intention of the rider/handler to make a behaviour less likely to happen in future.
Myth #2: Negative reinforcement is bad
The word negative has given negative reinforcement a bad reputation so it’s best to think of it as subtraction or removal reinforcement, or pressure-release. Pressure comes in many forms, from your proximity or your voice to various levels of tactile cues.
However, the pressure is only half of the equation; the horse learns from the release/removal of pressure. When it is correctly timed, i.e., the pressure is released the moment the horse makes the desired response, the behaviour that the horse is more likely to repeat in the future is the one that earned the release.
Pressure-release is also the most common way of motivating a horse to perform a behaviour. For example, if we are lunging the horse at walk and we want it to trot, we might use the verbal cue of clucking (pressure). As soon as the horse trots, we stop clucking (release).
Myth #3 It is possible or practical to only use positive reinforcement
Something I hear a lot is; “I only use positive reinforcement”, often accompanied by a lovely photo of the person riding or leading their horse…
Positive reinforcement is excellent for shaping behaviour, but when used alone, the rider or handler has no way of motivating the horse to perform the desired behaviour, they must simply wait until the horse performs it and then add the reward (be it a scratch, a rest, a treat or some other item that the horse wants).
Depending on what you are trying to teach, the ‘waiting’ may take a very long time since the horse has many options.
A good example here is teaching the horse piaffe (a slow trot on the spot). I could stand with a piece of carrot in front of the horse for days before it decided to trot on the spot, but if I gave it some motivation to trot, perhaps clucking with my voice or gently tapping with the whip, then I could take that behaviour and shape it using rewards.
This is why combined reinforcement is almost always used when training horses. In combined reinforcement, pressure of some sort is used to motivate the horse, released when the correct behaviour is displayed and then a reward is added for greater effect.
Combined reinforcement = pressure–release–reward
It is also important not to confuse release and reward. The horse must get the release as it is part of the pressure-release sequence and the reward is extra and comes after the behaviour has been performed. The release is not a reward.
The most concerning thing about riders claiming to ‘only use positive reinforcement’ is that if they don’t realise they are using pressure, then they will not be releasing it – one can’t stop doing something unless they first know they are doing it!
Best of both worlds
As riders, handlers or horse caregivers, we are always presented with a choice of which operant conditioning quadrant to use. By choosing combined reinforcement we can remain proactive, guide the horse towards the correct answer, reinforce those desirable behaviours and set the horse up for success.
Next time, we will take a closer look at the difference between proactive and reactive riders and show you how you can provide your horse with a predictable, safe learning environment.
In the meantime, to learn more about the practical application of equitation science, visit: https://www.kandooequine.com/