Operant Conditioning

A potent means of reinforcing and shaping voluntary behavior.

You may have heard of Pavlov's dog - a dog that was trained to drool at the sound of a bell. Pavlov's dog is an example of classical conditioning, where the trained behavior is automatic. That is, the dog automatically drools at the sight of food, and in this experiment that automatic behavior was transferred to the sound of a bell.

With operant conditioning, we're dealing with voluntary behavior. Operant conditioning involves providing reward structures through reinforcement schedules in order to elicit specific controlled behaviors.

Operant conditioning is ubiquitous in our society, and it can be incredibly powerful. Some would say dangerously so -- addictions such as gambling addiction are due in part to the nature of such conditioning. But operant conditioning is also used extensively in games (in fact, it may be all that's necessary), and is what makes many websites, such as Reddit, so pleasurable (maybe even addictive!). After becoming familiar with it, you'll notice it how pervasive it is, underlying many of the systems we enjoy using so much.

Reinforcement vs Punishment

Reinforcement is anything that encourages a behavior. Anything that reinforces is a reinforcer. Punishment, on the other hand, is anything that discourages a behavior. Anything that punishes is a punisher.

Both reinforcers and punishers can be divided into positive and negative. Positive just implies that something is being added or introduced; negative implies that something is being removed.

Positive Reinforcers

Positive reinforcers increase the likelihood of a behavior when they are presented. Positive reinforcers can range from candy and ice cream to a pat on the back or a promotion at your job. They are rewards in the traditional sense.

Negative Reinforcers

Negative reinforcers increase the likelihood of a behavior when they are removed. That is, they are displeasurable, or painful, or otherwise unappealing and undesired - so their removal is a reward. For example, the removing an electric shock would be a negative reinforcer.

Positive Punishers

Positive punishers decrease the likelihood of a behavior when they are presented. Electric shocks and other physical pain are common positive punishers - but they also include things such as economic sanctions. The introduction of a negative reinforcer would also be considered positive punishment.

Negative Punishers

Negative punishers decrease the likelihood of a behavior when they are removed. Taking away a child's candy, for example. In other words, negative punishment is the removal of a positive reinforcer.

The Skinner Box

The Skinner Box is the prototypical form of operant conditioning. You have a rat in a box with a lever. The target behavior for reinforcing is lever pressing, and the reinforcement reward is a food pellet.

In general, the rat presses the lever, and is rewarded with a food pellet. The exact reward conditions depend on the reinforcement schedule, which are described below.

A basic Skinner Box.
Reinforcement Schedules

In operant conditioning, there are four basic patterns of reinforcement, known collectively as reinforcement schedules. Each reinforcement schedule has different impacts on response frequency; some are more effective than others.

Reinforcement schedules can vary in two ways:

Interval vs. Ratio

Interval - Reinforcement is applied at regular intervals (that is, after certain amount of time have passed). For example, reinforcement is applied every 10 minutes. The size of this interval is negatively related to response rate - larger intervals mean lower response rates. However, the magnitude of the reward is positively related; bigger, better rewards mean higher response rates.

Ratio - Reinforcement is applied according to the subject's responses (that is, after a certain amount of responses). For example, reinforcement is applied every 20 responses. Ratio reinforcement can be very effective, but it is limited by fatigue - and thus limited by the size of the required ratio.

Fixed vs Variable

Fixed - Reinforcement will occur reliably when an interval or ratio is satisfied. For example, the reinforcement is applied every 20 responses exactly.

Variable - Reinforcement occurs only at an average of the interval or ratio. For example, the reinforcement is applied, on average, every 20 responses - it could happen at 18 responses, or 22 responses, and so on. This level of unpredicatbility makes variable reinforcement very powerful, as we'll see later.

From these we have four combinations of basic reinforcement schedules: fixed interval, fixed ratio, variable interval, and variable ratio. Their effectiveness in eliciting response rate is shown in the accompanying graph.

Response amount by reinforcement schedule type. Each hatch mark designates a reinforcement. Image adapted from Wikipedia.

Each reinforcement schedule has different effects on responses:

Fixed Interval - The response increases rather gradually as the end of the interval approaches. In the Skinner Box, the rat presses the lever at a higher rate as the next reinforcement time approaches.

Variable Interval - The response rate remains steady, at a relatively low rate, since response amount isn't a factor. In the Skinner Box, the rat presses the lever at a relatively flat rate, since it cannot anticipate when the next reinforcement will be.

Fixed Ratio - The response increases rapidly as the required response quota is approached. There's a pause at the beginning of each action-reward cycle; initial responses are known not to cause immediate reward, so there's little incentive to start. This pause varies positively with the size of the ratio - smaller ratios mean shorter pauses. Extremely high ratios can lead to abulia, meaning the reinforced behavior may be extinguished. In the Skinner Box, the rat presses the lever at a higher rate the closer it gets to its required response amount.

Variable Ratio - The response can remain steadily high. Variable ratio schedules with an average ratio equal to the ratio of a fixed ratio schedule are comparatively more powerful. In the Skinner Box, the rat presses the lever at a higher rate as the next reinforcement time approaches.

Operant Conditioning in the Real World

Operant conditioning is everywhere! One of my favorite examples is video games. Most games can be reduced to Skinner boxes, with core game mechanics being based on simple reinforcement schedules. For example, let's look at World of Warcraft.

Not too long ago (and even now still), WoW, as it was known, was a notoriously addictive game. Applying some basic principles of operant conditioning can give us some insight as to why.

If you're unfamiliar with the game, here's a brief overview: you have a character who runs around in a virtual world fighting monsters and completing quests. Your character acquires "experience points" by defeating these monsters and completing these quests.

Once you've acquired enough experience points, your character "levels up", which means they get stronger, and may gain access to new abilities. Your character is also rewarded items for defeating these monsters and completing these quests. These items can make your character even more powerful, among other things. Some items are extremely powerful and thus highly sought after.

Here, there are already two apparent reinforcement schedules. Leveling up is a fixed ratio schedule - you must kill a certain amount of monsters or complete a certain amount of quests before you level up again.

The item system is a variable ratio schedule. You know you can kill monsters and complete quests to get good items, but it's never a set amount.

You may find yourself caught in the "just one more" loop, where you tell yourself - just one more monster - the next one might drop a good item - and then I'll stop. But then it doesn't, and you don't stop. Because maybe the next one has that good item.

I don't mean to say this is all there is to these games. But it seems like (and this is very controversial) that maybe operant conditioning mechanics are all that's necessary to make a successful game. Ian Bogost's Cow Clicker is an infamous example and criticism of this (Wired has a great write-up on the game). For more on "behavioral game design", this article at Gamasutra is worth a read.

Sources