~ Operant Conditioning ~
|
In 1937, Burrhus Frederic Skinner, usually referred to as B.F. Skinner, coined the term operant conditioning (also referred to as 'instrumental conditioning') in describing his theory of learning and behavior, distinguishing itself from the traditional Pavlovian method of classical conditioning. While both models are paradigms of associative learning, incorporating variations of a theme of stimulus and response as basic mechanisms for behavioral change, operant conditioning emphasizes learning as an active process whereas Pavlovian or classical conditioning is based upon involuntary learning.
Skinner describes operant conditioning as behavior that operates upon the environment to generate consequences (reinforcement or punishment), which can then serve to strengthen or weaken behavior respectively. |
According to Skinner, behaviors can be analyzed according to a three-term-contingency as follows:
discriminative stimulus --- operant response --- reinforcer
also known as the ABCs of behavior, as follows:
Antecedent --- Behavior --- Consequence
whereby the Antecedent (A) or discriminative stimulus represents the person or event that brings about a Behavior (B) or operant response, which in turn brings about a Consequence (C) that will strengthen or weaken the behavior from continuing or ceasing to occur. Learning, then, results from the association between the discriminative stimulus and the behavioral response. The reinforcing or extinguishing of the response will determine whether or not the behaviors will continue.
Reinforcement following a behavior increases the probability that the behavior will continue, whether it is the presentation of something pleasurable (positive) or the removal of something noxious (negative). According to Skinner, by reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time.
Differential reinforcement is a method that includes positive reinforcement as well as extinction, the cessation of a previously reinforced behavior, and is a critical component of the shaping process. By reinforcing the target behaviors and withholding reinforcement from those previously reinforced approximations that were necessary in order to arrive at the target behavior, the subject is able to differentiate a target from the prior approximations. We can see this in action by taking a look at Porter
The Driving Dog video.
Reinforcers and Punishers
Skinner identified three different types of operants; neutral operants, reinforcers, and punishers. Neutral operants are those responses that occur from the environment that are neutral, they neither increase or decrease the probability of behaviors repeating. Then there are negative reinforcers and positive reinforcers, and negative punishments and positive punishments. In operant conditioning, there is a different connotation to the terms positive and negative, and to reinforcers and punishment.
The way to understand them is as follows: Negative means to remove a stimulus, positive means to present a stimulus. A reinforcer is used to increase behaviors, punishers are used to decrease behaviors. Therefore, negative reinforcement is the removal of an aversive stimulus in order to increase the target behaviors. Positive reinforcement is the presentation of an appetitive or pleasant stimulus in order to increase the target behaviors Negative punishment is the removal of an appetitive or pleasant stimulus in order to decrease the target behaviors, and positive punishment (this one often confuses people), is the presentation of an aversive stimulus in order to decrease the target behaviors.
If I want little Johnny to eat more vegetables (the target behavior), then we need to use reinforcement because reinforcers increase the targeted behavior. But we can either use positive reinforcement or negative reinforcement. In positive reinforcement (the presentation of a positive stimulus), we tell little Johnny that if he eats all his veggies, he can have his favorite dessert. Or, we can use negative reinforcement (the removal of a negative stimulus) and tell little Johnny that if he eats all of his veggies he doesn't have to take out the trash. If however, little Johnny spends the majority of his waking moments on his electronics and we want him to decrease the amount of time he has his face buried in his electronics, then we want to use Punishment (to decrease the behavior). Just like reinforcers, for punishers, they can be positive or negative. To use negative punishment, we would tell little Johnny that we are going to take away his electronics or his router, thereby preventing him from using them, or, we can use positive punishment by telling little Johnny that if he continues to use his electronics, he will be required to take out the trash every night and clean the bathrooms every week.
Schedules of Reinforcement
Operant conditioning is a seriously complex set of rules and tools and include multiple types of reinforcers called schedules of reinforcement. Schedules of reinforcement include Interval Schedules, which can be either fixed or variable, or Ratio Schedules which can also be either fixed or variable. There is also the process of Extinction whereby schedules of reinforcement are extinguished or discontinued such that the targeted behavior no longer occurs.
Interval Schedules of Reinforcement are those methods that are based upon a specified amount of time that should elapse between responses. When the interval schedule is fixed, it means that the amount of time in between responses is a set amount. For example, many newer cars have windshield wipers that can be set to go off (wipe) every 3 seconds or every 6 seconds; this would be illustrative of fixed intervals. Variable intervals occur when the amount of time between responses is varied and produce the the most steady rate of responding. If you were training your pet rat to press a lever to receive a food pellet, giving the animal a pellet every 4th press of the lever, will not produce much responding as your pet will quickly learn that the first three presses will yield him nothing and therefore your rat will not be motivated to continue pressing the lever. If however, instead of a fixed interval of receiving pellets, you varied the amount of time, it would look something like this: after only 3 seconds of lever pressing the rat receives a food pellet, but then 8 seconds later the rat receives a pellet, and then 9 seconds later, and then back to 3 seconds. In this way the rat will continuously press that lever because it will be very difficult for it to learn the pattern of when the pellet will be received.
With Ratio Schedules of Reinforcement, instead of using time, you are using a certain number of responses. Using the windshield wiper example, you can set your wipers so that after the first swipe, they will go off for 4 swipes and stop, and then another 4 swipes and then stop, etc. This would be an example of a fixed ratio of reinforcement as the wipers are being swiped based on a fixed number of times (swipes). Using the pet rat example, a pellet may be received after the rat presses the lever 3 times, then he receives another pellet after pressing the lever 6 times, then he receives another pellet after depressing the lever 4 times, and the pattern repeats. This produces a much better lever response as your pet will continuously press the lever because it cannot readily figure out the pattern. This is an example of a varied ratio schedule. Casinos use a combination of variable intervals and variable ratios, in order to maximize the fact that it will be very difficult to figure out the pattern of when they win the slot machines and so they will continuously press that lever, much the same as the pet rat.
The Premack Principle
In 1959, experimental psychologist David Premack published a paper that successfully challenged what operant conditioning held as the the definitive role of reinforcement. In operant conditioning, reinforcement is based upon the contingency between a stimulus and a behavior, such that a behavior is strengthened by the immediate consequence that follows its occurrence. Premack's research demonstrated otherwise. In what has become known as The Premack Principle, reinforcers are no longer the stimuli, but serve instead as the response.
According to Premack, reinforcement is based upon the contingencies between two behaviors of differential probability, removing the stimulus as a reinforcer. When one behavior is made contingent upon another behavior, the more probable behavior serves as reinforcement for the less probable behavior Granting several strife-free hours of gaming (high probability activity) contingent upon cleaning the room (low probability activity) is powerful reinforcement for a gamer. High probability behaviors are those activities that are highly rewarding on their own, require no inducement, and serve as powerful reinforcement for low probability activities.
Want to know more? You can follow my blogs, attend a webinar or seminar, or take an e-course.
discriminative stimulus --- operant response --- reinforcer
also known as the ABCs of behavior, as follows:
Antecedent --- Behavior --- Consequence
whereby the Antecedent (A) or discriminative stimulus represents the person or event that brings about a Behavior (B) or operant response, which in turn brings about a Consequence (C) that will strengthen or weaken the behavior from continuing or ceasing to occur. Learning, then, results from the association between the discriminative stimulus and the behavioral response. The reinforcing or extinguishing of the response will determine whether or not the behaviors will continue.
Reinforcement following a behavior increases the probability that the behavior will continue, whether it is the presentation of something pleasurable (positive) or the removal of something noxious (negative). According to Skinner, by reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time.
Differential reinforcement is a method that includes positive reinforcement as well as extinction, the cessation of a previously reinforced behavior, and is a critical component of the shaping process. By reinforcing the target behaviors and withholding reinforcement from those previously reinforced approximations that were necessary in order to arrive at the target behavior, the subject is able to differentiate a target from the prior approximations. We can see this in action by taking a look at Porter
The Driving Dog video.
Reinforcers and Punishers
Skinner identified three different types of operants; neutral operants, reinforcers, and punishers. Neutral operants are those responses that occur from the environment that are neutral, they neither increase or decrease the probability of behaviors repeating. Then there are negative reinforcers and positive reinforcers, and negative punishments and positive punishments. In operant conditioning, there is a different connotation to the terms positive and negative, and to reinforcers and punishment.
The way to understand them is as follows: Negative means to remove a stimulus, positive means to present a stimulus. A reinforcer is used to increase behaviors, punishers are used to decrease behaviors. Therefore, negative reinforcement is the removal of an aversive stimulus in order to increase the target behaviors. Positive reinforcement is the presentation of an appetitive or pleasant stimulus in order to increase the target behaviors Negative punishment is the removal of an appetitive or pleasant stimulus in order to decrease the target behaviors, and positive punishment (this one often confuses people), is the presentation of an aversive stimulus in order to decrease the target behaviors.
If I want little Johnny to eat more vegetables (the target behavior), then we need to use reinforcement because reinforcers increase the targeted behavior. But we can either use positive reinforcement or negative reinforcement. In positive reinforcement (the presentation of a positive stimulus), we tell little Johnny that if he eats all his veggies, he can have his favorite dessert. Or, we can use negative reinforcement (the removal of a negative stimulus) and tell little Johnny that if he eats all of his veggies he doesn't have to take out the trash. If however, little Johnny spends the majority of his waking moments on his electronics and we want him to decrease the amount of time he has his face buried in his electronics, then we want to use Punishment (to decrease the behavior). Just like reinforcers, for punishers, they can be positive or negative. To use negative punishment, we would tell little Johnny that we are going to take away his electronics or his router, thereby preventing him from using them, or, we can use positive punishment by telling little Johnny that if he continues to use his electronics, he will be required to take out the trash every night and clean the bathrooms every week.
Schedules of Reinforcement
Operant conditioning is a seriously complex set of rules and tools and include multiple types of reinforcers called schedules of reinforcement. Schedules of reinforcement include Interval Schedules, which can be either fixed or variable, or Ratio Schedules which can also be either fixed or variable. There is also the process of Extinction whereby schedules of reinforcement are extinguished or discontinued such that the targeted behavior no longer occurs.
Interval Schedules of Reinforcement are those methods that are based upon a specified amount of time that should elapse between responses. When the interval schedule is fixed, it means that the amount of time in between responses is a set amount. For example, many newer cars have windshield wipers that can be set to go off (wipe) every 3 seconds or every 6 seconds; this would be illustrative of fixed intervals. Variable intervals occur when the amount of time between responses is varied and produce the the most steady rate of responding. If you were training your pet rat to press a lever to receive a food pellet, giving the animal a pellet every 4th press of the lever, will not produce much responding as your pet will quickly learn that the first three presses will yield him nothing and therefore your rat will not be motivated to continue pressing the lever. If however, instead of a fixed interval of receiving pellets, you varied the amount of time, it would look something like this: after only 3 seconds of lever pressing the rat receives a food pellet, but then 8 seconds later the rat receives a pellet, and then 9 seconds later, and then back to 3 seconds. In this way the rat will continuously press that lever because it will be very difficult for it to learn the pattern of when the pellet will be received.
With Ratio Schedules of Reinforcement, instead of using time, you are using a certain number of responses. Using the windshield wiper example, you can set your wipers so that after the first swipe, they will go off for 4 swipes and stop, and then another 4 swipes and then stop, etc. This would be an example of a fixed ratio of reinforcement as the wipers are being swiped based on a fixed number of times (swipes). Using the pet rat example, a pellet may be received after the rat presses the lever 3 times, then he receives another pellet after pressing the lever 6 times, then he receives another pellet after depressing the lever 4 times, and the pattern repeats. This produces a much better lever response as your pet will continuously press the lever because it cannot readily figure out the pattern. This is an example of a varied ratio schedule. Casinos use a combination of variable intervals and variable ratios, in order to maximize the fact that it will be very difficult to figure out the pattern of when they win the slot machines and so they will continuously press that lever, much the same as the pet rat.
The Premack Principle
In 1959, experimental psychologist David Premack published a paper that successfully challenged what operant conditioning held as the the definitive role of reinforcement. In operant conditioning, reinforcement is based upon the contingency between a stimulus and a behavior, such that a behavior is strengthened by the immediate consequence that follows its occurrence. Premack's research demonstrated otherwise. In what has become known as The Premack Principle, reinforcers are no longer the stimuli, but serve instead as the response.
According to Premack, reinforcement is based upon the contingencies between two behaviors of differential probability, removing the stimulus as a reinforcer. When one behavior is made contingent upon another behavior, the more probable behavior serves as reinforcement for the less probable behavior Granting several strife-free hours of gaming (high probability activity) contingent upon cleaning the room (low probability activity) is powerful reinforcement for a gamer. High probability behaviors are those activities that are highly rewarding on their own, require no inducement, and serve as powerful reinforcement for low probability activities.
Want to know more? You can follow my blogs, attend a webinar or seminar, or take an e-course.