In the Skinner-box it is possible to change the contingency between the responses and the delivery of reinforcement so that more than one response may be required in order to obtain the reward. A whole range of rules can govern the contingency between responses and reinforcement - these different types of rules are referred to as schedules of reinforcement. Most of these schedules of reinforcement can be divided into schedules in which the contingency depends on the number of responses and those where the contingency depends on their timing.
Schedules that depend on the number of responses made are called ratio schedules. The ratio of the schedule is the number of responses required per reinforcement. The "classic" schedule, where one reinforcer is delivered for each response, is called a continuous reinforcement schedule - it has a ratio of 1. A schedule where two responses had to be made for each reinforcer has a ratio of 2 and so on. A distinction is also made between schedules where exactly the same number of responses have to be made for each reinforcer - fixed-ratio schedules, and those where the number of response required can differ for each reinforcer around some average value - a variable-ratio schedule. A schedule where exactly 20 responses were required for each reinforcer is called a fixed-ratio 20 or FR20 schedule. One where on average 30 response are required is called a variable-ratio 30 or VR30 schedule.
If the contingency between responses and reinforcement depends on time, the schdule is called an interval schedule. Reinforcing the first response an animal makes after the SD light has been on for 20 seconds and ignoring responses it makes during that 20 seconds would correspond to such a schedule. Where the interval which must elapse between the onset of the SD and the first reinforced response is the same for all reinforcers the schedule is called a fixed-interval or FI schedule. Again, the intervals could also vary around some average - this is called a variable-interval or VI schedule. It s possible to combine these schedules in various ways and even to construct other basic types of schedule (e.g. ones where animals are reinforced for maintaining specified intervals between responses - differential reinforcement of low rate of response or DRL schedules). The important thing about these different schedule, however, is the differences in response patterns and learning that they produce. These differences may tell us about part of what is learned in operant conditioning. A summary of the basic types of schedules might be useful:
The most characteristic response patterns are produced by FI and FR schedules. Responses in operant-conditioning experiments were traditionally recorded using a pen recorder (see figure in the Skinner-Box document) in which a pen was drawn across paper at a constant rate, the pen was moved up a small amount each time an animal made a response, a larger diagonal movement recorded the occurrence of reinforcements. Animals produce constant rate responding to FR schedules with a distinct pause in responding after each reinforcement.
The rate of response is inversely proportional to the ratio requirement. The length of the post-reinforcement pause in responding also increases as the ratio increases. The pattern of animal responding on FI schedules is quite different. After each reinforcement animals respond on FI schedules with gradually accelerating response rates which produces a 'scalloped' record:
The main feature of variable schedules is that, in animals, ratio schedules produce larger response rates than interval schedules for the same reinforcement density. For example, one animal might be trained on a variable ratio schedule and the times at which it received reinforcement could be noted. These time could then be used to form a 'yoked' variable interval schedule for another animal - an interval schedule where the interval between SD onset and the onset of a response-reinforcement contingency is determined by the times at which the first animal received each reinforcement. Typically the second animal would produce much slower response rates on the yoked schedule even though the frequency of reinforcement received by the two animals was more or less the same.
The rate at which an animal responds on a schedule is one measure of the strength of association that it makes between response and reinforcement. For example, the interesting thing about FI schedules is that, in these terms, responses become more strongly associated with reinforcement as the time since SD onset increases. The animal does not learn the FI contingency precisely, but does learn an approximation which keeps the effort it expends to obtain reinforcement relatively low while maintaining a good chance of obtaining that reinforcement almost as soon as it becomes available. We can study the way animals choose between reinforcers by presenting them with the opportunity to respond on two schedules simultaneously if we have a Skinner- box with two levers and two SD lights. One lever may, for example operate on a VR20 schedule while the other operated on a VR10. In a series of experiments with different combinations of schedules, the way in which animals allot their resources between reinforcers of different values can be discovered. It turns out that animals do not distribute their responses ideally - always making all of their response to the richer schedule, but again distribute their response in a way which serves to minimise responding given only approximate information about the different contingencies. Animals allot responses between schedules in proportion to the numbers of reinforcers they obtain on each schedule. This is known as the matching law - it has been studied not only by psychologists, but also more and more by economists since this behavior of rats often corresponds to economic behavior in humans.
The strength of associations can also be judged by the resistance to extinction - that is, how long an animal will keep responding in the presence of an SD whithout reiforcement. One of the main features of all schedules of reinforcement which are partial - that is where not every single response is reinforced, is that they produce learning which is much more resistant to extinction than continuous reinforcement.
In operant conditioning appetitive and aversive events produce different patterns of earning. One can distinguish between four different consequences of responding in operant conditioning:
Although animals can learn all of these contingencies it is very clear that they have quite different consequences in extinction. When a contingency fails to apply to a behavior actively produced by the animal it is clear that the contingency is not longer in operation. On the other hand, if behaving leads to aversive consequences then, in extinction the animal is unlikely to produce the behavior and hence to discover that the contingency no longer applies.
Observing the animal while being trained in a Skinner-Box, it becomes clear, that it has to learn the nature of the operant - this is known as response differentiation. By changing the response requirement, for example the force required to depress the lever, it comes out that animals learn response requirements very precisely. It is also clear, however, that they are not simply learning a set of muscle movements. Once a rat has learned to press one lever he does not have to relearn the whole process if he is extinguished and presented with a new learning task - we do not need to go through the whole trained procedure again even if the lever is now on the opposite side of the Skinner-box.
In addition to response differentiation the animal must also learn to discriminate the discriminative stimulus. This task can most clearly be seen when a number of stimuli can be presented to the animal only one of which is the true SD. If the factor which distinguishes the SD is its colour then we soon see that the number of responses the animal makes to colour which differ slightly from the SD colour is far fewer than would be the case if the SD did not have to be distinguished like this (see figure below).
According to Skinner, operant conditioning is not based on stimulus-response (S-R) associations, but rather on response-reinforcer (R-R) associations. Let us briefly look at some evidence. A rat is trained to make one type of response for one reinforcer - say chocolate-drops and a different response for a second reinforcer - food pellets. If the value of the first reinforcer is now reduced, for example by presenting it in the animals' home cage in conjunction with a chemical which makes the rat nauseous, then, when the rat returns to the Skinner box he will produce much less of the first type of behavior. If the animal had learned a stimulus-response association - i.e. when put in the Skinner-box there is an association with producing the first behavior, then we would not expect to see less of behavior one even before the animal has obtained a reinforcer. If, on the other hand, the association is between response and reinforcement then devaluation of the reinforcer would be expected to have just the effect observed on behavior. We can also demonstrate the R-R nature of operant associations by presenting additional reinforcers not contingent on responding on one of a pair of schedules. Imaging the same initial situation, however, now instead of devaluing chocolate-drops by paring them with poison we begin to present some chocolate drops to the animal in the Skinner-box whether or not it has met the schedule contingency. The animal now again makes fewer of the first 'chocolate-drop' responses. Again, the stimulus (the SD, being in the box and so on) have not changed yet a change in the contingency between response and reinforcement has effected behavior.
Instrumental learning normally clearly depends on a contingency between response and reinforcement, but must this always be the case? Normally, if a contingency is not present - if responding has no effect on whether reinforcement is obtained, then no learning occurs. There is, however, the possibility that a contingency is perceived where, in fact, there is none. To truly assess the contingency between response and reinforcement we need to know both the chances of obtaining a reinforcer if we respond and the chances of obtaining a reinforcer if we don't respond. If we never evaluate the latter probability because we are responding all the time then we may attribute a contingency to responding where there is none. The opposite can also occur. An extreme example of this is 'learned helplessness'. In the first part of a learned helplessness experiment an animal is subject to unavoidable shocks - there may be a potential path to escape, for example a wall to jump over, but escape is impossible, for example because the wall is too high. Soon the animal learns that escape is impossible and ceases attempting it. If the animal is now moved to a different situation in which escape is possible it will, nevertheless, fail to learn. Because it never performs escape behavior it does cannot discover that the chances of being shocked when it makes an escape attempt now are different from those it experience when not behaving. The lack of contingency perceived between behavior and shock is illusory. In these circumstances then conditioning is really being controlled by the contiguity of response and reinforcer not their contingency. It should, however, be emphasised that, in general, the effectiveness of instrumental learning depends on contingency.
Finally, as we noted above, when one considers partial schedules of reinforcement the animal learns approximations to the contingencies in operation, based on the samples of those contingencies he is exposed to. He does not learn the contingency as was idealised. If he did then on FI schedules no response would be produced until the interval was up and in concurrent schedules he would maximise not match.
This documents are restructured from a lecture kindly provided by R.W.Kentridge.