What is Operant Conditioning?

A Brief Survey of Operant Behavior

It has long been known that behavior is affected by its consequences. We reward and punish people, for example, so that they will behave in different ways. A more specific effect of a consequence was first studied experimentally by Edward L. Thorndike in a well-known experiment. A cat enclosed in a box struggled to escape and eventually moved the latch which opened the door. When repeatedly enclosed in a box, the cat gradually ceased to do those things which had proved ineffective ("errors") and eventually made the successful response very quickly.

In operant conditioning, behavior is also affected by its consequences, but the process is not trial-and-error learning. It can best be explained with an example. A hungry rat is placed in a semi-soundproof box. For several days bits of food are occasionally delivered into a tray by an automatic dispenser. The rat soon goes to the tray immediately upon hearing the sound of the dispenser. A small horizontal section of a lever protruding from the wall has been resting in its lowest position, but it is now raised slightly so that when the rat touches it, it moves downward. In doing so it closes an electric circuit and operates the food dispenser. Immediately after eating the delivered food the rat begins to press the lever fairly rapidly. The behavior has been strengthened or reinforced by a single consequence. The rat was not "trying" to do anything when it first touched the lever and it did not learn from "errors."

To a hungry rat, food is a natural reinforcer, but the reinforcer in this example is the sound of the food dispenser, which was conditioned as a reinforcer when it was repeatedly followed by the delivery of food before the lever was pressed. In fact, the sound of that one operation of the dispenser would have had an observable effect even though no food was delivered on that occasion, but when food no longer follows pressing the lever, the rat eventually stops pressing. The behavior is said to have been extinguished.

An operant can come under the control of a stimulus. If pressing the lever is reinforced when a light is on but not when it is off, responses continue to be made in the light but seldom, if at all, in the dark. The rat has formed a discrimination between light and dark. When one turns on the light, a response occurs, but that is not a reflex response.

The lever can be pressed with different amounts of force, and if only strong responses are reinforced, the rat presses more and more forcefully. If only weak responses are reinforced, it eventually responds only very weakly. The process is called differentiation.

A response must first occur for other reasons before it is reinforced and becomes an operant. It may seem as if a very complex response would never occur to be reinforced, but complex responses can be shaped by reinforcing their component parts separately and putting them together in the final form of the operant.

Operant reinforcement not only shapes the topography of behavior, it maintains it in strength long after an operant has been formed. Schedules of reinforcement are important in maintaining behavior. If a response has been reinforced for some time only once every five minutes, for example, the rat soon stops responding immediately after reinforcement but responds more and more rapidly as the time for the next reinforcement approaches. (That is called a fixed-interval schedule of reinforcement.) If a response has been reinforced n the average every five minutes but unpredictably, the rat responds at a steady rate. (That is a variable-interval schedule of reinforcement.) If the average interval is short, the rate is high; if it is long, the rate is low.

If a response is reinforced when a given number of responses has been emited, the rat responds more and more rapidly as the required number is approached. (That is a fixed-ratio schedule of reinforcement.) The number can be increased by easy stages up to a very high value; the rat will continue to respond even though a response is only very rarely reinforced. "Piece-rate pay" in industry is an example of a fixed-ratio schedule, and employers are sometimes tempted to "stretch" it by increasing the amount of work required for each unit of payment. When reinforcement occurs after an average number of responses but unpredictably, the schedule is called variable-ratio. It is familiar in gambling devices and systems which arrange occasional but unpredictable payoffs. The required number of responses can easily be stretched, and in a gambling enterprise such as a casino the average ratio must be such that the gambler loses in the long run if the casino is to make a profit.

Reinforcers may be positive or negative. A positive reinforcer reinforces when it is presented; a negative reinforcer reinforces when it is withdrawn. Negative reinforcement is not punishment. Reinforcers always strengthen behavior; that is what "reinforced" means. Punishment is used to suppress behavior. It consists of removing a positive reinforcer or presenting a negative one. It often seems to operate by conditioning negative reinforcers. The punished person henceforth acts in ways which reduce the threat of punishment and which are incompatible with, and hence take the place of, the behavior punished.

This human species is distinguished by the fact that its vocal responses can be easily conditioned as operants. There are many kinds of verbal operants because the behavior must be reinforced only through the mediation of other people, and they do many different things. The reinforcing practices of a given culture compose what is called a language. The practices are responsible for most of the extraordinary achievements of the human species. Other species acquire behavior from each other through imitation and modelling (they show each other what to do), but they cannot tell each other what to do. We acquire most of our behavior with that kind of help. We take advice, heed warnings, observe rules, and obey laws, and our behavior then comes under the control of consequences which would otherwise not be effective. Most of our behavior is too complex to have occurred for the first time without such verbal help. By taking advice and following rules we acquire a much more extensive repertoire than would be possible through a solitary contact with the environment.

Responding because behavior has had reinforcing consequences is very different from responding by taking advice, following rules, or obeying laws. We do not take advice because of the particular consequence that will follow; we take it only when taking other advice from similar sources has already had reinforcing consequences. In general, we are much more strongly inclined to do things if they have had immediate reinforcing consequences than if we have been merely advised to do them.

The innate behavior studied by ethologists is shaped and maintained by its contribution to the survival of the individual and species. Operant behavior is shaped and maintained by its consequences for the individual. Both processes have controversial features. Neither one seems to have any place for a prior plan or purposes. In both, selection replaces creation.

Personal freedom also seems threatened. It is only the feeling of freedom, however, which is affected. Those who respond because their behavior has had positively reinforcing consequences usually feel free. They seem to be doing what they want to do. Those who respond because the reinforcement has been negative and who are therefore avoiding or escaping from punishment are doing what they have to do and do not feel free. These distinctions do not involve the fact of freedom.

The experimental analysis of operant behavior has led to a technology often called behavior modification. It usually consists of changing the consequences of behavior, removing consequences which have caused trouble, or arranging new consequences for behavior which has lacked strength. Historically, people have been controlled primarily through negative reinforcement that is, they have been punished when they have not done what is reinforcing to those who could punish them. Positive reinforcement has been less often used, partly because its effect is slightly deferred, but it can be as effective as negative reinforcement and has many fewer unwanted byproducts. For example, students who are punished when they do not study may study, but they may also stay away from school (truancy), vandalize school property, attack teachers, or stubbornly do nothing. Redesigning school systems so that what students do is more often positively reinforced can make a great difference.

(For further details, see my The Behavior of Organisms, my Science and Human Behavior, and Schedules of Reinforcement by C. F. Ferster and me.)

-- B. F. Skinner

http://www.bfskinner.org/Operant.asp

___

The theory of B.F. Skinner is based upon the idea that learning is a function of change in overt behavior. Changes in behavior are the result of an individual's response to events (stimuli) that occur in the environment. A response produces a consequence such as defining a word, hitting a ball, or solving a math problem. When a particular Stimulus-Response (S-R) pattern is reinforced (rewarded), the individual is conditioned to respond. The distinctive characteristic of operant conditioning relative to previous forms of behaviorism (e.g., Thorndike, Hull) is that the organism can emit responses instead of only eliciting response due to an external stimulus.

Reinforcement is the key element in Skinner's S-R theory. A reinforcer is anything that strengthens the desired response. It could be verbal praise, a good grade or a feeling of increased accomplishment or satisfaction. The theory also covers negative reinforcers -- any stimulus that results in the increased frequency of a response when it is withdrawn (different from adversive stimuli -- punishment -- which result in reduced responses). A great deal of attention was given to schedules of reinforcement (e.g. interval versus ratio) and their effects on establishing and maintaining behavior.

One of the distinctive aspects of Skinner's theory is that it attempted to provide behavioral explanations for a broad range of cognitive phenomena. For example, Skinner explained drive (motivation) in terms of deprivation and reinforcement schedules. Skinner (1957) tried to account for verbal learning and language within the operant conditioning paradigm, although this effort was strongly rejected by linguists and psycholinguists. Skinner (1971) deals with the issue of free will and social

http://tip.psychology.org/skinner.html