A
Brief Survey of Operant Behavior
It has long been known that behavior is affected by its consequences. We reward
and punish people, for example, so that they will behave in different ways. A more specific effect of a consequence
was first studied experimentally by Edward L. Thorndike in a well-known experiment. A cat enclosed in a box struggled
to escape and eventually moved the latch which opened the door. When repeatedly enclosed in a box, the cat gradually
ceased to do those things which had proved ineffective ("errors") and eventually made the successful response very quickly.
In
operant conditioning, behavior is also affected by its consequences, but the process is not trial-and-error learning.
It can best be explained with an example. A hungry rat is placed in a semi-soundproof box. For several days bits
of food are occasionally delivered into a tray by an automatic dispenser. The rat soon goes to the tray immediately
upon hearing the sound of the dispenser. A small horizontal section of a lever protruding from the wall has been resting
in its lowest position, but it is now raised slightly so that when the rat touches it, it moves downward. In doing so
it closes an electric circuit and operates the food dispenser. Immediately after eating the delivered food the
rat begins to press the lever fairly rapidly. The behavior has been strengthened or reinforced by a single consequence.
The rat was not "trying" to do anything when it first touched the lever and it did not learn from "errors."
To a hungry
rat, food is a natural reinforcer, but the reinforcer in this example is the sound of the food dispenser, which was conditioned
as a reinforcer when it was repeatedly followed by the delivery of food before the lever was pressed. In fact, the sound
of that one operation of the dispenser would have had an observable effect even though no food was delivered on that occasion,
but when food no longer follows pressing the lever, the rat eventually stops pressing. The behavior is said to have
been extinguished.
An operant can come under the control of a stimulus. If pressing the lever is reinforced
when a light is on but not when it is off, responses continue to be made in the light but seldom, if at all, in the dark.
The rat has formed a discrimination between light and dark. When one turns on the light, a response occurs, but
that is not a reflex response.
The lever can be pressed with different amounts of force, and if only strong responses
are reinforced, the rat presses more and more forcefully. If only weak responses are reinforced, it eventually responds
only very weakly. The process is called differentiation.
A response must first occur for other reasons
before it is reinforced and becomes an operant. It may seem as if a very complex response would never occur to be reinforced,
but complex responses can be shaped by reinforcing their component parts separately and putting them together in the
final form of the operant.
Operant reinforcement not only shapes the topography of behavior, it maintains
it in strength long after an operant has been formed. Schedules of reinforcement are important in maintaining
behavior. If a response has been reinforced for some time only once every five minutes, for example, the rat soon stops
responding immediately after reinforcement but responds more and more rapidly as the time for the next reinforcement approaches.
(That is called a fixed-interval schedule of reinforcement.) If a response has been reinforced n the average
every five minutes but unpredictably, the rat responds at a steady rate. (That is a variable-interval schedule
of reinforcement.) If the average interval is short, the rate is high; if it is long, the rate is low.
If a response
is reinforced when a given number of responses has been emited, the rat responds more and more rapidly as the required number
is approached. (That is a fixed-ratio schedule of reinforcement.) The number can be increased by easy stages up
to a very high value; the rat will continue to respond even though a response is only very rarely reinforced. "Piece-rate
pay" in industry is an example of a fixed-ratio schedule, and employers are sometimes tempted to "stretch" it by increasing
the amount of work required for each unit of payment. When reinforcement occurs after an average number of responses
but unpredictably, the schedule is called variable-ratio. It is familiar in gambling devices and systems which
arrange occasional but unpredictable payoffs. The required number of responses can easily be stretched, and in a gambling
enterprise such as a casino the average ratio must be such that the gambler loses in the long run if the casino is to make
a profit.
Reinforcers may be positive or negative. A positive reinforcer reinforces when it is presented; a negative
reinforcer reinforces when it is withdrawn. Negative reinforcement is not punishment. Reinforcers always strengthen
behavior; that is what "reinforced" means. Punishment is used to suppress behavior. It consists of removing a
positive reinforcer or presenting a negative one. It often seems to operate by conditioning negative reinforcers.
The punished person henceforth acts in ways which reduce the threat of punishment and which are incompatible with, and hence
take the place of, the behavior punished.
This human species is distinguished by the fact that its vocal responses
can be easily conditioned as operants. There are many kinds of verbal operants because the behavior must be reinforced
only through the mediation of other people, and they do many different things. The reinforcing practices of a given
culture compose what is called a language. The practices are responsible for most of the extraordinary achievements
of the human species. Other species acquire behavior from each other through imitation and modelling (they show each
other what to do), but they cannot tell each other what to do. We acquire most of our behavior with that kind of help.
We take advice, heed warnings, observe rules, and obey laws, and our behavior then comes under the control of consequences
which would otherwise not be effective. Most of our behavior is too complex to have occurred for the first time without
such verbal help. By taking advice and following rules we acquire a much more extensive repertoire than would be possible
through a solitary contact with the environment.
Responding because behavior has had reinforcing consequences is very
different from responding by taking advice, following rules, or obeying laws. We do not take advice because of the particular
consequence that will follow; we take it only when taking other advice from similar sources has already had reinforcing consequences.
In general, we are much more strongly inclined to do things if they have had immediate reinforcing consequences than if we
have been merely advised to do them.
The innate behavior studied by ethologists is shaped and maintained by its contribution
to the survival of the individual and species. Operant behavior is shaped and maintained by its consequences for the
individual. Both processes have controversial features. Neither one seems to have any place for a prior plan or
purposes. In both, selection replaces creation.
Personal freedom also seems threatened. It is only the
feeling of freedom, however, which is affected. Those who respond because their behavior has had positively reinforcing
consequences usually feel free. They seem to be doing what they want to do. Those who respond because
the reinforcement has been negative and who are therefore avoiding or escaping from punishment are doing what they have
to do and do not feel free. These distinctions do not involve the fact of freedom.
The experimental analysis
of operant behavior has led to a technology often called behavior modification. It usually consists of changing the
consequences of behavior, removing consequences which have caused trouble, or arranging new consequences for behavior which
has lacked strength. Historically, people have been controlled primarily through negative reinforcement that is, they
have been punished when they have not done what is reinforcing to those who could punish them. Positive reinforcement
has been less often used, partly because its effect is slightly deferred, but it can be as effective as negative reinforcement
and has many fewer unwanted byproducts. For example, students who are punished when they do not study may study, but
they may also stay away from school (truancy), vandalize school property, attack teachers, or stubbornly do nothing.
Redesigning school systems so that what students do is more often positively reinforced can make a great difference.
(For
further details, see my The Behavior of Organisms, my Science and Human Behavior, and Schedules of Reinforcement
by C. F. Ferster and me.)
-- B. F. Skinner
http://www.bfskinner.org/Operant.asp
___
The theory of B.F. Skinner is
based upon the idea that learning is a function of change in overt behavior. Changes in behavior are the result of an individual's
response to events (stimuli) that occur in the environment. A response produces a consequence such as defining a word, hitting
a ball, or solving a math problem. When a particular Stimulus-Response (S-R) pattern is reinforced (rewarded), the individual
is conditioned to respond. The distinctive characteristic of operant conditioning relative to previous forms of behaviorism
(e.g., Thorndike, Hull) is that the organism can emit responses instead of only eliciting response due to
an external stimulus.
Reinforcement is the key element
in Skinner's S-R theory. A reinforcer is anything that strengthens the desired response. It could be verbal praise, a good
grade or a feeling of increased accomplishment or satisfaction. The theory also covers negative reinforcers -- any stimulus
that results in the increased frequency of a response when it is withdrawn (different from adversive stimuli -- punishment
-- which result in reduced responses). A great deal of attention was given to schedules of reinforcement (e.g. interval versus
ratio) and their effects on establishing and maintaining behavior.
One of the distinctive aspects
of Skinner's theory is that it attempted to provide behavioral explanations for a broad range of cognitive phenomena. For
example, Skinner explained drive (motivation) in terms of deprivation and reinforcement schedules. Skinner (1957) tried to
account for verbal learning and language within the operant conditioning paradigm, although this effort was strongly rejected
by linguists and psycholinguists. Skinner (1971) deals with the issue of free will and social
http://tip.psychology.org/skinner.html
|