Bob Kentridge 1995

Comparative Psychology:Lecture 6.

Edward Thorndike, puzzle-boxes and the law of effect.

You may recall that we left comparative psychology's development as an empirical response to Darwin's 'Descent of Man' very much on the verge of becoming a science. The importance of the problem and the practical difficulties had been recognised and, by the end of the century, serious efforts were being made to produce objective tests of animal intelligence. The focus of this work was now America where the publication of William James' 'Principles of Psychology' (1890) inspired a growing number of graduate-students. One, Edward Thorndike, attempted to develop some of the anecdotes on the mechanical problem solving ability of cats and dogs collected by George Romanes into an objective experimental method. Thorndike devised a number of wooden crates which required various combinations of latches, levers, strings and treadles to open them. A dog or a cat would be put in one of these 'puzzle-boxes' and, sooner or later would manage to escape from it. Thorndike's initial aim was to show that the anecdotal achievements of cats and dogs could be replicated in controlled, standardised circumstance, however, he soon realised that he could now measure animal intelligence using this equipment. His method was to set an animal the same task repeatedly, each time measuring the time it took to solve it. Thorndike could then compare these 'learning-curves' across different situations and different species.
You'd see a Thorndike Learning Curves
image here if you were using a graphical web browser like Mosaic 
or Netscape.
Thorndike was particularly interested in discovering whether his animals could learn their tasks through imitation or observation. He compared the learning curves of cats who had been given the opportunity of observing others escaping from a box with those who had never seen the box being solved and found no difference in their rate of learning. He obtained the same null result with dogs and, even when he showed the animals the methods of opening a box by placing their paws on the appropriate levers and so on, he found no improvement. He fell back on a much simpler trial and error explanation of learning. Occasionally, quite by chance, an animal performs an action which frees it from the box. When the animal finds itself in the same position again it is more likely to perform the same action again. The reward of being freed from the box somehow strengthens an association between a stimulus, being in a certain position in the box, and an appropriate action. Reward acts to strengthen stimulus-response associations. The animal learns to solve the puzzle-box not by reflecting on possible actions and really puzzling its way out of it but by a quite mechanical development of actions originally made by chance. By 1910 Thorndike had formalised this notion into a 'law' of psychology - the law of effect. In full it reads:
"Of several responses made to the same situation those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections to the situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond."
It is worth quoting in full, first because it essentially drove comparative psychology in north America and Europe for fifty years and second because Thorndike maintained that, in combination with the law of exercise, the notion that associations are strengthen by use and weakened with disuse, and the concept of instinct, the law of effect could explain all of human behaviour in terms of the development of myriads of stimulus-response associations. It is worth briefly comparing trial and error learning with classical conditioning. In classical conditioning a neutral stimulus becomes association with part of a reflex (either the US or the UR). In trial and error learning no reflex is involved. A reinforcing or punishing event (a type of stimulus) alters the strength of association between a neutral stimulus and quite arbitrary response. The response is not to any part of a reflex.

J.B. Watson, learning and the lab rat.

The position that human behaviour could be explained entirely terms of reflexes, stimulus-response associations, and the effects of reinforcers upon them entirely excluding 'mental' terms like desires, goals and so on was taken up by John Broadhus Watson in his 1914 book 'Behavior: An Introduction to Comparative Psychology.'. Watson had also been involved in the introduction of the most favoured subject in comparative psychology - the laboratory rat. One of his early jobs which he used to fund his Ph.D. was as a caretaker, one of whose duties was to look after laboratory rats used in studies intended to mimic 'real-life' learning tasks such as navigating complex mazes (a scale-model of the Hampton-Court maze!). Watson became adept at taming rats and found he could train rats to open a puzzle-box like Thorndike's for a small food-reward. He also studied maze- learning but simplified the task dramatically. One type of maze is simply a long straight alley with food at the end. Watson found that once the animal was well trained at running this 'maze' it did so almost automatically. Once started by the stimulus of the maze its behaviour becomes a series of associations between movements (or their kinaesthetic consequences) rather than stimuli in the outside world. This is made plain by shortening the alleyway - the well-trained rats now run straight into the end wall. This was known as the kerplunk experiment. The development of wel-controlled behavioural techniques by Watson also allowed him to explore animals sensory abilities, for example their abilities to discriminate between similar stimuli, experimentally. Watson's theoretical position was even more extreme than Thorndike's - he would have no place for mentalistic concepts like pleasure or distress in his explanations of behaviour. He essentially rejected the law of effect, denying that pleasure or discomfort caused stimulus-response associations to be learned. For Watson, all that was important was the frequency of occurrence of stimulus-response pairings. Reinforcers might cause some responses to occur more often in the presence of particular stimuli, but they did not act directly to cause their learning. Watson could therefore reject the notion that some mental traces of stimuli and responses needed to be retained in an animals mind until a reinforcer caused an association between them to be strengthened, which is a rather mentalistic consequence of the law of effect.

Human Behaviour and Little Albert.

Watson became an extremely influential force in American Psychology, publishing his second book 'Psychology from the Standpoint of a Behaviorist' in 1919. His rejection of mentalism was total. He felt that thought was explicable as subvocalisation and that speech was simply another behaviour which might be learned by the law-of effect. In Psychology from the Standpoint of a Behaviorist' he addresses a number of practical human problems such as education, the development of emotional reaction and the effects of factors like alcohol or drugs on human performance. He even suggests that thought processes might be investigated by monitoring movements in the larynx. Watson believed that mental illness was the result of 'habit distortion' which might be caused by fortuitous learning of inappropriate associations which then go on to influence a person's behaviour so that it become ever more abnormal. Watson tested part of this hypothesis on a baby in the hospital in which he worked. The baby, 'little Albert', apparently showed no particular fears or phobias about anything apart from sudden loud sounds. For example, when Watson placed a tame white rat in little Albert's lap the child happily played with the animal. On a subsequent occasion Watson placed the rat in Albert's lap and his assistant made a loud noise by striking a large steel bar directly behind Albert's head. One week later Albert was subjected to the same experience. After this, when Albert was showed the rat be began to fret, appearing anxious. Similar reactions were produced by other furry objects (a fur coat). Watson was keen to use this as evidence for the behavioural basis of phobias, however, apparently Albert's reactions to the rate were quite mild. Nevertheless, one of the most widespread applications of conditioning has been in the treatment of phobias and other behaviour problems and the case of Little Albert is often cited as the first experiment in this field.

The Fall and Rise of J.B.Watson.

Shortly after his experiment on Little Albert Watson became romantically involved with one of his research assistant's - Rosalie Rayner. At the time such behaviour was not tolerated in American academia and Watson was eventually forced to retire from research. He soon, however, found gainful employment with the J. Walter Thompson advertising agency where, using techniques from his behavioural psychology, he showed that people's preferences between rival products were not based on their sensory qualities but on their associations. He went on to develop the selling of products like Maxwell House Coffee, Pond's Cold Cream, Johnson's Baby Powder and Odorono (one of the first deodorants). By 1924 he was on of the four vice-presidents of this very successful agency.

Cognition and learning in the 1930s.

In the 1920's behaviorism began to wane in popularity somewhat. A number of studies in the Berkeley laboratory of Edward Tolman appeared both to show flaws in the law of effect and require mental representations in their explanation. For example, rats were allowed to explore a maze in which there were three routes of different lengths between the starting position and the goal. The rats behaviour when the maze was blocked implied that they must have some sort of mental map of the maze. The rats prefer the routes according to their shortness, so, when the maze is blocked at point A, stopping them using the shortest route, they will choose the second shortest route. When, however, the maze is blocked at point B the rats does not retrace his steps and use route 2, which would be predicted according to the law of effect, but rather uses route 3 . The rat must be recognising that block B will stop him using route 2 by using some memory of the layout of the maze. Tolman's group also showed that animals could use knowledge they gained learning a maze by running to navigate it swimming and that unexpected changes in the quality of reward could weaken learning even though the animal was still rewarded. This result was developed further by Crespi who, in 1942, showed that unexpected decreases in reward quantity caused rats temporarily to run a maze more slowly than normal while unexpected increases caused a temporary elevation in running speed - effects Crespi referred to as depression and elation!
You'd see a Tolman's maze experiment
image here if you were using a graphical web browser like Mosaic 
or Netscape.
At the same time as this work was appearing in the USA the Polish psychologists Konorski and Miller began the first cognitive analyses of classical conditioning - the forerunners of the work of Rescorla, Wagner, Dickinson and Mackintosh which I described earlier. In Germany Wolfgang Koehler was studying insight and observation as mechanisms of learning in Chimps. All work which was quite problematic for behaviourism.

Operant Conditioning.

In 1938 Burrhus Friederich Skinner published what was arguably the most influential work on animal behaviour of the century 'The Behavior of Organisms'. In the interim it had been shown that Tolman's results were sensitive to factors like the openness of his maze - if the rats could not see stimuli outside the maze they did not make appropriate choices when it was blocked, suggesting that they may have learned many stimulus response associations in different parts of the maze, perhaps in sequence, rather than having internalised a map of it. Skinner resurrected the law of effect in more starkly behavioural terms and provided a technology which allowed sequences of behaviour produced over a long time to be studied objectively - a great improvement on the individual learning trials of Watson and Thorndike. Skinner's formulation of operant conditioning was based around the contingencies between three types of event he termed the discriminative stimulus (SD), the operant response (R) and the reinforcing stimulus (SR). The operant response is some behaviour which, if it is followed by a reinforcing stimulus, comes to occur more frequently. The discriminative stimulus is another neutral stimulus which is present when the contingency between the operant response and reinforcement is true - it serves to discriminate the conditions under which the operant response will be made. Skinner did not believe that operant conditioning was the result of stimulus-response learning - for Skinner the basic association in operant conditioning was between the operant response and the reinforcer, the discriminative stimulus served to signal when this association would be acted upon. Skinner's great technological contribution was the operant test chamber or Skinner box. The interior of a Skinner box typically contain a lever which an animal can press, a stimulus light and a place in which a reinforcer like food can be delivered. The animal's presses on the lever can be detected and recorded and a contingency between these presses, the state of the stimulus light and the delivery of reinforcement can be set up, all automatically. The lever allows the experimenter to measure the production of an operant - lever-pressing. The stimulus light serves as a discriminative stimulus and the food as a reinforcer. It is also possible to deliver other reinforcers such as water or to deliver punishers like electric shock through the floor of the chamber. Other types of response can be measured - nose-poking at a moving panel, or hopping on a treadle - both often used when testing birds rather than rats. It is also common to use more than one lever, discriminative stimulus or type of reinforcer in a Skinner box. The flexibility of this technology produced a flood of work on operant-conditioning for nearly 50 years (some is still going on today, but not nearly as much as at the time of the peak of interest in the 50s and 60s). We will discuss the techniques and results of operant conditioning experiments next week and use them to try and understand what is learned in intrumental learning.

Sources.

As usual I've drawn heavilly on Bob Boakes' 'From Darwin to Behaviourism' for the historical material. The classic 'Theories of learning.' by G.H. Bower and E.R. Hilgard (my copy is the 5th edition from 1981, Englewood Cliff, NJ: Prentice-Hall - the first edition was published in 1948) has very good discussions of the differences between Thordike and Watson's theoretical positions and the status of Tolman's work. There is also a little in the introductory chapter of Schwartz about Thorndike's early work.