The flashcards below were created by user
master.director2
on FreezingBlue Flashcards.
-
Define instrumental conditioning.
- A change in behaviour (learning) caused by a causal relationship between the behaviour and a biologically important stimulus (reinforcer).
- (Process whereby organisms learn to make responses in order to obtain or avoid important consequences)
-
Outline the difference between instrumental and Pavlovian conditioning.
- Pavlovian: response is a reflex produced in expectation of an outcome
- Instrumental: response is instrumental in receiving/determining the outcome
-
Who first demonstrated instrumental conditioning? How?
- Thorndike (1911)
- Cats could learn to press lever to get out of a cage
- Over time, the time taken for them to escape became shorter and shorter
-
From this experiment, what did Thorndike observe? What law?
- Law of Effect
- Of the several responses made to same situation (lying down, meowing, pulling lever) those which were accompanied or closely followed by satisfaction will be more firmly connected to the situation.
-
[Principles of reinforcement] In Pavlovian conditioning, irrespective of whether reinforcer is appetitive or aversive, conditioning results in increase in vigour of response. What are the different types of outcomes in instrumental conditioning?
- Positive reinforcer: response followed by pleasant outcome - probability of response increases
- Negative reinforcer: response followed by removal of unpleasant stimulus - response increases
- Punisher: respose is followed by unpleasant outcome (positive) or response is followed by removal of pleasant stimulus (negative). Responses decrease.
-
What are the 2 different schedule types that arrange response-outcome relationships?
- Ratio: presence of reinforcer depends on number of responses
- Interval: presence of reinforcer depends on how much time has passed since the last reinforcer (irrespective of number of responses)
- Both these can be fixed or variable. (VR = variable ratio; FI = fixed interval).
-
Ratio schedule or Interval - which is more common in nature?
- Interval ratio is more similar to what happens in nature
- because in nature, reinforcers are often organic and organic resources deplete and only replete after a certain time has elapsed so that there is a limit on rate of reinforcement.
-
Describe the study that compared the performance on two types of reinforcement schedule.
- Matthews et al (1977)
- Rewarded lever pressing with monetary reward on VR schedule
- every time reward was delivered on VR schedule, a reward became availble for another participant- generating a yoked variable interval (VI) schedule with similar rate of reinforcement
- VR ratio schedule yielded much higher rate of responding than yoked VI
- (probably because of awareness that delivery depends on response - VI participants will only get really high response rates when enough time elapses that they expect another reward soon)
-
In his '__ __ __', Thorndike proposed that reinforcers strengthen an association between __ present when response is performed and the ___ itself. __-__ (__), thereby establishing a __ whereby __ provoked ___.
- law of effect
- stimuli
- response
- stimulus-response (S-R)
- habit
- stimulus
- response
-
Whose study showed how there is a differece in brain areas activated depending on Pavlovian or Instrumental conditioning (habit). Describe the study.
- O'Doherty et al (2004)
- Participants touch one of two simultaneously presented visual stimuli - one of them yields fruit juice reward
- Other participants just got yoked pairings of stimuli with outcome without making choice response
- fMRI: showed activity in dorsal striatum ONLY during instrumental learning (first group of participants), and not for Pavlovian
- Both groups showed activation in ventral striatum.
-
According to __-__ accounts, instrumental learning is supported by both a __-__ and a __ system.
- dual-system
- goal-directed
- habitual
-
What is the limitation/difference of S-R reinforcement compared to goal-directed learning/behaviour?
- S-R reinfrcement establishes instrumental habits that are not mediated by knowledge of the outcome of response
- So, it doesn't allow selection of instrumental action based on agent's current goal
- Goal-directed - representation of a causal relationship between action and outcome + representation of current incentive value of outcome.
-
Describe the study which showed how one can discriminate between instrumental action that is an S-R habit or goal-directed action.
- Klossek et al (2008)Outcome devaluation paradigm
- Train children to touch one icon for one cartoon clip and another for another clip
- After this, one of the cartoons devaluated by prolonged exposure inducing boredom
- Then, test the choice between touching two icons in extinction without cartoons
- If original training simply established stimulus-response habits with no knowledge of outcome, then children should perform both responses equally during extinction test
- But, if they learned which cartoon was produced by each response, they should show devaluation effect and should touch the icon with the more valuable outcome during the test.
- Older children (>27months) showed outcome devaluation effect
- Younger children (<27 months) performed both responses equally
- Importantly, absence of devaluation effect in younger children wasn't because of failure of cartoon devaluation - two other control groups (younger and older) and the younger children chose response producing valued outcome when outcomes were still available rather than extinction.
-
Describe the study showing neural correlates of goal-directed and habit learning. For this slide, first just tell me how they managed to isolate one type of learning from another.
- De Wit et al (2009)
- Task: learn association between a stimulus printed on box and the outcome inside - after being presented with stimulus, participant must press either right or left button.
- 3 types of discrimination:
- 1. Control - stimulus and outcome different
- 2. Congruent - stimulus and outcome are same
- 3. Incongruent - stimulus of one problem is the same as outcome in the other
- The incongruent discrimination relies solely on the habit system, thus allowing isolation of different parts of brain used for each type of learning.
- Goal-directed learning cannot work in the incongruent task because it will activate S-O-R of one chain and O-R of another.
- eg. cherry(S) -> pear (O) -> right (R) AND pear (S) -> cherry (O) -> left (R)
- In this case, if participant is presented with cherry stimulus, it will activate S-O-R chain where they will press right button, but will also activate O-R (cherry->left) which will cause a left button response.

-
De Wit confirmed that the incongruent conditions engaged only the S-R habit response by doing what?
- Outcome devaluation phase
- 2 open boxes with outcomes inside presented
- one box had cross, showing outcome no longer produced points (while other still had food)
- Prediction: accuracy maintain on control and congruent conditions because goal-directed learning is affected by outcome devaluation but accuracy decrease on incongruent condition.
- This is exactly what was found, confirming that the incongruent condition engaged S-R habit response.
-
What did De Wit's scanning results show about the neural correlates for each of the learning types then?
- Goal-directed: greater ventro-medial PFC activation
- Habit: greater dorso-medial PFC activation
- During devaluation phase: ventral medial PFC activation higher in goal-directed control trials. (Positive correlation between accuracy in outcome devaluation test and vmPFC activation).
-
Describe the study which suggested that a particular brain region is sensitive to incentive value of reinforcers.
- Valentin et al (2007) (fMRI study)
- 1. Participants first learned association between visual stimulus and food reward (chocolate or tomato) with probability of .4.
- Choosing the other visual stimulus (out of 2) would produce common outcome (orange juice) of .3
- So, one action will produce food reward of .7 and the other .3
- Also a control trial with tasteless liquid reward (neutral)
- Result: as number of blocks/trials increase -> the more number of high probability choices for rewarding outcomes (but stayed same for neutral)
- ALSO - medial orbitofrontal cortex (part of vmPFC) activation in rearding outcome.
- 2. After training in scanner, participants consumed one outcome to satiation (to the full) decreasing the incentive value of the outcome
- Then they did the same instrumental choice procedure
- Result: after devaluation by satiation, greater medial & central PFC activity in valued condition (compared to devalued)
- These two scanning results suggest that this region (vmPFC area) is sensitive to the incentive value of reinforcers.
|
|