Learning Pt3 PBS5

Home

Get App

Create

Define instrumental conditioning.
- A change in behaviour (learning) caused by a causal relationship between the behaviour and a biologically important stimulus (reinforcer).
- (Process whereby organisms learn to make responses in order to obtain or avoid important consequences)
Outline the difference between instrumental and Pavlovian conditioning.
- Pavlovian: response is a reflex produced in expectation of an outcome
- Instrumental: response is instrumental in receiving/determining the outcome
Who first demonstrated instrumental conditioning? How?
- Thorndike (1911)
- Cats could learn to press lever to get out of a cage
- Over time, the time taken for them to escape became shorter and shorter
From this experiment, what did Thorndike observe? What law?
- Law of Effect
- Of the several responses made to same situation (lying down, meowing, pulling lever) those which were accompanied or closely followed by satisfaction will be more firmly connected to the situation.
[Principles of reinforcement] In Pavlovian conditioning, irrespective of whether reinforcer is appetitive or aversive, conditioning results in increase in vigour of response. What are the different types of outcomes in instrumental conditioning?
- Positive reinforcer: response followed by pleasant outcome - probability of response increases
- Negative reinforcer: response followed by removal of unpleasant stimulus - response increases
- Punisher: respose is followed by unpleasant outcome (positive) or response is followed by removal of pleasant stimulus (negative). Responses decrease.
What are the 2 different schedule types that arrange response-outcome relationships?
- Ratio: presence of reinforcer depends on number of responses
- Interval: presence of reinforcer depends on how much time has passed since the last reinforcer (irrespective of number of responses)
- Both these can be fixed or variable. (VR = variable ratio; FI = fixed interval).
Ratio schedule or Interval - which is more common in nature?
- Interval ratio is more similar to what happens in nature
- because in nature, reinforcers are often organic and organic resources deplete and only replete after a certain time has elapsed so that there is a limit on rate of reinforcement.
Describe the study that compared the performance on two types of reinforcement schedule.
- Matthews et al (1977)
- Rewarded lever pressing with monetary reward on VR schedule
- every time reward was delivered on VR schedule, a reward became availble for another participant- generating a yoked variable interval (VI) schedule with similar rate of reinforcement
- VR ratio schedule yielded much higher rate of responding than yoked VI
- (probably because of awareness that delivery depends on response - VI participants will only get really high response rates when enough time elapses that they expect another reward soon)
In his '__ __ __', Thorndike proposed that reinforcers strengthen an association between __ present when response is performed and the ___ itself. __-__ (__), thereby establishing a __ whereby __ provoked ___.
- law of effect
- stimuli
- response
- stimulus-response (S-R)
- habit
- stimulus
- response
Whose study showed how there is a differece in brain areas activated depending on Pavlovian or Instrumental conditioning (habit). Describe the study.
- O'Doherty et al (2004)
- Participants touch one of two simultaneously presented visual stimuli - one of them yields fruit juice reward
- Other participants just got yoked pairings of stimuli with outcome without making choice response
- fMRI: showed activity in dorsal striatum ONLY during instrumental learning (first group of participants), and not for Pavlovian
- Both groups showed activation in ventral striatum.
According to __-__ accounts, instrumental learning is supported by both a __-__ and a __ system.
- dual-system
- goal-directed
- habitual
What is the limitation/difference of S-R reinforcement compared to goal-directed learning/behaviour?
- S-R reinfrcement establishes instrumental habits that are not mediated by knowledge of the outcome of response
- So, it doesn't allow selection of instrumental action based on agent's current goal
- Goal-directed - representation of a causal relationship between action and outcome + representation of current incentive value of outcome.
Describe the study which showed how one can discriminate between instrumental action that is an S-R habit or goal-directed action.
- Klossek et al (2008)
- Outcome devaluation paradigm
- Train children to touch one icon for one cartoon clip and another for another clip
- After this, one of the cartoons devaluated by prolonged exposure inducing boredom
- Then, test the choice between touching two icons in extinction without cartoons
- If original training simply established stimulus-response habits with no knowledge of outcome, then children should perform both responses equally during extinction test
- But, if they learned which cartoon was produced by each response, they should show devaluation effect and should touch the icon with the more valuable outcome during the test.
- Older children (>27months) showed outcome devaluation effect
- Younger children (<27 months) performed both responses equally
- Importantly, absence of devaluation effect in younger children wasn't because of failure of cartoon devaluation - two other control groups (younger and older) and the younger children chose response producing valued outcome when outcomes were still available rather than extinction.
Describe the study showing neural correlates of goal-directed and habit learning. For this slide, first just tell me how they managed to isolate one type of learning from another.
- De Wit et al (2009)
- Task: learn association between a stimulus printed on box and the outcome inside - after being presented with stimulus, participant must press either right or left button.
- 3 types of discrimination:
- 1. Control - stimulus and outcome different
- 2. Congruent - stimulus and outcome are same
- 3. Incongruent - stimulus of one problem is the same as outcome in the other
- The incongruent discrimination relies solely on the habit system, thus allowing isolation of different parts of brain used for each type of learning.
- Goal-directed learning cannot work in the incongruent task because it will activate S-O-R of one chain and O-R of another.
- eg. cherry(S) -> pear (O) -> right (R) AND pear (S) -> cherry (O) -> left (R)
- In this case, if participant is presented with cherry stimulus, it will activate S-O-R chain where they will press right button, but will also activate O-R (cherry->left) which will cause a left button response.
De Wit confirmed that the incongruent conditions engaged only the S-R habit response by doing what?
- Outcome devaluation phase
- 2 open boxes with outcomes inside presented
- one box had cross, showing outcome no longer produced points (while other still had food)
- Prediction: accuracy maintain on control and congruent conditions because goal-directed learning is affected by outcome devaluation but accuracy decrease on incongruent condition.
- This is exactly what was found, confirming that the incongruent condition engaged S-R habit response.
What did De Wit's scanning results show about the neural correlates for each of the learning types then?
- Goal-directed: greater ventro-medial PFC activation
- Habit: greater dorso-medial PFC activation
- During devaluation phase: ventral medial PFC activation higher in goal-directed control trials. (Positive correlation between accuracy in outcome devaluation test and vmPFC activation).
Describe the study which suggested that a particular brain region is sensitive to incentive value of reinforcers.
- Valentin et al (2007) (fMRI study)
- 1. Participants first learned association between visual stimulus and food reward (chocolate or tomato) with probability of .4.
- Choosing the other visual stimulus (out of 2) would produce common outcome (orange juice) of .3
- So, one action will produce food reward of .7 and the other .3
- Also a control trial with tasteless liquid reward (neutral)
- Result: as number of blocks/trials increase -> the more number of high probability choices for rewarding outcomes (but stayed same for neutral)
- ALSO - medial orbitofrontal cortex (part of vmPFC) activation in rearding outcome.
- 2. After training in scanner, participants consumed one outcome to satiation (to the full) decreasing the incentive value of the outcome
- Then they did the same instrumental choice procedure
- Result: after devaluation by satiation, greater medial & central PFC activity in valued condition (compared to devalued)
- These two scanning results suggest that this region (vmPFC area) is sensitive to the incentive value of reinforcers.

Author

master.director2

318026

Card Set

Learning Pt3 PBS5

Description

Lec3 - Instrumental conditioning

Updated

2016-03-28T16:58:23Z

Show Answers