Two-Armed Bandit Task

Technical Manual

Script Author: Katja Borchert, Ph.D. (katjab@millisecond.com), Millisecond

Created: January 29, 2018

Last Modified: January 24, 2025 by K. Borchert (katjab@millisecond.com), Millisecond

Script Copyright © Millisecond Software, LLC

Background

This script implements the 2-armed bandit task, a decision making game in which participants tradeoff pursuing a known resource vs exploring a new resource as described in Knox et al (2012).

References

Knox, W.B., Otto, A.R., Stone, P. & Love, B.C. (2012). The nature of belief-directed exploratory choice in human decision-making. Frontiers in Psychology, Volume 2, Article 298, 1-12

Duration

35 minutes

Description

Participants select between 2 options ("bandits") that are tied to different payoff schedules. One option is always worth 10 points more than the other. The pay off schedules change with a certain probability (flipRate) each trial. When they change, 20 points are added to the lesser paid option which flips the relative payoffs. Participants are told that their final reward is tied to the number of times they selected the higher payoff choice. 2 phases: 1. Passive Observation Phase: participants watch the computer select choices for 300 trials (default). Payoff changes are made explicit to participants by presenting a text message on screen. Every 100 trials (starting at trial 200), participants are asked to estimate how many reversals in payoff they expect to observe during the next set of 100 trials 2. Active Game: participants play the game for 300 trials (default)

Procedure

Two phases:
1. Passive Observation Phase: participants watch the computer select choices for 500 trials.
Every 100 trials (starting at trial 200), participants are asked to estimate how many reversals in payoff
they expect to observe during the next set of 300 trials.

Trial Sequence:
choice ("Choose") (2000ms, editable) -> selected choice highlighted and total updated (1000ms, editable)
If the payoff schedule has changed that trial, the choice trial displays "Changed" in red font


2. Active Game Phase: participants plays the game for 300 trials

Trial Sequence
choice ("Choose") (max.1500ms, editable) -> if no response: feedback and trial gets repeated until a response is made->
point update (1000ms, editable) -> etc.

The default fliprate is 0.075 (editable) => each trial there is a 7.5% chance that the payoffs flip
(so you could expect to see 7.5 flips every 100 trials)

• in this script, the fliprate is a true probability.
• the first trial cannot be a flip trial

Stimuli

provided by Millisecond - can be edited under section Editable Stimuli

Instructions

provided by Millisecond - can be edited under section Editable Instructions

Summary Data

File Name: twoarmedbandittask_summary*.iqdat

Data Fields

NameDescription
inquisit.version Inquisit version number
computer.platform Device platform: win | mac |ios | android
computer.touch 0 = device has no touchscreen capabilities; 1 = device has touchscreen capabilities
computer.hasKeyboard 0 = no external keyboard detected; 1 = external keyboard detected
startDate Date the session was run
startTime Time the session was run
subjectId Participant ID
groupId Group number
sessionId Session number
elapsedTime Session duration in ms
completed 0 = Test was not completed
1 = Test was completed
flipRate Parameter: p(flip) = the fixed probability with which the payoff schedules will switch and the former
lower payoff choice will be the higher payoff choice until the next flip
meanFlipRateEstimate The estimate flip rate based on the mean Number of Flips over 100 trials (PHASE 1)
meanFlipEstimate The mean estimate of flips over 100 trials (PHASE 1)
sumFlipEstimate The total number of estimated flips for 400 trials (phase 1) (PHASE 1)
countFlipsPractice The total number of actual flips for 400 trials (phase 1) (PHASE 1)
propHigherPayOffs Proportion of selecting the higher payoff option (PHASE 2)
propExploit Proportion of selecting the option with the known higher payoff (PHASE 2)
( at the time of making the choice the participant does not know whether a flip occurred (PHASE 2)
and can only use the information gathered up to this point)
propExplore Proportion of selecting the option with the known lower payoff (PHASE 2)
( at the time of making the choice the participant does not know whether a flip might have occurred
and can only use the information gathered up to this point)

Raw Data

File Name: twoarmedbandittask_raw*.iqdat

Data Fields

NameDescription
build Inquisit version number
computer.platform Device platform: win | mac |ios | android
computer.touch 0 = device has no touchscreen capabilities; 1 = device has touchscreen capabilities
computer.hasKeyboard 0 = no external keyboard detected; 1 = external keyboard detected
date Date the session was run
time Time the session was run
subject Participant ID
group Group number
session Session number
blockcode The name the current block (built-in Inquisit variable)
blocknum The number of the current block (built-in Inquisit variable)
trialcode The name of the currently recorded trial (built-in Inquisit variable)
trialnum The number of the currently recorded trial (built-in Inquisit variable)
trialnum is a built-in Inquisit variable; it counts all trials run
even those that do not store data to the data file.
In this script trial.selectBandit and trial.summary store data to the data file.
trial.summary stores the comprehensive summary of all relevant values at the end of each
'successful' (= a choice was made) trial.)
flipRate Parameter: p(flip) = the fixed probability with which the payoff schedules will switch and the former
lower payoff choice will be the higher payoff choice (until the next flip)
countSelections Counts the number of choices A or B made (excludes trials with no responses)
attempts Running total of attempts for the current trial
( if participant takes too long to decide, the current trial terminates and is repeated after feedback.
Those repeats are called 'attempts' in this script)
flip 1 = a flip of the pay-offs occurred during this trial; 0 = no payoff changes occurred
countFlipsPractice Counts the number of flips
valueA Stores the value of choice A ( changes in values after flips are stored in trial.summary only)
valueB Stores the value of choice B ( changes in values after flips are stored in trial.summary only)
trial.selectBandit: values present the pre-flip values (preflip values are used to assess whether participants selected the known higher payoff)
trial.summary: present the post-flip values
higherPayOff Stores the choice with the higher payoff ("A" or "B")
! this variable is only updated in trial.summary, see valueA and valueB
selectionRT Stores the response time (in ms) of selecting the last response; measured from start of last trial.selectBandit
previousArm Stores the previously (preceding trial) selected choice ("A" or "B")
response The participant's response during current trial
trial.selectBandig: 18 = E; 23 = I or 0 (no response)
trial.summary: 0 = no response
selectedArm Stores the selected choice ("A" or "B")
selectedHigherPayOff 1 = participant selected the currently higher payoff
(note: if there was a payoff flip, participant wasn't aware of the flip at time of making response)
0 = otherwise
choice 0 = no choice made
1 = exploitative choice (the choice selected was the one attached to the highest seen payoff up to this point)
2 = exploratory choice (participant selected the other choice)
highestSeenPayOff Stores the current (at time of data saving) highest payoff that participant has seen
trial.selectBandit: stores the highestSeenPayOff from perspective of participant at time of making response
trial.summary: stores the highestSeenPayOff from perspective of participant at the end of the trial (after totalpoints give feedback)
( the highestSeenPayOff can change from trial.selectBandit to trial.summary depending on whether or not
there was a payoff flip)
consecutiveSameChoice Running total of selecting the same choice consecutively (resets after switch)
selectionChange 1 = participant changed selection
0 = otherwise
totalPoints Stores the current total points earned

Parameters

The procedure can be adjusted by setting the following parameters.

NameDescriptionDefault
flipRate P(flip) = the fixed probability with which the payoff schedules will switch and the former
lower payoff choice will be the higher payoff choice until the next flip (default: 0.075)
in order for the script to run without error messages, make sure that flipRate*1000 results in an integer number
0.075
startA The start value of option A 10
startB The start value of option B 20
nrDemoTrials The number of passive demo trials 300
nrTestTrials The number of active test trials
Knox et al (2012) run 500 trials in each phase
300
demoSelectionTime The time (in ms) it takes computer to make a choice during the passive demo trials 2000
demoResponseFeedback The feedback duration (in ms) during the passive demo trials 1000
readyDuration The duration (in ms) of the get-ready trial 3000
selectionTime The response timeout (in ms) of making a selection during a test trial 1500
feedbackDuration The duration (in ms) of the feedback during a test trial 1000
iti Inter trial interval (in ms)1000
leftKey The left response button - this key is attached to option A"E"
rightKey The right response button - this key is attached to option B"I"