Two Armed Bandit Task

Licensing: Included with an Inquisit license.

Categories:

Multi-Armed Bandit Task Behavioral Economics Decision Making Games Learning

Background

The Two-Armed Bandit Task by W. Bradley Knox and colleagues (2012) is a simplified variant of the Multi-Armed Bandit paradigm designed to see exactly how humans plan ahead when rewards are constantly changing. They specifically designed a simpler design with only two instead of four slots (as is used by the Four Armed Bandit Task) as they argue that the 4-arm environment may simply be too complex for people to allow researchers to adequately study human explorative behavior under uncertainty conditions.

In their simplified environment, participants only had to choose between two options: option A and option B. The payoff values of option A and option B were set in such a way that one option was always worth 10 points more than the other but the payoff schedules flipped with a certain probability (e.g. 7.5%) each trial by adding 20 points to the lower paid option (thus 'leapfrogging' the previously higher paid option).

The results of the study support Knox et al's claim that if an environment is too complex (like Daw's 4-armed task), humans seem to default to a somewhat random, value-sensitive guessing to save brainpower. But if the environment is simple enough to mentally manage, the human brain will actively calculate and hunt down uncertainty. But even so, people do not engage in optimal long-term planning but instead choose simply the option on every trial that they believe to have the highest immediate payoff. In contrast, a mathematically programmed 'Ideal' Player would sacrifice known higher values more often in order to eliminate uncertainty that could yield higher payouts more often down the road.

Task Procedure

The two-armed bandit task is divided into two phases: (1) Passive Observation Phase (POP) and (2) Active Game (AG). During the POP, participants simply watch the game for 300 trials while the computer makes choices. Anytime, the payoffs switch (happens with p=7.5%), an alert is presented on screen. Beginning with trial 200, participants are asked to estimate how many reversals in payoff they expect to observe during the next set of 100 trials. During the AG phase, participants actively play the game for 300 trials. Payoff switches are no longer broadcasted to the participants. Participants select option A or option B via keyboard presses. If no response is made within 1500ms, the trial gets repeated.

### What it Measures Armed Bandit Tasks are a measure of adaptive decision-making under uncertainty.

Psychological domains

Decision-making: Response to potential rewards and losses over time
Risk-taking: Preference for high-reward/high risk or low-reward/low-risk
Delay Discounting: Foregoing immediate rewards for better long-term outcomes
Executive Control: The ability of our prefrontal cortex to override automatic, reward-seeking behavior to execute an exploratory choice

Main Performance Metrics

Total: Absolute and relative final payout; measures of 'Reward Maximation'
Proportion of HighestPayOff: Proportion highest payOff option selected; measure of 'Optimal Choice Making'
Exploration Rate: Proportion of times participants selected a new options (relative to all choices made)

Psychiatric Conditions

Armed Bandit Task performance tends to be expressed differently in patients with the following psychiatric conditions.

Substance Use Disorders
Schizophrenia
Major Depressive Disorder
Obsessive-Compulsive Disorder (OCD
Attention Deficit Hyperactivity Disorder (ADHD)

Two Armed Bandit Task

Manual

A decision making game in which participants tradeoff pursuing one known resource vs exploring one new resource as described in Knox et al (2012).

Duration: 35 minutes

(Requires Inquisit Lab)

(Run with Inquisit Web)

Last Updated

English (English)

Run Demo

Jun 15, 2026, 3:29PM

References

Search Google Scholar for peer-reviewed, published research using the Inquisit Two Armed Bandit Task.

Knox, W.B., Otto, A.R., Stone, P. & Love, B.C. (2012). The nature of belief-directed exploratory choice in human decision-making. Frontiers in Psychology, Volume 2, Article 298, 1-12 .