Multi-Armed Bandit Task

AKA: Explore/Exploit Trade-Off Task,Four Armed Bandit Task,Two Armed Bandit Task

Licensing: Included with an Inquisit license.

Categories:

Behavioral Economics Decision Making Games Learning

Background

Multi-Armed Bandit Tasks are behavioral paradigms designed to study how humans make decisions under uncertainty. They get their name from a hypothetical gambler deciding which lever to pull on a row of slot machines, notoriously known as "one-armed bandits". The core purpose of these tasks is to measure the explore-exploit trade-off: the constant, everyday dilemma between sticking with a known, rewarding option (exploitation) versus trying new or uncertain options in the hope of finding something better (exploration).

In a typical experiment, participants are seated before a computer and presented with two or more options (e.g., slot machines, cards, or doors). Each option has an initially unknown and hidden probability of yielding a reward (like winning points or money). Participants are instructed to maximize their total rewards over a fixed number of trials. They must test different options to figure out which one pays out the best, and then repeatedly choose that favorite option to earn the most points, while occasionally checking other options which are designed to fluctuate their profits.

Millisecond offers two different Armed Bandit Paradigms:

the classic Four Armed Bandit Task by Nathaniel Daw and colleagues (2006)
the Two Armed Bandit Task by W. Bradley Knox and colleagues (2012)

Select each task for additional background information on each of the paradigms.

Test Variations

Four Armed Bandit Task

Manual

A decision making game in which participants tradeoff pursuing known resources vs exploring ones as described in Daw et al (2006).

Duration: 36 minutes

(Requires Inquisit Lab)

(Run with Inquisit Web)

Last Updated

English (English)

Run Demo

Jun 4, 2026, 11:46PM

Hebrew (עברית)

Run Demo

Sep 2, 2025, 10:22PM

Four Armed Bandit Task - keyboard

Manual

The Four-Armed Bandit Task with keyboard input

Duration: 36 minutes

(Requires Inquisit Lab)

(Run with Inquisit Web)

Last Updated

English (English)

Run Demo

Jun 4, 2026, 11:46PM

Two Armed Bandit Task

Manual

A decision making game in which participants tradeoff pursuing one known resource vs exploring one new resource as described in Knox et al (2012).

Duration: 35 minutes

(Requires Inquisit Lab)

(Run with Inquisit Web)

Last Updated

English (English)

Run Demo

Jun 15, 2026, 3:29PM

References

Search Google Scholar for peer-reviewed, published research using the Inquisit Multi-Armed Bandit Task.

Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. (2006). Cortical substrates for exploratory decisions in humans. Nature. 2006 June 15; 441(7095): 876–879.

Knox, W.B., Otto, A.R., Stone, P. & Love, B.C. (2012). The nature of belief-directed exploratory choice in human decision-making. Frontiers in Psychology, Volume 2, Article 298, 1-12 .

Blanco, Love, Cooper, Mcgeary, Knopik, & Maddox. (2015). A frontal dopamine system for reflective exploratory behavior. Neurobiology of Learning and Memory, 123, 84-91.

Cogliati Dezza, I., Yu, A., Cleeremans, A., & Alexander, W. (2017). Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep, 7(1), 16919.

Warren, C., Wilson, R., Giltay, E., Van Noorden, M., Cohen, J., & Nieuwenhuis, S. (2017). The effect of atomoxetine on random and directed exploration in humans. PLoS One, 12(4), E0176034.

M A Addicott, J M Pearson, M M Sweitzer, D L Barack, & M L Platt. (2017). A Primer on Foraging and the Explore/Exploit Trade-Off for Psychiatry Research. Neuropsychopharmacology, 42(10), 1931-1939.

Gershman, S. (2018). Deconstructing the human algorithms for exploration. Cognition, 173, 34-42.