Multi-Armed Bandit Task
AKA: Explore/Exploit Trade-Off Task,Four Armed Bandit Task,Two Armed Bandit Task
Background
Multi-Armed Bandit Tasks are behavioral paradigms designed to study how humans make decisions under uncertainty. They get their name from a hypothetical gambler deciding which lever to pull on a row of slot machines, notoriously known as "one-armed bandits". The core purpose of these tasks is to measure the explore-exploit trade-off: the constant, everyday dilemma between sticking with a known, rewarding option (exploitation) versus trying new or uncertain options in the hope of finding something better (exploration).
In a typical experiment, participants are seated before a computer and presented with two or more options (e.g., slot machines, cards, or doors). Each option has an initially unknown and hidden probability of yielding a reward (like winning points or money). Participants are instructed to maximize their total rewards over a fixed number of trials. They must test different options to figure out which one pays out the best, and then repeatedly choose that favorite option to earn the most points, while occasionally checking other options which are designed to fluctuate their profits.
Millisecond offers two different Armed Bandit Paradigms:
- the classic Four Armed Bandit Task by Nathaniel Daw and colleagues (2006)
- the Two Armed Bandit Task by W. Bradley Knox and colleagues (2012)
Select each task for additional background information on each of the paradigms.
Test Variations
A decision making game in which participants tradeoff pursuing known resources vs exploring ones as described in Daw et al (2006).
The Four-Armed Bandit Task with keyboard input
A decision making game in which participants tradeoff pursuing one known resource vs exploring one new resource as described in Knox et al (2012).
References
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. (2006). Cortical substrates for exploratory decisions in humans. Nature. 2006 June 15; 441(7095): 876–879.
Knox, W.B., Otto, A.R., Stone, P. & Love, B.C. (2012). The nature of belief-directed exploratory choice in human decision-making. Frontiers in Psychology, Volume 2, Article 298, 1-12 .
Blanco, Love, Cooper, Mcgeary, Knopik, & Maddox. (2015). A frontal dopamine system for reflective exploratory behavior. Neurobiology of Learning and Memory, 123, 84-91.
Cogliati Dezza, I., Yu, A., Cleeremans, A., & Alexander, W. (2017). Learning the value of information and reward over time when solving exploration-exploitation problems. Sci Rep, 7(1), 16919.
Warren, C., Wilson, R., Giltay, E., Van Noorden, M., Cohen, J., & Nieuwenhuis, S. (2017). The effect of atomoxetine on random and directed exploration in humans. PLoS One, 12(4), E0176034.
M A Addicott, J M Pearson, M M Sweitzer, D L Barack, & M L Platt. (2017). A Primer on Foraging and the Explore/Exploit Trade-Off for Psychiatry Research. Neuropsychopharmacology, 42(10), 1931-1939.
Gershman, S. (2018). Deconstructing the human algorithms for exploration. Cognition, 173, 34-42.