# Questions tagged [multi-armed-bandit]

For questions related to the multi-armed bandit (MAB) problem, in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation.

13 questions

**0**

votes

**0**answers

4 views

### Is there a multi-agent version of EXP3?

The EXP3 algorithm as given in the figure below (taken from Adversarial Bandits and the Exp3 Algorithm) is to solve the adversarial bandits for the single-player case. What happens if there are ...

**0**

votes

**1**answer

60 views

### Why am I getting better performance with Thompson sampling than with UCB or $\epsilon$-greedy in a multi-armed bandit problem? [closed]

I ran a test using 3 strategies for multi-armed bandit: UCB, $\epsilon$-greedy, and Thompson sampling. The results for the rewards I got are as follows: Thompson sampling had the highest average ...

**5**

votes

**1**answer

167的浏览量

### How do I recognise a bandit problem?

I'm having difficulty understanding the distinction between a bandit problem and a non-bandit problem. An example of the bandit problem is an agent playing $n$ slot machines with the goal of ...

**1**

vote

**1**answer

25 views

**2**

votes

**2**answers

55 views

### Are bandits considered an RL approach?

If a research paper uses multi-armed bandits (either in their standard or contextual form) to solve a particular task, can we say that they solved this task using a reinforcement learning approach? Or ...

**1**

vote

**1**answer

26 views

### How do we reach at the formula for UCB action-selection in multi-armed bandit problem?

I came across the formula for Upper Confidence Bound Action Selection (while studying multi-armed bandit problem), which looks like: $$ A_t \dot{=} \operatorname{argmax}_a \left[ Q_t(a) + c \sqrt{ \...

**1**

vote

**0**answers

33 views

### How do I determine the optimal policy in a bandit problem with missing contexts?

Suppose I learn an optimal policy $\pi(a|c)$ for a contextual multi-armed bandit problem, where the context $c$ is a composite of multiple context variables $c = c_1, c_2, c_3$. For example, the ...

**2**

votes

**1**answer

91 views

### Solving a Multi-Armed, “Multi-Bandit” Problem

This is the problem: I have 66 slot-machines and for each of them I have 7 possible actions/arms to choose from. At each trial, I have to choose one of 7 actions for each and every one of the 66 slots....

**2**

votes

**1**answer

382 views

### How to implement a contextual reinforcement learning model?

In a reinforcement learning model, states depend on the previous actions chosen. In the case in which some of the states -but not all- are fully independent of the actions -but still obviously ...

**1**

vote

**1**answer

228 views

### Large action set in multi-armed bandits

Suppose one is using a multi-armed bandit, and one has relatively few "pulls" (i.e. timesteps) relative to the action set. For example, maybe there are 200 timesteps and 100 possible actions. ...

**1**

vote

**0**answers

29 views

### Name of a multiarmed bandit with only some levers available

In order to model a card game, as an exercise, I was thinking of an elementary setting as a multiarmed bandit, each lever being the distribution of expected rewards of a specific card. But, of course,...

**2**

votes

**1**answer

59 views

### Programming a bandit to optimize donations

I'm developing a multi-armed bandit which learns the best information to display to persuade someone to donate to charity. Suppose I have treatments A, B, C, D (which are each one paragraph of text). ...

**5**

votes

**1**answer

163 views

### What is a weighted average in a non-stationary k-armed bandit problem?

In the book Reinforcement Learning: An Introduction (page 25), by Richard S. Sutton and Andrew G. Barto, there is a discussion of the k-armed bandit problem, where the expected reward from the bandits ...