- 블루투스 키보드
Multi-armed bandit problem 본문
Multi-armed bandit problem모릅니다 2011.01.07 02:35
레버가 여러 개 달린 슬롯머신을 말하는구나. bandit 이 뭔가 했네…
각 레버마다 다른 이득을 얻을 수 있고, 초기에는 어떤 값이 나올지 모르는데 하면 할 수록 시스템을 알아가게 되기 때문에
Exploration & Exploitation 측면으로 많이 접근한다.
위키피디아 링크는 아래와 같음.
1952년에 Herbert Robbins 가 Fomulate 했다.
Common bandit strategies 는 아래와 같다는데, 이건 좀 봐야 알겠네.
Common bandit strategies
Many strategies exist which provide an approximate solution to the bandit problem, and can be put into the three broad categories detailed below.
 Semi-uniform strategies
Semi-uniform strategies were the earliest (and simplest) strategies discovered to approximately solve the bandit problem. All those strategies have in common a greedy behavior where the best lever (based on previous observations) is always pulled except when a (uniformly) random action is taken.
- Epsilon-greedy strategy: The best lever is selected for a proportion 1 − ε of the trials, and another lever is randomly selected (with uniform probability) for a proportion ε. A typical parameter value might be ε = 0.1, but this can vary widely depending on circumstances and predilections.
- Epsilon-first strategy: A pure exploration phase is followed by a pure exploitation phase. For N trials in total, the exploration phase occupies εN trials and the exploitation phase (1 − ε)N trials. During the exploration phase, a lever is randomly selected (with uniform probability); during the exploitation phase, the best lever is always selected.
- Epsilon-decreasing strategy: Similar to the epsilon-greedy strategy, except that the value of ε decreases as the experiment progresses, resulting in highly explorative behaviour at the start and highly exploitative behaviour at the finish.
- Adaptive epsilon-greedy strategy based on value differences (VDBE): Similar to the epsilon-decreasing strategy, except that epsilon is reduced on basis of the learning progress instead of manual tuning (Tokic, 2010). High changes in the value estimates lead to a high epsilon (exploration); low value changes to a low epsilon (exploitation).
 Probability matching strategies
Probability matching strategies reflect the idea that the number of pulls for a given lever should match its actual probability of being the optimal lever.
 Pricing strategies
Pricing strategies establish a price for each lever. The lever of highest price is always pulled.