The K-Armed Bandit Problem
The K-Armed Bandit Problem is a classical problem between exploration and exploitation; The machine learning system or agent is presented with multiple options (or K-Arms), and the agent has to select only one option. Hence, it faces a dilemma on whether to continue exploiting the same action that led to the highest reward received until now OR to explore other action choices that might lead to even higher rewards from the environment over the long run.
Exploitation is when we repeat the same action that gave us the highest reward until now.
Exploration is when we, instead of repeating the same action, explore other options to find better actions that would lead to even higher returns from the environment.
Example of a K-Armed Bandit Problem
A typical example of this K-Armed Bandit problem is when you, as an agent, are trying to invest in the stock market. You have all the stock options of all the companies that are currently listed on a stock exchange. Out of these many stocks, let’s say, so far, you have invested in stocks X, Y, and Z. You found that Stock Y gave you the highest return so far.
Say, this month, you have some leftover money and want to invest it in stocks. Now you are in a dilemma between investing in Stock Y or exploring other stocks that might return higher than Stock Y by investing in stocks other than X, Y, and Z.
In the above scenario, you are the agent, the stock exchange is your environment, the K stocks where you are considering investing are the K-Arms, and the ROI you got from an invested stock is your agent reward. Finally, the K-Armed Bandit Problem is the above problem of the dilemma between investing in the same stock Y or exploring other stock options.
Subscribe to my free newsletter to become ML upskilled with more such bite-sized ML articles breaking complex concepts into small, simple consumable bites.