In my last post, I discussed the concept of probability matching in choice. A simple case of probability matching is one in which you are given a choice between two cups, only one of which will have an M&M in it. If the M&M is in the left cup 80% of the time, and the right cup 20% of the time, you can maximize the number of M&Ms you are likely to receive by always selecting the left cup. That way, you will get M&Ms 80% of the time. People (and most other animals) tend not to make this optimal response. Instead, we tend to probability match. That is, if the left cup has the M&M 80% of the time, then we pick the left cup 80% of the time.
There are many reasons why we carry out this behavior, and a paper by Wolfgang Gaissmaier and Lael Schooler in the December, 2008 issue of Cognition suggests a new reason. They find that people who tend to probability match are better able to detect changes in the environment than people who find the option that is most highly rewarded and stick with it.
One way to think about this is that there is always a tradeoff between exploring the world and exploiting it. Exploration is the process of searching for new things. The potential benefit of exploration is that you may discover rich new sources of reward. The danger of exploration is that you may spend a lot of time and energy and come up empty handed. Exploitation is the process of drawing rewards from the world in known places. The benefit of exploitation is that you have a good idea of what you are going to get. The danger is that you may miss out on other opportunities that are more rewarding than the one you are currently exploiting.
This exploration exploitation tradeoff occurs in almost every facet of our lives. If you watch the same TV show routinely, you are exploiting that show. If you sample different restaurants in the town you live in, you are exploring. If you play a musical instrument and stick to the same set of songs you have already learned, you are exploiting. If you deliberately schedule your vacations so that you always visit new places, you are exploring.
It has always been somewhat puzzling that people continued to explore in experiments demonstrating probability matching. The optimal behavior is to exploit the option that pays off most often. And narrowly within the context of the experiment, it is true that exploiting the best option in the study is the best thing to do. However, the world is dynamic. Things in the world change. A restaurant that used to be terrible might get a new chef and suddenly be excellent. A TV show that started off edgy and radical may slip into mediocrity.
If you evaluate the world at one time and then exploit after that, you run the risk of missing changes in the world. The cognitive system is structured to find a reasonable way to resolve the tradeoff between exploration and exploitation. If there is one option that is far superior to all of the others, then you will tend to pick it most of the time and to select other options every once in a while, just to make sure that the world hasn’t changed radically. If one option is only slightly better than another, then you sample the better option only slightly more often than the worse one. That behavior is useful, because a small decrease in the quality of the better option (or increase in the quality of the worse option) could flip the relative goodness of the options. And because you are doing a good job of managing the tradeoff between exploration and exploitation, you will notice. So, the unpredictability of human behavior really is a virtue.