Saturday, December 3, 2016

Monty Hall Problem - When Perception Does Not Align with Reality

Monty Hall Problem

This post is a part in my series on Statistics and Probability topics from the book Think Stats of Professor Allen B. Downey.

I admit. I have a weird relationship with Statistics and Probability. By weird, I mean that I love to get to know them, and they love to be strangers. Now, I will not blame any anything or anyone for my knowledge gap. That being said, I must say that an appropriate approach to this interesting and important subject will take us much further with much less resistance.

On my journey to find this approach, I came across the interesting book “Think Stats” of Prof. Allen B. Downey, Franklin W. Olin College of Engineering. The hypothesis of this book is that programming can be an excellent vessel to bring a person into the world of Statistics and Probability. It cannot replace a formal Statistics course, but it will make this formal course much more accessible. And more important, it makes Statistics tangible. I will write an article about this book in near future. Today, I will discuss about an interesting problem presented in this book that makes me reconsider the way we perceive the world.

Monty Hall Problem

Monty Hall problem is a probability puzzle based on an American game show called “Let’s Make a Deal”. This puzzle is presented in chapter 5 of Think Stats. It can be summarized as follow:

The goal of the game is selecting the door with the hidden car. The player is presented with 3 doors. The car is hidden behind one door, while junks are hidden behind the remaining ones. After the player selecting a door, Monty Hall (i.e., the host of the game show) opens one of the remaining door, which always does not have the car. Now, the player has to choice either to switch or to stay with the selected door. Afterward, Monty Hall opens the remaining game and reveals the final result.

The question of the Monty Hall Problem is whether the player should switch, or stay with the selected door.

Perception

It is simple. We have two choices: our selelected door, and the remaining door. The car must be behind one of these two doors, so each door has 50% chance of having the car. Therefore, the decision to stay or to switch is meaningless.

When we are presented with these choices, I think we also “feel” that behind every close door, their is either a wonderful new sport car, or a pile of junks, so the chance of having a car when opening each door is 50%. And, therefore, the decision to stay or to switch is meaningless.

These views are natural and straightforward. Are they correct?

Monte Carlo Simulation

Instead of getting into an endless debate, let’s have a look at a simulation first. As suggested by the book, I wrote a simple simulation the game in Python. In this simulation, the car is placed randomly behind a door, picked from a uniform distribution. Without losing generality, I let the player pick door 1. Depending on the actual position of the car, Monty Hall opens a door, which is always empty. Depending on the given parameter into the simulation, the player can either stick with door 1 or switch to the remaining door. A boolean value is returned by the end of the simulation, reflecting the result of the game.

def MontyHall(switch, startDoor, verbose = False):
    """Play a Monty Hall game.
    Args:
        switch = indicate the decision to switch to the other gate
        startgate = the indicator of the selected gate of a user
    Returns:
        True/False = Win/Lose
    """
#    Initialize the game and put the car behind a random door.
    carDoor = random.randrange(0,3)
    selectedDoor = startDoor
    remainingDoors = [i for i in [0,1,2] if i != selectedDoor]
    
    if verbose:
        print("Start Door:%s - Car Door:%s - Switch:%s - Remaining Doors:%s" % (startDoor, carDoor, switch, remainingDoors))

    if startDoor < 0 or startDoor > 2: return None
    if selectedDoor == carDoor:
        if switch == True: selectedDoor = remainingDoors[random.randrange(0,2)]
    else:
        if switch == True: selectedDoor = carDoor
    
    if selectedDoor == carDoor:
        if verbose:
            print("selectedDoor:%s => Won!" % selectedDoor)
        return True
    else:
        if verbose:
            print("selectedDoor:%s => Lose!" % selectedDoor)
        return False

The simulation is repeated a large number of time to estimate the winning probability of staying and switching.

def MontyHallSim(startDoor, verbose = False, ite = 10000):
    winProbNoSwitch = 0.0
    winProbSwitch = 0.0
    win = 0
    for i in range(0,ite):
        if MontyHall(False, startDoor, verbose):
            win += 1
    winProbNoSwitch = win / float(ite)
    print("No Switch: Win:%s - Prob:%s" % (win, winProbNoSwitch))
    win = 0
    for i in range(0,ite):
        if MontyHall(True, startDoor, verbose):
            win += 1
    winProbSwitch = win / float(ite)
    print("Switch: Win:%s - Prob:%s" % (win, winProbSwitch))
    print("Relative risk: switch/no_switch = %s" % (winProbSwitch/winProbNoSwitch))

And here is the result:

After 10000 iterations:
No Switch: Win:3381 - Prob:0.3381
Switch: Win:6568 - Prob:0.6568
Relative risk: switch/no_switch = 1.94262052647

Surprisingly, we are twice more likely to win the car if we switch to the remaining door.

Why?

Analytical Solution

There was nothing wrong with the simulation, as far as I am aware. Nor there is anything wrong with the conclusion. Repeated simulations are independent from each other, and their results are identically distributed. Therefore, from a frequentism point of view, the resulting propability is perfectly valid.

In fact, using from a Bayesian point of view, we can confirm the simulation result analytically. Let’s denote three doors of the game as A, B C, and let CA,CB,CC be the events that the car is behind door A, B, or C, respectively. Assume that the player selects door A at the beginning, and Monty Hall opens door B. Let MB be the event that Monty Hall opens door B. We want to compare the probability of having the car behind the door A and the door C, given the evidence that Monty Hall opened door B.

Probability of having the car behind door C, given the evidence from Monty Hall is calculated as follow:
P(CC|MB)

Probability of having the car behind door A, given the evidence from Monty Hall is:
P(CA|MB)

Probability of having a car behind each door, prior to any evidence is 1/3. Also, because there are always two doors left, the probability of having Monty Hall to open any of two doors, without considering the actual location of the car is 1/2. The tricky part is the likelihood (i.e., probability that Monty Hall opens a door if he knows the location of the car). If the car is behind door C, Monty Hall has to open door B, so the probability is 1. On the other hand, if the car is behind door A, Monty Hall can pick the remaining doors at random, so this probability is 1/2.

As a result, because Monty Hall always know the exact position of the car, the probability of having car of each door changes after he opens one of them. It should be noted that this result is only valid if Monty Hall knows the exact position of the car. Otherwise, P(MB|CC)=P(MB|CA).

Another explanation from Numberphile is also interesting. It can be seen that in the beginning, the probability of each door having the car is 1/3. However, after the door B is eliminated, its probability is “pushed” to the remaining door C. In order word, door C now holds the probability of two doors combined.

Implication

Despite the confirmation of both analysis and simulation, I admit that it is still difficult to agree with this valid solution. No matter what the fancy mathematics say, when facing a closed door with a serial killer (possibly) standing behind it, my mind will always tell me that the chance of dying is 50%. I guess our perception is much less accurate than our expectation. Therefore, whenever I see a pattern, a cluster, or I am going to make a remark that an event is “likely” or “impossible”, I guess I should ask statistics first.

But then, again, there are “lies, dammed lies, and statistics”.

No comments:

Post a Comment