Maximum Entropy and Bayesian Inference for the Monty Hall Problem

We devise an approach to Bayesian statistics and their applications in the analysis of the Monty Hall problem. We combine knowledge gained through applications of the Maximum Entropy Principle and Nash equilibrium strategies to provide results concerning the use of Bayesian approaches unique to the Monty Hall problem. We use a model to describe Monty’s decision process and clarify that Bayesian inference results in an “irrelevant, therefore invariant” hypothesis. We discuss the advantages of Bayesian inference over the frequentist inference in tackling the uneven prior probability Monty Hall variant. We demonstrate that the use of Bayesian statistics conforms to the Maximum Entropy Principle in information theory and Bayesian approach successfully resolves dilemmas in the uneven probability Monty Hall variant. Our findings have applications in the decision making, information theory, bioinformatics, quantum game theory and beyond.


Introduction
The famous Monty Hall problem arises from a popular television game show Let's Make a Deal [1] [2].In the game, a contestant faces three closed doors.One of the closed doors conceals a brand new car, whereas the other two doors conceal goats that are worthless.To start with, the show host, Monty, asks the contestant to choose a door as the initial guess.After the contestant chooses a door, Monty opens another door that conceals a goat.Then, Monty offers the contestant the option to stick with the door selected initially, or switch to the other closed door.An interesting question arises as to what are the advantages and disadvantages of frequentist and Bayesian statistical techniques in incomplete information games such as the Monty Hall problem, and what is the potential impact of these approaches on decision-making in a variety of scientific areas.
As a mathematical problem, it is important to clarify the rules of the game that are not necessarily in parallel to the realistic game show situations.For the Monty Hall problem, Monty is required to open a goat-yielding door not chosen by the contestant under all circumstances [1]- [3].As such, even though the winning opportunity appears to be 50:50 for the two remaining doors, the counter-intuitive answer is that the contestant should switch.The update of probability, conditional upon the constraint that Monty cannot open the door selected by the contestant, is pivotal to recognize the difference between the two closed doors [2].When the contestant selects a door, there is a 1 3 chance of being right and a 2 3 chance of being wrong.After Monty opens a goat-yielding door, the door initially held by the contestant remains the same 1 3 winning probability, whereas the other closed door now has a 2 3 winning chance.Therefore, the contestant should switch to maximize the chance of winning the prize.In general, Bayesian inference indicates that the constraint as mentioned above leads to dividing the probability space into two sets.One set contains the door initially chosen by the contestant with prior probability 0 p , and the other set contains the other doors with probability 0 1 p − .In the end, the decision depends merely on the comparison of 0 p and 0 1 p − since the goat-yielding actions are not supposed to change 0 p .However, this conceivably simple Bayesian reasoning for the Monty Hall problem has been the subject of open debate [1].
The controversy focuses on the correct way of updating information between Bayesian and frequentist approaches to statistics [1].Bayesian inference's assertion that the probability of the initially chosen door remains intact after Monty's goat-yielding action is precisely the point at issue.In fact, the frequentist approach indicates that the probability 0 p may change [1].The frequentist inference entails that Monty makes his choice at random when given a selection of doors to open [1]- [9].Frequentist and Bayesian inferences result in different answers for variants of the problem.For example, what if the prior probability of placing the car behind each door is not equal?What if, given a choice, Monty prefers to open one door over another?Given that the frequentist approach is used exclusively in the literature [1]- [11], it becomes necessary to develop Bayesian approaches for the Monty Hall problem.
In this paper, we discuss and analyze a few variations of the Monty Hall problem to clarify the difference between Bayesian and frequentist inferences.We model the problem as an incomplete information game in which Monty and the contestant have opposing interests [10]- [12].The solution of the Monty Hall game involves the determination of Nash equilibrium strategies, one for the contestant and the other for Monty [13].We employ the Maximum Entropy Principle [14] to devise the optimal strategy for Monty, which corresponds to an "irrelevant, therefore invariant" theorem [1] [13].In contrast, the "coin-flipping" procedure [1] [2] is shown to be unjustified for an uneven variant of the Monty Hall problem.Our findings not only completely resolve the Monty Hall problem and its variants, but also have applications in the decision making, information theory [14], bioinformatics [15]- [17], quantum game theory [11] [12] and beyond.

Bayes' Theorem
Bayes theorem serves as an approach to statistical inference by means of conditional probabilities.Bayes theorem states that for two events C and O, the probability of C given O, ( ) We define , C C , and C C to be the events in which the car is placed behind door P O C q = − ) as a random variable ranging from 0 to 1, reflecting the likelihood that Monty exhibits a bias in favor of selecting door B (or C) if the car is behind door A.
Given that the possibility of finding the car behind door A, B, or C is equally likely, and then we have ( ) ( ) ( ) . We evaluate the probability for Monty to open door B using the law of total probability, Substituting everything into Bayes theorem, we obtain Similarly for Monty opening door C, we have The frequentist inference estimates Monty's decision procedure q by means of long-run successful frequencies [5]- [9], and subsequently determines whether to stick or switch by evaluating As seen from Equation (3) and Equation ( 5), one can infer that switching is optimal for the contestant in that ( ) ( ) Bayesian inference uses expected values to determine the optimal option: Calculated expected values for sticking and switching are independent of q; thereby supporting the argument that rational decision in the single case is not relevant to the degrees of belief about long-run success frequency [4]- [9].For a single game, Monty's goat-yielding action is realized by selecting a discrete value of 1 q = or 0. The frequentist approach typically uses a "coin-flipping" procedure for the bias parameter q.On the other hand, the expected value includes contributions from , then frequentist and Bayesian approaches lead to the same result.However, if e.g., ( ) ( ) , then two approaches yield different answers.To have an in-depth understanding of the differences between frequentist and Bayesian inferences, we employ tools in information theory to calculate the conditional entropy.

Conditional Entropy
In information theory, Shannon entropy is the average amount of information contained in each event.If ( ) . Shannon entropy also measures uncertainty about an event.The less likely an event is, the more information it provides when the event occurs, and vice-versa.
The combined system OC is characterized by a joint probability ( ) , P O C and therefore by a joint entropy is the uncertainty about the entire system.The conditional entropy is defined as where ( ) In the literature of discussing the Monty Hall problem, few have brought up the concept of conditional entropy.We utilize the Maximum Entropy Principle [14] to infer rational decision by Monty from the information theory perspective.The Maximum Entropy Principle [14] refers to finding the distribution that represents the highest uncertainty.In the Monty Hall problem, the principle resembles the premise that Monty can follow a strategy to provide the contestant with minimal information.The use of Maximum Entropy Principle reproduces every aspect of Bayesian inference and demonstrates the compatibility of Bayesian and entropy methods.

Nash Equilibrium
What is the rational response in the case where the host adopts a strategy that minimizes the winning chance of the contestant [11]- [13]?To answer this question, we need to consider the connection between the strategies chosen by both the host and the contestant.In the Monty Hall problem, Monty's decision procedure is characterized by a bias parameter q, and the contestant's decision process is described by a parameter p, which is the probability of sticking.The Nash equilibrium recognizes those combinations of strategies for the contestant and host in which each is making the optimal choice given the choice that the other makes.A Nash equilibrium study allows for a thorough evaluation of not only optimal strategies but also sub-optimal strategies.Nash equilibrium strategies are self-enforcing because either player will be worse off if they deviate from the Nash equilibrium.Our analysis of Nash equilibrium strategies prevails in resolving the controversy over dilemmas appearing in the variants of the Monty Hall problem.

Results and Discussion
Preceding investigations [1]- [11] into the Monty Hall problem focused predominantly on the winning chances of sticking and switching, which arises from differences between free choice (unconditional probability) and restricted choice (conditional probability).Depicted in Figure 1 are results from the so-called free choice and restricted choice scenarios.The free choice refers to the case where Monty may open a goat-yielding door even if it is selected by the contestant [1]- [3].It is then easy to show that the winning chance for sticking or switching strategies becomes 50:50.In contrast, with restricted choice, the winning chance for sticking and switching is 1 3 and 2 3 , respectively [1].
The "coin-flipping" has been used exclusively in the literature [1]- [11].The coin-flipping entails that Monty chooses randomly from his options when door A conceals the car ( ) ( ) , which amounts to 1 2 q = .In other words, Monty may open either B or C, and the approximation requires him to show no preference to either door.With the use of the "coin-flipping", the subsequent probabilities can be extracted from Equation (3) or Equation (5).In either case ( B O or C O ), the winning probability of sticking is 1 3 , and that of switching Figure 2 displays the calculated conditional entropy as a function of the bias parameter, As seen in Figure 2, the maximum entropy of the symmetrical curve is at 1 2 q = , whereas the minimum en- tropy endpoints are at 0 q = or 1 q = .In the absence of information about the selection process used by Monty, expected values of winning the prize for sticking or switching are independent of the process (see Equation (6) and Equation ( 7)), which coincide with the assessed likelihood using the frequentist approach with 1 2 q = .Suppose that Monty chooses 0 q = (or 1 q = ) and informs the contestant accordingly before the game  starts, we have a strongly biased scenario as a variant of the original Monty Hall problem [6].In this situation, there is a 1 3 chance that the contestant knows exactly the location of the car if Monty opens the non-preferred door.In the case that Monty opens the preferred door, the contestant has a 50:50 winning chance for either sticking or switching.It is important to note that there is a difference between the case where Monty informs the contestant of his decision process before a door is chosen and the case in which he informs the contestant after.
In the latter case, Monty has gained additional information, which is not accounted for by frequentist inferences.The frequentist approach entails using a fixed model to the inference [1].Consequently, the "coin-flipping" assumption ( 1 2 q = ) has been used exclusively in the literature for the solution of variants of the Monty Hall problem [1]- [11].However, Monty is not compelled to "coin-flipping".In reality, Monty should adopt strategies to reduce the contestants winning chance.For Bayesian inference, although the partition of the probability space argument has been elaborated on before, there is a striking lack of analysis on its underlying ramifications.Converting Bayesian inference (the probability of that initially selected door concealing the car is invariant of the event of Monty opening a goat-yielding door) into conditional probability formalism, we have: This indicates that the events of A C and B O are independent; so are A C and C O .This independence also implies that ( ) ( ) ( ) ( ) . Using the law of total probability, we can calculate the bias parameter q:

P O P O C P C P O P C P O C P C
There exists a bias parameter q that characterizes the result of Bayesian inference, which facilitates an assessment of epistemic and statistical probabilities for Bayesian and frequentist inferences [4]- [9].
A few remarks are immediately in order.1) As a conditional probability problem, it is important to remove ambiguities through the rules of the game [1]- [3].The following rules are necessary: Monty by no means opens the goat-yielding door the contestant chose initially, and Monty always opens a door concealing a goat [1] [2].The former is essential for the partition of the probability space.The latter is important as well in case Monty limit the winning chance of the contestant to 1 3 by ending the game prematurely [10]; 2) If the contestant initially chooses the correct door, Monty then faces with a choice as to which door to open.The frequentist approach uses a "coin-flipping" method [1].In Bayesian approach, however, Monty can choose to open a door in such a way that no additional relevant information is given to the contestant; 3) For the two non-selected doors with the same prior winning probability, the two approaches find the same solution.However, the reasoning behind each answer varies.In the frequentist inference, q is assumed to be 1 2 [1], whereas Bayesian inference determines the value of q to be 1 2 .When the non-selected doors have different prior winning probabilities, the frequentist and Bayesian inferences lead to different results.
To illustrate the differences, we examine an uneven probability model in which the odds of a car being behind a door are not uniform.A case of an unequal prior model could occur as follows.Assume Monty rolls a standard, 6-sided die.If he rolls a 1, 2, or 3, then the car is placed behind door A. If he rolls a 4 or 5, the car is placed behind door B. If he rolls a 6, the car is placed behind door C. Therefore, the winning probabilities for the doors , A B , and C, are 1 , respectively.The contestant consequently rolls the die to select a door as the contestant's initial choice.Shown in Figure 3  the contestant switches to door B and wins.This counter-strategy limits the contestant's winning chance to 1 3 , instead of 1 2 for a mixed strategy by randomly sticking or switching with equal probability.
In fact, the case where ( ) 0 A P O = allows for a straightforward evaluation of the Nash equilibrium strategy.We have ( ) ( ) ( ) ( ) ( ) ( ) As a consequence, if 1 3 q < , the contestant should switch for B O and stick for C O ; whereas if 1 3 q > , the contestant should stick for B O and switch for 1 3 .Consequently, Monty's dominant strategy is for bias parameter

= =
).The result of 1 3 q * = is a stark contrast to the "coin-flipping" that assumes 1 2 q = .Therefore, an extension of the "coin-flipping" to variants of the Monty Hall problem is unjustified.In contrast, using Bayesian inference (cf. the derivations of Equation ( 11)), we have Therefore, the rational decision procedure inferred from Bayesian approach conforms to the Nash equilibrium strategy.Furthermore, we have ( ) ( ) ( ) To further pursue this point, we calculate the conditional entropy as the function of the bias parameter (q) for the initial unequal probability model, As seen in Figure 4, the maximum entropy for each instance is 1 1 , 3 4 q = , and 2 5 for ,

A B
O O , and C O , re- spectively.It is worth noting that Bayesian approach yields the maximum entropy value of q, which corresponds to the rational strategy for Monty.We summarize in Table 1 extracted values of conditional entropy for different bias parameters.The value of q * is larger than the corresponding value of the "coin-flipping" 1 2 q = for the uneven probability model.We have demonstrated that Bayesian inference corresponds to Maximum Entropy solution for the calculated conditional entropy.As a result, Bayesian approach can be employed to tackle the variants of the Monty Hall problem.In contrast, if the contestant miscalculates rational decisions by the "coin-flipping" model, we have demonstrated that there exists a counter strategy for Monty to take advantage of the situation.

Conclusions
In conclusion, we have devised a Bayesian inference approach for a systematic exploration of rational decisions for variants of the Monty Hall problem.The method employs the Maximum Entropy Principle [14].Our results show that the "coin-flipping" assumption is unfounded in several variants of the Monty Hall problem.In contrast, Bayesian inference considers many different perspectives in updating the probability to avoid inference bias.
The frequentist inference determines the winning probability using single conditional probability.Bayesian inference estimates the winning probability using expected values that are weighted averages of individual conditional probabilities.Bayesian inference considers Monty's decision process with respect to the change of information at the various stages of the game.We examined a few variants of the Monty Hall problem and showed that the "coin-flipping" assumption was in general not consistent with maximum entropy solutions.Our analysis on the uneven prior probability Monty Hall variant reveals fallacies in the "coin-flipping" assumption, thereby providing convincing evidence that Bayesian inference is appropriate in tackling Monty Hall-like conditional  0.918 0.978 0.874 0.645 q * 0.918 1.000 0.918 0.650 probability problems.We believe that our findings shed light on the application of Bayesian inferences and the Maximum Entropy Principle in quantum Monty Hall problems [11] [12], which deserve further development in the future.We remark, before closing, that the approaches developed in this paper can be applied to a variety of emerging fields, notably Big Data and Bioinformatics.Bayesian inference has some appealing features, including the capability of describing complex data structures, characterizing uncertainty, and providing comprehensive estimates of parameter values, and comparative assessments.Bayesian methodology can be employed for a comprehensible means of integrating all available sources of information and of considering missing data.There is also of great benefit in using Bayesian approach as a mechanism for integrating mathematical models and advanced computational algorithms.

Figure 1 .
Figure 1.The probability distribution of prior (top panel), free-choice (bottom left), restricted choice with Bayesian inference, and restricted choice with frequentist approach (bottom right).

Figure 2 .
Figure 2. Calculated Shannon entropy as a function of the parameter q for ( ) ( ) ( ) 1 3 A B C P C P C P C = = = .Insets: the winning probability of the contestant initially selecting the green door and Monty opening either the light or dark grey door, respectively.

Figure 3 .
Figure 3. Probability distribution changes from the prior to the restricted choice with the frequentist inference (left panel) and to that with Bayesian inference (right panel), respectively.

Table 1 .
Extracted Shannon entropy S for the original model, and ( ) ( ) ( ) Monty's option to open a door is restricted in that if the car is behind B or C, then the host has to open door C or B, respectively.We have B O and C O , irrespective of in reality B O or C O .In other words, Bayesian inference takes into account not only what happened but also what could happen.