Epsilon Intelligence - mu-zhao/Liars_dice GitHub Wiki
Epsilon Intelligence
Epsilon Intelligence(EI) is based on Bayesian inference and probabilistic model. The key part of EI different from ZI is Bayesian Inference.
If the math format doesn't render properly, you might consider using MathJax Plugin for Github
General Idea:
EI will observe each bet made by other players and infer the distribution of dice results of other players. Then It makes decisions exactly as ZI.
Bayesian Inference:
It's kind of paradoxical to do Bayesian inference because we need to know the principles of other players' decision- making, which in turn is the ultimate purpose of this AI. To make the task feasible, we need to make assumptions about other players' decision policies.
Simple assumption & shortcomings:
Our first idea would be the simple assumption we made in ZI, where the other players are simple-minded and will call "liar" once the bet exceeds a certain level $\lambda$ and if not, will raise the bet to the best option. There is a deadly shortcoming though. For example, suppose the bet is $[5,5]$, then based on this principle, the player shall never raise the bet to $[7,5]$, because $[6,5]$ is "clearly" the better choice, since the "payoff" of $[6,5]$ is always larger than that of $[7,5]$. But players do skip bets and it doesn't seem to be a bad idea sometimes. What goes wrong here then? Well, the problem lies in the definition of "payoff". It's not really payoff, it is actually a reward function. And this reward function, as mentioned in the remarks of Zero Intelligence, is larger than what it is supposed to be: A player raises the bet and isn't questioned, that does NOT mean the end of the round. If the bet is not large enough or the number of players is small, the game continues and can come back to the player again. Therefore it makes sense for some players to skip bets. When it happens, we won't be able to make Bayesian inference.
Stochastic response:
How do we model players' response criteria then? One way is to model the response as a stochastic rather than deterministic decision process. So instead of choosing the "best" response, we choose all eligible responses with a distribution. That is,
** Suppose the set of all possible rollouts is $\Re$ and $R \in \Re$, and the bet is $B$. For bet $ \hat{B}>B$, if the perceived z-score by this player has exceeded $\lambda$, then $T(R,\hat{B})=0$, otherwise, we have: $$ T(R,\hat{B})=\frac{\phi(\omega(\hat{B}))}{\sum\limits_{B'>B}^{}\phi(\omega(B'))},$$ where $T$ is the transitional probability, and $ \phi>0$ is a utility function. And the Bayesian Inference formula, given that player's response is $\hat{B}$, is: $$ p(R|\hat{B})=\frac{p_R\cdot T(R,\hat{B})}{\sum\limits_{ R'\in \Re} p_{R'} \cdot T(R',\hat{B})},$$ where $p_R$ is the prior probability and $p(R|\hat{B})$ is posterior probability.
Naive Bayes and Strong independence:
If a player makes two different bets $B_1$ and $B_2$, we can infer whether this player has a strong hand on certain denominations. Therefore, a player might want to conceal this information by intentionally makes bet based not just on the current bet but also on previous bets or even fake it. We make the assumption here that the bets made by players are solely dependent on the current state, i.e, independent of the previous bets. Then we have:
$$T(R,B_1,B_2)=T(R,B_1)\cdot T(R,B_2|B_1)=T(R,B_1)\cdot T(R,B_2)$$
This formula allows us to make Bayesian inference based on the previous one:
$$p(R|B_1,B_2)=\frac{p_R\cdot T(R,B_1,B_2)}{\sum\limits_{R'\in\Re}p_R\cdot T(R',B_1,B_2)}=\frac{p_R\cdot T(R,B_1)\cdot T(R,B_2)}{\sum\limits_{R'\in\Re}p_R\cdot T(R',B_1)\cdot T(R',B_2)}=\frac{p(R|B_1)\cdot T(R,B_2)}{\sum\limits_{R'\in\Re}p(R'|B_1)\cdot T(R',B_2)}.$$
The last equality holds becasue we can divide both numerator and denominator by $\sum\limits_{R\in\Re}p_R\cdot T(R,B_1).$
Remark:
With the above formula, we can just update player's belief on the other players' distribution each step.