Zero Intelligence - mu-zhao/Liars_dice GitHub Wiki
Zero Intelligence
AI Zero Intelligence(ZI) is based on statistical assumptions and z-score. This AI is largely deterministic.
If the math format doesn't render properly, you might consider using MathJax Plugin for Github
The result of the player's own dice is known while those of others follow the binomial distribution, which can be approximated by the normal distribution. So the idea would be calculating the z-score of a bet: if the z-score exceeds a predetermined level
we'll show the calculation through an example.
Let's say we have 5 dice, and the total number of dice the other players have is 36. Suppose our roll result is one 2 and four threes, it's represented by an array [0,1,4,0,0,0]. Now for the 36 dice owned by others, the mean will be [6,12,12,12,12,12]( remember 1 is wild !) and the standard deviation is [s1,s2,s2,s2,s2,s2], where
Now someone bet sixteen threes, which is represented as [16,3], has z-score calculated as follows:
my_expectation=[0,1,4,0,0,0]+[6,12,12,12,12,12]=[6,13,16,12,12,12]
and z_score = (16+0.5-16)/s2, here 0.5 is the error adjustment.
Now suppose the previous bet does not exceed the call level, then we need to raise the bet, for which we have to calculate the payoff of each legit bet. So what is the model that describes the payoff? First, we need to know
To understand the behavior of the next player B, we need to make assumptions of B. Here we'll make a simple assumption (or "hold the belief " -- in game theory jargon) that the next player B will call if the bet exceeds a certain level $ \lambda$.
Now suppose that player A's bet is
Now we can calculate payoff
Suppose player A's rollout is $[s,d]$, player B's rollout has $X$ $d$'s, $y_0$ and $Z$ are defined as above, then player A's payoff $ \omega$ by raising bet to $[r,d]$ is
$$ \omega([r,d])=-\sum_{x\leq r-y_0}\mathbb{P}_X(X=x)\mathbb{P}_Z(Z<r-s-x)$$
or
$$ \omega([r,d]) =-\mathbb{E}_X I_{\{X\leq r-y_0\}} F_Z(r-s-1-X)$$
noticing
Proof: It's straight forward and we only explain the first equation. Player A will lose a dice, only when
- Player B call liar, i.e, $ x\leq r-y_0$
- Bet
$[r,d]$ is bad, that is, $ s+x+Z < r$
In this case, the expected payoff is $ \mathbb{P}_Z(s+x+Z<r)$, and the total expectation is the weighted sum.
The real payoff is actually less than the payoff here because as the game continues, there is still more chance for player A to lose dice.
The general calculation is feasible yet tedious due to its finite nature. We'll instead use the standard univariate normal distribution.
Notice that the following discussion deviates from the game in two aspects:
- We use the normal distribution, as opposed to the binomial distribution
- We discuss one-dimensional cases, rather than 6-dimensional(corresponds to the six faces of dice)
Suppose
On the other hand, if A raises
Now if we set
Now in the case
This result is surprisingly good. In practice, we found this value almost optimal.
The robustness of AI Zero Intelligence can somewhat be expected since the strategy takes nothing but initial information into consideration. Other players' moves have no impact on its response. Once the dice are cast, the responses of this AI is completely determined(Different versions of this AI may contain a certain level of randomness).
We need a metric to measure the robustness. Since we don't really know what our strategy is up against, and it's usually the good strategies survive till the end, we would use the worst-case scenario payoff as a metric for robustness as opposed to the expectation and variance.
It's generally hard, if not impossible, to figure out what exactly the worst-case scenario is. However, we can give a lower bound for the worst case. In this case, suppose that player A is up against other
Is this scenario the worst for player A? Will the action of collusion in some ways helped player A in a mysterious way? That is to say, will the action of collusion, whether it's known to player A or not, impact player A's response in such a way that it actually helps A objectively? The answer is NO because player A's response is determined at the beginning of the game, no additional information later into the game will be taken into account. Furthermore, the information considered by this AI is the total number of dice out there, no reference to the number of players whatsoever.
Now we can try to get a sense of the AI's worst performance.
First of all, let's concisely describe the game between the two players. Player A's(AI ZI) rollout will be
Suppose player A,B's rollouts are
First of all, let's figure out what player B would do. Suppose the rollout is
- $ y>\lambda\sigma$:
In general,
- $ y< \lambda\sigma$:
Let's imagine a scenario when $ z\ll 0$, player B is obviously not going to call "liar" immediately, instead, player B will wait, because the longer player B waits, the better chance player has. However, this kind of patience has its downside, which is the risk of player A call liar first. Now when player A call "liar", player A wins, due to the fact that $ z=x+\lambda\sigma> x+y$. Therefore, we notice that there's a trade-off between the advantage we gain from waiting and the risk of player A makes the first move.
More precisely, at the moment
- at time $ z=K$, player A hasn't made a move yet, so $ x+\lambda\sigma>K$
- The payoff of win is 1 and that of loss is 0. Player B can win iff:
- $ x+\lambda\sigma >z$
- $ x+y<z$
Therefore, the expected payoff is: $$ \omega_B(z)=\mathbb{E}_X(I_{\{z-\lambda\sigma<X<z-y\}}| X>K-\lambda\sigma)=\frac{\mathbb{P}_X(z-\lambda\sigma<X<z-y)}{\mathbb{P}_X(X>K-\lambda\sigma)}, z\geq K$$
And we immediately notice that to maximize
- $ z= \frac{y+\lambda\sigma}{2},$ when $ K\leq \frac{y+\lambda\sigma}{2}$
- $ z=K, $ when $ K> \frac{y+\lambda\sigma}{2}$
That is, if $ z\leq \frac{y+\lambda\sigma}{2}$, we shall wait till $ z=\frac{y+\lambda\sigma}{2}$. If $ z>\frac{y+\lambda\sigma}{2}$ for whatever reason, player B shall make a move immediately. In conclusion, player B shall call liar at $ z=\frac{y+\lambda\sigma}{2}$.
To summarize, player A makes a move at
Now we are ready to compute the expected payoff of player B.
Player B will win in the following two cases:
- $ y<\lambda\sigma$ and player A call "liar" first, that is, $ x+\lambda\sigma<\frac{y+\lambda\sigma}{2}$
- $ y<\lambda\sigma$, player B call "liar" first, incrrectly: $ x+\lambda\sigma>\frac{y+\lambda\sigma}{2}$ and $ x+y\geq z$, where $ z=\frac{y+\lambda\sigma}{2}$
These conditions are equivalent to the sets:
- $ \Omega_1=\{(x,y)|\quad y<\lambda\sigma, 2x+\lambda\sigma<y\}$
- $ \Omega_2=\{(x,y)|\quad y<\lambda\sigma, 2x+\lambda\sigma>y,2x+y\geq \lambda\sigma\}$
And the expected payoff for player A is: $$ \omega_A(\lambda)=\int_{\Omega}f(x,y)dxdy=2\int_0^\infty f_X(x)\int_{\lambda\sigma-2x}^{\lambda\sigma}f_Y(y)dydx$$ where $ \Omega=\Omega_1\cup\Omega_2, f(x,y)=f_X(x)f_Y(y)$.
- $ \Omega_1$ and $ \Omega_2$ are symmetric, which implies that the expected payoffs are equal.
Noticing that $ \omega_A(\lambda)$ is a lower bound for the expected payoff of this strategy, we can tweak
The optimal
If somehow our strategy is known to others, as shown above, then this could have devastating consequences on the performance. One way to reduce the loss is to randomize the strategy. Suppose we adopt the same strategy, except that the call level
- player A call "liar" first, unsuccessfully: $$ \Omega_1=\{(x,\lambda) | x+\lambda\sigma<K, x+y>x+\lambda\sigma\}.$$
- player B call "liar" first, sucessfully:
$$ \Omega_2=\{(x,\lambda) | K<x+\lambda\sigma, x+y< K\}.$$
we have the expected payoff
$\omega_B$ : $$ \omega_B(K)=\int_{-\infty}^{\frac{y}{\sigma}}\int_{-\infty}^{K-\lambda\sigma} \psi(\lambda)f_X(x)dxd\lambda+ \int_{-\infty}^{K-y}\int_\frac{K-\lambda}{\sigma}^{+\infty}\psi(\lambda)f_X(x)d\lambda dx.$$
Therefore we have,