Zero Intelligence - mu-zhao/Liars

Zero Intelligence

AI Zero Intelligence(ZI) is based on statistical assumptions and z-score. This AI is largely deterministic.

If the math format doesn't render properly, you might consider using MathJax Plugin for Github

General Idea:

The result of the player's own dice is known while those of others follow the binomial distribution, which can be approximated by the normal distribution. So the idea would be calculating the z-score of a bet: if the z-score exceeds a predetermined level $\lambda$, it calls "liar"; otherwise, it raises the bet.

Calculate z-score:

we'll show the calculation through an example. Let's say we have 5 dice, and the total number of dice the other players have is 36. Suppose our roll result is one 2 and four threes, it's represented by an array [0,1,4,0,0,0]. Now for the 36 dice owned by others, the mean will be [6,12,12,12,12,12]( remember 1 is wild !) and the standard deviation is [s1,s2,s2,s2,s2,s2], where $s1=\sqrt{36\times \frac{1}{6}\times(1-\frac{1}{6})}, s2=\sqrt{36\times \frac{1}{3}\times(1-\frac{1}{3})}$.

Now someone bet sixteen threes, which is represented as [16,3], has z-score calculated as follows:

my_expectation=[0,1,4,0,0,0]+[6,12,12,12,12,12]=[6,13,16,12,12,12]

and z_score = (16+0.5-16)/s2, here 0.5 is the error adjustment.

Calculate "Payoff":

Now suppose the previous bet does not exceed the call level, then we need to raise the bet, for which we have to calculate the payoff of each legit bet. So what is the model that describes the payoff? First, we need to know $p$, the probability of a bet to be BAD and we need a model to describe the behavior of the next player. Here the payoff is defined as the expected number of dice to gain or the negative of the expected number of dice to lose.

calculate $p$:

$p$ is fairly easy to calculate, we can just use the z-score. We find the z-score of the bet, say [17,3], then p is the cdf of normal distribution. We can get a more accurate result by using the binomial distribution. In general, suppose we have the bet $ [r,d]$ while our rollout is $[s,d]$(here it means the part that matters). Then $$ p=\mathbb{P}_X(X+s<r)$$ or equivalently, $$ p=\mathbb{P}_X(X<r-s)=F_X(r -s-1),$$ since $X$ takes values in integers.

Parsimonious model of the next player's behavior:

To understand the behavior of the next player B, we need to make assumptions of B. Here we'll make a simple assumption (or "hold the belief " -- in game theory jargon) that the next player B will call if the bet exceeds a certain level $ \lambda$.

Now suppose that player A's bet is $[r,d]$, what is the probability of player B call liar based on our(A's) belief? Well, let us suppose the next player's(B's) rollout is $[x,d]$(here we only care about the part concern denomination $d$), will player B call liar? It all depends on whether the bet $[r,d]$ exceeding level $\lambda$ from B's viewpoint. That is, suppose the total number of dice owned by the rest players(except B) is $n$. Then the number of $d$ is a random variable $Y$, which follows the binomial distribution(we can use the normal distribution to approximate it). player B would call liar if the probability of bet $[r,d]$ being good is less than level $\lambda$. That is, $$ \mathbb{P}_Y(x+Y\geq r)<\lambda.$$ Equivalently, $$ \mathbb{P}_Y(Y\geq r-x)<\lambda.$$ Let $ y_0 :=\min \{ y|\mathbb{P}_Y(Y\geq y)<\lambda \}$, the above inequality is equivalent to $$ r-x \geq y_0.$$ So we conclude that player B will call liar iff $$ x\leq r- y_0.$$

Remark:

$y_0$ can be computed since $Y$ is discrete. In the case where Y has continuous density function, $ y_0=F_Y^{-1}(1-\lambda)$.

"Payoff":

Now we can calculate payoff $\omega$ of bet $[r,d]$. Let $Z$ be the number of $d$'s rolled by players other than $A$ and $B$, which follows the binomial distribution. we have:

Suppose player A's rollout is $[s,d]$, player B's rollout has $X$ $d$'s, $y_0$ and $Z$ are defined as above, then player A's payoff $ \omega$ by raising bet to $[r,d]$ is

$$ \omega([r,d])=-\sum_{x\leq r-y_0}\mathbb{P}_X(X=x)\mathbb{P}_Z(Z<r-s-x)$$ or $$ \omega([r,d]) =-\mathbb{E}_X I_{\{X\leq r-y_0\}} F_Z(r-s-1-X)$$ noticing $Z$ is discrete, where $I$ is the indicator variable.

Proof: It's straight forward and we only explain the first equation. Player A will lose a dice, only when

Player B call liar, i.e, $ x\leq r-y_0$
Bet $[r,d]$ is bad, that is, $ s+x+Z < r$

In this case, the expected payoff is $ \mathbb{P}_Z(s+x+Z<r)$, and the total expectation is the weighted sum.

Remark:

The real payoff is actually less than the payoff here because as the game continues, there is still more chance for player A to lose dice.

Some theoretical thoughts on $\lambda$ :

The general calculation is feasible yet tedious due to its finite nature. We'll instead use the standard univariate normal distribution.

Notice that the following discussion deviates from the game in two aspects:

We use the normal distribution, as opposed to the binomial distribution
We discuss one-dimensional cases, rather than 6-dimensional(corresponds to the six faces of dice)

Suppose $A$ and $B$'s rollout come from the standard normal distribution, while the sum of rollouts of the other players is $ Z\sim N(0, \sigma^2)$, where $ \sigma^2$ is the number of other players. Furthermore, suppose the rollout of A and B are $S=s$ and $X$, respectively. And the bet is $r$, then the payoff of A call liar is $$\omega_0=-\mathbb{P}_Y(Y+s\geq r)=-\lambda, $$ where $Y\sim N(0,\sigma^2+1)$. Let $y_0=F_Y^{-1}(1-\lambda)$, we have $$y_0=r-s.$$

On the other hand, if A raises $r$ to $r+\delta$ ($\delta$ is arbitarily small), then the "payoff" is $$\omega(r)=-\mathbb{E}_X I_{\{X\leq r-y_0\}} F_Z(r-s-X).$$ That is, $$\omega(r)=-\int_{-\infty}^{r-y_0}F_Z(r-s-x)dF_X(x)$$

Now if we set $\omega_0=\omega(r)$, we have $$ 1-F_Y(y_0)=\int_{-\infty}^{r-y_0}F_Z(r-s-x)dF_X(x)$$ That is, $$ \int_{-\infty}^s F_Z(y_0-x)dF_X(x)+F_Y(y_0)=1$$ where $X\sim N(0,1)$.

Now in the case $s=0, \sigma^2 = 3$, we can solve $y_0=0.5775$, $\lambda=0.2818$.

Remark:

This result is surprisingly good. In practice, we found this value almost optimal.

Robustness (Stability):

The robustness of AI Zero Intelligence can somewhat be expected since the strategy takes nothing but initial information into consideration. Other players' moves have no impact on its response. Once the dice are cast, the responses of this AI is completely determined(Different versions of this AI may contain a certain level of randomness).

A bit discussion on metrics on robustness:

We need a metric to measure the robustness. Since we don't really know what our strategy is up against, and it's usually the good strategies survive till the end, we would use the worst-case scenario payoff as a metric for robustness as opposed to the expectation and variance.

Worst-case scenario:

It's generally hard, if not impossible, to figure out what exactly the worst-case scenario is. However, we can give a lower bound for the worst case. In this case, suppose that player A is up against other $k$ players. These $k$ players know exactly what $A$'s strategy is, and they may or may not be able to communicate with each other their own rollouts precisely. But let's say they collude with each other and know exactly each other's rollout. More importantly, their common objective is to minimize player A's payoff. In this case, these $k$ players play as one player with more dice.

Is this scenario the worst for player A? Will the action of collusion in some ways helped player A in a mysterious way? That is to say, will the action of collusion, whether it's known to player A or not, impact player A's response in such a way that it actually helps A objectively? The answer is NO because player A's response is determined at the beginning of the game, no additional information later into the game will be taken into account. Furthermore, the information considered by this AI is the total number of dice out there, no reference to the number of players whatsoever.

Model the game:

Now we can try to get a sense of the AI's worst performance. First of all, let's concisely describe the game between the two players. Player A's(AI ZI) rollout will be $X$ from the standard normal distribution, player B's rollout $Y$ is drawn from the normal distribution $N(0,k)$. Each time B will pass a number $z$ to A, A can call liar( if it exceeds the level $\lambda$, whose exact value is known to B), or A will maintain it( In fact A adds an infinitesimal increment to the bet), then B will call liar or increase the bet and start the process again. Formally,

Suppose player A,B's rollouts are $X, Y$, respectively, where $X\sim N(0,1), Y\sim N(0,\sigma^2), \sigma^2=k$. Number $z$ increase from $-\infty$ to $+\infty$. A and B can call "liar" at any time. If the number $z$ at that moment is less than the sum of $X$ and $Y$, the caller gains a point. Moreover, we know player A will call "liar" when $$\frac{z-X}{\sigma} = \lambda.$$ what's player A's minimal expected payoff?

First of all, let's figure out what player B would do. Suppose the rollout is $Y=y$, then depending on whether $ y>\lambda\sigma$, player B has different strategies.

$ y>\lambda\sigma$:

In general, $Y$ being large means a strong hand. Therefore, less incentive to call "liar" early. As a matter of fact, when $ Y>\lambda\sigma$, player B will never call "liar".And this strategy will guarantee a win. Indeed, when player A call "liar", we have $ z= x+\lambda\sigma<x+y$. Therefore, player A loses.

$ y< \lambda\sigma$:

Let's imagine a scenario when $ z\ll 0$, player B is obviously not going to call "liar" immediately, instead, player B will wait, because the longer player B waits, the better chance player has. However, this kind of patience has its downside, which is the risk of player A call liar first. Now when player A call "liar", player A wins, due to the fact that $ z=x+\lambda\sigma> x+y$. Therefore, we notice that there's a trade-off between the advantage we gain from waiting and the risk of player A makes the first move.

More precisely, at the moment $Z=K$, the payoff of player B setting up the mind to make a move at $Z=z$ is: $$ \omega_B(z)=\frac{\mathbb{P}_X(z-\lambda\sigma<X<z-y)}{\mathbb{P}_X(X>K-\lambda\sigma)}, z\geq K$$ The above formula comes from the following fact:

at time $ z=K$, player A hasn't made a move yet, so $ x+\lambda\sigma>K$
The payoff of win is 1 and that of loss is 0. Player B can win iff:
1. $ x+\lambda\sigma >z$
2. $ x+y<z$

Therefore, the expected payoff is: $$ \omega_B(z)=\mathbb{E}_X(I_{\{z-\lambda\sigma<X<z-y\}}| X>K-\lambda\sigma)=\frac{\mathbb{P}_X(z-\lambda\sigma<X<z-y)}{\mathbb{P}_X(X>K-\lambda\sigma)}, z\geq K$$

And we immediately notice that to maximize $\omega_B(z)$, we shall take

$ z= \frac{y+\lambda\sigma}{2},$ when $ K\leq \frac{y+\lambda\sigma}{2}$
$ z=K, $ when $ K> \frac{y+\lambda\sigma}{2}$

That is, if $ z\leq \frac{y+\lambda\sigma}{2}$, we shall wait till $ z=\frac{y+\lambda\sigma}{2}$. If $ z>\frac{y+\lambda\sigma}{2}$ for whatever reason, player B shall make a move immediately. In conclusion, player B shall call liar at $ z=\frac{y+\lambda\sigma}{2}$.

To summarize, player A makes a move at $z=x+\lambda\sigma$ while player B does it at $z=\frac{y+\lambda\sigma}{2}$

Now we are ready to compute the expected payoff of player B.

Player B will win in the following two cases:

$ y<\lambda\sigma$ and player A call "liar" first, that is, $ x+\lambda\sigma<\frac{y+\lambda\sigma}{2}$
$ y<\lambda\sigma$, player B call "liar" first, incrrectly: $ x+\lambda\sigma>\frac{y+\lambda\sigma}{2}$ and $ x+y\geq z$, where $ z=\frac{y+\lambda\sigma}{2}$

These conditions are equivalent to the sets:

$ \Omega_1=\{(x,y)|\quad y<\lambda\sigma, 2x+\lambda\sigma<y\}$
$ \Omega_2=\{(x,y)|\quad y<\lambda\sigma, 2x+\lambda\sigma>y,2x+y\geq \lambda\sigma\}$

Fig for the regions

And the expected payoff for player A is: $$ \omega_A(\lambda)=\int_{\Omega}f(x,y)dxdy=2\int_0^\infty f_X(x)\int_{\lambda\sigma-2x}^{\lambda\sigma}f_Y(y)dydx$$ where $ \Omega=\Omega_1\cup\Omega_2, f(x,y)=f_X(x)f_Y(y)$.

Remark:

$ \Omega_1$ and $ \Omega_2$ are symmetric, which implies that the expected payoffs are equal.

Further thoughts on $\lambda$:

Noticing that $ \omega_A(\lambda)$ is a lower bound for the expected payoff of this strategy, we can tweak $\lambda$ to get better lower bound. First, by differentiating $ \omega_A(\lambda)$, we have $$ 0=\frac{d\omega_A(\lambda)}{d\lambda}=2\int_0^\infty f_X(x)[f_Y(\lambda\sigma)\sigma-f_Y(\lambda\sigma-2x)\sigma]dx.$$ That is, $$ 2\int_0^\infty f_X(x)f_Y(\lambda\sigma)\sigma dx=2\int_0^\infty f_X(x)f_Y(\lambda\sigma-2x)\sigma dx.$$ which is, $$ \frac{1}{\sqrt{2\pi}}e^{-\frac{\lambda^2}{2}}=\frac{1}{\pi}\int_0^\infty \exp(-\frac{x^2}{2}-\frac{(\lambda\sigma-2x)^2}{2\sigma^2})dx.$$ And we have $$ erf(\frac{-2\lambda}{\sigma^2+4})+1=\sqrt{1+\frac{4}{\sigma^2}}\exp(-\frac{2\lambda^2}{\sigma^2+4}).$$

Remark:

The optimal $\lambda$ here is different from the $\lambda$ above because the previous optimal $\lambda$ is obtained on the premise that the opponent(s)'strategies are the same while here the opponent doesn't play the same strategy. In fact, the opponents here are more adversarial.

Randomize strategy:

If somehow our strategy is known to others, as shown above, then this could have devastating consequences on the performance. One way to reduce the loss is to randomize the strategy. Suppose we adopt the same strategy, except that the call level $\lambda$ becomes a random variable $\Lambda$, with density function $\psi$. Then for any $Y=y$, we have to figure out player B's best response first. Suppose player B will call "liar" at $Z=K$, then if $ K=+\infty$, we have $$ \omega_B(+\infty)=\mathbb{P}_\Lambda(x+y>x+\Lambda\sigma)=\mathbb{P}_\Lambda(\Lambda<\frac{y}{\sigma})=F_\Lambda(\frac{y}{\sigma}).$$ Otherwise, $ K<\infty$, and the expected payoff comes from two cases:

player A call "liar" first, unsuccessfully: $$ \Omega_1=\{(x,\lambda) | x+\lambda\sigma<K, x+y>x+\lambda\sigma\}.$$
player B call "liar" first, sucessfully: $$ \Omega_2=\{(x,\lambda) | K<x+\lambda\sigma, x+y< K\}.$$ we have the expected payoff $\omega_B$: $$ \omega_B(K)=\int_{-\infty}^{\frac{y}{\sigma}}\int_{-\infty}^{K-\lambda\sigma} \psi(\lambda)f_X(x)dxd\lambda+ \int_{-\infty}^{K-y}\int_\frac{K-\lambda}{\sigma}^{+\infty}\psi(\lambda)f_X(x)d\lambda dx.$$

Therefore we have, $$\frac{d\omega_B(K)}{dK}=\int_{-\infty}^\frac{y}{\sigma}\psi(\lambda)f_X(K-\lambda\sigma)d\lambda+ f_X(K-y)\int_\frac{y}{\sigma}^\infty\psi(\lambda)d\lambda-\frac{1}{\sigma}\int_{-\infty}^{K-y}f_X(x)\psi(\frac{K-x}{\sigma})dx.$$ Set $$\frac{d\omega_B(K)}{dK}=0, $$ we can solve the equation for $K$, if possible, then compare $\omega(K)$ with $\omega(+\infty)$ to decide player B's best response when $Y=y$.

Zero Intelligence - mu-zhao/Liars_dice GitHub Wiki

General Idea:

Calculate z-score:

Calculate "Payoff":

calculate $p$:

Parsimonious model of the next player's behavior:

Remark:

"Payoff":

Remark:

Some theoretical thoughts on $\lambda$ :

Remark:

Robustness (Stability):

A bit discussion on metrics on robustness:

Worst-case scenario:

Model the game:

Remark:

Further thoughts on $\lambda$:

Remark:

Randomize strategy:

⚠️ GitHub.com Fallback ⚠️

Zero Intelligence - mu-zhao/Liars_dice GitHub Wiki

General Idea:

Calculate z-score:

Calculate "Payoff":

calculate $p$:

Parsimonious model of the next player's behavior:

Remark:

"Payoff":

Remark:

Some theoretical thoughts on $\lambda$ :

Remark:

Robustness (Stability):

A bit discussion on metrics on robustness:

Worst-case scenario:

Model the game:

Remark:

Further thoughts on $\lambda$:

Remark:

Randomize strategy:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️