Original Reward Function - opendigital/RL-collective-action GitHub Wiki
The Reward Function
From Coordinate to Cooperate or Compete: Abstract Goals and Joint Intentions in Social Interaction (citation at the end of paper)
Context:
- the goal of this game is to secure the reward
- the blue and yellow dots have the choice of going for the reward cell at the edge or in the center
- going to the edges symbolizes cooperating, going to the middle symbolizes competing (since if one dot’s in the middle the other cannot access their central reward cell)
- moving costs the dots one point, while staying still and waiting does not; this forces the dots to minimize the number of moves spent
- also makes the central reward cell more attractive than the edge cells
Cooperating
- cooperative action defined as one that maximizes reward for all agents
- this motivates the idea of a “group agent” whose reward is a sum of all agents’ reward weighed by how much the group agent cares about each agent
- sum of how much each agent has; don’t overcomplicate this
- for our purposes we can weigh each agent equally
- this motivates the idea of a “group agent” whose reward is a sum of all agents’ reward weighed by how much the group agent cares about each agent
Competing
-
agents have some default “level 0” behavior, which is then evolved upon with each round based on the round prior
-
from this, they can develop a recursively defined behavior
- the other agent is treated as a part of the environment, something which is factored in when calculating P(s’|s, a’) since the other agent is part of the current state (s)
- just because these agents are selfishly planning doesn’t mean that they are actively opposing each other; it is possible that agents all develop norms that collectively help everyone
Coordinating Cooperating and Competing:
- An agent can derive the intentions of other agents from these functions (I)
- in the case of cooperative planning, the agents are given the other agents’ actions from the group-agent function
- in the case of competitive planning, the agents take their actions using an expectation for what the other agents’ actions will be
- From there, use simple Bayesian techniques P(I|D) ∝ P(D|I)P(I) where P(D|I) is the policy to understand which policy an agent will choose (D) given their intention (I)
- From there, can use any sort of strategy (e.g. tit-for-tat learning) to pick what action agent should choose
- choice 1: tit-for-tat learning
- choice 2: Bush-Mosteller
- basic idea behind this is that whether or not a person contributes is based on whether or not they think the group’s work met an expectation of theirs
- the initial condition p1 is from the uniform density on [0, 1] independently for different players.
- First, we define pt as the expected contribution that the player makes in round t. We draw the actual contribution at from the truncated Gaussian distribution whose mean and standard deviation are equal to pt and 0.2, respectively. If at falls outside the interval [0, 1], we discard it and redraw at until it falls within [0, 1]. Second, we introduce a threshold contribution value X, distinct from A, used for regarding the action to be either cooperative or defective.
- X = 0.3 and 0.4 for conditional cooperative behavior
- beta = 0.4, A = 0.9
- we use Bush-Mosteller to decide whether to be competitive or cooperative, and then use Q-Learning to tune the exact contribution
How This Applies to Our Game
- actions are simply how much an agent decides to contribute
- tit-for-tat learning applies to the public goods game once we reduce the choices down to cooperate or compete - people are most successful in the public goods game if they work together, but more successful were they to defect and work on their own than if they tried to work with others who had defected themselves
How Ezaki Verified His Paper
- he defined a threshold X above which contributions were considered cooperative and below which contributions were considered uncooperative
- he varied X (set to 0.3 and 0.4) and observed patterns of cooperation consistent with the study he used to back up his claim (https://www.nature.com/articles/srep39275.pdf?platform=oscar&draft=collection)
Citations
- Kleiman-Weiner, Max, Ho, Mark K., Austerweil, Joseph L., Littman, Michael L., & Tenenbaum, Joshua B. (2016). Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction. COGSCI. Retrieved from http://par.nsf.gov/biblio/10026426
- Ezaki T, Horita Y, Takezawa M, Masuda N (2016) Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin. PLoS Comput Biol 12(7): e1005034. https://doi.org/10.1371/journal.pcbi.1005034