Analysis Report - opendigital/RL-collective-action GitHub Wiki
Analysis Report
Question
How does different information dissemination feedback schema influence conditional cooperator reinforcement learning agents’ behavior in a public goods game?
Motivation
Per Kurzban and Houser (2005), around 63% of people are conditional cooperators in groups - their contributions are a function of how much they think other members of their group will contribute to the group. Understanding the process by which they gauge the cooperativeness of their group better helps us figure out what drives people to contribute to groups, and how we can organize information available to groups in a manner that leads to the best, most cooperative outcome.
Related Work
- Kleiman-Weiner, Max, Ho, Mark K., Austerweil, Joseph L., Littman, Michael L., & Tenenbaum, Joshua B. (2016). Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction. COGSCI. Retrieved from http://par.nsf.gov/biblio/10026426
The lower-level reinforcement learning function was taken from this paper. It studies how two agents in an environment behave when cooperating and competing (it does not cover in great detail how to switch between cooperating and competing states, instead allowing anyone to pick their approach)
- Ezaki T, Horita Y, Takezawa M, Masuda N (2016) Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin. PLoS Comput Biol 12(7): e1005034. https://doi.org/10.1371/journal.pcbi.1005034
The upper level reinforcement learning function was taken from here. It studies how agents playing the public goods game switch between cooperating and competing using an aspiration-based model.
RL Model Specification
Cooperating
- cooperative action defined as one that maximizes reward for all agents
- this motivates the idea of a “group agent” whose reward is a sum of all agents’ reward weighed by how much the group agent cares about each agent
- for our purposes we can weigh each agent equally
Competing
- agents have some default “level 0” behavior, which is then evolved upon with each round based on the round prior
- from this, they can develop a recursively defined behavior
- the other agent is treated as a part of the environment, something which is factored in when calculating P(s’|s, a’) since the other agent is part of the current state (s)
- just because these agents are selfishly planning doesn’t mean that they are actively opposing each other; it is possible that agents all develop norms that collectively help everyone
Coordinating Cooperating and Competing:
- An agent can derive the intentions of other agents from these functions (I)
- in the case of cooperative planning, the agents are given the other agents’ actions from the group-agent function
- in the case of competitive planning, the agents take their actions using an expectation for what the other agents’ actions will be
- From there, use simple Bayesian techniques P(I|D) ∝ P(D|I)P(I) where P(D|I) is the policy to understand which policy an agent will choose (D) given their intention (I)
- From there, can use any sort of strategy (e.g. tit-for-tat learning) to pick what action agent should choose
- We will use the Bush-Mosteller approach:
- basic idea behind this is that whether or not a person contributes is based on whether or not they think the group’s work met an expectation of theirs
- the initial condition p1 is from the uniform density on [0, 1] independently for different players.
- First, we define pt as the expected contribution that the player makes in round t. We draw the actual contribution at from the truncated Gaussian distribution whose mean and standard deviation are equal to pt and 0.2, respectively. If at falls outside the interval [0, 1], we discard it and redraw at until it falls within [0, 1]. Second, we introduce a threshold contribution value X, distinct from A, used for regarding the action to be either cooperative or defective.
- X = 0.3 and 0.4 for conditional cooperative behavior
- beta = 0.4, A = 0.9
How This Applies to Our Game
- actions are simply how much an agent decides to contribute
Information Dissemination Schemes: Base Game
- agents are not exposed to any information whatsoever
Treatment 1
- agents are exposed to the sum of contributions for each round
Treatment 2
- agents are exposed to the individual contributions of every other agent in each round
Method
Which Parameters are Used (for each type of game)
- A = 0.9, β = 0.4, X = 0.3
- A = 0.9, β = 0.4, X = 0.4
- A = 0.9, β = 0.4, X = 0.5
Our modeling policies/experimental set-up: Base Game
- agents are not exposed to any information whatsoever
Treatment 1
- agents are exposed to the sum of contributions for each round
Treatment 2
- agents are exposed to the individual contributions of every other agent in each round
The parameters varied:
- A in range(0, 2.1, 0.1),
- β in range(0, 3.1, 0.1),
- X = 0.3, 0.4, 0.5,
- I = {Base Game, Treatment 1, Treatment 2}
Total number of games: 21 * 31 * 3 * 3 games Each game is ran 100 times
Base Game sanity check and sensitivity analysis
- Play with 5 only: examine the contribution per round per player
- 4 fixed agents with max contributions and one reinforcement learning:
- Vary number of players
How Each Game is Ran
- 25 players, 12 rounds
- 3 unconditional cooperators (contribute from uniform distribution in [0.8, 1])
- 5 unconditional free-riders (contribute from uniform distribution in [0, 0.2])
- 17 conditional cooperators trained on the reward function we’ve designated
- contributions are in [0, 1] and discrete with intervals of 0.1 (e.g. 0.1, 0.2, 0.3 are valid but 0.01 is not)
- p is initialized as a random number from the uniform distribution in [0, 1] (as ran in the Ezaki paper)
- the reinforcement learning strategies for cooperation and competition will train and come up with what they think is the best first round contribution in the cooperating case and in the competing case
- at the end of each game no matter what the contributions are multiplied by 1.6 and evenly distributed back to the players (as ran in the Ezaki paper and the supporting Nature paper)
Number of simulations: 50 to 100 per game. So 100*21 * 31 * 3 * 3 games
Parameters Examined
- contribution behavior of individual agents (use time series plot for games to examine)
- contribution behavior of all conditionally cooperating agents in each game (use time series plot for games to examine)
Validation Technique
- when making the average contribution of other agents graph vs the average contribution of agent graph for the case with the distributions visible we should see a graph similar to this
What Data is Collected for each condition;
- each agent’s contributions in each round for all games
- their type: cooperator, free-rider, reciprocators
- Game condition (A,b,X,I)
- the mean/sd of contributions for each agent for a particular game condition
Explorative analysis: What Data is Plotted:
- time series plot for each agent in each game (line graph, x-axis is time, y-axis is contribution)
- average agent contribution vs. other agents contribution (plot from Ezaki) for each information dissemination scheme
Statistical modeling:
Anova/t-test on the different treatments etc.
Results
- tbd