June 11 Notes - opendigital/RL-collective-action GitHub Wiki

Welcome to the RL-collective-action wiki!

Research question: what policy enables a deep RL to act like a conditional cooperative human?

Basic game design: See Horita et. al 2017

Theoretical motivation: Advancing existing RL research and model CC behavior

Practical motivation: a reward function for CC behavior allows a researcher to fiddle with the inputs given and the game settings and simulate how CC behavior changes

Notes:

  • We should calculate Nash outcome with the game
    • Note: when playing with CCs the Nash outcome will not be realized

Policy options:

  • Focus on regret: CC agents regret competing, after some point they switch
    • Reward is some function of money saved and regret accrued
    • Regret is not cooperating when group is moving in a cooperative direction and not competing when group is moving in a competitive direction
  • Weigh both individual money saved and change in collective goal
    • Reward is some function of money saved and change in cooperation over time
    • Agent becomes more cooperative as cooperation increases and more competitive when cooperation decreases