qertanime.blogg.se - Sequential gaming

These diagrams show the payoff that an individual agent can expect if it follows a defecting/exploitative strategy (red) vs a cooperative strategy (blue), given the number of other agents that are cooperating. The above plot shows the empirical Schelling diagrams for both Cleanup (A) and Harvest (B) (from ). If individual agents employ an exploitative strategy by greedily consuming too many apples, the collective reward of all agents is reduced.

Harvest: A tragedy-of-the-commons dilemma in which apples regrow at a rate that depends on the amount of nearby apples.

While an agent is cleaning the river, other agents can exploit it by consuming the apples that appear.

Cleanup: A public goods dilemma in which agents get a reward for consuming apples, but must use a cleaning beam to clean a river in order for apples to grow.The implemented environments are structured to be compatible with OpenAIs gym environments as well as RLlib's Multiagent Environment Implemented Games

The reward structure poses a dilemma because individual short-term optimal strategies lead to poor long-term outcomes for the group. SSDs can be thought of as analogous to spatially and temporally extended Prisoner's Dilemma-like games. This repo is an open-source implementation of DeepMind's Sequential Social Dilemma (SSD) multi-agent game-theoretic environments.