Reinforcement learning (RL) — an artificial intelligence (AI) training technique that uses rewards or punishments to drive agents toward goals — has a problem: it doesn’t result in highly generalizable models.
As OpenAI explains, prior work in reinforcement learning environments has focused on procedurally generated mazes, community projects like the General Video Game AI framework, and games like Sonic the Hedgehog, with generalization measured by training and testing agents on different sets of levels.
As if that weren’t enough, OpenAI developed two additional environments to investigate overfitting: CoinRun-Platforms and RandomMazes.
To validate CoinRun, CoinRun-Platforms, and RandomMazes, OpenAI trained 9 agents, each with a different number of training levels.
And in CoinRun-Platforms and RandomMazes, the agents strongly overfit in all cases.
The results provide valuable insight into the challenges underlying generalization in reinforcement learning, OpenAI said.