This approach combines the strengths of two algorithms that are already well known in reinforcement learning and are also believed to exist in humans and rodents.
“Model-based” algorithms learn models of the environment that can then be simulated to produce estimates of future reward, while “Model-free” algorithms learn future reward estimates directly from experience in the environment.
Model-based algorithms are flexible but computationally expensive, while model-free algorithms are computationally cheap but inflexible. Because the calculation is a simple weighted sum, it is computationally efficient, much like a model-free algorithm. More generally, a major future task will be to look at how the brain integrates different types of learning.
While we posed this model as an alternative to model-based and model-free learning in the brain, a more realistic view is that many types of learning are simultaneously coordinated by the brain during learning and planning.
Understanding how these learning algorithms are combined is an important step towards understanding human and animal brains, and could provide key insights for designing equally complex, multifaceted AI.Read the full paper.