Deep Reinforcement Learning for Optimising Non-pharmaceutical Interventions in Epidemics:  A Systematic Review

Raphael Ibraimoh; Mohammed Saraee

doi:10.9734/jamcs/2026/v41i12091

Deep Reinforcement Learning for Optimising Non-pharmaceutical Interventions in Epidemics: A Systematic Review

Full Article - PDF Review History Discussion

Published: 2026-01-17

DOI: 10.9734/jamcs/2026/v41i12091

Page: 72-95

Issue: 2026 - Volume 41 [Issue 1]

Raphael Ibraimoh *

School of Science, Engineering and Environment, University of Salford, UK.

Mohammed Saraee

School of Science, Engineering and Environment, University of Salford, UK.

*Author to whom correspondence should be addressed.

Abstract

Aims: This review aims to evaluate the application of Deep Reinforcement Learning (DRL) for optimizing non-pharmaceutical interventions (NPIs), such as lockdowns and mobility restrictions, during epidemic outbreaks. The focus is on understanding how DRL addresses uncertainty and balances health-economic trade-offs compared to traditional static approaches.

Study Design: Systematic literature review.

Place and Duration of Study: School of Science, Engineering and Environment, University of Salford, between January 2020 and March 2025.

Methodology: A systematic review was conducted following PRISMA-S and Kitchenham guidelines. Literature research was performed across five major databases (Scopus, Web of Science Core Collection, PubMed/MEDLINE, IEEE Xplore, and arXiv) to identify studies published between January 2020 and March 2025. Inclusion criteria required that studies applied DRL to NPI strategies (lockdowns and/or travel restrictions) and reported quantitative outcomes. In total, 30 eligible studies were analyzed for algorithmic design, reward structures, and performance metrics.

Results: The review found that DRL consistently outperformed static heuristic-based policies in simulation environments. DRL-driven strategies recommended earlier, adaptive, and layered interventions, leading to improved epidemic control. Multi-objective DRL frameworks demonstrated superior trade-offs between infection suppression and economic impact compared to single-objective models. However, key limitations were identified, including data scarcity, inconsistent reward engineering, and limited integration of socio-economic factors.

Conclusion: DRL offers a principled and adaptive approach for dynamic epidemic policy optimization, outperforming static strategies in simulation studies. Nevertheless, real-world implementation remains challenging due to data limitations and the complexity of integrating socio-economic and behavioral dimensions. Future research should prioritize safety-aware DRL, transparent reward design, and multi-domain integration to ensure practical applicability and public trust.

Keywords: Deep reinforcement learning, COVID-19, lockdown, travel restriction, multi-objective policy, pandemic control, PRISMA-S

How to Cite

Ibraimoh, Raphael, and Mohammed Saraee. 2026. “Deep Reinforcement Learning for Optimising Non-Pharmaceutical Interventions in Epidemics: A Systematic Review”. Journal of Advances in Mathematics and Computer Science 41 (1):72-95. https://doi.org/10.9734/jamcs/2026/v41i12091.

Downloads

Download data is not yet available.