Deep Reinforcement Learning for Optimising Non-pharmaceutical Interventions in Epidemics: A Systematic Review
Raphael Ibraimoh *
School of Science, Engineering and Environment, University of Salford, UK.
Mohammed Saraee
School of Science, Engineering and Environment, University of Salford, UK.
*Author to whom correspondence should be addressed.
Abstract
Aims: This review aims to evaluate the application of Deep Reinforcement Learning (DRL) for optimizing non-pharmaceutical interventions (NPIs), such as lockdowns and mobility restrictions, during epidemic outbreaks. The focus is on understanding how DRL addresses uncertainty and balances health-economic trade-offs compared to traditional static approaches.
Study Design: Systematic literature review.
Place and Duration of Study: School of Science, Engineering and Environment, University of Salford, between January 2020 and March 2025.
Methodology: A systematic review was conducted following PRISMA-S and Kitchenham guidelines. Literature research was performed across five major databases (Scopus, Web of Science Core Collection, PubMed/MEDLINE, IEEE Xplore, and arXiv) to identify studies published between January 2020 and March 2025. Inclusion criteria required that studies applied DRL to NPI strategies (lockdowns and/or travel restrictions) and reported quantitative outcomes. In total, 30 eligible studies were analyzed for algorithmic design, reward structures, and performance metrics.
Results: The review found that DRL consistently outperformed static heuristic-based policies in simulation environments. DRL-driven strategies recommended earlier, adaptive, and layered interventions, leading to improved epidemic control. Multi-objective DRL frameworks demonstrated superior trade-offs between infection suppression and economic impact compared to single-objective models. However, key limitations were identified, including data scarcity, inconsistent reward engineering, and limited integration of socio-economic factors.
Conclusion: DRL offers a principled and adaptive approach for dynamic epidemic policy optimization, outperforming static strategies in simulation studies. Nevertheless, real-world implementation remains challenging due to data limitations and the complexity of integrating socio-economic and behavioral dimensions. Future research should prioritize safety-aware DRL, transparent reward design, and multi-domain integration to ensure practical applicability and public trust.
Keywords: Deep reinforcement learning, COVID-19, lockdown, travel restriction, multi-objective policy, pandemic control, PRISMA-S