SHAP-Based Explainable Machine Learning for Predicting Loan Default among Women-Owned Microenterprises in Kenya
Maurice Wanyonyi *
Department of Mathematics and Statistics, University of Embu, Kenya and Research and Development, African Institute for Capacity Development, Nairobi, Kenya.
Jacqueline Akelo Gogo
Department of Mathematics and Statistics, University of Embu, Kenya and Research and Development, African Institute for Capacity Development, Nairobi, Kenya.
Dennis Muchuki Kinini
Department of Mathematics and Statistics, University of Embu, Kenya.
Jonathan Ndolo Mbithi
Department of Mathematics and Statistics, University of Embu, Kenya.
John Kiluyi Wafula
Department of Assets, Office of Auditor General, Nairobi, Kenya.
Isaac Wafula
Department of Mathematics and Statistics, University of Embu, Kenya.
*Author to whom correspondence should be addressed.
Abstract
Background: Access to credit remains a major challenge for women-owned microenterprises in sub-Saharan Africa, yet existing loan default prediction models often lack transparency and interpretability. This study aimed to develop and evaluate a SHAP-based explainable machine learning framework for predicting loan default among women borrowers using Savings and Credit Cooperative Organizations (SACCO) lending data in Kenya.
Methods: We analysed 3,880 loan records from Kenyan SACCOs, comprising variables across five domains: demographic characteristics, business attributes, loan terms, group-lending mechanisms, and savings/credit history. SSix machine learning algorithms, Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, XGBoost, and LightGBM, were implemented and compared. The Synthetic Minority Oversampling Technique (SMOTE) addressed class imbalance (82.5% non-default versus 17.5% default). Model performance was assessed using classification metrics, probability calibration (Brier score), computational efficiency, and 95% bootstrap confidence intervals. Model interpretability was evaluated using SHapley Additive exPlanations (SHAP), with Random Forest selected as the primary interpretable model.
Results: Ensemble learning methods outperformed traditional approaches across most performance metrics. Random Forest achieved the highest predictive performance with an F1 score of 0.903 (95% CI: 0.891–0.914) and recall of 0.999 (95% CI: 0.997–1.000), correctly identifying 99.9% of actual defaulters. Gradient boosting models (LightGBM, XGBoost, Gradient Boosting) demonstrated superior probability calibration (Brier = 0.147) and computational efficiency, requiring less than 0.6 seconds training time and producing model sizes below 0.25 MB. SHAP analysis identified loan amount, loan term, and interest rate as the dominant predictors of default, while training attendance and group enforcement mechanisms were associated with reduced default risk.
Implications for Practice: The proposed SHAP-based framework provides financial institutions with transparent, actionable insights for credit risk assessment. By identifying modifiable loan-design factors and protective behavioral mechanisms, SACCOs can enhance portfolio quality, implement risk-based lending strategies, and expand inclusive access for women entrepreneurs without sacrificing predictive accuracy.
Conclusion: Accurate and interpretable loan default prediction is feasible in microfinance settings using SHAP-based explainable machine learning. Loan characteristics are the primary drivers of default risk, while behavioral and group-based factors contribute to repayment performance. The framework supports improved lending decisions, enhanced financial inclusion, and better portfolio quality management for women-owned enterprises in developing economies.
Keywords: Women-owned microenterprises, SACCO lending, loan default prediction, credit risk assessment, explainable artificial intelligence, SHAP, machine learning, Random Forest, LightGBM, XGBoost, class imbalance, financial inclusion