Evaluating SMOTE-based Class Imbalance Handling in Software Defect Prediction
Abdulbari Alamri *
Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, Saudi Arabia.
Muhammad Bilal
Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, Saudi Arabia.
*Author to whom correspondence should be addressed.
Abstract
Class imbalance remains a persistent challenge in software defect prediction because defective modules are usually much fewer than non-defective modules. This imbalance can reduce a model's ability to identify the minority class and can also make performance interpretation misleading when evaluation relies on unsuitable measures. This paper evaluates the effect of the Synthetic Minority Over-sampling Technique (SMOTE) on defect prediction performance in a focused and leakage-safe empirical setting. The study compares Logistic Regression, Random Forest and k-Nearest Neighbors on the AEEEM_JDT benchmark dataset using imbalance-aware metrics. The findings show that SMOTE improved threshold-based performance for Random Forest and k-Nearest Neighbors, while Logistic Regression showed almost no improvement in F1-score and a decline in MCC. Random Forest improved from 0.566 to 0.627 in F1-score and from 0.499 to 0.535 in MCC, while k-Nearest Neighbors improved from 0.533 to 0.628 in F1-score and from 0.476 to 0.523 in MCC. However, PR-AUC decreased for all three models after SMOTE, indicating that better classification at a selected operating point did not necessarily produce better ranking quality across thresholds. The results suggest that SMOTE should be treated as a conditional imbalance-handling technique rather than a universally beneficial preprocessing step. Its usefulness depends on the classifier and on whether the practical objective is fixed-threshold defect detection or risk ranking.
Keywords: Software defect prediction, class imbalance, SMOTE, stratified cross-validation, F1-score, MCC, PR-AUC