Performance Analysis of Gaussian–Edge Feature Extraction Pipelines for Imbalanced Image Classification
Elif Kozan *
Management School, Lancaster University, Lancaster, LA1 4YX, UK and Department of Statistics, Faculty of Science, Ege University, Izmir 35100, Türkiye.
*Author to whom correspondence should be addressed.
Abstract
Modern supervised learning pipelines for image classification rely heavily on preprocessing decisions, including feature extraction, noise reduction, and class imbalance handling. In high-dimensional and heterogeneous image data, these choices can substantially influence classifier behavior, yet the combined effects of Gaussian smoothing and classical edge detection operators remain insufficiently characterized within statistical learning pipelines.This study presents a systematic performance analysis of Gaussian–edge feature extraction pipelines for imbalanced binary image classification. Gaussian filtering is combined with four widely used edge operators—Sobel, Prewitt, Laplacian of Gaussian, and Canny—and evaluated using classical learning algorithms, including Logistic Regression, Support Vector Machines, Random Forests, and Gradient Boosting. To address class imbalance, both class weighted learning and SMOTE oversampling are examined. Using a heterogeneous bee image dataset as an illustrative case study, the results demonstrate that preprocessing choices have a pronounced impact on minority class performance, while overall accuracy alone provides a misleading assessment under severe imbalance. First-order gradient operators (Sobel and Prewitt), particularly when combined with Gradient Boosting and SMOTE, consistently yield more balanced and interpretable performance than second-order or threshold-based detectors. The best-performing pipeline achieves a macro F1-score of approximately 0.55 and a minority class (Apis) F1-score of approximately 0.25, highlighting the importance of preprocessing design in classical image classification pipelines and providing interpretable baselines for imbalanced learning scenarios where complex deep-learning architectures may be impractical.
Keywords: Gaussian smoothing, Edge based feature extraction, imbalanced image classification, classical machine learning, SMOTE oversampling, statistical image analysis