A Hybrid Multi-factor Forensic Data Analytics Framework for Suspicious Cyber Activity Detection
Himanshu Shukla
*
Department of Information Technology, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand – 263145, India.
Shikha Goswami
Department of Information Technology, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand – 263145, India.
Rajeev Singh
Department of Information Technology, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand – 263145, India.
Govind Verma
Department of Information Technology, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, Uttarakhand – 263145, India.
*Author to whom correspondence should be addressed.
Abstract
Background: The convergence of cloud, mobile, and enterprise networks improves operational efficiency and connectivity for organisations. However, it also increases cybersecurity risks, making multi-layered defences essential against threats such as credential attacks, data exfiltration, DDoS attacks, and insider threats.
Aims: This study examines whether integrating domain-expert rule logic with ensemble machine learning classifiers can produce a more dependable and operationally robust mechanism for identifying suspicious cyber activities within authentication-intensive environments than any single detection strategy alone.
Study Design: This comparative experimental study evaluated four detection configurations—rule-based scoring, Decision Tree (DT), Random Forest (RF), and a hybrid fusion model—against a binary-labelled cybercrime forensic dataset sourced from Kaggle.
Place and Duration of Study: Department of Information Technology, Govind Ballabh Pant University of Agriculture and Technology, Pantnagar, India; January 2025 – May 2026.
Methodology: A dataset of 7,400 records with eleven attributes was pre-processed through missing-value imputation, label encoding, and SMOTE + Tomek Links resampling to address class imbalance. Login_Attempts and the hour of the day were retained as primary predictors. DT and RF classifiers were trained alongside a rule-based multi-factor scoring model, and their outputs were fused through a logical-OR strategy to form the hybrid model. Performance was assessed using Accuracy, Precision, Recall, and confusion-matrix statistics, with Recall treated as the governing metric given the asymmetric cost of undetected attacks.
Results: The Hybrid Model achieved the highest Recall of 97.20%, substantially outperforming the Rule-Based model (88.80%), Decision Tree (5.88%), and Random Forest (5.88%). The Decision Tree and Random Forest recorded the highest Accuracy (95.20% and 95.00%, respectively) and Precision (100% and 98.35%), whereas the Hybrid Model produced a False Negative count of only 13, the lowest among all configurations. These findings suggest that recall-optimised fusion is an appropriate detection paradigm for security-critical applications.
Conclusion: Fusing domain-driven rule logic with supervised ensemble learning through a logical-OR strategy substantially improves minority-class attack detection. The proposed framework reduces missed attacks and demonstrates potential for deployment within real-world forensic cybersecurity pipelines, notwithstanding an elevated false-positive count that may be addressed through alert prioritisation in operational environments.
Keywords: Intrusion detection systems, digital forensics, forensic data analytics, hybrid machine learning, rule-based detection, Random Forest, Decision Tree, class imbalance, SMOTE–Tomek Links, cyber threat detection, suspicious activity detection