Comparative Analysis of Random Forest and Hybrid ARIMA-random Forest Models for Student Enrollment Forecasting in Higher Education
Rachelle Tapio *
University of Science and Technology of Southern Philippines, Cagayan De Oro City, Philippines.
Dennis Tarepe
University of Science and Technology of Southern Philippines, Cagayan De Oro City, Philippines.
*Author to whom correspondence should be addressed.
Abstract
Aims: This study evaluates the predictive accuracy of the Random Forest model and a Hybrid ARIMA-Random Forest model for forecasting student enrollment trends in higher education. Accurate forecasting is crucial for institutional planning, resource allocation, and decision-making. This study examines whether combining statistical time series forecasting (ARIMA) with machine learning (Random Forest) improves prediction accuracy.
Study Design: A comparative forecasting study using historical enrollment data from 1949 to 2024.
Place and Duration of Study: Conducted at a Higher Education Institution in Misamis Occidental, Philippines utilizing 75 years of enrollment records.
Methodology: The dataset was split into 80% training and 20% testing. The Random Forest model captured nonlinear relationships in enrollment trends, while the ARIMA (2,1,1) model identified long-term patterns and seasonality. A Hybrid ARIMA-Random Forest model combined predictions from both models to improve accuracy. Forecasting performance was assessed using Mean Absolute Percentage Error (MAPE). The analysis was conducted using Python and relevant statistical/machine learning libraries.
Results: The Random Forest model had a MAPE of 15.62%, while ARIMA recorded 16.95%. The Hybrid model achieved the lowest MAPE of 14.52%, indicating superior accuracy. Projected enrollments for 2025-2029 suggest a stable trend between 3,065 and 3,408 students, with seasonal variations.
Conclusion: The Hybrid ARIMA-Random Forest model outperformed both standalone models, demonstrating that integrating statistical and machine learning approaches enhances forecasting accuracy. These findings support data-driven decision-making for university administrators in faculty recruitment, admissions planning, and financial management. Future research should explore additional machine learning approaches, particularly deep learning and neural networks, to further enhance predictive accuracy.
Keywords: Enrollment forecasting, machine learning, time series analysis, random forest, ARIMA, hybrid models, predictive analytics, higher education planning