Web Document Classification Using Naïve Bayes

A. B. Adetunji

Department of Computer Science and Engineering, Faculty of Engineering and Technology, Ladoke Akintola University of Technology (LAUTECH), Nigeria.

J. P. Oguntoye *

Department of Computer Science and Engineering, Faculty of Engineering and Technology, Ladoke Akintola University of Technology (LAUTECH), Nigeria.

O. D. Fenwa

Department of Computer Science and Engineering, Faculty of Engineering and Technology, Ladoke Akintola University of Technology (LAUTECH), Nigeria.

N. O. Akande

Department of Physical Sciences, College of Science and Engineering, Landmark University Omu-Aran, Nigeria.

*Author to whom correspondence should be addressed.


Abstract

World Wide Web has become a huge collection of documents and the amount of documents available is increasing on a daily basis. How to correctly classify the vast documents into a particular category and locate any document of interest easily has become a challenge researchers have been trying to solve for decades and different researchers have attempted different algorithms using different platform to achieve this aim. In this paper, a University web site was used as a case study and a machine learning workbench called WEKA (Waikato Environment for Knowledge Analysis) which provides a general-purpose environment for automatic classification, regression, clustering and feature selection was used as a machine learning platform. Running Naïve Bayes with 10-fold cross validation on the selected web data gives a 77% correctly classified instances in zero second with relative absolute error of 68.9937%. This shows the ability of Naïve Bayes algorithm to accurately classify vast amount of web document in a short time.

Keywords: Machine learning, Naïve Bayes, web document classification, Waikato Environment for Knowledge Analysis (WEKA)


How to Cite

Adetunji, A. B., J. P. Oguntoye, O. D. Fenwa, and N. O. Akande. 2018. “Web Document Classification Using Naïve Bayes”. Journal of Advances in Mathematics and Computer Science 29 (6):1-11. https://doi.org/10.9734/JAMCS/2018/34128.

Downloads

Download data is not yet available.