Centroid-Based and Bayesian Algorithms Performance

Ghaleb Al-Gaphari *

Computer Faculty Sana’a, University, P. O. Box 1247, Sana’a, Yemen.

Fadl M. Ba-Alwi

Computer Faculty Sana’a, University, P. O. Box 1247, Sana’a, Yemen.

Saeed Abdullah M. Al Dobai

Computer Faculty Sana’a, University, P. O. Box 1247, Sana’a, Yemen.

*Author to whom correspondence should be addressed.


Abstract

Since, the amount of textual information available on the web is estimated by terra bytes. Then, there should be an efficient algorithm to summarize such information. The algorithm would speed up the process of information reading, information accessing and decision making process. This paper investigates Bayesian classifier (BC) and a Centroid -Based algorithm (CBA) performance in terms of Arabic text summarization problem (ATS). Both algorithms are implemented as a software program. The Centroid -Based algorithm (CBA) extracts the most important sentences in a document or a set of documents (cluster). This algorithm starts computing the similarity between two sentences and evaluating the centrality of each sentence in a cluster based on centrality graph. Then the algorithm extracts the most important sentences in the cluster to include them in a summary. Whereas the Bayesian algorithm categorizes each sentence to be in text summary or out of text summary classes depends on its features vector. Both algorithms are evaluated by human participants and by an automatic metrics. Arabic NEWSWIRE-a corpus is used as a data set in the algorithms evaluation. The F-measure is obtained for both algorithms results. The Centroid -Based algorithm records 0.7199 and the Bayesian algorithm records 0.623.Thereforethe Centroid -Based algorithm (CBA) outperforms the Bayesian algorithm. The CBA results show that, the CBA is a robust algorithm compared to BC. It show a low deviation average that means the CBA gives similar result either contains bugs or not compared to BC. It is able to compress or reduce the text into 25% of its original size without losing the main idea behind the original text. This property makes the algorithm distinguishable among others used for the same purpose. Also, it outperforms all those techniques which are included in this paper when it is used for Arabic text summarization.

Keywords: Text Summarization, Text Mining and Centrality Concept


How to Cite

Al-Gaphari, Ghaleb, Fadl M. Ba-Alwi, and Saeed Abdullah M. Al Dobai. 2014. “Centroid-Based and Bayesian Algorithms Performance”. Journal of Advances in Mathematics and Computer Science 4 (12):1642-64. https://doi.org/10.9734/BJMCS/2014/7897.

Downloads

Download data is not yet available.