Feature-based Model for Extraction and Classification of High Quality Questions in Online Forum

Bolanle Ojokoh *

Department of Computer Science, Federal University of Technology, P.M.B. 704, Akure, Nigeria.

Tobore Igbe

Department of Computer Science, Federal University of Technology, P.M.B. 704, Akure, Nigeria.

Ayobami Araoye

Department of Computer Science, Federal University of Technology, P.M.B. 704, Akure, Nigeria

*Author to whom correspondence should be addressed.


Abstract

Aims: To design and implement a classification-based model using specific features for identification and extraction of high quality questions in a thread.

Study Design: The study design is divided into three modules: preprocessing, configuration, and question classification

Place and Duration of Study: Department of Computer Science of the Federal University of Technology Akure, between June 2016 and December 2016

Methodology: This research proposes a way of identifying, extracting and classifying questions in order to enhance high quality answers in an online forum. One of the major issues in question extraction and classification in forum is the restriction on the number of categories considered such as Who, What, Where, Where, Which, Why and How which are not sufficient to capture all possible questions. In this work, a number of parameters were proposed and aggregated using fuzzy logic for context based spam detection and removal in order to enhance question identification and classification. Part of speech (POS) tagging was applied to analyse the structure of each extracted sentence based on the presence and position of predefined question tags; with this, issues like case sensitivity, grammatical construction and synonyms are addressed. Question classification is carried out with Naïve Bayes and identifying semantic relationship between extracted questions is achieved with cosine similarity model. Experiments were performed on dataset constructed from Research Gate website.

Results: We presented questions extracted from researchgate website into the system. The output consists of the corresponding POS tags and the category the question is classified into. The number of questions extracted from the website is dependent on the number of questions available in a forum. We were able to achieve a successful result of 3015 correctly extracted and classified questions at 80% POS tag occurrence.

Conclusion: Our approach to question identification and classification was effective and covers more question categories. This can be applied to any question answering system.

Keywords: Question, online forum, ResearchGate, Naïve Bayes, spam filtering


How to Cite

Ojokoh, Bolanle, Tobore Igbe, and Ayobami Araoye. 2017. “Feature-Based Model for Extraction and Classification of High Quality Questions in Online Forum”. Journal of Advances in Mathematics and Computer Science 22 (1):1-21. https://doi.org/10.9734/BJMCS/2017/32541.

Downloads

Download data is not yet available.