Title: Automatic
Text Classification using naïve Bayesian Algorithm on Arabic language
Abstract:Text classification
is a supervised technique that uses labeled train data to learn the classification
system and then automatically classify the remaining text using these class
labels. In this research we illustrated an effective approach to text classification
from Arabic text collections; our approach uses probabilistic framework by
applying naïve Bayesian algorithm to classify Arabic text. Our test data
is 600 documents distributed equally into 6 classes (Architecture, Economy,
Health and Medicine, Politics, science, and sports), each class contains 100
documents. We used 25% from each class to train our system, and the other
75% documents are automatically classified using naïve Bayesian classifier.
We tested our system performance using the accuracy measure at several number
of test documents. The accuracy varies from class to another (from 41% to
100%). The over all average accuracy achieved by our system for all classes
is 57.19% and the best results achieved was for the first class, it reached
88.38%.
Authors: Ghassan
Kanaan, Riyad Al-Shalabi, and Omar Al-Azzam