Title: An Algorithm
for Extracting the Root for the Arabic Language
Abstract: Stemming
is one of many tools used in information retrieval (IR) to combat the vocabulary
mismatch problem, in which query words do not match document words. Stemming
in the Arabic language does not fit into the usual mold, because
stemming in most research in other languages so far depends only on
eliminating prefixes and suffixes from the word, but Arabic words
contain infixes as well. In this paper we introduce a root-based algorithm
that handles the problems of affixes, including prefixes, suffixes, and infixes
depending on the morphological pattern of the word. In this paper we will
use the stemming concept to eliminate for eliminating all kinds of
affixes, including infixes.
Authors: Sameh
Ghawanmeh, Riyad Al-Shalabi, Ghassan Kanaan, Khalid Khanfar, and Saif Rabab’ah