Title: Evaluation
of Search Methodologies for DARPA
GALE Distillation Engines
Abstract: Effective
search methodologies are needed to meet the
human-like accuracy requirements of distillation engines used in the
Defense
Advanced Research Projects Agency (DARPA) Global Autonomous Language
Exploitation (GALE) project. An evaluation of two search methods for
the
distillation engine is presented. The Gospels of the Holy Bible (in
English and
Arabic form) are used as a test corpus for determining which search
method is
more effective. Fifty English queries are issued against each corpus
and the
results tabulated. Statistical methods are then applied to compare the
“hits”
from each method with a baseline result set which is generated by
issuing a
human-translated version of the English queries against the Arabic
corpus.Analysis shows that searching in the end user’s native language
generates more hits than searching in the document’s language even
though it
requires translation of the entire document. This finding is contrary
to the
authors’ expectations. As such, an explanation of possible causes and a
call
for further research is given to determine whether full document
translation is
indeed more effective than query translation alone.
Author:
Christopher
Armstrong and Houman
Younessi