Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology

    Abstract
2014(Vol.8, Issue:14)
Article Information:

Rule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases

Ahmad B.A. Hassanat and Ghada Awad Altarawneh
Corresponding Author:  Ahmad B.A. Hassanat 
Submitted: ‎May ‎19, ‎2014
Accepted: June ‎18, ‎2014
Published: October 10, 2014
Abstract:
This study investigates the problem that some Arabic names can be written in multiple ways. When someone searches for only one form of a name, neither exact nor approximate matching is appropriate for returning the multiple variants of the name. Exact matching requires the user to enter all forms of the name for the search and approximate matching yields names not among the variations of the one being sought. In this study, we attempt to solve the problem with a dictionary of all Arabic names mapped to their different (alternative) writing forms. We generated alternatives based on rules we derived from reviewing the first names of 9.9 million citizens and former citizens of Jordan. This dictionary can be used for both standardizing the written form when inserting a new name into a database and for searching for the name and all its alternative written forms. Creating the dictionary automatically based on rules resulted in at least 7% erroneous acceptance errors and 7.9% erroneous rejection errors. We addressed the errors by manually editing the dictionary. The dictionary can be of help to real world-databases, with the qualification that manual editing does not guarantee 100% correctness.

Key words:  Arabic names, Arabic names standardization, database, name entity recognition, NLP, ,
Abstract PDF HTML
Cite this Reference:
Ahmad B.A. Hassanat and Ghada Awad Altarawneh, . Rule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases. Research Journal of Applied Sciences, Engineering and Technology, (14): 1630-1638.
ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved