Home           Contact us           FAQs           
     Journal Home     |     Aim & Scope    |    Author(s) Information      |     Editorial Board     |     MSP Download Statistics
2014 (Vol. 7, Issue: 6)
Article Information:

Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair

Deepa Gupta, Vani Raveendran and Rahul Kumar Yadav
Corresponding Author:  Deepa Gupta 

Key words:  Cost calculation, Natural Language Processing (NLP), non-official data, normal distribution, official data, parallel corpus collection, semi-official data, sentential alignment
Vol. 7 , (6): 1187-1198
Submitted Accepted Published
March 19, 2013 May 10, 2013 February 15, 2014

Creation of Parallel Corpora and efficient corporal alignment at sentential level for structurally distinct languages having relatively low degree of correlation remains a challenge. This work emphasizes the importance of domain biased parallel data collection and a structured methodology to obtain the same for English-Hindi language duet. Further, its sentential alignment has also been undertaken since the participating languages are structurally distinct. In essence two aspects of this study is collection of parallel corpora from different domains and aligning the extracted parallel corpus at sentence level. The proposition is intended to help researchers in the field of Natural Language Processing help contribute better in terms of accuracy, precision and robustness of their proposition. This being possible only with availability of abundant parallel corpora and more so only if the parallel corpora are available domain wise and aligned at least at sentence level. The language pair considered for the development of the algorithm is English-Hindi. The algorithm being generic in nature makes our proposition scalable to other like structured language pairs.
Abstract PDF HTML
  Cite this Reference:
Deepa Gupta, Vani Raveendran and Rahul Kumar Yadav, 2014. Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair.  Research Journal of Applied Sciences, Engineering and Technology, 7(6): 1187-1198.
    Advertise with us
ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Current Information
   Sales & Services
Home  |  Contact us  |  About us  |  Privacy Policy
Copyright © 2015. MAXWELL Scientific Publication Corp., All rights reserved