Abstract
|
Article Information:
Linear Reranking Model for Chinese Pinyin-to-Character Conversion
Xinxin Li, Xuan Wang, Lin Yao and Muhammad Waqas Anwar
Corresponding Author: Xinxin Li
Submitted: January 31, 2013
Accepted: February 25, 2013
Published: February 05, 2014 |
Abstract:
|
Pinyin-to-character conversion is an important task for Chinese natural language processing tasks. Previous work mainly focused on n-gram language models and machine learning approaches, or with additional hand-crafted or automatic rule-based post-processing. There are two problems unable to solve for word n-gram language model: out-of-vocabulary word recognition and long-distance grammatical constraints. In this study, we proposed a linear reranking model trying to solve these problems. Our model uses minimum error learning method to combine different sub models, which includes word and character n-gram LMs, part-of-speech tagging model and dependency model. Impact of different sub models on the conversion are fully experimented and analyzed. Results on the Lancaster Corpus of Mandarin Chinese show that our new model outperforms word n-gram language model.
Key words: Dependency model, minimum error learning method, part-of-speech tagging, word n-gram model, , ,
|
Abstract
|
PDF
|
HTML |
|
Cite this Reference:
Xinxin Li, Xuan Wang, Lin Yao and Muhammad Waqas Anwar, . Linear Reranking Model for Chinese Pinyin-to-Character Conversion. Research Journal of Applied Sciences, Engineering and Technology, (5): 975-980.
|
|
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|