Abstract
|
Article Information:
A Grammatical Evolution Approach for Content Extraction of Electronic Commerce Website
Wei Qing-jin and Peng Jian-sheng
Corresponding Author: Wei Qing-jin
Submitted: July 26, 2012
Accepted: September 12, 2012
Published: March 11, 2013 |
Abstract:
|
Web content extraction, a problem of identifying and extracting interesting information from Web pages, plays an important role in integrating data from different sources for advanced information-based services. In this paper, an approach and techniques of extracting electronic commercial information from the Web pages without any given template is investigated in a way of Grammatical Evolution (GE) method. Although a lot of research used the Xpath technique to extract the content of Web pages, but due to the complexity of the Xpath grammar, it is too difficult to perform the processing automatically for evolutional tools. Hence, a reduced language integrating Xpath and DOM techniques is given to generate the solution of parse in a BNF grammar form, which is used in the GE. Moreover, a fitness function evaluation method is also proposed on the fuzzy membership of the two parts in the chromosome. Finally, empirical results on several real Web pages show that the new proposed technique can segment data records and extract data from them accurately, automatically and flexibly.
Key words: DOM, grammatical evolution, web content extraction, Xpath, , ,
|
Abstract
|
PDF
|
HTML |
|
Cite this Reference:
Wei Qing-jin and Peng Jian-sheng, . A Grammatical Evolution Approach for Content Extraction of Electronic Commerce Website. Research Journal of Applied Sciences, Engineering and Technology, (07): 2426-2432.
|
|
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|