Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


An Efficient Technique to Implement Similarity Measures in Text Document Clustering using Artificial Neural Networks Algorithm

1K. Selvi and 2R.M. Suresh
1Sathyabama University
2Sri Muthukumaran Institute of Technology, Chennai, India
Research Journal of Applied Sciences, Engineering and Technology  2014  23:2320-2328
http://dx.doi.org/10.19026/rjaset.8.1235  |  © The Author(s) 2014
Received: ‎September ‎18, ‎2014  |  Accepted: October 17, ‎2014  |  Published: December 20, 2014

Abstract

Pattern recognition, envisaging supervised and unsupervised method, optimization, associative memory and control process are some of the diversified troubles that can be resolved by artificial neural networks. Problem identified: Of late, discovering the required information in massive quantity of data is the challenging tasks. The model of similarity evaluation is the central element in accomplishing a perceptive of variables and perception that encourage behavior and mediate concern. This study proposes Artificial Neural Networks algorithms to resolve similarity measures. In order to apply singular value decomposition the frequency of word pair is established in the given document. (1) Tokenization: The splitting up of a stream of text into words, phrases, signs, or other significant parts is called tokenization. (2) Stop words: Preceding or succeeding to processing natural language data, the words that are segregated is called stop words. (3) Porter stemming: The main utilization of this algorithm is as part of a phrase normalization development that is characteristically completed while setting up in rank recovery technique. (4) WordNet: The compilation of lexical data base for the English language is called as WordNet Based on Artificial Neural Networks, the core part of this study work extends n-gram proposed algorithm. All the phonemes, syllables, letters, words or base pair corresponds in accordance to the application. Future work extends the application of this same similarity measures in various other neural network algorithms to accomplish improved results.

Keywords:

Artificial neural networks, natural language processing, porter stemming, similarity measure, wordnet,


References

  1. Altlnel, B., M.C. Ganiz and B. Diri, 2013. A novel higher-order semantic kernel for text classification. Proceeding of the International Conference on Electronics, Computer and Computation (ICECCO, 2013), pp: 216-219.
    CrossRef    
  2. Audhkhasi, K., A. Sethy, B. Ramabhadran and S.S. Narayanan, 2014. Semi-supervised term-weighted value rescoring for keyword search. Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2014), pp: 7869-7873.
    CrossRef    
  3. Baleia, J., P. Santana and J. Barata, 2014. Self-supervised learning of depth-based navigation affordances from haptic cues. Proceeding of the IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC, 2014), pp: 146-151.
    CrossRef    
  4. Beebe, N.L., L.A. Maddox, L. Lishu and S. Minghe, 2013. Sceadan: Using concatenated N-gram vectors for improved file and data type classification. IEEE T. Inf. Foren. Sec., 8(9): 1519-1530.
    CrossRef    
  5. Bollegala D., Y. Matsuo and M. Ishizuka, 2006. Disambiguating personal names on the web using automatically extracted key phrases. In Proc. of the 17th European Conference on Arti¯cial Intelligence, pp: 553-557.
  6. Hamani, M.S. and R. Maamri, 2013. Word semantic similarity based on document's title. Proceeding of the 24th International Workshop on Database and Expert Systems Applications (DEXA), pp: 43-47.
    CrossRef    
  7. Hiraishi, K., M. Yoshimoto and K. Kobayashi, 2013. Diagnosis of stochastic discrete event systems based on N-gram models with wildcard characters. Proceeding of the IFIP/IEEE International Symposium on Integrated Network Management (IM'2013), pp: 1383-1388.
  8. Jiang J. and D. Conrath, 1998. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of the International Conference on Research in Computational Linguistics ROCLING X.
  9. Kavitha Karun, A., P. Mintu and K. Lubna, 2013. Comparative analysis of similarity measures in document clustering. Proceeding of the International Conference on Green Computing, Communication and Conservation of Energy (ICGCE, 2013), pp: 857-860.
    CrossRef    
  10. Kongsorot, Y. and P. Horata, 2014. Multi-label classification with extreme learning machine. Proceeding of the 6th International Conference on Knowledge and Smart Technology (KST, 2014), pp: 81-86.
    CrossRef    
  11. Liangboonprakong, C. and O. Sornil, 2013. Classification of malware families based on N-grams sequential pattern features. Proceeding of the 8th IEEE Conference on Industrial Electronics and Applications (ICIEA, 2013), pp: 777-782.
  12. Lin D., 1998. An Information-Theoretic Definition of Similarity, Proc.15th Int'l Conf. Machine Learning (ICML), pp: 296-304.
  13. Malandrakis, N., A. Potamianos and S. Narayanan, 2013. Continuous models of affect from text using n-grams. Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2013), pp: 8500-8504.
    CrossRef    
  14. Manna, S., B.S.U. Mendis and T. Gedeon, 2009. Hierarchical document signature: A specialized application of fuzzy signature for document computing. Proceeding of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp: 1083-1088.
    CrossRef    
  15. Mclean D., Y. Li, and Z.A. Bandar, 2003. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources, IEEE Trans. Knowledge and Data Eng., 15(4): 871-882.
    CrossRef    
  16. Megler, V.M. and D. Maier, 2014. Are datasets like documents? Evaluating similarity-based ranked search over scientific data. IEEE T. Knowl. Data En., 99: 1.
  17. Messina, R. and C. Kermorvant, 2014. Over-generative finite state transducer n-gram for out-of-vocabulary word recognition. Proceeding of the 11th IAPR International Workshop on Document Analysis Systems (DAS, 2014), pp: 212-216.
    CrossRef    
  18. Rada R., H. Mili, E. Bichnell and M. Blettner, 1989. Development and Application of a Metric on Semantic Nets. IEEE Trans. Systems, Man and Cybernetics, 19(1):17-30.
    CrossRef    
  19. Reddy, G.S., T.V. Rajinikanth and A.A. Rao, 2014. A frequent term based text clustering approach using novel similarity measure. Proceeding of the IEEE International Advance Computing Conference (IACC, 2014), pp: 495-499.
    CrossRef    
  20. Resnik P., 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proc. 14th Int'l Joint Conf. Aritificial Intelligence.
  21. Selvi K., R.M. Suresh, 2012. Measure Semantic Similarity between words Using Fuzzy Formal Concept Analysis. In Proc. of Int'l conference On Computer Science and Engineering ICCSE2012-IRNet, pp: 31-34.
  22. Sahami, M. and T. Heilman, 2006. A web-based kernel function for measuring the similarity of short text snippets. Proc. of 15th International World Wide Web Conference.
    CrossRef    PMid:16844896    
  23. Taeho, J., 2013. Application of table based similarity to classification of bio-medical documents. Proceeding of the IEEE International Conference on Granular Computing (GrC), pp: 162-166.
  24. Toselli, A.H. and E. Vidal, 2014. Word-graph based handwriting key-word spotting: Impact of word-graph size on performance. Proceeding of the 11th IAPR International Workshop on Document Analysis Systems (DAS, 2014), pp: 176-180.
  25. Yu, K., J. Yunde and F. Yun, 2014. Interactive phrases: Semantic descriptionsfor human interaction recognition. IEEE T. Pattern Anal., 36(9): 1775-1788.
    CrossRef    PMid:26352231    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved