CALL FOR PAPERS :
DEC-2018
| Submission Last Date |
:
|
30-Dec-2018
|
| Acceptance Notification
|
:
|
in 15 days
|
| Publication Date
|
:
|
in 5 days
|
FOR AUTHORS
FOR REVIEWERS
IJRET® PUBLICATIONS
DOWNLOADS
CONTACT US
NEWS & UPDATES
|
RULE BASED PSEUDO N-GRAM MODEL FOR TELUGU SCRIPT
N. Swapna, B. Padmaja Rani
Abstract: With the increasingly widespread use of computers and the internet in India, large amount of information in Indian languages are becoming available on the web. Automatic information processing and retrieval is therefore becoming an urgent need in the Indian context. This paper presents a new Rule based Pseudo N-gram for Telugu language. Rule based Pseudo N-gram is an approach, which provides a system that gives set of rules to extracting root words by removing inflections which were unrecognized by Pseudo N-gram. Pseudo N-gram can act as a preliminary stage for Rule based Pseudo N-gram. Pseudo N-gram is process of stripping the word from the end. We composed five rules to describe a Rule based Pseudo N-gram. The rules are written based on the morphology, grammar rules and word derivation structure of Telugu language. Telugu is one of the old and traditional languages of India and it is categorized as one of the Dravidian language family unit with its own high-class script. Telugu is an authorized language of the states of Telangana and Andhra Pradesh. Telugu is a rich morphological large that has high word conflation. Keeping in view of these complexities, we propose a Rule based Pseudo N-gram that provides a reasonable alternative to word based models and is also used for text categorization. We have conducted the experiments on randomly selected Telugu documents and we found the accuracy of Rule based Pseudo N-gram is up to 97.8%.
Keywords: Rule Based Pseudo N-Gram, Pseudo N-Gram, Text Categorization, Morphology, Grammar Rule.
DOI: https://doi.org/10.15623/ijret.2017.0601002
|
|