CALL FOR PAPERS :
DEC-2018
| Submission Last Date |
:
|
30-Dec-2018
|
| Acceptance Notification
|
:
|
in 15 days
|
| Publication Date
|
:
|
in 5 days
|
FOR AUTHORS
FOR REVIEWERS
IJRET® PUBLICATIONS
DOWNLOADS
CONTACT US
NEWS & UPDATES
|
LANGUAGE IDENTIFICATION USING G-LDA
Shubham Saini, Bhavesh Kasliwal, Shraey Bhatia
Abstract: Language Identification has an important role in Natural Language processing applications as one of the pre-processing steps. There are various mechanisms in use today to achieve this task with brilliant recognition rates. Recent years have seen rapid growth in international communication which has lead to the requirement of systems capable of correctly identifying languages of documents. Possible applications of language identification include information retrieval, web crawlers, text mining and email filtering. The paper uses a process called G-LDA [1], which takes concepts from Latent Dirichlet Allocation (LDA) and Genetic Evolution techniques. This involves framing a set of words having a high frequency of occurrence in any given document. The method was tested on Leipzig Corpora. The phrases that were evolved through the generations reflected significant improvement.
Keywords: Language Identification, Latent Dirichlet Allocation, Gibbs Sampling, Genetic Algorithm, Topic Modeling, Breeding, Fitness, Roulette Wheel
DOI: https://doi.org/10.15623/ijret.2013.0211008
|
|