CALL FOR PAPERS :
DEC-2018
| Submission Last Date |
:
|
30-Dec-2018
|
| Acceptance Notification
|
:
|
in 15 days
|
| Publication Date
|
:
|
in 5 days
|
FOR AUTHORS
FOR REVIEWERS
IJRET® PUBLICATIONS
DOWNLOADS
CONTACT US
NEWS & UPDATES
|
IMPROVEMENT OF TELUGU OCR BY SEGMENTATION OF TOUCHING CHARACTERS
J. Bharathi, P. Chandrasekhar Reddy
Abstract: The reported success rates for Telugu OCRs are 84-87% for fonts sizes from 12-20 and 95.4-98.5% for sizes from 15 to 35. Some of the issues mentioned in the literature are noise and confusion characters. Studies by the authors have indicated that the touching characters constitute about 1% - 2% of the total characters in printed books of normal size fonts (14 pts). The editable output of OCR System has additional errors due to incorrect code selection emphasizing the need to identify the touching characters. Identification of touching characters is a challenge as the touching may occur at different places due to orthography and rules of grammar. A complete strategy of identification, segmentation and recognition system is proposed along with syllable models for segmentation. Effect of normalization methods at preprocessing stage for improving the identification of touching characters and recognition rates of normal characters is studied. A new algorithm is proposed for segmenting the touching conjunct consonants. The use of augmented database shows clear improvement in the recognition rates. The touching characters are identified and segmented successfully with 83% success rate, thus improving the overall performance of OCR System for Telugu.
Keywords: Telugu OCR, Touching characters, Syllable model Non Linear Normalization, Hausdorff distance, Augmented Database
DOI: https://doi.org/10.15623/ijret.2014.0310054
|
|