IJRET
  • CrossRef
  • Google Scholar
  • ischolar
  • Index Copernicus
  • IJRET
  • Alternate Text
  • IJRET
  • IJRET
  • IJRET
  • Alternate Text
  • IJRET
  • IJRET
  • IJRET
  • IJRET
  • IJRET
  • IJRET
  • IJRET
Authors will receive one hard copy of full paper, individual print certificates and digital certificates, Submit Manuscript

CALL FOR PAPERS : DEC-2018

Submission Last Date :  30-Dec-2018
Acceptance Notification :  in 15 days
Publication Date :  in 5 days
Submit Manuscript Online

FOR AUTHORS

FOR REVIEWERS

IJRET® PUBLICATIONS

DOWNLOADS

CONTACT US

NEWS & UPDATES

Call for Paper Vol-7 Iss-02 Feb-2018

IJRET invites papers from various engineering disciplines for Volume-07 Issue-02, Feb-2018.

Submit Manuscript

Published Vol-07 Iss-01 Jan-18

IJRET Volume-07 Issue-01, Jan-2018 is published now.

Browse Papers

FOCUSED WEB CRAWLING USING NAMED ENTITY RECOGNITION FOR NARROW DOMAINS

Sameendra Samarawickrama, Lakshman Jayaratne

Abstract: Within recent years the World Wide Web (WWW) has grown enormously to a large extent where generic web crawlers have become unable to keep up with. As a result, focused web crawlers have gained its popularity which is focused only on a particular domain. But these crawlers are based on lexical terms where they ignore the information contained within named entities; named entities can be a very good source of information when crawling on narrow domains. In this paper we discuss a new approach to focus crawling based on named entities for narrow domains. We have conducted experiments in focused web crawling in three narrow domains: baseball, football and American politics. A classifier based on the centroid algorithm is used to guide the crawler which is trained on web pages collected manually from online news articles for each domain. Our results showed that during anytime of the crawl, the collection built with our crawler is better than the traditional focused crawler based on lexical terms, in terms of the harvest ratio. And this was true for all the three domains considered

Keywords: web mining, focused crawling, named entity, classification

DOI: https://doi.org/10.15623/ijret.2013.0203023

Home | Publication Ethics | Privacy Policy | Terms & Conditions | Refund Policy | Feedback | Contact Us
Copyright © 2012-2018 IJRET Journal All rights reserved