CALL FOR PAPERS : DEC-2018

Submission Last Date	:	30-Dec-2018
Acceptance Notification	:	in 15 days
Publication Date	:	in 5 days

Submit Manuscript Online

FOR AUTHORS

FOR REVIEWERS

IJRET^® PUBLICATIONS

DOWNLOADS

CONTACT US

NEWS & UPDATES

Call for Paper Vol-7 Iss-02 Feb-2018

IJRET invites papers from various engineering disciplines for Volume-07 Issue-02, Feb-2018.

Submit Manuscript

Published Vol-07 Iss-01 Jan-18

IJRET Volume-07 Issue-01, Jan-2018 is published now.

Browse Papers

VITALIZED BI-LEVEL WEB CRAWLER FOR REMOVAL OF REDUNDANT CONTENT IN DEEP WEB INTERFACE

Supriya.H.S

Abstract: Search engine are used to search for appropriate data against trillion web pages, which are stored in several different servers. Normal search engine can search information on Shallow Web. Deep web is huge storage area of hidden information which is not indexed by automated search engines. Challenging job is to locate a deep web. Deep Web can efficiently harvest and explore accurate result for user query very quickly. This paper proposes a vitalized bi- level web crawler to analyze deep web interface and also remove redundant content in its database. In the first level, to stay away from tripping a huge number of pages Web Crawler search for core pages in search engines based on sites. For this web crawler will prioritize highly appropriate ones through ranking the sites for a given query. In the second level, Crawler achieves rapid in-site searching by adaptive link-ranking through excavating most appropriate links. Web is comprehended with several copies of equivalent content or equivalent web pages. Thus the incident of duplicate and near-duplicate content happening on the web will be very frequent. Thus removal of redundant content in deep web will be achieved based on parsing the content of one web page and comparing the pared content with other web page content which can save the storage area and bandwidth for a web crawler to crawl a web page

Keywords: Search Engine, Deep Web, Web Crawler, Site Ranking, Duplicate Content

DOI: https://doi.org/10.15623/ijret.2016.0516019