Abstract
				As deep web grows at a very fast pace  there has been increased interest in techniques that help ef?ciently locate deep-web interfaces. However  due to the large volume of web resources and the dynamic nature of deep web  achieving wide coverage and high ef?ciency is a challenging issue. We propose a two-stage framework  namely SmartCrawler  for ef?cient harvesting deep web interfaces. In the ?rst stage  SmartCrawler performs site-based searching for center pages with the help of search engines  avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl  SmartCrawler ranks websites to prioritize highly relevant ones for a given topic. In the second stage  SmartCrawler achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in hidden web directories  we design a link tree data structure to achieve wider coverage for a website. Our experimental results on a set of representative domains show the agility and accuracy of our proposed crawler framework  which ef?ciently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers
				
				Modules
				
				
				Algorithms
				page ranking
				
				Modification
				web mining