【Abstract】 With the development of the internet information and the popularization of programming technology,search engine has become a common tool to surf the Internet.Most search engines use crawler technology as the core module to return the results of user queries through keywords.However,the explosive growth of network information makes it difficult to find and locate information.In order to solve the above difficulties,based on Python and Scrapy language framework,this paper takes "tourism website" as the crawling target.By analyzing the operation mechanism,functional units and algorithm program of the existing Web crawler,this paper tentatively creates a more targeted Web crawler and crawls the target data of the subject.In brief the principle of crawler technology is given and some key technologies in the development present situation,introduced the crawler project,and emphatically introduces the have a profound influence in the study of cookies and Robot agreement after the paper expounds the no represented by Mongo DB database on the target information plays the key role of data storage,and in the light of the process of program development and the key emphasis on the implementation details.Meanwhile,the paper also mentioned the key problems involved in the development of modern crawler technology,as well as the practical solutions adopted in this paper.In order to solve the limitation dilemma of the website,it is mainly introduced to solve the above problems by changing the Cookie and user-agent camouflage.The original resource symbol addresses the problems of duplication and multithreading,and analyzes the solutions included by Scrapy.Finally,the results of the crawler were tested and visualized,and the existing problems and possible improvements of the research results were discussed.
基于python的旅游系统_基于Python的旅游网站数据爬虫研究
最新推荐文章于 2024-09-21 20:58:13 发布