程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Introduction to Python crawler-34-scratch framework, understanding of the function of the scratch architecture module

編輯:Python

Scrapy Is applicable to Python A quick 、 High level screen grabs and web Grabbing framework , Used to grab web Site and extract structured data from the page .Scrapy A wide range of uses , Can be used for data mining 、 Monitoring and automated testing .

1、scrapy initial

We know , It's hard to write a reptile , For example, initiate a request 、 Data analysis 、 Anti reptile mechanism 、 Asynchronous request, etc . If we do it manually every time , It's very troublesome. .scrapy This framework has already encapsulated some basic contents , We can use it directly , Very convenient .

2、scrapy framework

We use the following two figures , Let's get to know each other ;

You can see the above figure ,scrapy It is also composed of many components , Let's take a look at the role of each component ;

3、scrapy form

  1. Scrapy Engine( engine )Scrapy The core part of the framework . Responsible for Spider and ItemPipelineDownloaderScheduler Intermediate communication 、 Transfer data, etc .

  2. Spider( Reptiles ): Send the link to be crawled to the engine , Finally, the engine sends the data requested by other modules to the crawler , The crawler parses the data it wants . This part is written by our developers , Because of which links to crawl , Which data in the page we need , It's all up to the programmer .

  3. Scheduler( Scheduler ): Responsible for receiving requests from the engine , And arrange and arrange in a certain way , Responsible for scheduling the order of requests, etc .

  4. Downloader( Downloader ): Responsible for receiving the download request from the engine , Then go to the network to download the corresponding data and return it to the engine .

  5. Item Pipeline( The Conduit ): Responsible for Spider( Reptiles ) Save the data passed on . Exactly where to keep it , It depends on the developer's own needs .

  6. Downloader Middlewares( Download Middleware ): Middleware that can extend the communication between downloader and engine .

  7. Spider Middlewares(Spider middleware ): Middleware that can extend the communication between engine and crawler .

In this paper, the end , Relevant contents are updated daily .

For more information, go to VX official account “ Operation and maintenance home ” , Get the latest article .

------ “ Operation and maintenance home ” ------
------ “ Operation and maintenance home ” ------
------ “ Operation and maintenance home ” ------

linux Under the system ,mknodlinux,linux Directory write permission , Chinese cabbage can be installed linux Do you ,linux How the system creates files , Led the g linux How to install software in the system ,linux Text positioning ;
ocr distinguish linux,linux Anchoring suffix ,linux System usage records ,u Dish has linux Image file , Fresh students will not Linux,linux kernel 64 position ,linux Self starting management service ;
linux Calculate folder size ,linux What are the equipment names ,linux Can I use a virtual machine ,linux The system cannot enter the command line , How to create kalilinux,linux Follow so Are the documents the same .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved