程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python crawler programming idea (153): grab data and multiple URLs using scratch

編輯:Python

         In the previous cases, only one was captured Url Corresponding page , But in practice , You usually need to grab multiple Url, In reptiles start_urls Add multiple... To the variable Url, The crawler will crawl when running start_urls All in the variable Url. The following code is in start_urls Added... To the variable 2 individual Url, function MultiUrlSpider After the reptile , Will grab these two Url Corresponding page .

class MultiUrlSpider(scrapy.Spider):
name = 'MultiUrlSpider'
start_urls = [
'https://www.jd.com',
'https://www.taobao.com'
]
... ...

         The following example uses a text file (urls.txt) Provide multiple Url, And read in the crawler class urls.txt Contents of the file , Then read multiple Url Deposit in start_urls variable . Finally, I will grab urls.txt All of the Url Corresponding page , And output the number of blog posts on the page ( This example provides Url yes geekori.com Blog list page , If readers use other Url, You need to modify the logic code of the analysis page ).

import scrapy
class MultiUrlSpider(scrapy.Spider):
name = 'Mult

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved