程序師世界是廣大編程愛好者互助、分享、學習的平台，程序師世界有你更精彩！


設為首頁	加入收藏

首頁
編程語言: C語言|JAVA編程
 Python編程
網頁編程: ASP編程|PHP編程
 JSP編程
數據庫知識: MYSQL數據庫|SqlServer數據庫
 Oracle數據庫|DB2數據庫

您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

python爬取數據-初級

編輯：Python

爬蟲的應用

一，准備環境

1，准備pycharm開發工具
2，安裝對應的依賴 Scrapy

二，使用scrapy startproject 創建項目

項目創建好之後如下圖

三，在項目的spiders目錄下創建爬蟲
1，先切換目錄

2，創建爬蟲

爬蟲創建成功之後效果如下

三，配置文件
1，配置settings文件
1）把 ROBOTSTXT_OBEY=True改成ROBOTSTXT_OBEY=False
2）去掉管道配置得注釋

3）修改默認請求頭

2，在items.py文件中添加需要爬取的內容

3，編寫爬蟲bookTest.py代碼

import scrapy
from ..items import BookItem
class BooktestSpider(scrapy.Spider):
name = 'bookTest'
allowed_domains = ['book.douban.com']
start_urls = []
base_url = []
# 爬取前10頁
i = 0
j = 10
while i < j:
base_url += ['https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4?start='+str(i*20)+'&type=T']
i += 1
start_urls = base_url
def parse(self, response):
lies = response.xpath('//ul[@class="subject-list"]/li')
for li in lies:
bookname = li.xpath(".//div[@class='info']//a/@title").extract_first()
author = li.xpath(".//div[@class='pub']/text()").extract_first()
jj = li.xpath(".//p/text()").extract_first()
item = BookItem()
item['bookname'] = bookname
item['author'] = author
item['jj'] = jj
yield item

4，編寫管道代碼保存數據

四，最後執行爬蟲

上一篇文章： Python3-excel文檔操作（二）：利用openpyxl庫處理excel表格：在excel表格中插入圖片
下一篇文章： Django-admin注冊model後一直404，路徑都配置的正確

Python

The house price in Guangzhou is a distant dream for me. Today I will use Python to make a house price prediction gadget.

hello , Hello everyone . Today

python編程，67行應怎麼改

爬蟲加數據可視化的程序，搜了一下有說用if else的，但應

Python infinite window open v1.0 custom a browser window

Give good brothers love to stu

Python do make the Tanabata instance project - let your lovers heart

文章目錄PythonDo make the Tanabata

Introduction and startup of Django framework

List of articles Preface Djan

Learn while playing, 4 Python Programming Game Websites

Learning programming is fun fo

相關文章

没有相关文章

閱讀排行榜

How to generate 100million mobile phone numbers? Python has 22 methods to generate random numbers, and the random function is too strong~ 說說 Python 的元編程 Python3 opencv frame difference method for moving object tracking Python description leetcode 82 Delete duplicate Element II in the sorting linked list Python進階系列（十四） Opencv Python debug record Python實現提取圖片中顏色並繪制成可視化圖表 Ubuntu上安裝python連接oracle數據庫的包 python案例十講 Python復習筆記2——面向對象編程 What can I do after learning Python?

熱門圖文

技術文檔2：Django中URL路由解析 php遞歸獲取目錄內文件封裝類分享 Visual Basic中的界面設計原則和編程技巧 PAT03 HDU 2473 Junk-Mail Filter 刪點並查集＃i nclude <INTRINS.h> 編程-這兩種的if結構的區別是什麼? CF-192-diy-2

欄目導航

編程綜合問答

更多關於編程

編程問題解答

Copyright © 程式師世界 All Rights Reserved