程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

python3爬蟲JD圖片

編輯:Python

 

前言

python3爬蟲京東圖片,並保存圖片文件至本地。


一、HTML正則表達式的匹配?

url="https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1)
'data-lazy-img="(.*?)"'

二、代碼

1.引入庫

import urllib.request
import re
import requests 

2.添加報頭

headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
opener =urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)

3.設置商品

keyname = "洋河"#輸入商品名稱
key = urllib.request.quote(keyname)

4.獲取圖片鏈接與保存圖片至本地

for i in range(1,2):
url = "https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1);
data = urllib.request.urlopen(url).read().decode("utf-8","ignore")
print(data)
pat = 'data-lazy-img="(.*?)"'
imagelist = re.compile(pat).findall(data)
for j in range(1,len(imagelist)):
b1 = imagelist[j].replace('/n7', '/n0')
print("第"+str(i)+"頁第"+str(j)+"張爬取成功")
newurl = "http:"+b1
print(newurl)
r = requests.get(newurl,stream=True)
with open('C:/Users/lishu/Desktop/tensorflow/pc/yh/'+"第"+str(i)+"頁第"+str(j)+"張"+".jpg", 'wb') as f:
for html in r.iter_content():
f.write(html)

5.全部代碼

import urllib.request
import re
import requests
keyname = "洋河"#輸入商品名稱
key = urllib.request.quote(keyname)
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
opener =urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)
for i in range(1,2):#爬取頁數
url = "https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1);
data = urllib.request.urlopen(url).read().decode("utf-8","ignore")
pat = 'data-lazy-img="(.*?)"'
imagelist = re.compile(pat).findall(data)
for j in range(1,len(imagelist)):
b1 = imagelist[j].replace('/n7', '/n0')
print("第"+str(i)+"頁第"+str(j)+"張爬取成功")
newurl = "http:"+b1
print(newurl)
r = requests.get(newurl,stream=True)
with open('C:/Users/lishu/Desktop/tensorflow/pc/yh/'+"第"+str(i)+"頁第"+str(j)+"張"+".jpg", 'wb') as f:
for html in r.iter_content():
f.write(html)

 


總結

主要針對urllib.request.urlretrieve()文件路徑不能保存中文目錄的情況,使用requests.get()保存圖片到本地。


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved