程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python requests package obtains URL resources to realize data fetching (1)

編輯:Python

requests Bag is python The most used third party URL Get package of resources , Can be easily implemented get/post visit 、 Interface testing, etc .

requests install

requests Installation will not be repeated here , direct pip Just install it .

pip install requests

requests Use

Introduce before use requests package import requests, call get() Method execution get request , The specific code is as follows :

import requests
# Get the Douban movie homepage label 
url = 'https://movie.douban.com/j/search_tags?type=movie&source=index'
r = requests.get(url)
r.encoding = 'utf-8'
data = r.json()
print(data)

When executing this code, you will find Report errors , The information is as follows . This is because access to Douban requires adding browser information to the request header User-Agent, Delegates are accessed through a browser .

 ...
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

How to get a browser User-Agent Information ?

Open the browser , Press F12 Or click settings , Open developer tools . And then choose Network, Find a connection to check Headers Information , reproduce User-Agent Of value value .

The revised code is as follows :

import requests
# Add browser information to the request header 
headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.71'
}
# Get the Douban movie homepage label 
url = 'https://movie.douban.com/j/search_tags?type=movie&source=index'
r = requests.get(url, headers=headers)
r.encoding = 'utf-8'
data = r.json()
print(data)

After execution, the output information is as follows :

{
'tags': [' hot ', ' newest ', ' Douban high score ', ' Popular film ', ' Chinese ', ' Europe and the United States ', ' South Korea ', ' Japan ']}

Come here requests The basic use of the package is over , We can see the above in the browser developer tool url Of Headers Information , Usually in the acquisition of url Resources will first analyze the corresponding request header to write code .

  1. General( essential information ),Request Method: GET That is the get request , So call requests.get() Method .

  2. Responsese Headers( Response header information ),Content-Type: application/json; charset=utf-8 The content returned by the representative is json Format , So we use r.encoding = 'utf-8' code ,data = r.json() obtain json Information .

  3. Request Headers( Request header information ), The main thing here is Cookie/User-Agent/token etc. .Cookie Generally store browser authentication information , Such as user identification , Generally the same cookie Represents the same user accessing , But some use authentication information token Transitive .

Summary

This section describes requests Package call get request The basic method of use , At the same time, the method of obtaining browser header information and the meaning of corresponding basic parameters are introduced , I hope it helps you .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved