程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python crawler learning notes crawler process and requests module

編輯:Python

Python Crawler learning

The first day

HTTP/HTTPS agreement

  1. HTTP agreement

HTTP Protocol is a form of data interaction between server and client .

Common request header information :

  • User-Agent: The identity of the request carrier . such as , When sending a request to the server with Google browser , The identity of the carrier includes Google browser , Current operating system and other information .
  • Connection: After the request , Disconnect or stay connected keep on line perhaps close

Common response header information

  • Content-Type: The data type of the server response back to the client
  1. HTTPS agreement

Safe HTTP( Hypertext transfer ) agreement . Here security involves data encryption .

encryption :

  • Symmetric key encryption
  • Asymmetric key encryption

There is no guarantee that the client gets the secret key sent by the server .

  • Certificate key encryption (HTTPS)

The server submits the public key to the certificate authority first , After passing the audit, the public key is digitally signed , Encapsulate the public key into the certificate , Then send the certificate to the client . After the client gets the certificate, the public key must be provided by the server .

requests modular

There are two modules of network request, including ,urllib( It's old , More trouble ) and requests( Very concise , Very efficient ).

requests characteristic :

  • Very powerful
  • Simple and convenient
  • Very efficient

requests effect :

  • Simulate browser to send request .

requests Module coding process :

Strictly follow the process of the browser sending the request .

  • Appoint URL(uniform resource locator)
  • j be based on requests The module initiates the request
  • Get response data
  • Persistent storage

Environmental installation :pip install requests

Actual code :

  • Crawl Sogou to search the information on the home page
import requests
if __name__ == "__main__":
#step1 Appoint URL
url = "https://www.sogou.com/"
#step2 Initiate request 
#get Method will return a response object 
response = requests.get(url=url)
#step3 Get response data , What is returned is the response data in the form of string 
page_text = response.text
print(page_text)
#step4 Persistent storage 
with open('./sougou.html','w',encoding='utf-8') as fp:
fp.write(page_text)
print(' End of crawling data !!!')

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved