您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python crawler (I) getting to know the requests Library


PYTHON Reptiles ( One )

  • python Reptiles
    • Yes pyhton Reptile understanding
    • Front end crawling requests library
      • 1. obtain response object
      • 2. Network status code
      • 3. Encoding mode
      • 4. Binary stream output and encoded output
      • 5. Data storage

python Reptiles

Yes pyhton Reptile understanding

Web crawler : Crawl the data from the front of the web page and extract what you need to save it

Front end crawling requests library

1. obtain response object

import requests # Import requests
res = requests.get(url) # obtain url Front end data of ,url Must not omit the entire web site http:// perhaps https://

2. Network status code

res.status_code #200 It means success 

Is used to indicate the response status of hypertext transfer protocol of web server 3 Digit code .

Status code meaning 1XX series Specify some actions that the client should take , The representative's request has been accepted , Need to continue processing . because HTTP/1.0 Nothing is defined in the agreement 1xx Status code , So unless under certain experimental conditions , The server forbids sending 1xx Respond to .2XX series The delegate request has been successfully received by the server 、 understand 、 And accept . The most common in this series are 200、201 Status code .3XX series Represents that the client needs to take further action to complete the request , These status codes are used to redirect , Subsequent request address ( Redirect to ) In this response Location The domain indicates . The most common in this series are 301、302 Status code .4XX series Indicates a request error . Represents that the client may have an error , Hinders server processing . There are common :401、404 Status code .5XX series Represents that the server has an error or abnormal state in the process of processing the request , It is also possible that the server realizes that it cannot complete the processing of the request with the current hardware and software resources . There are common 500、503 Status code .

3. Encoding mode

res.encoding # Encoding mode 
res.apparent_encoding # The matching encoding method , Generally very accurate 

The general Chinese code of the coding method is UTF-8 UTF-16 GBK GB2312 GB18030( Case insensitive )
What is commonly used is UTF-8 GB2312
python and linux The default is UTF-8 windows Default GB2312

4. Binary stream output and encoded output

res.content # Binary stream 
res.text # Code output 

The data format of encoded output is str Generally used to save some text , And binary stream output is used to save video 、 Pictures, etc

5. Data storage

f = open('myFirst.txt', 'w') # Build file object FileName file name ,Mode Is the mode 
f.write(res.text) #DocumentContent The contents of the document 
f.close() # Close file 

That's the first step for a reptile , Just crawl the front unrerendered pages , Later, we need to extract what we want here , Look at my next chapter on reptiles

Mode describe r Open the file read-only . The pointer to the file will be placed at the beginning of the file . This is the default mode .rb Open a file in binary format for read-only use . The file pointer will be placed at the beginning of the file . This is the default mode .r+ Open a file for reading and writing . The file pointer will be placed at the beginning of the file .rb+ Open a file in binary format for reading and writing . The file pointer will be placed at the beginning of the file .w Open a file only for writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .wb Opening a file in binary format is only used for writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .w+ Open a file for reading and writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .wb+ Open a file in binary format for reading and writing . Open the file if it already exists , And edit from the beginning , The original content will be deleted . If the file does not exist , Create a new file .a Open a file for appending . If the file already exists , The file pointer will be placed at the end of the file . in other words , The new content will be written after the existing content . If the file does not exist , Create a new file to write to .ab Open a file in binary format for appending . If the file already exists , The file pointer will be placed at the end of the file . in other words , The new content will be written after the existing content . If the file does not exist , Create a new file to write to a+ Open a file for reading and writing . If the file already exists , The file pointer will be placed at the end of the file . Append mode when the file opens . If the file does not exist , Create a new file for reading and writing .ab+ Open a file in binary format for appending . If the file already exists , The file pointer will be placed at the end of the file . If the file does not exist , Create a new file for reading and writing .
  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved