您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Experience of Python defense

編輯：Python

This semester our python Course design , We have to make a procedure and then reply , Today, I made an online defense , My topic is reptile , use xpath Method to crawl Baidu pictures . I will give a general description of my defense process and the questions asked by the teacher

problem 1

First of all, my teacher asked only one question about reptiles headers Usage of . I said it was a simple reverse crawl , Otherwise, the browser will prohibit the crawler from accessing , The answer to this question is that the server will forbid the crawler to access it instead of the browser , Then he asked this use_agent Where can I find , Maybe I talked a lot about the front end , I even wrote a small front-end page to explain html Some knowledge of

problem 2

The teacher still asked some basic knowledge , For example, how to write files , I haven't cared about this for a long time, and then I didn't answer very well

Because I'm using with open To write to the file , The teacher asked a question with Usage of , What does it do , This is really a blind spot of knowledge with The function of is to automatically call close（） Method

Then the teacher asked the following ‘wb’ What does that mean? ： Write to file in binary , also w And so on, you can also review

I also call here os modular , In the previous several students also os Several operations of the module

Finally, because of online defense , There are only five questions , There are three questions about code , It's ok , Finally, the source code is attached

# The import module
import requests
import os
from lxml import etree
# utilize os Module creation file
if not os.path.exists("G:\python curriculum design \debug"):
os.mkdir("G:\python curriculum design \debug")
url = 'https://pic.netbian.com/4kdongwu/' # Get page URL
# Simple reverse climbing , Use headers Camouflage reptiles
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.62'
}
response = requests.get(url=url,headers=headers)# Use request.get Method to get html page
page_text = response.text # obtain html Text content in the page
tree = etree.HTML(page_text) # call HTML Class to HTML Text initialization , Successfully constructed XPath Parse object
li_list = tree.xpath('//div[@class="slist"]/ul/li') # utilize html Knowledge orientation
for li in li_list :
img_src = 'http://pic.netbian.com' + li.xpath('./a/img/@src')[0] # stay li Navigate to the picture again under the label
img_name = li.xpath('./a/img/@alt')[0] + 'jpg' # Get the picture name
img_name = img_name.encode('iso-8859-1').decode('gbk') # Solve the mess
img_data = requests.get(url=img_src,headers=headers).content # Crawling pictures
img_path ='G:\\python curriculum design \\debug\\'+img_name # Get image path
with open(img_path,'wb') as fp: # Write the crawled file to the folder
fp.write(img_data)
print(img_name,'over'

In the last few days, there is linear algebra , Advanced mathematics ,c Language , College Physics . Today, I just finished my oral defense and meditation exam , Very busy very busy , Forgive the slow update , Summer vacation rehepad , Thank you for your support .