Use bs4 Extract local html When you file , An encoding error occurred . as follows
#-*- coding = utf-8 -*-
#@Time : 2022/2/20 17:46
#@File : bs4 Data analysis .py
#@software : PyCharm
#bs4 Data analysis
# Principle of data analysis 1. Label positioning ,2. Extract tags , Data values stored in label properties
#bs4 1. Label positioning 1. Instantiate a BeautifulSoup object , And load the page source code into the object
#2. By calling BeautifulSoup Object for tag location and data extraction
# Environmental installation :install bs4 pip install lxml
from bs4 import BeautifulSoup
# Object instantiation
#1. Local HTML You can only get the text content directly below the tag
# Will local html Load with this object
fp =open('./sogou.html','r',encoding='utf-8')
soup = BeautifulSoup(fp,'lxml')
fp.close()
print(soup)
#2. Load the source code of the page obtained from the Internet into the object An error has occurred UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 7819: illegal multibyte
terms of settlement :
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') # Change the default encoding of standard output
fp =open('./sogou.html','r',encoding='utf-8')
soup = BeautifulSoup(fp,'lxml')
fp.close()
print(soup.decode('utf-8'))