程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

When reading a file, can you skip the characters that cannot be encoded and continue reading? (Language Python)

編輯:Python

Recently, I am studying reptile novels , There is a garbled code in a web page . Its web page is gb2312 code , I use gb2312、gbk、utf-8 I tried it once and couldn't recognize . Because I am crawling the text page by page , An error report means a chapter is missing , It's hard .
I want to ask you , Is there any way to directly ignore the characters that cannot be encoded , Write the extracted content directly ?
The download code is as follows

# download async def download(url, name): async with semaphore: async with aiohttp.ClientSession() as session: async with session.get(url) as reques: reques.encoding = 'gbk' page = bs4.BeautifulSoup(await reques.text(), 'html.parser') div = page.find('div', class_="read_chapterDetail") p = div.find_all('p') # Open file , Open mode , The data is binary  with open(f'{name}.txt', mode='wb') as f: for i in p: text = i.text + '\n' f.write(text.encode('utf-8')) print(f'{name} Download complete !')

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved