程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

How does python regular match Chinese characters

編輯:Python

Regular expressions match Chinese characters, which are very common in practical applications.
For example: crawler web page text extraction, validation of user input standards, etc.
Take the following text string as an example, match all Chinese characters in the string asstr.

import reastr = '''aaaaaWhen when, see Nanxue snow, I am with plum blossomTwo white heads'''

Two methods are described below (the environment of this article is python3)
1. Use Unicode encoding to match Chinese
Common Chinese Unicode encoding range: \u4e00-\u9fa5
Implement matching code: re.findall('[\u4e00-\u9fa5]', astr)

import reastr = '''aaaaaWhen when, see Nanxue snow, I am with plum blossomTwo white heads'''res = re.findall('[\u4e00-\u9fa5]', astr)print(res)

Matches:

Second,Directly use Chinese characters to achieve Chinese matching
If you haven’t used it, you may not know it. Chinese matching can also be done in this way
The matching code is: re.findall('[一-羥]', asstr)

import reastr = '''aaaaaWhen when, see Nanxue snow, I am with plum blossomTwo white heads'''res = re.findall('[一-羥]', astr)print(res)

Match result:

Note: Actually here"The Unicode code corresponding to "1" is "\u4e00", and the Unicode code corresponding to "徥" (yù) is "\u9fa5".

Common non-English characters Unicode encoding range:
u4e00-u9fa5 (Chinese)
u0800-u4e00 (Japanese)
uac00-ud7ff (Korean)


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved