程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Highly recommended! Python this treasure house re regular matching

編輯:Python

Python Of re modular (Regular Expression Regular expressions ) Provides a variety of regular expression matching operations .

In text analysis 、 Complex string analysis and information extraction is a very useful tool , The following is a summary re Common methods of modules .

One . Predefined characters

\d Match all decimal numbers 0-9
\D Match all non numbers , Include underscores
\s Match all white space characters ( Space 、TAB etc. )
\S Match all non white space characters , Include underscores
\w Match all letters 、 Chinese characters 、 Numbers a-z A-Z 0-9
\W Match all non letters 、 Chinese characters 、 Numbers , Include underscores

Two . Special characters

$: Match the end of a line ( Must be placed at the end of the regular expression )
^: Match the beginning of a line ( Must be placed at the top of the regular expression )
*: The preceding characters can appear 0 Times or times (0~ Infinite )( Greedy matching )
+: The preceding characters can appear 1 Times or times (1~ Infinite )( Greedy matching )
?: change " Greedy mode " by " Reluctantly mode ", The preceding characters can appear 0 Time or 1 Time ( Non greedy matching )
remarks : Symbol .* greedy , Symbol .*? Not greed
.: Match except for line breaks "\n" Any single character other than
|: Both items are matched
[ ]: Represents a collection , There are three situations
[abc]: Can match a single character
[a-z0-9]: Can match a specified range of characters , Desirable reverse ( Join at the front ^)
[2-9] [1-3]: Can do combination matching
{ }: Used to mark the frequency of the preceding character , There are the following situations :
{n,m}: Represents that the preceding characters appear at least n Time , Most appear m Time
{n,}: Represents that the preceding characters appear at least n Time , Unlimited at most
{,m}: Represents that the preceding characters appear at most n Time , At least unlimited
{n}: The preceding character must appear n Time

3、 ... and . Backslash description

If there is a backslash in the string , You need to escape the backslash

Four . grouping

(): Grouping characters , You can group the matched content , Get the data in the group quickly In regular "()" It means grouping , A bracket represents a grouping , You can only match "()" The content in .
group: Used to view the content matched by the specified group
groups: Returns a tuple , All matched contents in the Group
groupdict: Return a dictionary , Contains grouped key value pairs , You need to name the Group

5、 ... and . Common methods

match: Match at the beginning of the target text
search: Match in the entire target text
findall: Scan the entire target text , Returns a list of all substrings that match the rule , If there is no match, return an empty list
split
re.split(pattern, string[, maxsplit=0, flags=0])
split(string[, maxsplit=0])
effect : You can cut the part of the string matching the regular expression and return a list

6、 ... and . Regular expression function inside flags Parameter description

flags The definition includes :

re.I: Ignore case
re.L: Represents a special character set \w, \W, \b, \B, \s, \S Depends on the current environment
re.M: Multi line mode
re.S:’.’ And any character including line breaks ( Be careful :’.’ Does not include line breaks )
re.U: Represents a special character set \w, \W, \b, \B, \d, \D, \s, \S Depend on Unicode Character property database
stay Python Before using regular expressions in , First use the following command to import re modular
import re
Example 1: Specific instructions
 for example :
‘(\d)(a)\1’ Express : The first match is numbers , The second is characters a, Third \1 Must match the first same number and repeat , That is, it is quoted once .
Such as “9a9” Matched , but “9a8” Will not be matched , Because the third \1 Must be 9 Can only be .
‘(\d)(a)\2’ Express : The first match is a number , The second is a, Third \2 It must be the second group () Match the same .
Such as “8aa” Matched , but “8ab”,“7a7” Will not be matched , The third digit must be a copy of the second group of characters , It refers to the second set of regular matching content .
print(re.match(r'(\w{3}).',"abceeeabc456abc789").group())
print(re.match(r'(\w{3}).*',"abceeeabc456abc789").group())#* Greedy matching 
print(re.match(r'(\w{3}).*?',"abceeeabc456abc789").group())#? Non greedy matching 
print(re.search(r'(\d{3})',"abceeeabc456abc789").group())
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").groups())
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(1))
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(2))
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(3))
print(re.search(r'(\w{3})(\d+)(\2)',"abceeeabcs456456abc456789abc").groups())
print(re.search(r'(\w{3})(\d+)(\2)',"abceeeabcs456456abc456789abc").group(1))
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789abc").group(1))
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789abc").group(2))
print(re.search(r'(\w{3})(.*?)(\2)',"abceeeabc456abc789").group())
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789").group(1,2))
print(re.findall(r'\d+','one11two22three33four44'))
print(re.split(r'\W+','192.168.1.1')) #\W Match all non letters 、 Chinese characters 、 Numbers , Include underscores , Then the processing is completed and a list is returned 
print(re.split(r'(\W+)','192.168.1.1')) # After adding parentheses, we do grouping ,. The number is also divided 
print(re.split(r'(\W+)','192.168.1.1',1)) # Added a 1 After this parameter , Indicates that the maximum segmentation depth is 1
str1 = '''goodjobisgood: testisgood welldone '''
res1 = re.findall(r'good(.*?)done',str1)
 If not used re.S Parameters , Match only within each line , If a line doesn't have , Just change the line and start over , Not across lines .
While using re.S After the parameters , Regular expressions take this string as a whole , take “\n” Add to this string as a normal character , Match in the whole .
res2 = re.findall(r'good(.*?)done',str1,re.S)
print(res1)
print(res2)
Example 2: Web page information matching
str1 = '<p>this is a herf<a href="www.baidu.com">goodjob</a></p>'
find = re.search('<a href="(.+)">(\w+)</a>', str1)
find = re.search('<a href="(?P<url>.+)">(?P<name>\w+)</a>', str1)
print(find.groups())
print(find.group(1))
print(find.group(2))
print(find.groupdict())
Example 3: Date match
date1=input(" Please enter the date :")
result1=re.match(r'^(\d{4}-\d{1,2}-\d{1,2})$',date1)
print(result1.group())
Example 4: Regular mailbox matching
re_email = r'^[a-zA-Z0-9_]{0,20}@(163|162|Gmail|yahoo)\.com'
email_address = input(' Please enter email address ')
res = re.search(re_email, email_address)
print(res)
print(email_address)
print(type(res))
print(res.group())
Example 5: Cell phone number matches
phone=input(" Please enter your mobile number :")
result2=re.match(r'1[35678]\d{9}',phone)
print(result2.group())

Welcome to your attention : The way of immeasurable testing official account , reply : Claim resources

Python+Unittest frame API automation 、

Python+Unittest frame API automation 、

Python+Pytest frame API automation 、

Python+Pandas+Pyecharts Big data analysis 、

Python+Selenium frame Web Of UI automation 、

Python+Appium frame APP Of UI automation 、

Python Programming learning resources dry goods 、

Vue Front end component framework development 、

Resources and code Free ~

contain : Data analysis 、 big data 、 machine learning 、 Test Development 、API Interface automation 、 Test operation and maintenance 、UI automation 、 Performance testing 、 code detection 、 Programming technology, etc .

WeChat search official account : The way of immeasurable testing , Add the attention , Let's grow together !


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved