程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 數據庫知識 >> Oracle數據庫 >> Oracle教程 >> Oracle正則表達式實戰

Oracle正則表達式實戰

編輯:Oracle教程

Oracle正則表達式實戰


介紹

Oracle 10g introduced support for regular expressions in SQL andPL/SQL with the following functions.

Oracle 10g開始支持在SQL和PLSQL中使用以下正則表達式:

REGEXP_INSTR - Similar to INSTR except it uses a regular expression rather than a literal as the search string. 類似INSTR函數REGEXP_LIKE - Similar to LIKE except it uses a regular expression as the search string. REGEXP_LIKE is really an operator, not a function. 類似LIKE條件 REGEXP_REPLACE - Similar to REPLACE except it uses a regular expression as the search string. 類似REPLACE函數REGEXP_SUBSTR - Returns the string matching the regular expression. Not really similar to SUBSTR. 返回匹配正則表達式的字符串,和SUBSTR有點類似

Oracle 11g introduced two new features related to regularexpressions.

11g開始引入2個新特性:

REGEXP_COUNT - Returns the number of occurrences of the regular expression in the string. 返回符合正則表達式的字符串出現的次數。 Sub-expression support was added to all regular expression functions by adding a parameter to each function to specify the sub-expression in the pattern match.子表達式在所有正則表達式函數都支持,可通過增加一個參數實現。

Learning to write regular expressions takes a little time. If youdon't do it regularly, it can be a voyage of discovery each time. The generalrules for writing regular expressions are available here.You can read the Oracle Regular Expression Support here.

Rather than trying to repeat the formal definitions, I'll presenta number of problems I've been asked to look at over the years, where asolution using a regular expression has been appropriate.

此處不重復正則表達式的定義,代之以一組問題導向的正則表達式應用實例:

Example 1 : REGEXP_SUBSTR

The data in a column is free text, but may include a 4 digit year.

數據在字段中以自由文本存放,但是可能包含4個精度的年份數據。

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('FALL 2014');
INSERT INTO t1 VALUES ('2014 CODE-B');
INSERT INTO t1 VALUES ('CODE-A 2014 CODE-D');
INSERT INTO t1 VALUES ('ADSHLHSALK');
INSERT INTO t1 VALUES ('FALL 2004');
INSERT INTO t1 VALUES ('FALL 2015');
COMMIT;
 
SELECT * FROM t1;
 
DATA
---------------------------------------------------------------------
FALL 2014
2014 CODE-B
CODE-A 2014 CODE-D
ADSHLHSALK
FALL 2004
 
5 rows selected.
 
SQL>

If we needed to return rows containing a specific year we coulduse the LIKE operator (WHERE data LIKE '%2014%'),but how do we return rows using a comparison (<, <=, >, >=,<>)?

One way to approach this is to pull out the 4 figure year andconvert it to a number, so we don't accidentally do an ASCII comparison. That'spretty easy using regular expressions.

如果我們需要返回包含指定年份的數據我們可以使用LIKE操作符(…),但是如何通過不等操作符返回行?一條路是抽出4個數字的年份並轉換為數字。通過正則表達式可以很容易實現。

We can identify digits using the "\d" or"[0-9]" operators. We want a group of four of them, which isrepresented by the "{4}" operator. So our regular expression will be"\d{4}" or "[0-9]{4}". The REGEXP_SUBSTR functionreturns the string matching the regular expression, so that can be used toextract the text of interest. We then just need to convert it to a number andperform our comparison.

我們使用\d或者[0-9]來識別數字。我們需要4個一組,可以使用{4}表示。至此,我們的正則表達式為:\d{4}或者[0-9]{4}。REGEXP_SUBSTR函數返回匹配指定正式表達式的字符串,所以可以用來提取我們感興趣的文本。然後我們只需將其轉換為數字並執行比較即可。

SELECT *
FROM   t1
WHERE  TO_NUMBER(REGEXP_SUBSTR(data, '\d{4}')) >= 2014;
 
DATA
---------------------------------------------------------------------
FALL 2014
2014 CODE-B
CODE-A 2014 CODE-D
FALL 2015
 
4 rows selected.
 
SQL>

Example 2 : REGEXP_SUBSTR

Given a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.

給定一個元字符串,要求按照指定規則(基於字母大小寫和字母到數字的變化)分割為多個列:

ArtADB1234567e9876540 

Becomes this. 分割後:

Art ADB 1234567 e 9876540

The source data is set up like this. 元數據如下:

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('ArtADB1234567e9876540');
COMMIT;

The first part of the string is an initcap word, so it starts witha capital letter between "A" and "Z". We identify a singlecharacter using the "[]" operator, and ranges are represented using"-", like "A-Z", "a-z" or "0-9". So ifwe are looking for a single character that is a capital letter, we need to lookfor "[A-Z]". That needs to be followed by lower case letters, whichwe now know is "[a-z]", but we need 1 or more of them, which issignified by the "+" operator. So to find an initcap word, we need tosearch for "[A-Z][a-z]+". Since we want the first occurrence of this,we can use the following.

字符串第一部分為大寫字母,可能為A-Z。我們使用[]操作符識別單個字符,至於范圍則用“-”,例如“A-Z”,"a-z"或"0-9"。所以如果我們需要找大寫的首字母則用“[A-Z]”。其後緊跟著的是若干小寫字母,可以用+表示若干(1個或多個)。組合起來的正則表達式即為:[A-Z][a-z]+,這樣拆分出的第一列方法有了。

REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1)

The second part of the string is a group of 1 or more uppercaseletters. We know we need to use the "[A-Z]+" pattern, but we need tomake sure we don't get the first capital letter, so we look for the secondoccurrence.

第二部分是一組包含1個或多個大寫字母。我們知道需要用模式:[A-Z]+,但是為了不和第一部分沖突,我們指明匹配其第2次出現的文本。

REGEXP_SUBSTR(data, '[A-Z]+', 1, 2)

The next part is the first occurrence of a group of numbers.

下一部分是一組純數字。

REGEXP_SUBSTR(data, '[0-9]+', 1, 1)

The next part is a group of lower case letters. We don't to pickup those from the initcap word, so we must look for the second occurrence oflower case letters.

下一部分是一組小寫字母,同樣考慮了不和第一部分沖突:

REGEXP_SUBSTR(data, '[a-z]+', 1, 2)

Finally, we have a group of numbers, which is the secondoccurrence of this pattern.

最後,是一組數字:

REGEXP_SUBSTR(data, '[0-9]+', 1, 2)

Putting that all together, we have the following query, whichsplits the data into separate columns.

將以上每一部分正則表達式的輸出分別作為獨立字段:

COLUMN col1 FORMAT A15
COLUMN col2 FORMAT A15
COLUMN col3 FORMAT A15
COLUMN col4 FORMAT A15
COLUMN col5 FORMAT A15
 
SELECT REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1) col1,
       REGEXP_SUBSTR(data, '[A-Z]+', 1, 2) col2,
       REGEXP_SUBSTR(data, '[0-9]+', 1, 1) col3,
       REGEXP_SUBSTR(data, '[a-z]+', 1, 2) col4,
       REGEXP_SUBSTR(data, '[0-9]+', 1, 2) col5
FROM   t1;
 
COL1          COL2          COL3            COL4        COL5
---------   ----------    ----------    -----------   ------------
Art           ADB          1234567         e             9876540
 
1 row selected.
 
SQL>

Example 3 : REGEXP_SUBSTR

We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.

我們需要從一個字符串(含有分隔字符/和雙引號” ”)中提取一組字符,原始數據如下:

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('978/955086/GZ120804/10-FEB-12');
INSERT INTO t1 VALUES ('97/95508/BANANA/10-FEB-12');
INSERT INTO t1 VALUES ('97/95508/"APPLE"/10-FEB-12');
COMMIT;

We are looking for 1 or more characters that are not"/", which we do using "[^/]+". The "^" in thebrackets represents NOT and "+" means 1 or more. We also want toremove optional double quotes, so we add that as a character we don't want,giving us "[^/"]+". So if we want the data from the thirdcolumn, we need the third occurrence of this pattern.

我們要找1個或多個非“/“字符,可以使用”[^/]+“。^在方括號中表示NOT。我們還需要移除可選的雙引號所以需要使用[^/”]+。所以如果我們需要獲取第3次出現的字符串:

SELECT REGEXP_SUBSTR(data, '[^/"]+', 1, 3) AS element3
FROM   t1;
 
ELEMENT3
---------------------------------------------------------------------
GZ120804
BANANA
APPLE
 
3 rows selected.
 
SQL>

Example 4 : REGEXP_REPLACE

We need to take an initcap string and separate the words. The datalooks like this.

我們需要提取首字母大寫的字符串並將其分離。原始數據如下:

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('SocialSecurityNumber');
INSERT INTO t1 VALUES ('HouseNumber');
COMMIT;

We need to find each uppercase character "[A-Z]". Wewant to keep that character we find, so we will make that pattern asub-expression "([A-Z])", allowing us to refer to it later. For eachmatch, we want to replace it with a space, plus the matching character. Thespace is pretty obvious, but we need to use "\1" to signify the textmatching the first sub expression. So we will replace the matching pattern witha space and itself, " \1". We don't want to replace the first letterof the string, so we will start at the second occurrence.

我們需要使用[A-Z]找到每個大寫字符。我們需要保留找到的字符,所以我們使用一個子表達式([A-Z]),以便後續對其引用。對於每一個匹配,我們想使用一個空格替換,加上匹配到的字符。空格是相當明顯的,但我們需要使用”\1”表示第一個子表達式匹配的文本。所以我們替換匹配模式使用一個空格和其自身,即”\1”。我們不想替換字符串的第一個字母,所以我們從第2個字符開始:

SELECT REGEXP_REPLACE(data, '([A-Z])', ' \1', 2) AS hyphen_text
FROM   t1;
  
HYPHEN_TEXT
--------------------------------------------------------------------
Social Security Number
House Number
 
2 rows selected.
 
SQL>

Example 5 : REGEXP_INSTR

We have a specific pattern of digits (9 99:99:99) and we want toknow the location of the pattern in our data.

我們有一個指定數字模式(999:99:99)並且我們想知道模式在我們數據中所處位置。

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('1 01:01:01');
INSERT INTO t1 VALUES ('.2 02:02:02');
INSERT INTO t1 VALUES ('..3 03:03:03');
COMMIT;

We know we are looking for groups of numbers, so we can use"[0-9]" or "\d". We know the amount of digits in eachgroup, which we can indicate using the "{n}" operator, so we simplydescribe the pattern we are looking for.

我們知道我們正在找一組數字,所以使用"[0-9]"或"\d"。我們知道每一組數字的數量,所以可以使用{n}操作符,所以我們簡單描述一下模式:

SELECT REGEXP_INSTR(data, '[0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}') AS string_loc_1,
       REGEXP_INSTR(data, '\d \d{2}:\d{2}:\d{2}') AS string_loc_2
FROM   t1;
 
STRING_LOC_1 STRING_LOC_2
------------ ------------
           1            1
           2            2
           3            3
 
3 rows selected.
 
SQL>

Example 6 : REGEXP_LIKE andREGEXP_SUBSTR

We have strings containing parentheses. We want to return the textwithin the parentheses for those rows that contain parentheses.

我們有包含在括號內的字符串。我們想只想返回括號內的字符串。

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('This is some text (with parentheses) in it.');
INSERT INTO t1 VALUES ('This text has no parentheses.');
INSERT INTO t1 VALUES ('This text has (parentheses too).');
COMMIT;

The basic pattern for text between parentheses is"\(.*\)". The "\" characters are escapes for theparentheses, making them literals. Without the escapes they would be assumed todefine a sub-expression. That pattern alone is fine to identify the rows of interestusing a REGEXP_LIKE operator,but it is not appropriate in a REGEXP_SUBSTR, as itwould return the parentheses also. To omit the parentheses we need to include asub-expression inside the literal parentheses "\((.*)\)". We can then REGEXP_SUBSTR using thefirst sub expression.

匹配括號內文本的模式基本寫法為:“\(.*\)”。\是轉義字符,使跟在其後的字符變為字面值。但是這個模式用在REGEXP_SUBSTR時會連括號一起返回。為了忽略括號我們需要在字面括號內部包含子表達式:"\((.*)\)".

COLUMN with_parentheses FORMAT A20
COLUMN without_parentheses FORMAT A20
 
SELECT data,
       REGEXP_SUBSTR(data, '\(.*\)') AS with_parentheses,

REGEXP_SUBSTR(data, '\((.*)\)', 1, 1, 'i', 1) AS without_parentheses
FROM   t1
WHERE  REGEXP_LIKE(data, '\(.*\)');
 
DATA                                               WITH_PARENTHESES     WITHOUT_PARENTHESES
-------------------------------------------------- -------------------- --------------------
This is some text (with parentheses) in it.        (with parentheses)   with parentheses
This text has (parentheses too).                   (parentheses too)    parentheses too
 
2 rows selected.
 
SQL>

 

注意:REGEXP_SUBSTR(data,'\((.*)\)', 1, 1, 'i', 1) 中最後的i代碼不區分大小寫,最後1個“1”代表返回哪個子表達式匹配的文本。(范圍0-9)

Example 7 : REGEXP_COUNT

We need to know how many times a block of 4 digits appears intext. The data looks like this.

我們需要知道4個數字的塊在字符串中出現的次數。看原始數據:

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('1234');
INSERT INTO t1 VALUES ('1234 1234');
INSERT INTO t1 VALUES ('1234 1234 1234');
COMMIT;

We can identify digits using "\d" or "[0-9]"and the "{4}" operator signifies 4 of them, so using"\d{4}" or "[0-9]{4}" with the REGEXP_COUNT functionseems to be a valid option.

我們可以用表達式:\d 或[0-9]和{4}操作符識別4個數字的塊。

SELECT REGEXP_COUNT(data, '[0-9]{4}') AS pattern_count_1,
       REGEXP_COUNT(data, '\d{4}') AS pattern_count_2
FROM   t1;
 
PATTERN_COUNT_1 PATTERN_COUNT_2
--------------- ---------------
              1               1
              2               2
              3               3
 
3 rows selected.
 
SQL>

Example 8 : REGEXP_LIKE

We need to identify invalid email addresses. The data looks likethis.

我們需要校驗郵箱地址,原始數據如下:

DROP TABLE t1;
CREATE TABLE t1 (
  data VARCHAR2(50)
);
 
INSERT INTO t1 VALUES ('[email protected]');
INSERT INTO t1 VALUES ('me@example');
INSERT INTO t1 VALUES ('@example.com');
INSERT INTO t1 VALUES ('[email protected]');
INSERT INTO t1 VALUES ('me.me@ example.com');
INSERT INTO t1 VALUES ('[email protected]');
COMMIT;

The following test gives us email addresses that approximate toinvalid email address formats.

下列測試給我們近似不合法的郵箱。

SELECT data
FROM   t1
WHERE  NOT REGEXP_LIKE(data, '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}', 'i');
 
DATA
--------------------------------------------------
me@example
@example.com
me.me@ example.com
 
3 rows selected.
 
SQL>

 

-----------------------------

Dylan Presents.

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved