Oracle REGEXP_INSTR function

Last update on August 07 2024 11:50:12 (UTC/GMT +8 hours)

Description

The Oracle REGEXP_INSTR function is used to return the location of a regular expression pattern in a string. This function allows you to find a substring in a string using regular expression pattern matching. If no match is found, then the function returns 0.

Uses of Oracle REGEXP_INSTR Function

Locate Substring: Identify the position of a substring within a string using regular expressions.

Pattern Matching: Use various regular expression patterns to find specific character sequences.

Custom Search Position: Specify a starting position within the string for the search.

Multiple Occurrences: Search for the nth occurrence of a pattern within the string.

Return Options: Choose whether to return the position of the first character of the match or the character following the match.

Subexpression Targeting: Retrieve the position of specific subexpressions within the matched pattern.

Case Sensitivity: Perform case-sensitive or case-insensitive searches based on match parameters.

Multiline Support: Handle multi-line strings with appropriate match parameters.

Whitespace Handling: Include or ignore whitespace characters in pattern matching.

Syntax:

REGEXP_INSTR (source_char, pattern
              [, position
                 [, occurrence
                    [, return_opt
                       [, match_param
                          [, subexpr]
                       ]
                    ]
                 ]
              ]

Parameter:

Name	Description	Data Types
source_char	The string to search.	CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
pattern	pattern is the regular expression.	CHAR, VARCHAR2, NCHAR, or NVARCHAR2.
position	Optional. The position is a positive integer indicating the character of source_char where the search will start. If omitted, it defaults to 1 which is the first position in the string.
occurrence	Optional. It is a positive integer indicating which occurrence of a pattern in source_char search for. The default is 1, meaning that this function searches for the first occurrence of a pattern.
return_opt	If specify 0, then it returns the position of the first character of the occurrence. This is the default and if specify 1, then it returns the position of the character following the occurrence.
match_param	Optional. It lets you allow to modify the matching behavior for the REGEXP_COUNT function.
subexpr	Optional. The subexpr is an integer from 0 to 9 indicating which subexpression in a pattern is the target of the function. If the value zero, the position of the entire substring that matches the pattern is returned. If greater than zero, the position of the substring fragment that corresponds to subexpression number subexpr in the matched substring is returned., the function returns zero if the pattern does not have at least subexpr subexpressions. A null subexpr value returns NULL. The default value for subexpr is zero.

Here is the list of pattern formatting operator:

Operator	Description
\d	A digit character.
\D	A nondigit character.
\w	A word character.
\W	A nonword character.
\s	A whitespace character.
\S	A non-whitespace character.
\A	Matches only at the beginning of a string, or before a newline character at the end of a string.
\Z	Matches only at the end of a string.
*?	Matches the preceding pattern element 0 or more times (nongreedy).
+?	Matches the preceding pattern element 1 or more times (nongreedy).
??	Matches the preceding pattern element 0 or 1 time (nongreedy).
{n}?	Matches the preceding pattern element exactly n times (nongreedy).
{n,}?	Matches the preceding pattern element at least n times (nongreedy).
{n,m}?	Matches the preceding pattern element at least n but not more than mtimes (nongreedy).

Here is the list of Regular Expression Operators and Metasymbols:

Operator	Description
\	The backslash character can have four different meanings depending on the context. It can: Stand for itself, Quote the next character, Introduce an operator, Do nothing
*	Matches zero or more occurrences
+	Matches one or more occurrences
?	Matches zero or one occurrence
\|	Alternation operator for specifying alternative matches
^	Matches the beginning of a string by default. In multiline mode, it matches the beginning of any line anywhere within the source string.
$	Matches the end of a string by default. In multiline mode, it matches the end of any line anywhere within the source string.
.	Matches any character in the supported character set except NULL
[ ]	Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A non-matching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list. To specify a right bracket (]) in the bracket expression, place it first on the list (after the initial circumflex (^), if any). To specify a hyphen in the bracket expression, place it first on the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
( )	Grouping expression, treated as a single subexpression
{m}	Matches exactly m times
{m,}	Matches at least m times
{m,n}	Matches at least m times but no more than n times
\n	The backreference expression (n is a digit between 1 and 9) matches the nth subexpression enclosed between '(' and ')' preceding the \n
[..]	Specifies one collation element, and can be a multicharacter element (for example, [.ch.] in Spanish)
[: :]	Specifies character classes (for example, [:alpha:]). It matches any character within the character class.
[==]	Specifies equivalence classes. For example, [=a=] matches all characters having the base letter 'a'.

Here is the list of values for match_param:

Value	Description
'c'	Perform case-sensitive matching.
'i'	Perform case-insensitive matching.
'n'	Allows the period character (.) to match the newline character. By default, the period is a wildcard.
'm'	expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, an expression is assumed to be a single line.
'x'	Whitespace characters are ignored. By default, whitespace characters are matched like any other character.

Applies to

Oracle 12c, Oracle 11g

Examples: Oracle REGEXP_INSTR function

The following example findS the position of the first 't' character in a string.

SELECT REGEXP_INSTR ('The web development Tutorial', 't') FROM dual;

Sample Output:

REGEXP_INSTR('THEWEBDEVELOPMENTTUTORIAL','T')
---------------------------------------------
                                           19

The above example will return 2 because it is counting the number of occurrences of 't' in the string. Here we did not specify a match_parameter value, the REGEXP_COUNT function will perform a case-sensitive search which means that the 'T' characters will not be included in the count.

In the following example, we have provided start_position of 1, an occurrence of 1, a return_option of 0, and a match_parameter of 'i', the query will return 1 as the result. This time, the function will search for both 't' and 'T' values and return the first occurrence.

SELECT REGEXP_INSTR ('The web development Tutorial', 't',1,1,0,'i') FROM dual;

Sample Output:

REGEXP_INSTR('THEWEBDEVELOPMENTTUTORIAL','T',1,1,0,'I')
-------------------------------------------------------
                                                      1

The following example examines the string, looking for occurrences of one or more non-blank characters. Searching started at the first character in the string and returns the starting position (default) of the third occurrence of one or more non-blank characters.

SELECT
REGEXP_INSTR('The web development tutorial',
'[^ ]+', 1, 3) "REGEXP_INSTR"
FROM DUAL;

Sample Output:

REGEXP_INSTR
------------
           9

The following example examines the string, looking for occurrences of words beginning with t,e, or w, regardless of case, followed by any three alphabetic characters .Searching begins at the third character in the string and returns the position in the string of the character following the second occurrence with t,e, or w, regardless of case.

SELECT
REGEXP_INSTR('The web development tutorial',
'[s|r|p][[:alpha:]]{3}', 3, 2, 1, 'i') "REGEXP_INSTR"
FROM DUAL;

Sample Output:

REGEXP_INSTR
------------
          29

Previous: REGEXP_COUNT
Next: NLS_CHARSET_DECL_LEN