MySQL REGEXP operator
Introduction to MySQL REGEXP Operator
The MySQL REGEXP (Regular Expression) operator is a powerful tool for pattern matching and string searching within the SQL environment. Leveraging regular expressions, you can perform complex searches and matches that go beyond the capabilities of standard SQL wildcard characters. Whether you're filtering records based on intricate patterns or validating data formats, REGEXP offers unparalleled flexibility.
This guide will delve into the REGEXP operator's syntax, usage, and advanced applications, making it an invaluable resource for database developers and administrators.
This function is useful in -
- Pattern matching: It allows us to search for strings that match a specific pattern defined by a regular expression.
- Regular expression patterns: Meta-characters and quantifiers can be used to match characters, ranges of characters, or repeat characters in regular expressions.
- Complex pattern matching: Regular expressions allow you to define complex patterns that are difficult or impossible to express with simple wildcard characters (%) or underscores (_).
Syntax:
expr REGEXP pat
Argument:
| Name | Description | 
|---|---|
| expr | The string expression or column to be evaluated. | 
| pat | The regular expression pattern to search for within expr. | 
Note:
- MySQL uses the C escape syntax, meaning you need to double any \ in your patterns.
- The REGEXP operator is case-insensitive unless used with binary strings or the BINARY keyword.
MySQL uses the POSIX regular expression syntax, which supports a variety of metacharacters and constructs. Below are some common metacharacters:
- '.': Matches any single character.
- '^': Matches the start of a string.
- '$': Matches the end of a string.
- '[]': Defines a character class to match any single character within the brackets.
- '|': Acts as an OR operator within patterns.
- '*', '+', '?': Quantifiers to specify the number of occurrences.
- MySQL primarily uses POSIX regex, which is simpler but less powerful compared to Perl-Compatible Regular Expressions (PCRE) found in some other database systems. Understanding these differences can be crucial when migrating or working with multiple database systems.
MySQL Version: 8.0
MySQL REGEXP operator: Basic usage
Sample table: author
The following MySQL statement returns all rows from the author table where the country column contains either 'USA' or 'UK'.
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'country' column matches the regular expression pattern 'USA|UK'
WHERE country REGEXP 'USA|UK';
Explanation:
- SELECT *: This clause tells the database to return all columns from the selected rows.
- FROM author: Specifies the table (author) from which to retrieve the data.
- WHERE country REGEXP 'USA|UK': Filters the rows based on the country column. The REGEXP operator checks if the country column matches the regular expression 'USA|UK'.
- REGEXP: A regular expression operator used to match patterns within strings.
- 'USA|UK': The pattern to match. The pipe symbol (|) acts as an OR operator in regular expressions, meaning the pattern matches either 'USA' or 'UK'.
Output:
aut_id|aut_name |country|home_city| ------+---------------+-------+---------+ AUT001|William Norton |UK |Cambridge| AUT003|William Anthony|UK |Leeds | AUT006|Thomas Merton |USA |New York | AUT007|Piers Gibson |UK |London | AUT008|Nikolai Dewey |USA |Atlanta | AUT010|Joseph Milton |USA |Houston | AUT014|C. J. Wilde |UK |London | AUT015|Butler Andre |USA |Florida |
Example of MySQL REGEXP operator using(^) find from beginning
The following MySQL statement will find the author’s name beginning with ‘w’. The ‘^’ is used to match the beginning of the name.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that starts with 'w'
-- The REGEXP operator is used for pattern matching
-- The '^' symbol signifies the beginning of a string
-- Therefore, this query will return rows where the 'aut_name' column starts with 'w'
WHERE aut_name REGEXP '^w';
Explanation:
- This SQL statement is designed to retrieve all records from the author table where the aut_name (author's name) starts with the lowercase letter 'w'.
- The caret symbol (^) matches the beginning of the string. In regex terms, it ensures that the match must start from the beginning of the aut_name value.
- The letter 'w' specifies that the pattern is looking for any name that starts with 'w'.
Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP '^w'; 
+--------+-----------------+---------+-----------+
| aut_id | aut_name        | country | home_city |
+--------+-----------------+---------+-----------+
| AUT001 | William Norton  | UK      | Cambridge | 
| AUT002 | William Maugham | Canada  | Toronto   | 
| AUT003 | William Anthony | UK      | Leeds     | 
+--------+-----------------+---------+-----------+
3 rows in set (0.13 sec)
Example of MySQL REGEXP operator using (^) with binary operator
The following statement will find the author’s name beginning with ‘w’ exactly in lower case, because for case sensitivity BINARY operator has been used. Here no row has been fetched.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that starts with 'w'
-- The REGEXP BINARY operator is used for case-sensitive pattern matching
-- The '^' symbol signifies the beginning of a string
-- Therefore, this query will return rows where the 'aut_name' column starts with a lowercase 'w'
WHERE aut_name REGEXP BINARY '^w';
Explanation:
- This query retrieves rows from the author table where the aut_name column starts with the lowercase letter 'w'.
- The REGEXP BINARY operator ensures that the pattern matching is case-sensitive, meaning it will only match names that start with a lowercase 'w' and not an uppercase 'W'.
- The regular expression pattern '^w' is used to match names starting with 'w', and the BINARY keyword makes the match case-sensitive.
Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP BINARY '^w';
Empty set (0.05 sec)
Example of MySQL REGEXP operator using ($) searching from the end
The following statement will find the author’s name ending with ‘on’. The ‘$’ character have been used to match the ending of the name.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that ends with 'on'
-- The REGEXP operator is used for pattern matching
-- The '$' symbol signifies the end of a string
-- Therefore, this query will return rows where the 'aut_name' column ends with 'on'
WHERE aut_name REGEXP "on$";
Explanation:
- This query retrieves rows from the author table where the aut_name column ends with the letters 'on'.
- The REGEXP operator uses the regular expression pattern "on$", where the "$" character indicates that the pattern should match the end of the string.
- This is useful for finding names that end with a specific substring.
Sample Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP "on$" ;
+--------+----------------+---------+-----------+
| aut_id | aut_name       | country | home_city |
+--------+----------------+---------+-----------+
| AUT001 | William Norton | UK      | Cambridge | 
| AUT006 | Thomas Merton  | USA     | New York  | 
| AUT007 | Piers Gibson   | UK      | London    | 
| AUT010 | Joseph Milton  | USA     | Houston   | 
+--------+----------------+---------+-----------+
4 rows in set (0.00 sec)
Example of MySQL REGEXP operator searching within the string
The following statement will find the author’s name containing a ‘t’.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern containing the letter 't'
-- The REGEXP operator is used for pattern matching
-- The pattern "t" specifies that the letter 't' should appear anywhere in the 'aut_name'
-- This query will return rows where the 'aut_name' column contains at least one 't'
WHERE aut_name REGEXP "t";
Explanation:
- This query retrieves rows from the author table where the aut_name column contains the letter 't' anywhere within the name.
- The REGEXP operator with the pattern "t" checks for the presence of 't' in any position in the string.
- This is useful for finding names that include a specific character.
Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP "t";
+--------+----------------------+-----------+-----------+
| aut_id | aut_name             | country   | home_city |
+--------+----------------------+-----------+-----------+
| AUT001 | William Norton       | UK        | Cambridge | 
| AUT003 | William Anthony      | UK        | Leeds     | 
| AUT004 | S.B.Swaminathan      | India     | Bangalore | 
| AUT005 | Thomas Morgan        | Germany   | Arnsberg  | 
| AUT006 | Thomas Merton        | USA       | New York  | 
| AUT010 | Joseph Milton        | USA       | Houston   | 
| AUT011 | John Betjeman Hunter | Australia | Sydney    | 
| AUT015 | Butler Andre         | USA       | Florida   | 
+--------+----------------------+-----------+-----------+
8 rows in set (0.00 sec)
Using REGEXP with character classes
The following MySQL statement returns true(1) or false(0)from the given expression where the text contains three consecutive digits
Code:
-- Evaluate whether the string 'It12345contains3consicutive67digits' contains a sequence of exactly three digits
SELECT 'It12345contains3consicutive67digits' 
-- The REGEXP operator is used for pattern matching
-- The pattern '[0-9]{3}' specifies that we are looking for any sequence of exactly three digits
-- The result will be 1 if the string contains a sequence of three digits, otherwise it will be 0
REGEXP '[0-9]{3}';
Explanation:
This query checks if the string 'It12345contains3consicutive67digits' contains any sequence of exactly three consecutive digits. The REGEXP operator with the pattern '[0-9]{3}' searches for three digits in a row anywhere in the string. If such a sequence is found, the result is 1 (true); otherwise, the result is 0 (false).
- REGEXP '[0-9]{3}':
- The REGEXP operator is used to apply a regular expression pattern to the specified string.
- '[0-9]{3}' is the regular expression pattern:
- [0-9] specifies any digit from 0 to 9.
- {3} is a quantifier that specifies exactly three consecutive occurrences of the preceding element (in this case, digits).
Output:
'It12345contains3consicutive67digits' REGEXP '[0-9]{3}'|
-------------------------------------------------------+
                                                      1|
Example of MySQL REGEXP operator searching specific character
The following statement will find the author’s name containing a ‘z’ or ‘v’ or ‘y’.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that includes any of the characters 'z', 'v', or 'y'
-- The REGEXP operator is used for pattern matching
-- The pattern "[zvy]" specifies that we are looking for any occurrence of 'z', 'v', or 'y' in the 'aut_name'
-- This query will return rows where the 'aut_name' column contains at least one of these characters
WHERE aut_name REGEXP "[zvy]";
Explanation:
- This query retrieves rows from the author table where the aut_name column contains any of the characters 'z', 'v', or 'y' at any position within the name.
- The square brackets [ ] define a character class, which matches any one of the characters inside the brackets.
- The REGEXP operator with the pattern "[zvy]" checks for the presence of these specific characters in the aut_name column.
- This is useful for finding names that include any of a set of specified characters.
Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP "[zvy]";
+--------+-----------------+---------+-----------+
| aut_id | aut_name        | country | home_city |
+--------+-----------------+---------+-----------+
| AUT003 | William Anthony | UK      | Leeds     | 
| AUT008 | Nikolai Dewey   | USA     | Atlanta   | 
| AUT012 | Evan Hayek      | Canada  | Vancouver | 
+--------+-----------------+---------+-----------+
3 rows in set (0.00 sec)
Example of MySQL REGEXP operator searching using a range of characters
The following statement will find the author’s name containing characters from ‘x’ to ‘z’.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that includes any character from 'x' to 'z'
-- The REGEXP operator is used for pattern matching
-- The pattern "[x-z]" specifies that we are looking for any occurrence of characters between 'x' and 'z' in the 'aut_name'
-- This query will return rows where the 'aut_name' column contains at least one character in this range
WHERE aut_name REGEXP "[x-z]";
Explanation:
- This query effectively searches for and retrieves rows from the author table where the aut_name column contains any character from 'x' to 'z'.
- The REGEXP operator with the pattern [x-z] allows for flexible pattern matching within SQL queries, enabling you to find specific character ranges within text data.
Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP "[x-z]";
+--------+-----------------+---------+-----------+
| aut_id | aut_name        | country | home_city |
+--------+-----------------+---------+-----------+
| AUT003 | William Anthony | UK      | Leeds     | 
| AUT008 | Nikolai Dewey   | USA     | Atlanta   | 
| AUT012 | Evan Hayek      | Canada  | Vancouver | 
+--------+-----------------+---------+-----------+
3 rows in set (0.00 sec)
Example of MySQL REGEXP operator searching a specific length of string
The following statement will find the author’s name containing exactly 12 characters. Use ‘^’ and ‘$’ match the beginning and ending of the name and twelve instances of ‘.’ have been used for maintaining twelve characters.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that specifies exactly 12 characters
-- The REGEXP operator is used for pattern matching
-- The pattern '^............$' specifies that we are looking for 'aut_name' values that are exactly 12 characters long
-- This query will return rows where the 'aut_name' column contains exactly 12 characters
WHERE aut_name REGEXP '^............$';
Explanation:
- This query is designed to find and retrieve rows from the author table where the aut_name column contains a string exactly 12 characters long.
- The REGEXP operator allows for flexible pattern matching within SQL queries, in this case ensuring precise string length.
- REGEXP '^............$' specifies the regular expression pattern:
- ^ and $ anchor the pattern to the beginning and end of the string, respectively.
- . matches any single character.
- {12} specifies that exactly 12 characters (.) are expected between the anchors.
Output:
mysql> SELECT * FROM author 
    -> WHERE aut_name REGEXP '^............$';
+--------+--------------+---------+-----------+
| aut_id | aut_name     | country | home_city |
+--------+--------------+---------+-----------+
| AUT007 | Piers Gibson | UK      | London    | 
| AUT015 | Butler Andre | USA     | Florida   | 
+--------+--------------+---------+-----------+
2 rows in set (0.01 sec)
Example of MySQL REGEXP operator searching a defined length of string
The following statement will find the author’s name containing exactly 12 characters. Use ‘^’ and ‘$’ match the beginning and ending of the name and ‘{12}’ have been after ‘.’ for repeating ‘.’ twelve times.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that specifies exactly 12 characters
-- The REGEXP operator is used for pattern matching
-- The pattern '^.{12}$' specifies that we are looking for 'aut_name' values that are exactly 12 characters long
-- This query will return rows where the 'aut_name' column contains exactly 12 characters
WHERE aut_name REGEXP '^.{12}$';
Explanation:
- This query is designed to find and retrieve rows from the author table where the aut_name column contains a string exactly 12 characters long.
- WHERE aut_name REGEXP '^.{12}$':
- The WHERE clause filters the rows based on a condition.
- aut_name is the column being evaluated.
- REGEXP '^.{12}$' specifies the regular expression pattern:
- ^ and $ anchor the pattern to the beginning and end of the string, respectively.
- .{12} matches any sequence of exactly 12 characters.
- Together, ^.{12}$ ensures that the entire aut_name value is exactly 12 characters long.
Output:
mysql> SELECT * FROM author
    -> WHERE aut_name REGEXP '^.{12}$'; 
+--------+--------------+---------+-----------+
| aut_id | aut_name     | country | home_city |
+--------+--------------+---------+-----------+
| AUT007 | Piers Gibson | UK      | London    | 
| AUT015 | Butler Andre | USA     | Florida   | 
+--------+--------------+---------+-----------+
2 rows in set (0.00 sec)
Example of MySQL REGEXP operator using Lookaheads:
The following statement will match author names where 'w' is followed by 'i' but 'i' is not included in the match.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that includes 'w' followed by 'i'
-- The REGEXP operator is used for pattern matching
-- The pattern 'w(?=i)' specifies that we are looking for 'aut_name' values where 'w' is immediately followed by 'i'
-- This query will return rows where the 'aut_name' column contains 'wi' as a substring
WHERE aut_name REGEXP 'w(?=i)';
Explanation:
- This query uses a regular expression with a positive lookahead to find and retrieve rows from the author table where the aut_name column contains the specific substring 'wi'.
- The positive lookahead (?=i) ensures that 'w' is immediately followed by 'i', allowing for precise substring matching within SQL queries.
- WHERE aut_name REGEXP 'w(?=i)':
- The WHERE clause filters the rows based on a condition.
- aut_name is the column being evaluated.
- REGEXP 'w(?=i)' specifies the regular expression pattern:
- w matches the character 'w'.
- (?=i) is a positive lookahead assertion, ensuring that immediately after 'w', there must be 'i'.
- Together, w(?=i) ensures that 'aut_name' values containing 'wi' as a substring are selected.
Output:
aut_id|aut_name |country|home_city| ------+---------------+-------+---------+ AUT001|William Norton |UK |Cambridge| AUT002|William Maugham|Canada |Toronto | AUT003|William Anthony|UK |Leeds | AUT014|C. J. Wilde |UK |London |
Example of MySQL REGEXP operator using Lookbehinds:
The following query finds names where 'n' is preceded by 'a' but 'a' is not included in the match.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern that includes 'n' preceded by 'a'
-- The REGEXP operator is used for pattern matching
-- The pattern '(?<=a)n' specifies that we are looking for 'aut_name' values where 'n' is preceded by 'a'
-- This query will return rows where the 'aut_name' column contains 'an' as a substring
WHERE aut_name REGEXP '(?<=a)n';
Explanation:
- This query uses a regular expression with a positive lookbehind to find and retrieve rows from the author table where the aut_name column contains the specific substring 'an'.
- The positive lookbehind (?<=a) ensures that 'n' is preceded by 'a', allowing for precise substring matching within SQL queries.
- WHERE aut_name REGEXP '(?<=a)n':
- The WHERE clause filters the rows based on a condition.
- aut_name is the column being evaluated.
- REGEXP '(?<=a)n' specifies the regular expression pattern:
- (?<=a) is a positive lookbehind assertion, ensuring that immediately before 'n', there must be 'a'.
- n matches the character 'n'.
- Together, (?<=a)n ensures that 'aut_name' values containing 'an' as a substring are selected.
Output:
aut_id|aut_name |country |home_city| ------+--------------------+---------+---------+ AUT003|William Anthony |UK |Leeds | AUT004|S.B.Swaminathan |India |Bangalore| AUT005|Thomas Morgan |Germany |Arnsberg | AUT011|John Betjeman Hunter|Australia|Sydney | AUT012|Evan Hayek |Canada |Vancouver| AUT015|Butler Andre |USA |Florida |
Example of MySQL REGEXP operator combining Patterns with Groups and Quantifiers:
The following query matches names starting with 'W' or 'T' and ending with 'n' or 'y'.
Sample table: author
Code:
-- Select all columns (*) from the 'author' table
SELECT * 
-- From the 'author' table
FROM author 
-- Where the 'aut_name' column matches a pattern defined by the REGEXP operator
-- The pattern '^(W|T).*(n|y)$' specifies a regular expression pattern
-- This query will return rows where the 'aut_name' column starts with 'W' or 'T' and ends with 'n' or 'y'
WHERE aut_name REGEXP '^(W|T).*(n|y)$';
Explanation:
- This query uses a regular expression (REGEXP) to find and retrieve rows from the author table where the aut_name column meets specific criteria:
- It starts with either 'W' or 'T'.
- It ends with either 'n' or 'y'.
- WHERE aut_name REGEXP '^(W|T).*(n|y)$':
- The WHERE clause filters the rows based on a condition.
- aut_name is the column being evaluated.
- REGEXP '^(W|T).*(n|y)$' specifies the regular expression pattern:
- ^ asserts the beginning of the string.
- (W|T) matches either 'W' or 'T' at the beginning of the string.
- .* matches any sequence of characters (zero or more).
- (n|y) matches either 'n' or 'y' at the end of the string.
- $ asserts the end of the string.
- Together, ^(W|T).*(n|y)$ ensures that 'aut_name' values start with 'W' or 'T' and end with 'n' or 'y'.
Output:
aut_id|aut_name |country|home_city| ------+---------------+-------+---------+ AUT001|William Norton |UK |Cambridge| AUT003|William Anthony|UK |Leeds | AUT005|Thomas Morgan |Germany|Arnsberg | AUT006|Thomas Merton |USA |New York |
Tips to Optimize Performance with REGEXP:
Regular expressions are powerful but can be computationally expensive, especially on large datasets.
- Limit Search Scope: Whenever possible, reduce the number of rows to be scanned by adding more specific `WHERE` clauses or using indexes.
- Avoid Full Table Scans: Ensure that queries using `REGEXP` do not trigger full table scans. Use indexes or limit the search to specific columns.
- Use BINARY: Use the `BINARY` keyword to enforce case sensitivity, which can sometimes improve performance by reducing the number of matches.
- Profile Queries: Use the MySQL `EXPLAIN` statement to understand how your `REGEXP` queries are executed and identify potential bottlenecks.
Frequently Asked Questions (FAQ) - MySQL REGEXP Operator
1. What is the MySQL REGEXP operator?
The MySQL REGEXP (Regular Expression) operator is used for pattern matching within SQL queries. It allows users to search for strings that match specific patterns defined by regular expressions.
2. How does MySQL REGEXP differ from wildcards (%) in SQL?
While SQL wildcards like '%' match any sequence of characters (including zero), MySQL REGEXP uses regular expressions, which offer more complex pattern matching capabilities. REGEXP can match specific characters, character ranges, or enforce precise string conditions.
3. Is MySQL REGEXP case-sensitive?
MySQL REGEXP is case-insensitive by default, unless used with binary strings or the BINARY keyword.
. What are some common metacharacters used in MySQL REGEXP?
Metacharacters in MySQL REGEXP include:
- '.' (matches any single character)
- '^' (matches the start of a string)
- '$' (matches the end of a string)
- '[]' (defines a character class)
- '|' (acts as an OR operator within patterns)
- '*', '+', '?' (quantifiers to specify occurrences)
4. Can MySQL REGEXP match complex patterns?
Yes, MySQL REGEXP supports complex pattern matching that extends beyond simple wildcard characters (% or _). This includes matching specific character sets, ranges, and repetitions within strings.
5. What are some tips for optimizing performance when using MySQL REGEXP?
To optimize performance:
- Limit the search scope with specific WHERE clauses or indexes.
- Avoid triggering full table scans by narrowing down the query.
- Use the BINARY keyword for case-sensitive searches when necessary.
- Profile queries using EXPLAIN to identify and address performance bottlenecks.
6. What MySQL version supports REGEXP?
REGEXP has been supported in MySQL since version 3.23.
Video Presentation:
