PostgreSQL SUBSTRING() function
SUBSTRING() function
The PostgreSQL SUBSTRING() function is a powerful tool for text manipulation, allowing users to extract a sequence of characters from a given string. This function is essential for various data processing tasks, such as parsing parts of strings, cleaning data, or extracting meaningful segments from longer texts. It offers flexibility with optional parameters to define starting positions and the length of the substring, making it adaptable to a wide range of use cases, from simple extractions to complex data transformations.
Uses of SUBSTRING() Function
- Extracting Specific Segments: Extract specific parts of a string based on position and length.
- Data Cleaning: Clean and format strings by extracting and modifying portions of text.
- Text Parsing: Parse and analyze text data by isolating segments of interest.
- Dynamic Data Processing: Process strings dynamically based on variable start positions and lengths.
- Regular Expression Matching: Use with regular expressions to extract complex patterns.
- Conditional Logic: Implement conditional logic in queries based on the presence of substrings.
- Column Data Manipulation: Apply transformations and extractions to column data in tables.
Syntax:
substring(string [from <str_pos>] [for <ext_char>])
Parameters
Name | Description |
---|---|
string: | The source string from which the substring will be extracted. |
from <str_pos>: | Optional. Specifies the position within the string to start extraction. Positioning starts at 1. If omitted, the extraction starts from the beginning of the string. |
for <ext_char>: | Optional. Defines the number of characters to extract. If omitted, the extraction continues until the end of the string from the starting position. |
PostgreSQL Version: 9.3
Compatibility and Version Differences:
The `SUBSTRING()` function is available in PostgreSQL from version 9.3 and continues to be supported in later versions with consistent behavior. In newer PostgreSQL versions, additional features like support for more complex regular expressions and improvements in text processing performance have been introduced. Always refer to the latest PostgreSQL documentation for the most current features and best practices.
Visual Presentation of PostgreSQL SUBSTRING() function
Example: PostgreSQL SUBSTRING() function
The following PostgreSQL statement returns 5 characters starting from the 4th position from the string 'w3resource', that is ‘esour’.
Code:
-- This query extracts a substring from the string 'w3resource'.
-- It starts at the 4th character and extracts 5 characters from that position.
SELECT substring('w3resource' from 4 for 5);
Explanation:
- Function: The substring() function is used to extract a portion of a string based on specified criteria.
- Source String: 'w3resource' is the string from which a substring will be extracted.
- Start Position: The from 4 clause indicates that the extraction begins at the 4th character of the string. In this case, the 4th character is 'e'.
- Length of Substring: The for 5 clause specifies that 5 characters will be extracted starting from the 4th character.
- Result: The query returns the substring 'esour', which is composed of the 4th through 8th characters of the original string 'w3resource'.
Sample Output:
substring ----------- esour (1 row)
Extracting and Replacing Substrings:
Code:
-- Select a hardcoded string 'Phone: (123) 456-7890' and alias it as original_string
SELECT 'Phone: (123) 456-7890' AS original_string,
-- Extract a substring starting from the 8th character and spanning 13 characters from the original string
substring('Phone: (123) 456-7890' FROM 8 FOR 13) AS extracted_number;
Explanation:
- Selecting and Aliasing the Original String:
- 'Phone: (123) 456-7890': A static string representing a formatted phone number.
- AS original_string: This alias gives a descriptive name to the static string in the output, labeling it as "original_string".
- Extracting a Substring:
- substring('Phone: (123) 456-7890' FROM 8 FOR 13): Uses the substring() function to extract a portion of the string.
- FROM 8: Specifies that the extraction starts at the 8th character of the string, which is '('.
- FOR 13: Indicates that 13 characters will be extracted starting from the 8th character.
- AS extracted_number: The result of the substring extraction is labeled as "extracted_number" in the output for clarity.
Sample Output:
original_string |extracted_number| ---------------------+----------------+ Phone: (123) 456-7890|(123) 456-789 |
Usage with Regular Expressions:
Highlight how SUBSTRING() can be used with regular expressions for more complex string extraction scenarios.
The following example extracts the part of the email before the @ symbol using a regular expression.
Code:
-- Extract a substring from the string '[email protected]' using a regular expression pattern to isolate the username part
SELECT substring('[email protected]' FROM '([^@]+)') AS username;
Explanation:
- substring('[email protected]' FROM '([^@]+)'): This usage of the substring() function leverages a regular expression to match and extract a specific part of the string.
- '[email protected]': The string from which a portion is to be extracted. This example is a typical email address.
- '([^@]+)': The regular expression pattern used to specify the portion of the string to extract.
- [^@]: A character class that matches any character except @.
- +: Quantifier that matches one or more of the preceding character class (i.e., any sequence of characters that are not @).
- (): Parentheses create a capturing group around the matched characters, which the substring function extracts.
- Aliasing the Result:
- AS username: The result of the substring extraction is labeled as "username" in the output for clarity.
Sample Output:
username| --------+ hello |
PostgreSQL SUBSTRING() function using Column :
Sample Table: employees.
If we want to display the employee_id, first name and 1st 4 characters of first_name for those employees who belong to the department which department_id is below 50 from employees table, the following SQL can be executed:
Code:
-- Select the employee ID, first name, and the first 4 characters of the first name
SELECT employee_id, first_name,
-- Use the substring function to extract the first 4 characters of the first name
substring(first_name, 1, 4) AS "1st 4 characters of first_name"
-- From the employees table
FROM employees
-- Filter the results to include only employees in departments with an ID less than 50
WHERE department_id < 50;
Explanation:
- Selecting Columns:
- employee_id: The query retrieves the unique identifier for each employee from the employees table.
- first_name: The query includes the full first name of the employee.
- substring(first_name, 1, 4): This function extracts the first 4 characters of the first_name and labels the result as "1st 4 characters of first_name" in the output.
- Source Table:
- FROM employees: Specifies the employees table as the source of the data. This table is assumed to contain columns such as employee_id, first_name, and department_id.
- Filtering the Data:
- WHERE department_id < 50: This condition filters the rows to include only employees who belong to departments with an ID less than 50. This implies the query focuses on a subset of departments, possibly the lower-numbered or certain specific ones.
Sample Output:
employee_id | first_name | 1st 4 characters of first_name -------------+------------+-------------------------------- 114 | Den | Den 115 | Alexander | Alex 116 | Shelli | Shel 117 | Sigal | Siga 118 | Guy | Guy 119 | Karen | Kare 178 | Kimberely | Kimb 200 | Jennifer | Jenn 201 | Michael | Mich 202 | Pat | Pat 203 | Susan | Susa (11 rows)
Here is another example:
Sample Table: employees.
If we want to display the employee_id,last name and the position of the substring 'an' in the last_name column for those employees who have a substring 'an' in their last_name column from employees table, the following SQL can be executed :
Code:
-- Select the employee ID, last name, and the position of the substring 'an' within the last name
SELECT employee_id, last_name,
-- Use the position function to find where the substring 'an' starts within the last name
position('an' IN last_name) AS "Position of 'an'"
-- From the employees table
FROM employees
-- Filter the results to include only those rows where the substring 'an' is found in the last name
-- This is checked by extracting a substring of length 2 from the position where 'an' starts and comparing it to 'an'
WHERE substring(last_name, position('an' IN last_name), 2) = 'an';
Explanation:
- Selecting Columns:
- employee_id: Retrieves the unique identifier for each employee from the employees table.
- last_name: Includes the full last name of the employee.
- position('an' IN last_name) "Position of 'an'": Uses the position() function to find the starting index of the substring 'an' in the last_name and labels this output as "Position of 'an'".
- Source Table:
- FROM employees: Specifies the employees table as the source from which to retrieve the data.
- Filtering the Data:
- WHERE substring(last_name, position('an' IN last_name), 2) = 'an': This condition filters the rows to include only employees whose last_name contains the substring 'an'.
- position('an' IN last_name): Finds the position where 'an' starts within the last_name.
- substring(last_name, position('an' IN last_name), 2): Extracts a substring of 2 characters starting at the position where 'an' is found.
- The filter checks if this extracted substring is equal to 'an', ensuring the presence of the substring 'an' in the last_name.
Sample Output:
employee_id | last_name | Position of 'an' -------------+------------+------------------ 102 | De Haan | 6 112 | Urman | 4 123 | Vollman | 6 127 | Landry | 2 136 | Philtanker | 6 160 | Doran | 4 167 | Banda | 2 178 | Grant | 3 182 | Sullivan | 7 184 | Sarchand | 6 199 | Grant | 3 (11 rows)
Related Functions and Comparisons
SELECT LEFT('PostgreSQL', 4); -- Outputs 'Post'
SELECT RIGHT('PostgreSQL', 4); -- Outputs 'SQL'
SELECT POSITION('gre' IN 'PostgreSQL'); -- Outputs 5
Frequently Asked Questions (FAQ) - PostgreSQL SUBSTRING() Function
1. What is the PostgreSQL SUBSTRING() function?
The PostgreSQL SUBSTRING() function is used to extract a sequence of characters from a given string. It is a versatile tool for handling and manipulating text data within a PostgreSQL database, allowing users to isolate and work with specific segments of strings.
2. How does the PostgreSQL SUBSTRING() function work?
The SUBSTRING() function extracts a portion of a string based on specified criteria, such as starting position and length. It can be tailored to extract characters from any point in the string and for any number of characters.
3. What are the main uses of the PostgreSQL SUBSTRING() function?
- Text Parsing: Breaking down strings into more manageable parts.
- Data Cleaning: Removing or isolating unnecessary parts of a string.
- Data Extraction: Pulling out meaningful segments, such as a username from an email or a specific code from a longer identifier.
4. Can the PostgreSQL SUBSTRING() function handle regular expressions?
Yes, the SUBSTRING() function can use regular expressions to perform more complex extractions. This is particularly useful for patterns where simple positional extraction is insufficient.
5. Is the starting position parameter mandatory?
No, the starting position is optional. If omitted, the extraction begins at the first character of the string.
6. What happens if the length parameter is not specified?
If the length parameter is omitted, the function will extract characters from the starting position to the end of the string.
7. What versions of PostgreSQL support the SUBSTRING() function?
The SUBSTRING() function is supported in PostgreSQL from version 9.3 onwards. Subsequent versions have maintained support with enhancements for performance and additional features.
8. How does PostgreSQL SUBSTRING() compare with other string functions like PostgreSQL LEFT() and PostgreSQL RIGHT()?
- LEFT(): Extracts a specified number of characters from the start of the string.
- RIGHT(): Extracts a specified number of characters from the end of the string.
- SUBSTRING(): Offers more flexibility by allowing extraction from any position and for a variable length, including the use of regular expressions.
9. What are some practical scenarios where PostgreSQL SUBSTRING() is useful?
- Extracting usernames from email addresses.
- Isolating area codes from phone numbers.
- Pulling out date components from timestamp strings.
- Retrieving specific codes or identifiers embedded in longer strings.
Previous: POSITION function
Next: TRIM function
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics