Linux regular expressions

Last update on November 13 2023 10:30:05 (UTC/GMT +8 hours)

Introduction

Regular expressions are a very powerful tool in Linux. They can be used with a variety of programs like bash, vi, rename, grep, sed, and more.

This session introduces you to the basics of regular expressions.

regex versions

There are three different versions of regular expression syntax:

BRE: Basic Regular Expressions
ERE: Extended Regular Expressions
PRCE: Perl Regular Expressions

Depending on the tool being used, one or more of these syntaxes can be used.

For example, the grep tool has the -E option to force a string to be read as ERE while -G forces BRE and -P forces PRCE.

Note that grep als has -F to force the string to be read literally.

Find files of type file (not directory, pipe or etc.) that end in .conf.

The sed tool also has options to choose a regex syntax.

Read the manual of the tools you use!

grep

print lines matching a pattern

grep is a popular Linux tool to search for lines that match a certain pattern. Below are some examples of the simplest regular expressions.

This is the contents of the text file. This file contains four lines (or four newline characters).

datasoft @ datasoft-linux ~$ cat names.txt
Sachin
Sourav
Rahul
Binod

When grepping for a single character, only the lines containing that character are returned.

datasoft @ datasoft-linux ~$ grep c names.txt
Sachin
 datasoft @ datasoft-linux ~$ grep l names.txt
Rahul
 datasoft @ datasoft-linux ~$ grep o names.txt
Sourav
Binod

The pattern matching in this example should be very straightforward; if the given character occurs on a line, then grep will return that line.

concatenating characters

Two concatenated characters will have to be concatenated in the same way to have a match.

This example demonstrates that hi will match Sachin but not Sourav and Rahual Bi will match Vonod but not Sachin and Sourav.

datasoft @ datasoft-linux ~$ grep a names.txt
Sachin
Sourav
Rahul
 datasoft @ datasoft-linux ~$ grep hi names.txt
Sachin
 datasoft @ datasoft-linux ~$ grep Bi names.txt
Binod
 datasoft @ datasoft-linux ~$

one or the other

PRCE and ERE both use the pipe symbol to signify OR. In this example we grep for lines containing the letter i or the letter a.

 datasoft @ datasoft-linux ~$ cat names.txt
Sachin
Sourav
Rahul
Binod
 datasoft @ datasoft-linux ~$ grep -E 'i|u' names.txt
Sachin
Sourav
Rahul
Binod
 datasoft @ datasoft-linux ~$ grep -E 'i|o' names.txt
Sachin
Sourav
Binod

Note that we use the -E switch of grep to force interpretion of our string as an ERE.

We need to escape the pipe symbol in a BRE to get the same logical OR.

 datasoft @ datasoft-linux ~$ grep -G 'i|u' names.txt
 datasoft @ datasoft-linux ~$ grep -G 'i\|u' names.txt
Sachin
Sourav
Rahul
Binod

one or more

The * signifies zero, one or more occurences of the previous and the + signifies one or more of the previous.

datasoft @ datasoft-linux ~$ cat abc1.txt
11
101
1001
10001

 datasoft @ datasoft-linux ~$ grep -E '0*' abc1.txt
11
101
1001
10001

 datasoft @ datasoft-linux ~$ grep -E '0+' abc1.txt
101
1001
10001
 datasoft @ datasoft-linux ~$

match the end of a string

For the following examples, we will use this file.

datasoft @ datasoft-linux ~$ cat names.txt
Sachin
Sourav
Rahul
Binod
datasoft @ datasoft-linux ~$

The two examples below show how to use the dollar character to match the end of a string.

datasoft @ datasoft-linux ~$ grep n$ names.txt
Sachin
 datasoft @ datasoft-linux ~$ grep d$ names.txt
Binod

match the start of a string

The caret character (^) will match a string at the start (or the beginning) of a line.

Given the same file as above, here are two examples.

datasoft @ datasoft-linux ~$ grep ^Sac names.txt
Sachin
 datasoft @ datasoft-linux ~$ grep ^S names.txt
Sachin
Sourav

Both the dollar sign and the little hat are called anchors in a regex.

separating words

Regular expressions use a \b sequence to reference a word separator. Take for example this file:

 datasoft @ datasoft-linux ~$ cat summer.txt
The sun shine very brightly.
It is sunny day.
Is the flower beautiful?

Simply grepping for over will give too many results.

datasoft @ datasoft-linux ~$ grep day summer.txt
It is sunny day.

Surrounding the searched word with spaces is not a good solution (because other characters can be word separators). This screenshwo below show how to use \b to find only the searched word:

datasoft @ datasoft-linux ~$ grep '\bday\b' summer.txt
It is sunny day.
 datasoft @ datasoft-linux ~$

Note that grep also has a -w option to grep for words.

datasoft @ datasoft-linux ~$ cat summer.txt
The sun shine very brightly.
It is sunny day.
Is the flower beautiful?
 datasoft @ datasoft-linux ~$ grep -w day summer.txt
It is sunny day.
 datasoft @ datasoft-linux ~$

grep features

Sometimes it is easier to combine a simple regex with grep options, than it is to write a more complex regex. These options where discussed before:

grep -i
grep -v
grep -w
grep -A5
grep -B5
grep -C5

preventing shell expansion of a regex

The dollar sign is a special character, both for the regex and also for the shell (remember variables and embedded shells). Therefore it is advised to always quote the regex, this prevents shell expansion.

datasoft @ datasoft-linux ~$ grep 'l$' names.txt
Rahul

rename

the rename command

On Debian Linux the /usr/bin/rename command is a link to /usr/bin/prename installed by the perl package.

 datasoft @ datasoft-linux ~$ dpkg -S $(readlink -f $(which rename))
perl: /usr/bin/prename

Red Hat derived systems do not install the same rename command, so this section does not describe rename on Red Hat (unless you copy the perl script manually).

There is often confusion on the internet about the rename command because solutions that work fine in Debian (and Ubuntu, xubuntu, Mint, ...) cannot be used in Red Hat (and CentOS, Fedora, ...).

perl

The rename command is actually a perl script that uses perl regular expressions. The complete manual for these can be found by typing perldoc perlrequick (after installing perldoc).

datasoft @ datasoft-linux ~$ sudo apt-get install perl-doc
[sudo] password for datasoft: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  groff
The following NEW packages will be installed:
  perl-doc
0 upgraded, 1 newly installed, 0 to remove and 364 not upgraded.
Need to get 7,266 kB of archives.
After this operation, 13.1 MB of additional disk space will be used.
Get:1 http://in.archive.ubuntu.com/ubuntu/ trusty/main perl-doc all 5.18.2-2ubuntu1 [7,266 kB]
Fetched 7,266 kB in 2min 7s (57.0 kB/s)                                        
Selecting previously unselected package perl-doc.
(Reading database ... 171435 files and directories currently installed.)
Preparing to unpack .../perl-doc_5.18.2-2ubuntu1_all.deb ...
Adding 'diversion of /usr/bin/perldoc to /usr/bin/perldoc.stub by perl-doc'
Unpacking perl-doc (5.18.2-2ubuntu1) ...
Processing triggers for man-db (2.6.7.1-1) ...
Setting up perl-doc (5.18.2-2ubuntu1) ...

datasoft @ datasoft-linux ~$ perldoc perlrequick

well known syntax

The most common use of the rename is to search for filenames matching a certain string and replacing this string with the other string.

This is often presented as s/string/other string/ as seen in this example:

datasoft @ datasoft-linux ~$ ls
abc1                                names.txt
abc1.txt                            out_and_err
ABC.png                             part1
abc.txt                             part2
ajax-php-mysql-user-interface.html  part3
allfiles.txt                        Pictures
count                               png
cricket.txt                         pqr.txt
Desktop                             pqr.txt~
Documents                           Public
Downloads                           sample.txt
etcfiles.txt                        sqlite3
examples.desktop                    sqlite-amalgamation-3080500 (2)
file1.txt                           sqlite-amalgamation-3080500.zip
file2                               sqlite-shell-linux-x86-3080500.zip
FileA                               summer.png
FileB                               Summer.png
foo                                 summer.txt
football.txt                        summer.txt~
lebel1.txt                          Templates
lebel1.txt~                         temp.txt.bz2
lebel2.txt                          test1
lebel.txt                           test10
linux-command-past-date.png         test2
mno.txt                             text2
Music                               typescript
MyDir                               Untitled 1.odt
MyDir1                              Untitled Document~
MyDirA                              Videos
Myfile1.doc                         wrong.txt
MYFILE1.doc                         wrong.txtclear
MYFILE2.doc                         xyz.txt
MyTest                              xyz.txt~
 datasoft @ datasoft-linux ~$ rename 's/txt/text/' *
 datasoft @ datasoft-linux ~$ ls
abc1                                names.text
abc1.text                           out_and_err
ABC.png                             part1
abc.text                            part2
ajax-php-mysql-user-interface.html  part3
allfiles.text                       Pictures
count                               png
cricket.text                        pqr.text
Desktop                             pqr.text~
Documents                           Public
Downloads                           sample.text
etcfiles.text                       sqlite3
examples.desktop                    sqlite-amalgamation-3080500 (2)
file1.text                          sqlite-amalgamation-3080500.zip
file2                               sqlite-shell-linux-x86-3080500.zip
FileA                               summer.png
FileB                               Summer.png
foo                                 summer.text
football.text                       summer.text~
lebel1.text                         Templates
lebel1.text~                        temp.text.bz2
lebel2.text                         test1
lebel.text                          test10
linux-command-past-date.png         test2
mno.text                            TXT2
Music                               typescript
MyDir                               Untitled 1.odt
MyDir1                              Untitled Document~
MyDirA                              Videos
Myfile1.doc                         wrong.text
MYFILE1.doc                         wrong.textclear
MYFILE2.doc                         xyz.text
MyTest                              xyz.text~

And here is another example that uses rename with the well know syntax to change the extensions of the same files once more:

datasoft @ datasoft-linux ~$ ls
abc1                                names.text
abc1.text                           out_and_err
ABC.png                             part1
abc.text                            part2
ajax-php-mysql-user-interface.html  part3
allfiles.text                       Pictures
count                               png
cricket.text                        pqr.text
Desktop                             pqr.text~
Documents                           Public
Downloads                           sample.text
etcfiles.text                       sqlite3
examples.desktop                    sqlite-amalgamation-3080500 (2)
file1.text                          sqlite-amalgamation-3080500.zip
file2                               sqlite-shell-linux-x86-3080500.zip
FileA                               summer.png
FileB                               Summer.png
foo                                 summer.text
football.text                       summer.text~
lebel1.text                         Templates
lebel1.text~                        temp.text.bz2
lebel2.text                         test1
lebel.text                          test10
linux-command-past-date.png         test2
mno.text                            TXT2
Music                               typescript
MyDir                               Untitled 1.odt
MyDir1                              Untitled Document~
MyDirA                              Videos
Myfile1.doc                         wrong.text
MYFILE1.doc                         wrong.textclear
MYFILE2.doc                         xyz.text
MyTest                              xyz.text~
 datasoft @ datasoft-linux ~$ rename 's/text/txt/' *.text
 datasoft @ datasoft-linux ~$ ls
abc1                                names.txt
abc1.txt                            out_and_err
ABC.png                             part1
abc.txt                             part2
ajax-php-mysql-user-interface.html  part3
allfiles.txt                        Pictures
count                               png
cricket.txt                         pqr.text~
Desktop                             pqr.txt
Documents                           Public
Downloads                           sample.txt
etcfiles.txt                        sqlite3
examples.desktop                    sqlite-amalgamation-3080500 (2)
file1.txt                           sqlite-amalgamation-3080500.zip
file2                               sqlite-shell-linux-x86-3080500.zip
FileA                               summer.png
FileB                               Summer.png
foo                                 summer.text~
football.txt                        summer.txt
lebel1.text~                        Templates
lebel1.txt                          temp.text.bz2
lebel2.txt                          test1
lebel.txt                           test10
linux-command-past-date.png         test2
mno.txt                             TXT2
Music                               typescript
MyDir                               Untitled 1.odt
MyDir1                              Untitled Document~
MyDirA                              Videos
Myfile1.doc                         wrong.textclear
MYFILE1.doc                         wrong.txt
MYFILE2.doc                         xyz.text~
MyTest                              xyz.txt
 datasoft @ datasoft-linux ~$

These two examples appear to work because the strings we used only exist at the end of the filename. Remember that file extensions have no meaning in the bash shell.

The next example shows what can go wrong with this syntax.

datasoft @ datasoft-linux ~$ touch xyz.txt
 datasoft @ datasoft-linux ~$ rename 's/xyz/problem/' xyz.txt
 datasoft @ datasoft-linux ~$ ls
abc1                                names.txt
abc1.txt                            out_and_err
ABC.png                             part1
abc.txt                             part2
ajax-php-mysql-user-interface.html  part3
allfiles.txt                        Pictures
count                               png
cricket.txt                         pqr.text~
Desktop                             pqr.txt
Documents                           problem.txt
Downloads                           Public
etcfiles.txt                        sample.txt
examples.desktop                    sqlite3
file1.txt                           sqlite-amalgamation-3080500 (2)
file2                               sqlite-amalgamation-3080500.zip
FileA                               sqlite-shell-linux-x86-3080500.zip
FileB                               summer.png
foo                                 Summer.png
football.txt                        summer.text~
lebel1.text~                        summer.txt
lebel1.txt                          Templates
lebel2.txt                          temp.text.bz2
lebel.txt                           test1
linux-command-past-date.png         test10
mno.txt                             test2
Music                               TXT2
MyDir                               typescript
MyDir1                              Untitled 1.odt
MyDirA                              Untitled Document~
Myfile1.doc                         Videos
MYFILE1.doc                         wrong.textclear
MYFILE2.doc                         wrong.txt
MyTest                              xyz.text~
datasoft @ datasoft-linux ~$

Only the first occurrence of the searched string is replaced.

a global replace

The syntax used in the previous example can be described as s/regex/replacement/. This is simple and straightforward, you enter a regex between the first two slashes and a replacement string between the last two.

This example expands this syntax only a little, by adding a modifier.

datasoft @ datasoft-linux ~$ rename -n 's/TXT2/txt/g' aTXT2.TXT
aTXT2.TXT renamed as atxt.TXT
 datasoft @ datasoft-linux ~$

The syntax we use now can be described as s/regex/replacement/g where s signifies switch and g stands for global.

Note that this example used the -n switch to show what is being done (instead of actually renaming the file).

case insensitive replace

Another modifier that can be useful is i. this example shows how to replace a case insensitive string with another string.

datasoft @ datasoft-linux ~$ rename 's/.TXT$/.txt/' *.TXT datasoft @ datasoft-linux ~$ ls *.txt
abc1.txt      file1.txt     mno.txt     wrong.txt
abc.txt       football.txt  names.txt   xyz.txt
allfiles.txt  lebel1.txt    pqr.txt
cricket.txt   lebel2.txt    sample.txt
etcfiles.txt  lebel.txt     summer.txt
 datasoft @ datasoft-linux ~$

renaming extensions

Command line Linux has no knowledge of MS-DOS like extensions, but many end users and graphical application do use them.

Here is an example on how to use rename to only rename the file extension. It uses the dollar sign to mark the ending of the filename.

datasoft @ datasoft-linux ~$ ls *.txt
abc1.txt      file1.txt     mno.txt     wrong.txt
abc.txt       football.txt  names.txt   xyz.txt
allfiles.txt  lebel1.txt    pqr.txt
cricket.txt   lebel2.txt    sample.txt
etcfiles.txt  lebel.txt     summer.txt
datasoft @ datasoft-linux ~$ rename 's/.txt$/.TXT/' *.txt datasoft @ datasoft-linux ~$ ls *.TXT
abc1.TXT      file1.TXT     mno.TXT     wrong.TXT
abc.TXT       football.TXT  names.TXT   xyz.TXT
allfiles.TXT  lebel1.TXT    pqr.TXT
cricket.TXT   lebel2.TXT    sample.TXT
etcfiles.TXT  lebel.TXT     summer.TXT
 datasoft @ datasoft-linux ~$

Note that the dollar sign in the regex means at the end. Without the dollar sign this command would fail on the really.txt.txt file.

sed

stream editor

The stream editor or short sed uses regex for stream editing.

In this example, sed is used to replace a string.

datasoft @ datasoft-linux ~$ echo Sunday
Sunday
 datasoft @ datasoft-linux ~$ echo Sunday | sed 's/Sun/Mon/'
Monday
 datasoft @ datasoft-linux ~$

The slashes can be replaced by a couple of other characters, which can be handy in some cases to improve readability.

datasoft @ datasoft-linux ~$ echo Sunday
Sunday
 datasoft @ datasoft-linux ~$ echo Sunday | sed 's:Sun:Mon:'
Monday
 datasoft @ datasoft-linux ~$ echo Sunday | sed 's_Sun_Mon_'
Monday
 datasoft @ datasoft-linux ~$ echo Sunday | sed 's|Sun|Mon|'
Monday
 datasoft @ datasoft-linux ~$

interactive editor

While sed is meant to be used in a stream, it can also be used interactively on a file.

datasoft @ datasoft-linux ~$ echo Sunday > today
 datasoft @ datasoft-linux ~$ cat today
Sunday
 datasoft @ datasoft-linux ~$ sed -i 's/Sun/Mon/' today
 datasoft @ datasoft-linux ~$ cat today
Monday
 datasoft @ datasoft-linux ~$

simple back referencing

The ampersand character can be used to reference the searched (and found) string.

In this example, the ampersand is used to double the occurrence of the found string.

datasoft @ datasoft-linux ~$ echo Sunday | sed 's/Sun/&&/'
SunSunday
 datasoft @ datasoft-linux ~$ echo Sunday | sed 's/day/&&/'
Sundayday
 datasoft @ datasoft-linux ~$

back referencing

Parentheses (often called round brackets) are used to group sections of the regex so they can later be referenced.

Consider this simple example:

datasoft @ datasoft-linux ~$ echo Sunday | sed 's_\(Sun\)_\1ny_'
Sunnyday
 datasoft @ datasoft-linux ~$ echo Sunday | sed 's_\(Sun\)_\1ny \1_'
Sunny Sunday

a dot for any character

In a regex a simple dot can signify any character.

datasoft @ datasoft-linux ~$ echo 2014-08-09 | sed 's/....-..-../YYYY-MM-DD/'
YYYY-MM-DD
 datasoft @ datasoft-linux ~$ echo mnop-qr-st | sed 's/....-..-../YYYY-MM-DD/'
YYYY-MM-DD
 datasoft @ datasoft-linux ~$

multiple back referencing

When more than one pair of parentheses is used, each of them can be referenced separately by consecutive numbers.

datasoft @ datasoft-linux ~$ echo 2014-08-11 | sed 's/\(....\)-\(..\)-\(..\)/\1+\2+\3/'
2014+08+11
 datasoft @ datasoft-linux ~$ echo 2014-04-01 | sed 's/\(....\)-\(..\)-\(..\)/\3:\2:\1/'
01:04:2014
 datasoft @ datasoft-linux ~$

This feature is called grouping.

white space

The \s can refer to white space such as a space or a tab.

This example looks for white spaces (\s) globally and replaces them with 1 space.

datasoft @ datasoft-linux ~$ echo -e 'today\tis\thot'today	is	hot
 datasoft @ datasoft-linux ~$ echo -e 'today\tis\thot' | sed 's_\s_ _g'
today is hot
 datasoft @ datasoft-linux ~$

optional occurrence

A question mark signifies and the previous is optional.

The example below searches for three consecutive letter o, but the third o is optional.

datasoft @ datasoft-linux ~$ cat abc1.txt
11
101
1001
10001

 datasoft @ datasoft-linux ~$ grep -E '000?' abc1.txt
1001
10001
 datasoft @ datasoft-linux ~$ cat abc1 | sed 's/000\?/A/'
11
101
1A1
1A1

 datasoft @ datasoft-linux ~$

exactly n times

You can demand an exact number of times the oprevious has to occur. This example wants exactly three o's.

datasoft @ datasoft-linux ~$ cat abc1.txt
11
101
1001
10001

 datasoft @ datasoft-linux ~$ grep -E '0{3}' abc1.txt
10001
 datasoft @ datasoft-linux ~$ cat abc1.txt | sed 's/0\{3\}/A/'
11
101
1001
1A1

 datasoft @ datasoft-linux ~$

between n and m times

And here we demand exactly from minimum 2 to maximum 3 times.

datasoft @ datasoft-linux ~$ cat abc1.txt
11
101
1001
10001

 datasoft @ datasoft-linux ~$ grep -E '0{2,3}' abc1.txt1001
10001
 datasoft @ datasoft-linux ~$ grep '0\{2,3\}' abc1.txt
1001
10001
 datasoft @ datasoft-linux ~$ cat abc1.txt | sed 's/0\{2,3\}/A/'
11
101
1A1
1A1

 datasoft @ datasoft-linux ~$

bash history

The bash shell can also interprete some regular expressions. This example shows how to manipulate the exclamation mask history feature from the bash shell.

 datasoft @ datasoft-linux ~$ mkdir history
 datasoft @ datasoft-linux ~$ cd history/
datasoft @ datasoft-linux ~/history$ touch lebel1 lebel2 lebel3
 datasoft @ datasoft-linux ~/history$ ls -l lebel1
-rw-rw-r-- 1 datasoft datasoft 0 Aug 12 17:59 lebel1
 datasoft @ datasoft-linux ~/history$ !l
ls -l lebel1
-rw-rw-r-- 1 datasoft datasoft 0 Aug 12 17:59 lebel1
 datasoft @ datasoft-linux ~/history$  !l:s/1/3


l:s/3/3 :s/l/3 :s/l/3 :s/1/3
bash: l:s/3/3: No such file or directory
 datasoft @ datasoft-linux ~/history$

This also works with the history numbers in bash.

 datasoft @ datasoft-linux ~/history$ history 6
 1924  l:s/3/3 :s/l/3
 1925  clear
 1926  l:s/3/3 :s/l/3 :s/l/3
 1927  l:s/3/3 :s/l/3 :s/l/3 :s/1/3
 1928  clear
 1929  history 6
 datasoft @ datasoft-linux ~/history$ !1929
history 6
 1927  l:s/3/3 :s/l/3 :s/l/3 :s/1/3
 1928  clear
 1929  history 6
 1930  l:s/3/3 :s/l/3 :s/l/3 :s/1/3
 1931  clear
 1932  history 6


 datasoft @ datasoft-linux ~/history$ !1929:s/1/2bash: :s/1/2: substitution failed
 datasoft @ datasoft-linux ~/history$

Previous: Linux Basic Unix tools
Next: Linux users