Advance string search in text files using regular expressions in Linux

98 VIEWS

In addition to word and phrase searches, you can use grep to search for complex text patterns called “regular expressions.” A regular expression — or “regexp” — is a text string of special characters that specifies a set of patterns to match.

Technically speaking, the word or phrase patterns are regular expressions — just very simple ones. In a regular expression, most characters, including letters and numbers, represent themselves. For example, the regexp pattern 1 matches the string ‘1’, and the pattern boy matches the string ‘boy’.

There are a number of reserved characters called metacharacters that do not represent themselves in a regular expression, but they have a special meaning that is used to build complex patterns. These metacharacters are as follows: ., *, [, ], ˆ, $, and \.

It is good to note that such metacharacters are common among almost all of common and special Linux distributions. Here is a good article that covers special meanings of the metacharacters and gives examples of their usage. Without further ado, here are 5 hands-on examples that use regular expressions.

Ex 1: Matching Lines That Do not Contain a Regexp

To output all lines in a text that do not have a given pattern, use grep with the ‘-v’ option. This option reverts the sense of matching, selecting all non-matching lines.

To output all lines in ‘/usr/dict/words’ that are not four characters wide, type:

$ grep -v ’ˆ....$’
 

To output all lines in ‘access_log’ that do not contain the string ‘https’, type:

$ grep -v https access_log
 

Ex 2: Matching Lines That Only Contain Certain Characters

To match lines that only contain certain characters, use the regexp ‘ˆ[characters]*$’, where characters are the ones to match.
To output lines in ‘/usr/dict/words’ that only contain vowels, type:

$ grep -i ’ˆ[aeiou]*$’ /usr/dict/words
 

The ‘-i’ option matches characters regardless of case; so, in this example, all vowel characters are matched regardless of case.

Ex 3: Finding Phrases Regardless of Spacing

One way to search for a phrase that might occur with extra spaces between words, or across a line or page break, is to remove all linefeeds and extra spaces from the input, and then grep that.
To do this, pipe the input to tr with ‘’\r\n:\>\|-’’ as an argument to the ‘-d’ option (removing all line breaks from the input); pipe that to the fmt filter with the ‘-u’ option (outputting the text with uniform spacing); and pipe that to grep with the pattern to search for.

To search across line breaks for the string ‘at the same time as’ in the file ‘docs’, type:

$ cat docs | tr -d ’\r\n:\>\|-’ | fmt -u | grep ’at the same time as’
 

Ex 4: Finding Patterns in Certain Contexts

To search for a pattern that only occurs in a particular context, grep for the context in which it should occur, and pipe the output to another grep to search for the actual pattern.

For example, this can be useful to search for a given pattern only when it is quoted with a ‘>’ character in an email message.

To list lines from the file ‘archive’ that contain the word ‘narrative’ only when it is quoted, type:

$ grep ’ˆ>’ archive | grep narrative

You can also reverse the order and use the ‘-v’ option to output all lines containing a given pattern that are not in a given context.

To list lines from the file ‘archive’ that contain the word ‘narrative’, but not when it is quoted, type:

$ grep narrative archive | grep -v ’ˆ>’

Ex 5: Using a List of Regexps to Match From

You can keep a list of regexps in a file, and use grep to search text for any of the patterns in the file. To do this, specify the name of the file containing the regexps to search for as an argument to the ‘-f’ option.

This can be useful, for example, if you need to search a given text for a number of words — keep each word on its own line in the regexp file.

To output all lines in ‘/usr/dict/words’ containing any of the words listed in the file ‘forbidden-words’, type:

$ grep -f forbidden-words /usr/dict/words

To output all lines in ‘/usr/dict/words’ that do not contain any of the words listed in ‘forbidden-words’, regardless of case, type:

$ grep -v -i -f forbidden-words /usr/dict/words
 

Summary

In this article, we discussed 5 advance examples of using Grep command in Linux for searching and finding strings in text files. Also, we learned how to use regular expressions in conjunction with Grep command to run complex searches on text files. By now you realize the power of Linux command lines for data parsing and management.

Resources for System Administrators
1. Linux System Admin Guide- What is Linux Operating System and how it works
2. Linux System Admin Guide- Overview of Linux Virtual Memory and Disk Buffer Cache
3. Linux System Admin Guide- Best Practices for Monitoring Linux Systems
4. Linux System Admin Guide- Best Practices for Performing Linux Boots and Shutdowns
5. Linux System Admin Guide- Best Practices for Making and Managing Backup Operations

Resources for Linux Kernel Programmers
1. How Linux Operating System Memory Management works
2. Comprehensive Review of Linux Kernel Operating System Processes
3. What are mechanisms behind Linux Kernel task management

Linux File System Dictionary
Comprehensive Review of How Linux File and Directory System Works


Matt Zand is the founder of High School Technology Services, DC Web Makers and Coding Bootcamps. He has written extensively on advance topics on web design, mobile app development and blockchain. He is a senior editor at Touchstone Words where he writes and reviews coding and technology articles. He is also senior instructor and developer living in Washington DC. You can follow him on Linkedin.


Discussion

Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu