Regular Expressions   «Prev  Next»
Lesson 2 What is a regular expression?
Objective Describe regular expressions.

Define Regular Expression in Unix

You should already be familiar with the grep command, which has this general form:
% grep pattern file

grep searches one or more files for the specified pattern and displays all lines containing that pattern. The pattern argument is a regular expression[1]. Using regular expressions to perform searches often is called pattern matching [2].
In the simplest cases, a regular expression is a literal string[3] or sequence of characters. For example, bat is a regular expression that describes the literal characters b, a, and t, strung together. It describes a pattern found in words like bat, bath, or acrobat.
Regular expressions also can include special characters that let you perform “wildcard searches.” For example, the regular expression b[ae]t describes a pattern such as bat or bet. The brackets ([ ]) are an example of metacharacters[4], also called regular expression syntax.
Do not confuse regular expressions with the wildcard patterns used to match file names. File-matching wildcards are used by the shell to match the names of files. By contrast, regular expressions are used by programs to search the contents of files. Some common programs that use regular expression syntax are grep, more, and vi. Beginners often are confused because some special characters, such as * and [ ], are used both as regular expression metacharacters and as file name wildcards.


Searching Strings in Files

Sometimes, the user wants to search some string of characters in a file, such as to search the record of some student whose family name is Gould in the freshman file of the last section. The greg command can be used to do this. The name "grep" comes from the g/re/p command of the ed (a Unix line editor). The g/re/p means "globally search for a regular expression and print all lines containing it."
The regular expressions are a set of UNIX rules that can be used to specify one or more items in a singly character string. The regular expressions work like wildcards (in fact, wildcards belong to the regular expressions), that is, they can save typing in many characters by using representation rules. But the level of support for regular expressions is not the same for different tools. The grep command searches a file or files for lines that have a certain pattern. The syntax is:
$ grep [options] character-string file(s)

In the next lesson, you will learn how quotes affect the shell’s interpretation of regular expressions.
[1]regular expression: A regular expression describes a pattern using literal characters and optional metacharacters known as regular expression syntax.
[2]pattern matching: Pattern matching is the task of using regular expressions to search for text.
[3]string: A string is a sequence of characters.
[4]metacharacter: A metacharacter is a character with special meaning in regular expressions and is not treated literally. Examples include the * and . metacharacters.