Regular Expressions   «Prev  Next»
Lesson 4 Matching the occurrences of a pattern
Objective Create regular expressions using the ., *, and [] metacharacters.

Matching Occurrences of Pattern

Regular expression syntax provides several metacharacters for matching the occurrence of patterns. The following table shows the most common types:

CharacterDefinition
. Match a single character
* Match zero or more occurrences of the character preceding the *
[ ] Match any character within the brackets, or match a character within a range, such as [0-9], [A-Z], or [a-z]

We will use the following the paragraph to demonstrate the ., *. and [] metacharacters.
Sample Paragraph to be edited using vi
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love?
Oh, by the way, I think I lost my gloves somewhere out in that field of clover.
Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove.

Any Single Character (.)

/l.ve/
The dot (.) matches any one character, except the newline. Vi will find those lines where the regular expression consists of an l, followed by any single character, followed by a v and an e. It finds the combinations of love and live.

Zero or More of the Preceding Character (*)

/o*ve/
The asterisk (*) matches zero or more of the preceding character. [1] It is as though the asterisk were glued to the character directly before it and controls only that character. In this case, the asterisk is glued to the letter o. It matches for only the letter o and as many consecutive occurrences of the letter o as there are in the pattern, even no occurrences of o at all. Vi searches for zero or more occurrences of the letter o followed by a v and an e, finding love, loooove, lve, and so forth.

A Set of Characters ([])

/[Ll]ove/
The square brackets match for one of a set of characters. Vi will search for the regular expression containing either an uppercase or lowercase l followed by an o, v, and e.

A Range of Characters ( [ − ] )

/ove[a−z]/
The dash between characters enclosed in square brackets matches one character in a range of characters. Vi will search for the regular expression containing an o, v, and e, followed by any character in the ASCII range between a and z. Since this is an ASCII range, the range cannot be represented as [z–a].


Note the similarity between these metacharacters and the shell’s file name wildcards. The shell uses a question mark (?) to match a single character, whereas regular expressions use a dot (.). In file name matching, the shell uses * to match one or more of the preceding characters. Using * with regular expressions matches zero or more characters. Brackets work the same way in either situation.
You can combine metacharacters. For example, you can use .* to match zero or more instances of any character.
The following slide show shows some grep commands that search the file /usr/dict/words. This system file is a standard part of many versions of UNIX. It contains a list of words, one per line.

Using ., *, and [ ] in Regular Expressions

1) In this command, the regular expression matches a pattern of five characters
1) In this command, the regular expression matches a pattern of five characters, the "s.tte"
The dot (.) means any character can occur in the second position.

2) The * matches zero or more characters preceding it.
1) The * matches zero or more characters preceding it.
Hence, this regular expression finds patterns that do not necessarily begin with the letter "s".

3) By adding the dot(.), this expression will match the letter s.
2) By adding the dot(.), this expression will match the letter s, followed by zero or more characters, followed by zzle.
This time, only two lines match, sizzle and swizzle.

4) This expression uses [] to match a character range, uppercase A to H.
3) This expression uses [] to match a character range, uppercase A to H. The full expression matches all lines that contain VA, VB, VC, and so on through VH, anywhere in the line.

5) Without quotes, the [] is expanded by the shell, resulting in arguments VA, VB, VC, and so on.
4) Without quotes, the [] is expanded by the shell, resulting in arguments VA, VB, VC, and so on. The first agrument, VA, is treated as a pattern, but the remaining ones are treated as file names.


In the next lesson, you will learn how to use the caret (^) and dollar sign ($) metacharacters to match the position of a pattern.

Matching Patterns - Quiz

Click the Quiz link below to answer some questions about regular expressions, quoting, and the
., *, and [ ]

metacharacters.
Matching Patterns - Quiz

[1] Do not confuse this metacharacter with the shell wildcard (*). They are totally different. The shell asterisk matches for zero or more of any character, whereas the regular expression asterisk matches for zero or more of the preceding character.