Perl Regular Expression Character Classes

Summary: in this tutorial, you will learn how to use character classes to build regular expressions that represent the whole classes of characters.

In the previous Perl regular expresssion tutorial, we’ve built regular expressions with literal strings, for example /world/. However, regular expression engine allows you to build regular expressions that represent not just only a single character sequence but also a whole class of them, for example, digits, whitespace and words.

You use square brackets [] to denote the character classes in regular expressions. Inside the square brackets, you put a set of characters that possibly matches.

Take a look at the following example:

#!/usr/bin/perl use warnings; use strict; my @words = ('dog','fog','chicken'); for(@words){ print("$_\n") if(/[dfr]og/); # matches dog, fog, and rog }
Code language: Perl (perl)

The regular expression /[dfr]og/ matches dog, fog, rog, etc.

The regular expression below:

/elem[012345]/; # matches 'elem0', 'elem1'... or 'elem5'
Code language: PHP (php)

matches elem0 to elem5.

If you want to match from elem0 to elem1000, you can use range operator (-) within the character classes, for examples:

/[elem[0-1000]]/; /[a-z]1/; matches a1,b1,... to z1
Code language: Perl (perl)

To make the regular expressions more readable, Perl provides useful predefined abbreviations for common character classes as shown below:

  • \d matches a digit, from 0 to 9 [0-9]
  • \s matches a whitespace character, that is a space, tab, newline, carriage return, formfeed. [\t\n\r\f]
  • \w matches a “word” character (alphanumeric or _) [0-9a-zA-Z_].

And the following are negative predefined character classes:

  • \D matches any non-digits [^0-9]
  • \S matches any non-whitespace character [^\t\n\r\f]
  • \W matches any non-word character [^0-9a-zA-Z_]

Since version 5.14, Perl uses //a modifier to restrict the matches of \d \s \w to characters in ASCII range.

The following example demonstrates how to use predefined character classes:

#!/usr/bin/perl use warnings; use strict; my $time = localtime(); print $time, "\n"; print "Match found\n" if($time =~ /\d\d:\d\d:\d\d/);
Code language: Perl (perl)

The output of the program is:

Fri Dec 28 11:11:13 2012 Match found
Code language: Perl (perl)

We used the localtime() function to get the formatted date and time as an ASCII string. The regular expression /\d\d:\d\d:\d\d/ matched the time in hh:mm:ss format.

In this tutorial, you’ve learned how to use character classes in regular expressions to match the whole classes of characters.

Was this tutorial helpful ?