Joe's Notes on Perl #5 - Regular Expressions (Regex)

Coming up is something that could be considered infuriating but very useful, if not the whole point of Perl.

Regular expressions (Regex) in effect, scan a scalar looking for patterns of text using specific rules.

I’m not going to do too many examples due to how variable Regexs can be.

This will be you in no time! – XKCD

General – Negation – Operators – Anchors – Capture – Substitute – Switches

General syntax: $string =~ /pattern/

Here are some examples of how Regexs work:

use strict;

my $string = "Copious cats can carefully creep";

if ($string =~ /cats/ ){
  print "Found a "cats"!";
} else {
  print "String not found";
}

use strict;

my $string = "Copious cats can carefully creep";

if ($string =~ /cats/ ){

print "Found a "cats"!";

} else {

print "String not found";

}

Matches are done in a case sensitive matter unless you say otherwise, which I’ll cover later.

Using blocks of letters or numbers can accommodate for ranges.

[A-Z] will find characters from A-Z
[aA-zZ] will find all characters from A-Z regardless of case
[1-9] will look for numbers between 1 to 9
[A-G, Q,L, X-Z] will find letters A to G, Q, L, and X to Z
Scalars work in place of text

To find special characters, escape them with a backslash:

So as you can see you can precisely configure what to search for. However with Perl things are even easier than that:

\w for any word
\d for any character
\s for any whitespace

To alternate between matches use vertical bars. Eg:

Socks|Poppy|Milly
(Socks Poppy)|Milly will look for “Socks Poppy” and “Milly”
(Socks | Poppy) Milly will look for “Socks/Poppy Milly”

To look for repeating strings

(Socks){4} will look for “SocksSocksSocksSocks”

Negation

Say you don’t want to find a word, digit or special character:

[^b] don’t find any b’s
\W don’t find any words
\D no digits
\S no whitespace

Operators

When doing searches, controlling the scope of your search is vital so that you don’t match stuff that is irrelevant.

Tack these operators on to a switch or class to modify the search.

“.” Matches any character
“+” Matches more than 1
“*” Matches 0 or more
“?” Optionally

Anchors

“^” Match at the beginning of a line
“$” Match at the end of a line
\b defines a word boundry, eg /\bapples\b/

Capturing Data back from a match

Say you want a certain portion of the data back from a match:

use strict;

my $string = "Copius cats can carefully creep";

#1
if ($string =~/(\w+)/) {
 print $1."\n"; #Returns "Copious"
}
#2
if ($string =~/(\w+)\s(\w+)/) {
Â Â  Â print $1.$2."\n"; #Returns "CopiousCats"
}

use strict;

my $string = "Copius cats can carefully creep";

if ($string =~/(\w+)/) {

print $1."\n"; #Returns "Copious"

}

if ($string =~/(\w+)\s(\w+)/) {

Â Â Â print $1.$2."\n"; #Returns "CopiousCats"

}

#1: This will match the first word (\w) at least once (+). The $1 indicates the word to return (In this example “Copious”. This could then be stored to a scalar or whatever your needs be.

#2: Will match the first word (\w) at least once (+) and then the second word (\w) at least once (+). $2 is the second bracket and returns “cats”.

Substitutions – s/pattern/replacement/

Regexes are great for finding text and replacing with another.

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;
if ($string =~ s/cat/dog/) {
 print $string."\n";
}

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;

if ($string =~ s/cat/dog/) {

print $string."\n";

}

The above will print Copius dogs can carefully creep

Works with any of the above as well:

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;
if ($string =~ s/([cC,iI])/Q/g) {
  print $string."\n";
}

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;

if ($string =~ s/([cC,iI])/Q/g) {

print $string."\n";

}

Will print QopQus Qats Qan Qarefully Qreep

Swap text around by capturing:

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;
if ($string =~ s/(cats)\s(can)/$2 $1/g) {
 print $string."\n";
}

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;

if ($string =~ s/(cats)\s(can)/$2 $1/g) {

print $string."\n";

}

Prints Copius can cats carefully creep

Regex Switches

Regex can be unwieldy sometimes, restrain it with switches.

/i – Case Insensitive

Case insensitive search = $string =~ /pattern/i

use strict;

my $string = "Copius cats can carefully creep";

if ($string =~ /Cats/i ){
 print "Found a cat!";
} else {
 print "String not found";
}

use strict;

my $string = "Copius cats can carefully creep";

if ($string =~ /Cats/i ){

print "Found a cat!";

} else {

print "String not found";

}

This will find every occurrence where there is cat, regardless of case.

/g – Global

Probably the most useful switch, will look along the entire string for matches.

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;
while ($string =~ /([cC])/g) {
  $counter++;
}
print "There were $counter \"C\"s";

use strict;

my $string = "Copius cats can carefully creep";

my $counter = 0;

while ($string =~ /([cC])/g) {

$counter++;

}

print "There were $counter \"C\"s";

/e

When substituting, will force expressions to work. You could push to an array or do a summation.

use strict;

my $string = "Copius cats can carefully creep";
my @array;
my $counter = 0;
if ($string =~ s/(\w+)/push(@array ,$1)/eg) {
  print $string."\n";
}
print @array;

use strict;

my $string = "Copius cats can carefully creep";

my @array;

my $counter = 0;

if ($string =~ s/(\w+)/push(@array ,$1)/eg) {

print $string."\n";

}

print @array;

Will return Copiuscatscancarefullycreep (the array)

/m – Multiline

If you have a string that has \n newline characters in it, use /m to account for it. If doing a global search use /mg

/s – Single line

If you have a multi line string, it will search the first line.

/x

/x will let you embed comments into Regexs, great for testing.

Right, so you’ve got that yeah? Me neither. Just fiddle around with Perl and get to know (and hate) Regexes.

Enjoy!

XKCD

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Sleep mode...

Joe’s Notes on Perl #5 – Regular Expressions (Regex)

General syntax: $string =~ /pattern/

Negation

Operators

Anchors

Capturing Data back from a match

Substitutions – s/pattern/replacement/

Regex Switches

/i – Case Insensitive

/g – Global

/e

/m – Multiline

/s – Single line

/x