Coming up is something that could be considered infuriating but very useful, if not the whole point of Perl.
Regular expressions (Regex) in effect, scan a scalar looking for patterns of text using specific rules.
I’m not going to do too many examples due to how variable Regexs can be.
General – Negation – Operators – Anchors – Capture – Substitute – Switches
General syntax: $string =~ /pattern/
Here are some examples of how Regexs work:
1
2
3
4
5
6
7
8
9
|
use strict;
my $string = "Copious cats can carefully creep";
if ($string =~ /cats/ ){
print "Found a "cats"!";
} else {
print "String not found";
}
|
Matches are done in a case sensitive matter unless you say otherwise, which I’ll cover later.
Using blocks of letters or numbers can accommodate for ranges.
- [A-Z] will find characters from A-Z
- [aA-zZ] will find all characters from A-Z regardless of case
- [1-9] will look for numbers between 1 to 9
- [A-G, Q,L, X-Z] will find letters A to G, Q, L, and X to Z
- Scalars work in place of text
To find special characters, escape them with a backslash:
- \^
- \$
- \\
- \.
So as you can see you can precisely configure what to search for. However with Perl things are even easier than that:
- \w for any word
- \d for any character
- \s for any whitespace
To alternate between matches use vertical bars. Eg:
- Socks|Poppy|Milly
- (Socks Poppy)|Milly will look for “Socks Poppy” and “Milly”
- (Socks | Poppy) Milly will look for “Socks/Poppy Milly”
To look for repeating strings
- (Socks){4} will look for “SocksSocksSocksSocks”
Negation
Say you don’t want to find a word, digit or special character:
- [^b] don’t find any b’s
- \W don’t find any words
- \D no digits
- \S no whitespace
Operators
When doing searches, controlling the scope of your search is vital so that you don’t match stuff that is irrelevant.
Tack these operators on to a switch or class to modify the search.
- “.” Matches any character
- “+” Matches more than 1
- “*” Matches 0 or more
- “?” Optionally
Anchors
- “^” Match at the beginning of a line
- “$” Match at the end of a line
- \b defines a word boundry, eg /\bapples\b/
Capturing Data back from a match
Say you want a certain portion of the data back from a match:
1
2
3
4
5
6
7
8
9
10
11
12
|
use strict;
my $string = "Copius cats can carefully creep";
#1
if ($string =~/(\w+)/) {
print $1."\n"; #Returns "Copious"
}
#2
if ($string =~/(\w+)\s(\w+)/) {
   print $1.$2."\n"; #Returns "CopiousCats"
}
|
#1: This will match the first word (\w) at least once (+). The $1 indicates the word to return (In this example “Copious”. This could then be stored to a scalar or whatever your needs be.
#2: Will match the first word (\w) at least once (+) and then the second word (\w) at least once (+). $2 is the second bracket and returns “cats”.
Substitutions – s/pattern/replacement/
Regexes are great for finding text and replacing with another.
1
2
3
4
5
6
7
8
|
use strict;
my $string = "Copius cats can carefully creep";
my $counter = 0;
if ($string =~ s/cat/dog/) {
print $string."\n";
}
|
The above will print Copius dogs can carefully creep
Works with any of the above as well:
1
2
3
4
5
6
7
8
|
use strict;
my $string = "Copius cats can carefully creep";
my $counter = 0;
if ($string =~ s/([cC,iI])/Q/g) {
print $string."\n";
}
|
Will print QopQus Qats Qan Qarefully Qreep
Swap text around by capturing:
1
2
3
4
5
6
7
8
|
use strict;
my $string = "Copius cats can carefully creep";
my $counter = 0;
if ($string =~ s/(cats)\s(can)/$2 $1/g) {
print $string."\n";
}
|
Prints Copius can cats carefully creep
Regex Switches
Regex can be unwieldy sometimes, restrain it with switches.
/i – Case Insensitive
Case insensitive search = $string =~ /pattern/i
1
2
3
4
5
6
7
8
9
|
use strict;
my $string = "Copius cats can carefully creep";
if ($string =~ /Cats/i ){
print "Found a cat!";
} else {
print "String not found";
}
|
This will find every occurrence where there is cat, regardless of case.
/g – Global
Probably the most useful switch, will look along the entire string for matches.
1
2
3
4
5
6
7
8
9
|
use strict;
my $string = "Copius cats can carefully creep";
my $counter = 0;
while ($string =~ /([cC])/g) {
$counter++;
}
print "There were $counter \"C\"s";
|
/e
When substituting, will force expressions to work. You could push to an array or do a summation.
1
2
3
4
5
6
7
8
9
|
use strict;
my $string = "Copius cats can carefully creep";
my @array;
my $counter = 0;
if ($string =~ s/(\w+)/push(@array ,$1)/eg) {
print $string."\n";
}
print @array;
|
Will return Copiuscatscancarefullycreep (the array)
/m – Multiline
If you have a string that has \n newline characters in it, use /m to account for it. If doing a global search use /mg
/s – Single line
If you have a multi line string, it will search the first line.
/x
/x will let you embed comments into Regexs, great for testing.
Right, so you’ve got that yeah? Me neither. Just fiddle around with Perl and get to know (and hate) Regexes.
Enjoy!