2. Regular expressions
We usually talk about regular expressions and
pattern matching within the context of scripting
language such as Perl or Shell script.
Lets us look at pattern matching using regular
expression with Perl scripting language
http://arraylist.blogspot.com
3. Regular expressions
Pattern matching in Perl occurs using a match
operator such as
m// or m:: or m,,
Example – m/simple/
Here the text “simple” is matched against ? - $_
$_ is the default scalar variable in Perl.
http://arraylist.blogspot.com
4. Regular expressions
Metacharacters have to be preceded with a
during pattern matching.
Metacharacters ^ $ ( ) | @ [ { ? . + *
So to match m/$10/ we write m/$10/
http://arraylist.blogspot.com
5. Regular expressions
m// if we use // as delimiters – we can avoid
the character m during pattern matching.
So m/simple/ can be /simple/
To match variables using regex simply use /
$varname/
http://arraylist.blogspot.com
6. Regular expressions
Metacharacters . ^ $ ( ) | @ [ { ? + *
. Matches a single character
Example /d.t/ matches dot, dit, d t
If we want . to behave as a non-metacharacter
we preceed it with a
Thus /d.t/ matches d.t
http://arraylist.blogspot.com
7. Regular expressions
Metacharacters . ^ $ ( ) | @ [ { ? + *
Special characters
n – newline
r – carriage return
t – tab
f – formfeed
Special characters take the same meaning
inside // during pattern matching
http://arraylist.blogspot.com
8. Regular expressions
Quantifiers – tells the regex , how many times a
pattern should be matched.
“+” match minimum once or as many times as it
occurs
Example /go+d/ matches good but not god
“*” matches preceding character 0 or more times
Example /hik*e/ matches hike, hie – matches k 0
or more times between hi and e
“?” matches preceding character 0 or 1 times but
not more.
Example /h?ello/ matches hello or ello but not
hhello
http://arraylist.blogspot.com
9. Regular expressions
{} matched characters specified number of times
/a{5,10}/ - matches the character a at least 5
times ,
but no more than 10 times
/a{5,}/ - matches 5 and more times.
/a{0,2}/ - matches 0 or at the most 2 times.
/a{5}/ - match exactly six times
.* - matches anything between 2 set of characters
/hello.*world/ matches “hello Joe welcome to the
world”
http://arraylist.blogspot.com
10. Regular expressions
Square brackets [] and character class
[abcd] – match any of the characters a, b, c, d
[a-d] – also means the same thing
[ls]Aa[rs] – match uppercase A or lowercase a
[0-9] – match a digit
[A-Za-z]{5} - match any group of 5 alphabetic
characters
[^a-z] - match all capital case letters - ^ is a
negation
[*!@#$%&()] - match any of these characters
http://arraylist.blogspot.com
11. Regular expressions
Special Character classes
w – match a word character same as [a-Za-z]
W – match non-word characters
d –match a digit [0-9]
D- match a non-digit
s - match a whitespace character
S - match a non-whitespace character
Example - /d{3}/ - match 3 digits
/sw+s/ - match a group of words surrounded
by white space
http://arraylist.blogspot.com
12. Regular expressions
Alternation and Anchors
Alternation uses | which means “or”
Eg. /tea|coffee/ check if string contains tea or
coffee
Grouping with alternation
Eg. /(fr|bl|cl)og/ if string contains frog or blog
or clog
Anchors let you tell where you want to look for a
character
^ - caret .eg. /^tea/ matches tea only if it occurs
at the beginning of the line
$ - dollar sign .eg. /sample$/ matches sample only
at the end of the line.
http://arraylist.blogspot.com
13. Regular expressions
Substitution
Syntax – s///
s/searchstring/replacementstring/
Eg. $_ = “lies does not make sense”
s/lies/truth/ “truth does not make sense”
Instead of / you can use # as a substitution
operator
Example . s#lies#truth#;
http://arraylist.blogspot.com