This document provides an overview of regular expressions (RegEx). It explains what RegEx is, some basic RegEx syntax like character classes and anchors, and common uses of RegEx like searching logs and programming. Useful regular expressions are also included, such as for social security numbers, phone numbers, and email addresses. Questions are welcomed at the end.
3. What is RegEx?
“In computing, a regular
expression (abbreviated regex or regexp) is a
sequence of characters that forms a search
pattern, mainly for use in pattern
matching with strings, or string matching, i.e.
"find and replace"-like operations. “ - Wikipedia
4. • “Some people, when confronted with a
problem, think ‘I know, I'll use regular
expressions.’ Now they have two problems.” Jamie Zawinski
5. Why RegEx?
• Tools use it: Nessus, Burp, W3AF
• All programming languages use it
• Excellent tool to have in the toolbox
6. RegEx Basics: Literal Matches
Literal Matches
‘bat’ matches ‘bat’
12 special characters - ^ $ . | ? * + ( ) [ ]
These must be escaped ‘’ ‘$’
.
‘.at’ Matches ‘bat’, ‘cat’, and ‘hat’
7. RegEx Basics: Characture Classes
Character Classes
• -- [ ]
‘[bc]at’ will match ‘bat’ or ‘cat’
• --[^ ]
[^A-Z] will match any character that is not a capitol
letter
8. RegEx Basics: Shorthand Character Classes
Shorthand Character Classes
• d
Same as [0-9]
• D
Same as [^0-9]
• w
Same as [0-9A-Za-z_]
• W
Same as [^0-9A-Za-z_]
• s
tab, line feed, form feed, carriage return, and space
• S
Anything other than tab, line feed, etc.
9. RegEx Basics: Anchors
Anchors
• ^
Beginning of line
‘rpm -qa|grep ^ao’ would list all packages that start with
‘ao’
• $
End of line
‘[0-9][0-9][0-9]$’ would find all instances when a line
ended with 3 consecutive digits
• b b
Word boundary
‘bW.n*b’ looks for words that begin with ‘W’ followed by
any character followed by ‘n’ followed by zero or more
characters
‘Win’ ‘Windows’ ‘Won’ ‘Wonton’ ‘Winter’
‘Wonderland’ ‘Wonder’ all match
11. RegEx Basics: Groups
Groups
• --( )
Defines the scope and precedence of operators
‘Write(ln)?’ matches ‘Write’ and
‘Writeln’
• -- |
OR
‘Gr(a|e)y’ matches ‘Gray’ and ‘Grey’
‘(ITSO|OITS)’ matches ‘ITSO’ or ‘OITS’
12. RegEx Basics: Quantification
Quantification
Shows how often a token or group is allowed to
occur
• ?
Zero or one
‘a?’ will match ‘’ and ‘a’
• *
Zero or more
‘a*’ will match ‘’ and ‘a’ and ‘aaaaaaaaa’
13. RegEx Basics: Quantification (Cont.)
Quantification
Shows how often a token or group is allowed to
occur
• +
One or more
‘a+’ will match ‘a’ and ‘aaaaaaaaaaaa’
• {,}
Minimum and Maximum
‘a{3,7}’ will match between 3 and 7 ‘a’