SlideShare a Scribd company logo
1 of 26
Looking for Patterns - Finding
them with Regular
Expressions
Presented by Keith Wright
One Course Source
keith@OneCourseSource.com
From http://xkcd.com/1171/
If this is how you think of regular expression now…
Regular expressions…
REGULAR EXPRESSIONS ARE…
➢Strings used to search for patterns in text
➢More powerful than wildcards
➢Available in many programming languages and
programs
➢Also known as "regexp", "RegEx", and "RE"
RE DOS AND DON'TS…
✔ Input Validation
✔ Data Extraction
✔ Data Elimination
✔ Search/Replace
Do this… Don't do this…
✗Parsing
✗Allow publicly available searches
✗Use where better tools exists
✗Where using a procedure would be better
RE ARE AVAILABLE IN…AND MORE!
 .NET
 C#
 Delphi
 Java
 JavaScript
 Perl
 PCRE
 PHP
 Python
 Ruby
 Tcl
 PowerShell
POSIX PROGRAMS USING RE
awk
pattern scanning and
processing language
find
utility to search for files
grep
utility to print lines
matching a pattern
sed
stream editor for filtering
and transforming text
POSIX PROGRAMS SUPPORT RE…
Basic Regular Expressions (BRE)
Character classes [ ]
Named Character classes
[[:digit:]]
Asterisk *
Dot .
Carat ^
Dollar $
Backslashed Braces { }
Backslashed Parens ( )
Extended Regular Expressions (ERE)
Question mark ?
Plus sign +
Pipe symbol |
Braces { }
Parentheses ( )
All other BRE
grep [options] 'pattern' [file…]
grep is command line tool for
printing lines that match a pattern
Useful for demonstrating how
regular expressions work
By default, grep interprets regular
expressions as BRE
Using egrep, or grep -E interprets
regular expressions as ERE
• --color=auto highlights the part of the
line that matched the pattern
• -i is used to make grep case-
insensitive
• -c is used to have grep report a count
of the lines that matched
• -v is used to print the lines that don't
match the pattern
BASIC RE LITERALS
Alphanumeric characters and
non-regular expression
characters match themselves
Regular expression characters
will match themselves if
preceded by the backslash
character
RE DOT (PERIOD)
The dot . will match any single
character
To match the dot itself, it must be
preceded by a backslash
The RE .* is used to match an
entire string
RE CHARACTER CLASSES
Character classes match a single
character in the list or range enclosed
by brackets [ ]
If the first character enclosed is the
carat ^, then the list or range is
negated
To match the right square bracket ] it
must be the first character enclosed.
To not match it, it must be the second
character after a carat
To match a hyphen, it can be the first
or last character enclosed. To not
match it, it must be the second
character after a carat
RE NAMED CHARACTER CLASSES
Named character classes must
be enclosed in brackets like
[[:xdigit:]]
Many are available: [:alnum:],
[:alpha:], [:cntrl:], [:digit:],
[:graph:], [:lower:], [:print:],
[:punct:], [:space:], [:upper:],
and [:xdigit:]
RE CARAT ANCHOR
The character after the carat
character ^ must appear at the
beginning of the text
If used as the first character in
square brackets, it negates the list
or range of characters
If preceded by the backslash, the
carat character loses it's special
meaning
RE DOLLAR SIGN ANCHOR
The character before the dollar
sign character $ must appear at
the end of the text
If not at the end of the regular
expression, then the dollar sign
loses it's special meaning
When combined with the carat
character ^, the dollar sign
character $ must match the entire
text
RE REPETITION
Basic Regular Expressions
* preceding item repeated zero or more
times or {0,}
+ preceding item repeated one or more
times or {1,}
? preceding item is optional or {0,1}
{n} preceding item repeated exactly n
times
{n,} preceding item repeated n or more
times
{,m} preceding item matched at most m
times
{n,m} preceding item matched at least n
times, but not more than m times
Extended Regular Expressions
* preceding item repeated zero or more
times or {0,}
+ preceding item repeated one or more
times or {1,}
? preceding item is optional or {0,1}
{n} preceding item repeated exactly n
times
{n,} preceding item repeated n or more
times
{,m} preceding item matched at most m
times
{n,m} preceding item matched at least n
times, but not more than m times
RE ASTERISK
The asterisk * will match zero or
more of the item that precedes it
The asterisk is equivalent to the
BRE {0,} and the ERE {0,}
expressions for zero or more
A single item followed by an
asterisk will always match
To match an asterisk, it can be
preceded by a backslash
RE PLUS SIGN
In BRE, the backslashed plus sign +
will match one or more of the item
that precedes it
In ERE, the plus sign + will match one
or more of the item that precedes it
The plus sign is equivalent to the
BRE {1,} and the ERE {1,}
expressions for one or more
In BRE, the plus sign matches itself. In
ERE to match a plus sign, it can be
preceded by a backslash
RE QUESTION MARK
In BRE, the backslashed
question mark ? optionally
matches the item that
precedes it
In ERE, the question mark will
optionally match the item that
precedes it
The question mark equivalent
to the BRE {0,1} and the ERE
{0,1} expressions for zero to one
In BRE, the question mark
matches itself. In ERE to match
a question mark, it can be
preceded by a backslash
RE GROUPING
In BRE, the backslashed parentheses ( and ) are
used to create groups of characters that may
repeat as specified by repetition expressions
In ERE, the parentheses ( and ) are used to create
groups of characters that may repeat as specified
by repetition expressions
In BRE, the parentheses will match themselves, and
in ERE they can be matched if backslashed
RE ALTERNATION
In ERE, the pipe symbol | can
be used to perform alternation
Alternation allows for two or
more alternatives to match as
separated by the pipe symbol |
In BRE, the pipe symbol | will
match itself, and in ERE it will
match if backslashed
PERL US POSTAL CODE EXAMPLE
^d{5}((-|s)?d{4})?$
^ - Starts with
d{5} - exactly five digits
()? - optional group (two)
-|s - hyphen or whitespace
d{4} - exactly four digits
$ - Ends with
To use the perl debugger
type:
perl -d -e1
PERL CHARACTER SEQUENCES
w Alphanumeric and _ (word
characters)
W Not word characters
d Digit characters
D Not digit characters
s Whitespace characters
S Not whitespace characters
b Word boundaries
• grep supports the perl character
sequences in ERE except d
and D
PYTHON PROTOCOL EXAMPLE
(mailto:|(news|(ht|f)tp(s?))://){1}
(){1} - group repeats only once
mailto: - mailto followed by a
colon
| - separates alternatives
news|(ht|f)tp - news, http or ftp
(ht|f)tp(s?) - optional s added
:// - added to news, http, https,
ftp, or ftps
• To start the python shell type:
python
USE THE LIBRARY
RegExLib.com
The Regular Expression Library
Comes with a cheat sheet
A Regular Expression tester
Search thousands of rated expressions
You don't have to reinvent the wheel!
From http://xkcd.com/208/
About One Course Source
➢Online public classes (Linux, Programming & Security)
➢Custom corporate classes
➢Develop custom training programs
www.OneCourseSource.com

More Related Content

What's hot

The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++Anjesh Tuladhar
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
Introduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RHellen Gakuruh
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsDanny Bryant
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex powerMax Kleiner
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.netRegular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.netProgrammer Blog
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsMatt Casto
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Python (regular expression)
Python (regular expression)Python (regular expression)
Python (regular expression)Chirag Shetty
 
Processing Regex Python
Processing Regex PythonProcessing Regex Python
Processing Regex Pythonprimeteacher32
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHPAndrew Kandels
 
Regular Expression (Regex) Fundamentals
Regular Expression (Regex) FundamentalsRegular Expression (Regex) Fundamentals
Regular Expression (Regex) FundamentalsMesut Günes
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101Raj Rajandran
 

What's hot (20)

The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Introduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_R
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex power
 
Regular Expressions in Stata
Regular Expressions in StataRegular Expressions in Stata
Regular Expressions in Stata
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.netRegular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.net
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Python (regular expression)
Python (regular expression)Python (regular expression)
Python (regular expression)
 
PHP Regular Expressions
PHP Regular ExpressionsPHP Regular Expressions
PHP Regular Expressions
 
Processing Regex Python
Processing Regex PythonProcessing Regex Python
Processing Regex Python
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regex posix
Regex posixRegex posix
Regex posix
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHP
 
Regular Expression (Regex) Fundamentals
Regular Expression (Regex) FundamentalsRegular Expression (Regex) Fundamentals
Regular Expression (Regex) Fundamentals
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101
 

Similar to Finding Patterns with Regular Expressions

Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
 
Chapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular ExpressionChapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular Expressionazzamhadeel89
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracleLogan Palanisamy
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular ExpressionBinsent Ribera
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptxDurgaNayak4
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in PythonSujith Kumar
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy
 
Regular Expression Cheat Sheet
Regular Expression Cheat SheetRegular Expression Cheat Sheet
Regular Expression Cheat SheetSydneyJohnson57
 
Module 3 - Regular Expressions, Dictionaries.pdf
Module 3 - Regular  Expressions,  Dictionaries.pdfModule 3 - Regular  Expressions,  Dictionaries.pdf
Module 3 - Regular Expressions, Dictionaries.pdfGaneshRaghu4
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressionsYamagata Europe
 

Similar to Finding Patterns with Regular Expressions (20)

Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
Chapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular ExpressionChapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular Expression
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regex lecture
Regex lectureRegex lecture
Regex lecture
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular Expression
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
 
Regular expressions using Python
Regular expressions using PythonRegular expressions using Python
Regular expressions using Python
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
Regular Expression Cheat Sheet
Regular Expression Cheat SheetRegular Expression Cheat Sheet
Regular Expression Cheat Sheet
 
Les08
Les08Les08
Les08
 
Module 3 - Regular Expressions, Dictionaries.pdf
Module 3 - Regular  Expressions,  Dictionaries.pdfModule 3 - Regular  Expressions,  Dictionaries.pdf
Module 3 - Regular Expressions, Dictionaries.pdf
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Finding Patterns with Regular Expressions

  • 1. Looking for Patterns - Finding them with Regular Expressions Presented by Keith Wright One Course Source keith@OneCourseSource.com
  • 2. From http://xkcd.com/1171/ If this is how you think of regular expression now… Regular expressions…
  • 3. REGULAR EXPRESSIONS ARE… ➢Strings used to search for patterns in text ➢More powerful than wildcards ➢Available in many programming languages and programs ➢Also known as "regexp", "RegEx", and "RE"
  • 4. RE DOS AND DON'TS… ✔ Input Validation ✔ Data Extraction ✔ Data Elimination ✔ Search/Replace Do this… Don't do this… ✗Parsing ✗Allow publicly available searches ✗Use where better tools exists ✗Where using a procedure would be better
  • 5. RE ARE AVAILABLE IN…AND MORE!  .NET  C#  Delphi  Java  JavaScript  Perl  PCRE  PHP  Python  Ruby  Tcl  PowerShell
  • 6. POSIX PROGRAMS USING RE awk pattern scanning and processing language find utility to search for files grep utility to print lines matching a pattern sed stream editor for filtering and transforming text
  • 7. POSIX PROGRAMS SUPPORT RE… Basic Regular Expressions (BRE) Character classes [ ] Named Character classes [[:digit:]] Asterisk * Dot . Carat ^ Dollar $ Backslashed Braces { } Backslashed Parens ( ) Extended Regular Expressions (ERE) Question mark ? Plus sign + Pipe symbol | Braces { } Parentheses ( ) All other BRE
  • 8. grep [options] 'pattern' [file…] grep is command line tool for printing lines that match a pattern Useful for demonstrating how regular expressions work By default, grep interprets regular expressions as BRE Using egrep, or grep -E interprets regular expressions as ERE • --color=auto highlights the part of the line that matched the pattern • -i is used to make grep case- insensitive • -c is used to have grep report a count of the lines that matched • -v is used to print the lines that don't match the pattern
  • 9. BASIC RE LITERALS Alphanumeric characters and non-regular expression characters match themselves Regular expression characters will match themselves if preceded by the backslash character
  • 10. RE DOT (PERIOD) The dot . will match any single character To match the dot itself, it must be preceded by a backslash The RE .* is used to match an entire string
  • 11. RE CHARACTER CLASSES Character classes match a single character in the list or range enclosed by brackets [ ] If the first character enclosed is the carat ^, then the list or range is negated To match the right square bracket ] it must be the first character enclosed. To not match it, it must be the second character after a carat To match a hyphen, it can be the first or last character enclosed. To not match it, it must be the second character after a carat
  • 12. RE NAMED CHARACTER CLASSES Named character classes must be enclosed in brackets like [[:xdigit:]] Many are available: [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]
  • 13. RE CARAT ANCHOR The character after the carat character ^ must appear at the beginning of the text If used as the first character in square brackets, it negates the list or range of characters If preceded by the backslash, the carat character loses it's special meaning
  • 14. RE DOLLAR SIGN ANCHOR The character before the dollar sign character $ must appear at the end of the text If not at the end of the regular expression, then the dollar sign loses it's special meaning When combined with the carat character ^, the dollar sign character $ must match the entire text
  • 15. RE REPETITION Basic Regular Expressions * preceding item repeated zero or more times or {0,} + preceding item repeated one or more times or {1,} ? preceding item is optional or {0,1} {n} preceding item repeated exactly n times {n,} preceding item repeated n or more times {,m} preceding item matched at most m times {n,m} preceding item matched at least n times, but not more than m times Extended Regular Expressions * preceding item repeated zero or more times or {0,} + preceding item repeated one or more times or {1,} ? preceding item is optional or {0,1} {n} preceding item repeated exactly n times {n,} preceding item repeated n or more times {,m} preceding item matched at most m times {n,m} preceding item matched at least n times, but not more than m times
  • 16. RE ASTERISK The asterisk * will match zero or more of the item that precedes it The asterisk is equivalent to the BRE {0,} and the ERE {0,} expressions for zero or more A single item followed by an asterisk will always match To match an asterisk, it can be preceded by a backslash
  • 17. RE PLUS SIGN In BRE, the backslashed plus sign + will match one or more of the item that precedes it In ERE, the plus sign + will match one or more of the item that precedes it The plus sign is equivalent to the BRE {1,} and the ERE {1,} expressions for one or more In BRE, the plus sign matches itself. In ERE to match a plus sign, it can be preceded by a backslash
  • 18. RE QUESTION MARK In BRE, the backslashed question mark ? optionally matches the item that precedes it In ERE, the question mark will optionally match the item that precedes it The question mark equivalent to the BRE {0,1} and the ERE {0,1} expressions for zero to one In BRE, the question mark matches itself. In ERE to match a question mark, it can be preceded by a backslash
  • 19. RE GROUPING In BRE, the backslashed parentheses ( and ) are used to create groups of characters that may repeat as specified by repetition expressions In ERE, the parentheses ( and ) are used to create groups of characters that may repeat as specified by repetition expressions In BRE, the parentheses will match themselves, and in ERE they can be matched if backslashed
  • 20. RE ALTERNATION In ERE, the pipe symbol | can be used to perform alternation Alternation allows for two or more alternatives to match as separated by the pipe symbol | In BRE, the pipe symbol | will match itself, and in ERE it will match if backslashed
  • 21. PERL US POSTAL CODE EXAMPLE ^d{5}((-|s)?d{4})?$ ^ - Starts with d{5} - exactly five digits ()? - optional group (two) -|s - hyphen or whitespace d{4} - exactly four digits $ - Ends with To use the perl debugger type: perl -d -e1
  • 22. PERL CHARACTER SEQUENCES w Alphanumeric and _ (word characters) W Not word characters d Digit characters D Not digit characters s Whitespace characters S Not whitespace characters b Word boundaries • grep supports the perl character sequences in ERE except d and D
  • 23. PYTHON PROTOCOL EXAMPLE (mailto:|(news|(ht|f)tp(s?))://){1} (){1} - group repeats only once mailto: - mailto followed by a colon | - separates alternatives news|(ht|f)tp - news, http or ftp (ht|f)tp(s?) - optional s added :// - added to news, http, https, ftp, or ftps • To start the python shell type: python
  • 24. USE THE LIBRARY RegExLib.com The Regular Expression Library Comes with a cheat sheet A Regular Expression tester Search thousands of rated expressions You don't have to reinvent the wheel!
  • 26. About One Course Source ➢Online public classes (Linux, Programming & Security) ➢Custom corporate classes ➢Develop custom training programs www.OneCourseSource.com

Editor's Notes

  1. In ed or vi, g/re/p was to do a global search for the regular expression and print
  2. Backslash example: echo 'xyz^abzzz' | grep '\^ab'
  3. # Source: http://neilk.net/blog/2000/06/01/abigails-regex-to-test-for-prime-numbers/ # Source: Abigail -- perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' sub is_prime { if ((1 x shift) !~ /^1?$|^(11+?)\1+$/) { return 1; } else { return 0; } } <number>
  4. sub is_what { if ((1 x shift) !~ /^1?$|^(11+?)\1+$/) { return 1; } else { return 0; } }