SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Regular Expressions
Boot Camp
Presented by Chris Schiffhauer
www.schiffhauer.com
twitter.com/PaulyGlott
What are Regular Expressions?
• Regular expressions are an extension of wildcards (i.e. *.doc).
• Code that manipulates text needs to locate strings that match
complex patterns.
• A regular expression is a shorthand for a pattern.
• w+ is a concise way to say “match any non-null strings of alphanumeric
characters.
Finding Nemo
• nemo

Find nemo

• When ignoring case, will match “Nemo”, “NEMO”, or “nEmO”.
• Will also match characters 9-12 of “Johnny Mnemonic”, or “Finding Nemo 2”.

• bnemob

Find nemo as a whole word

• b is a code that says “match the position at the beginning of end of any word”.
• Will only match complete words spelled “nemo” with any combination of upper and
lowercase letters.

• bnemob.*b2b Find text with “nemo” followed by “2”
• The special characters that give Regular Expressions their power is already making
them hard for humans to read.
Determining the Validity of Phone
Numbers
• bddd-ddd-dddd
• d
•-

Matches any single digit.
Literal hyphen (has no special meaning).

• bd{3}-d{3}-d{4}
• {3}

Find ten-digit US phone number

Better way to find the number

Follows d to mean “repeat the preceding character three times”.
Special Characters
• baw*b Find words that start with the letter a
•
•
•
•

b
a
w*
b

• d+
•+

The beginning of a word.
The letter “a”.
Any number of repetitions of alphanumeric characters.
The end of a word.

Find repeated strings of digits
Similar to *, but requires one repetition.
Special Characters, continued
• bw{6}b Find six letter words
•
•
•
•
•
•
•

.
w
s
d
b
^
$

Match any character except newline
Match any alphanumeric character
Match any whitespace character
Match any digit
Match the beginning or end of a word
Match the beginning of the string
Match the end of the string
Beginnings and Endings
• ^d{3}-d{3}-d{4}$

Validate an entire string as a phone number

•^
The beginning of the string.
•$
The end of the string.
• In .NET, use RegexOptions.Multiline to match the beginning and end of a line.

• ^$1000$
•
•
•
•

^
$
1000
$

Find “$1000” as the entire string
The beginning of the string.
Escaped “$” (literal “$”).
Literal “1000”.
The end of the string.
Wash, Rinse, Repeat
•*
•+
•?
• {n}
• {n,m}
• {n,}

Repeat any number of times
Repeat one or more times
Repeat zero or one time
Repeat n times
Repeat at least n, but not more than m times
Repeat at least n times.
Wash, Rinse, Repeat, continued
• bw{5,6}b
• w{5,6}

Find all five and six letter words
Word with at least 5, but not more than 6, characters.

• b+d{1,3}sd{3}-d{3}-d{4}
Find phone numbers formatted for int’l calling
• s

White space

• d{3}-d{2}-d{4} Find social security numbers
• ^w*
Find first word in string
Character Classes
• [aeiou]

Matches any vowel

• [.?!]

Matches punctuation at the end of a sentence

•.
•?

Literal “.”, losing its special meaning because it’s inside brackets
Literal “?”

• (?d{3}[) ]s?d{3}[ ]d{4}

Matches a 10-digit phone number

• (?
Zero or one left parentheses.
• [) ]
A right parenthesis or a space.
• Will also match “480) 555-1212”.
Negation
• W
• S
• D
• B
• [^x]
• [^aeiou]

Match any character that is NOT alphanumeric
Match any character that is NOT whitespace
Match any character that is NOT a digit
Match a position that is NOT a word boundary
Match any character that is NOT “x”
Match any character that is NOT one of the chars “aeiou”

• S+

All strings that do not contain whitespace characters
Alternatives
•|

Pipe symbol separates alternatives

• bd{5}-d{4}b|bd{5}b

Five and nine digit Zip Codes

• bd{5}-d{4}b Leftmost alternative first: nine digit Zip Codes.
• bd{5}b
Second: five digit Zip Codes.

• bd{5}b|bd{5}-d{4}b

Only matches five digit Zip Codes

• ((d{3})|d{3})s?d{3}[- ]d{4}
• ((d{3})|d{3})

Ten digit phone numbers

Matches “(480)” or “480”.
Grouping
Parentheses delimit a subexpression to allow repetition or special
treatment.
• (d{1,3}.){3}d{1,3}

A simple IP address finder

• (d{1,3}.)
A one to three digit number following by a literal period.
• {3}
Repeats the preceding three times.
• Also matches invalid IP addresses like “999.999.999.999”.

• ((2[0-4]d|25[0-5]|[01]?dd?).){3}(2[0-4]d|25[0-5]|[01]?dd?)
A better IP address finder
Backreferences
Backreferences search for a recurrence of previously matched text that
has been captured by a group.
• b(w+)bs*1b
• (w+)
• s*
• 1

Find repeated words
Finds a string of at least one character within group 1.
Finds any amount of whitespace.
Finds a repetition of the captured text.
Backreferences, continued
Automatic numbering of groups can be overridden by specifying an
explicit name or number.
• b(?<Word>w+)bs*k<Word>b
Capture repeated word in a named group
• (?<Word>w+) Names this capture group “Word”.
Captures and Lookarounds
• Captures
• (exp)
Match “exp” & capture in an automatically numbered group.
• (?<name>exp) Match “exp” and capture it in a group named name.
• (?:exp)
Match “exp”, but do not capture it.

• Lookarounds
text
•
•
•
•

(?=exp)
(?<exp)
(?!exp)
(?<!exp)

Match a position like ^ or b and never match any
Match any position preceding a suffix “exp”.
Match any position following a prefix “exp”.
Match any position after which the suffix “exp” isn’t found.
Match any position before which the prefix “exp” isn’t found.
Positive Lookaround
• bw+(?=ingb)
• (?=ing)

The beginning of words ending with “ing”
“Zero-width positive lookahead assertion”
Matches a position that precedes a given suffix.

• (?<=bre)w+b

The end of words starting with “re”

• (?<=bre)

“Zero-width positive lookbehind assertion”
Matches a position following a prefix.

• (?<=d)d{3}b

3 digits at the end of a word, preceded by a digit

• (?<=s)w+(?=s) Alphanumeric strings bounded by whitespace
Negative Lookaround
• bw*q[^u]w*b
• [^u]

Always matches a character. “Iraq” does not match.

• bw*q(?!u)w*b
• (?!u)

Words with “q” followed by NOT “u”

Search for words with “q” not followed by “u”
“Zero-width negative lookahead assertion”
Succeeds when “u” does not exist. “Iraq” matches.

• (?<![a-z ])w{7} 7 alphanumerics not preceded by a letter or space
• (?<![a-z ])

“Zero-width negative lookbehind assertion”
Greedy and Lazy
Be default, regular expressions are “greedy”. This means that when a
quantifier can accept a range of repeitions, as many characters as
possible will be matched.
• a.*b
The longest string starting with “a” and ending with “b”
• An input of “aabab” will match the entire string.

Quantifiers can be made lazy by adding a question mark.
• a.*?b
The shortest string starting with an a and ending with a b
• An input of “aabab” will match “aab” and then “ab”.
Greedy and Lazy, continued
• *?
• +?
• ??
• {n,m}?
• {n,}?

Repeat any number of times, but as few as possible.
Repeat one or more times, but as few as possible.
Repeat zero or one time, but as few as possible.
Repeat at least n, but no more than m, as few as possible.
Repeat at least n times, but as few as possible.

Weitere ähnliche Inhalte

Ähnlich wie Regular Expressions Boot Camp (20)

Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular expressions using Python
Regular expressions using PythonRegular expressions using Python
Regular expressions using Python
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
 
Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
 
test vedio
test vediotest vedio
test vedio
 
qwdeqwe
qwdeqweqwdeqwe
qwdeqwe
 
Added to test pdf
Added to test pdf Added to test pdf
Added to test pdf
 
added for test
added for test added for test
added for test
 
ganesh testing
ganesh testing ganesh testing
ganesh testing
 
now its pdf
now its pdfnow its pdf
now its pdf
 
fghfghf
fghfghffghfghf
fghfghf
 
The hindu
The hinduThe hindu
The hindu
 
Video added by Normal user
Video added by Normal user Video added by Normal user
Video added by Normal user
 
Resource one
Resource one Resource one
Resource one
 
om
omom
om
 
Added to test pdf
Added to test pdf Added to test pdf
Added to test pdf
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Regular Expressions Boot Camp

  • 1. Regular Expressions Boot Camp Presented by Chris Schiffhauer www.schiffhauer.com twitter.com/PaulyGlott
  • 2. What are Regular Expressions? • Regular expressions are an extension of wildcards (i.e. *.doc). • Code that manipulates text needs to locate strings that match complex patterns. • A regular expression is a shorthand for a pattern. • w+ is a concise way to say “match any non-null strings of alphanumeric characters.
  • 3. Finding Nemo • nemo Find nemo • When ignoring case, will match “Nemo”, “NEMO”, or “nEmO”. • Will also match characters 9-12 of “Johnny Mnemonic”, or “Finding Nemo 2”. • bnemob Find nemo as a whole word • b is a code that says “match the position at the beginning of end of any word”. • Will only match complete words spelled “nemo” with any combination of upper and lowercase letters. • bnemob.*b2b Find text with “nemo” followed by “2” • The special characters that give Regular Expressions their power is already making them hard for humans to read.
  • 4. Determining the Validity of Phone Numbers • bddd-ddd-dddd • d •- Matches any single digit. Literal hyphen (has no special meaning). • bd{3}-d{3}-d{4} • {3} Find ten-digit US phone number Better way to find the number Follows d to mean “repeat the preceding character three times”.
  • 5. Special Characters • baw*b Find words that start with the letter a • • • • b a w* b • d+ •+ The beginning of a word. The letter “a”. Any number of repetitions of alphanumeric characters. The end of a word. Find repeated strings of digits Similar to *, but requires one repetition.
  • 6. Special Characters, continued • bw{6}b Find six letter words • • • • • • • . w s d b ^ $ Match any character except newline Match any alphanumeric character Match any whitespace character Match any digit Match the beginning or end of a word Match the beginning of the string Match the end of the string
  • 7. Beginnings and Endings • ^d{3}-d{3}-d{4}$ Validate an entire string as a phone number •^ The beginning of the string. •$ The end of the string. • In .NET, use RegexOptions.Multiline to match the beginning and end of a line. • ^$1000$ • • • • ^ $ 1000 $ Find “$1000” as the entire string The beginning of the string. Escaped “$” (literal “$”). Literal “1000”. The end of the string.
  • 8. Wash, Rinse, Repeat •* •+ •? • {n} • {n,m} • {n,} Repeat any number of times Repeat one or more times Repeat zero or one time Repeat n times Repeat at least n, but not more than m times Repeat at least n times.
  • 9. Wash, Rinse, Repeat, continued • bw{5,6}b • w{5,6} Find all five and six letter words Word with at least 5, but not more than 6, characters. • b+d{1,3}sd{3}-d{3}-d{4} Find phone numbers formatted for int’l calling • s White space • d{3}-d{2}-d{4} Find social security numbers • ^w* Find first word in string
  • 10. Character Classes • [aeiou] Matches any vowel • [.?!] Matches punctuation at the end of a sentence •. •? Literal “.”, losing its special meaning because it’s inside brackets Literal “?” • (?d{3}[) ]s?d{3}[ ]d{4} Matches a 10-digit phone number • (? Zero or one left parentheses. • [) ] A right parenthesis or a space. • Will also match “480) 555-1212”.
  • 11. Negation • W • S • D • B • [^x] • [^aeiou] Match any character that is NOT alphanumeric Match any character that is NOT whitespace Match any character that is NOT a digit Match a position that is NOT a word boundary Match any character that is NOT “x” Match any character that is NOT one of the chars “aeiou” • S+ All strings that do not contain whitespace characters
  • 12. Alternatives •| Pipe symbol separates alternatives • bd{5}-d{4}b|bd{5}b Five and nine digit Zip Codes • bd{5}-d{4}b Leftmost alternative first: nine digit Zip Codes. • bd{5}b Second: five digit Zip Codes. • bd{5}b|bd{5}-d{4}b Only matches five digit Zip Codes • ((d{3})|d{3})s?d{3}[- ]d{4} • ((d{3})|d{3}) Ten digit phone numbers Matches “(480)” or “480”.
  • 13. Grouping Parentheses delimit a subexpression to allow repetition or special treatment. • (d{1,3}.){3}d{1,3} A simple IP address finder • (d{1,3}.) A one to three digit number following by a literal period. • {3} Repeats the preceding three times. • Also matches invalid IP addresses like “999.999.999.999”. • ((2[0-4]d|25[0-5]|[01]?dd?).){3}(2[0-4]d|25[0-5]|[01]?dd?) A better IP address finder
  • 14. Backreferences Backreferences search for a recurrence of previously matched text that has been captured by a group. • b(w+)bs*1b • (w+) • s* • 1 Find repeated words Finds a string of at least one character within group 1. Finds any amount of whitespace. Finds a repetition of the captured text.
  • 15. Backreferences, continued Automatic numbering of groups can be overridden by specifying an explicit name or number. • b(?<Word>w+)bs*k<Word>b Capture repeated word in a named group • (?<Word>w+) Names this capture group “Word”.
  • 16. Captures and Lookarounds • Captures • (exp) Match “exp” & capture in an automatically numbered group. • (?<name>exp) Match “exp” and capture it in a group named name. • (?:exp) Match “exp”, but do not capture it. • Lookarounds text • • • • (?=exp) (?<exp) (?!exp) (?<!exp) Match a position like ^ or b and never match any Match any position preceding a suffix “exp”. Match any position following a prefix “exp”. Match any position after which the suffix “exp” isn’t found. Match any position before which the prefix “exp” isn’t found.
  • 17. Positive Lookaround • bw+(?=ingb) • (?=ing) The beginning of words ending with “ing” “Zero-width positive lookahead assertion” Matches a position that precedes a given suffix. • (?<=bre)w+b The end of words starting with “re” • (?<=bre) “Zero-width positive lookbehind assertion” Matches a position following a prefix. • (?<=d)d{3}b 3 digits at the end of a word, preceded by a digit • (?<=s)w+(?=s) Alphanumeric strings bounded by whitespace
  • 18. Negative Lookaround • bw*q[^u]w*b • [^u] Always matches a character. “Iraq” does not match. • bw*q(?!u)w*b • (?!u) Words with “q” followed by NOT “u” Search for words with “q” not followed by “u” “Zero-width negative lookahead assertion” Succeeds when “u” does not exist. “Iraq” matches. • (?<![a-z ])w{7} 7 alphanumerics not preceded by a letter or space • (?<![a-z ]) “Zero-width negative lookbehind assertion”
  • 19. Greedy and Lazy Be default, regular expressions are “greedy”. This means that when a quantifier can accept a range of repeitions, as many characters as possible will be matched. • a.*b The longest string starting with “a” and ending with “b” • An input of “aabab” will match the entire string. Quantifiers can be made lazy by adding a question mark. • a.*?b The shortest string starting with an a and ending with a b • An input of “aabab” will match “aab” and then “ab”.
  • 20. Greedy and Lazy, continued • *? • +? • ?? • {n,m}? • {n,}? Repeat any number of times, but as few as possible. Repeat one or more times, but as few as possible. Repeat zero or one time, but as few as possible. Repeat at least n, but no more than m, as few as possible. Repeat at least n times, but as few as possible.