SlideShare a Scribd company logo
1 of 20
Regular Expressions for the
Web Application Developer
             By Andrew Kandels
Regular Expressions
Regular expressions provide a concise, flexible means for
matching strings of text, such as words or patterns of
characters.
POSIX                                 PCRE
Portable Operating System Interface   Perl Compatible Regular Expressions


• Traditional Unix regular            •   Perl 5 Extended Features
  expression syntax                   •   Native C Extension
                                      •   Generally Faster
• PHP’s ereg_ functions               •   Optimization Qualifiers

• Basic and extended versions Used by:
                              • Programming languages
                              • Apache and other servers
Why Use Them?
•   Input Validation
•   Input Filtering
•   Search and Replace
•   Parsing and Data Extraction
•   Dynamic Recursion
•   Automation
In PHP, POSIX = Deprecated
ereg_* functions are now deprecated in newer versions of
PHP.
Switching to preg_* is generally pain free. Pain points:

•   Different matching criteria (greed)
•   preg_* requires delimiters
•   Different characters require escape sequences
•   preg favors option modifiers over functions
Anatomy of a PHP Regular Expression


                           /foo/i
• Delimiters
• Pattern to match
• Options/modifiers
preg_replace(
   „/(href|src)=„([^‟])*‟/i‟,
   „1=“2”‟,
   $str
);
PHP Regular Expressions

• Must use a delimiter: ! @ # /
• Use PHP’s single quotes (no escaping ’s)

preg_match                      Match against a pattern and
                                extract text
preg_replace                    Like str_replace with a pattern
                                (and sub-patterns)
preg_match_all                  Like preg_match, but an array
                                and count for every match
preg_split                      Like explode() but with a
                                pattern
preg_quote                      Escapes text for use in a regular
                                expression
Modifiers and Options
i   PCRE_CASELESS – Ignores case

m   PCRE_MULTILINE – Ignores new-lines

s   PCRE_DOTALL – New lines count with dots
    (.)
U   Don’t be greedy
Performance Killers

Slow-downs in performance generally come from:

• Alternation, the pipe/OR operator (|)
  Use [abcd] when possible over (a|b|c|d)
• Multi-line (PCRE_DOTALL or /s)
• Recursion: (d+)d*
  Use lengths when possible

It’s not that slow!
Sub-Patterns

Sub-Patterns allow you to extract relevant text from searches:




• For preg_replace, use either 1 or $1 in your replacement string
• Sub-patterns are left-most indexed by first left parenthesis “(“
Named Sub-Patterns




(?P<name>pattern)
Lookaheads
Are zero-match so they won’t modify your cursor or be included in any sub-patterns.




                            (?=pattern)
                   Pattern can be any valid regex
Lookbehinds




   (?<!pattern)
Accepts some basic regex
Multi-Line Processing




                     /msU
(Multi-line, include newlines with dots, non-greedy)
Once-Only Sub-Patterns

Eliminates slow recursion from wildcard searching.




       Less scans = more speed.
Greedy

By default, PCRE returns the biggest match.




        100,000 runs took 0.2791 seconds
Non-Greedy with Modifier

The /U modifier returns the SMALLEST match.




       100,000 runs took 0.2638 seconds
               (a little better, and it’s right)
Restrictive Wild-Carding

No greedy flag needed, faster without broad wild-cards.




         100,000 runs took 0.2271 seconds
                (fastest yet, no options needed)
grep

Use grep –E or egrep for extended regular expressions (+, ?, |)
and advanced functionality.

-A n         Print the next n lines after each match.
-B n         Print the previous n lines before each match.
-i           Ignore case
-m n         Stop after n matches
-r           Recursively search the file system
-n           Show line numbers
-v           Only show lines that don’t match
sed

Use –r (-E on OS X / FreeBSD) for extended regular expressions.
The End

  Web: http://andrewkandels.com

  Mail: mailto:akandels@gmail.com

Twitter: @andrewkandels

More Related Content

What's hot

What's hot (20)

Files in php
Files in phpFiles in php
Files in php
 
4.2 PHP Function
4.2 PHP Function4.2 PHP Function
4.2 PHP Function
 
MYSQL - PHP Database Connectivity
MYSQL - PHP Database ConnectivityMYSQL - PHP Database Connectivity
MYSQL - PHP Database Connectivity
 
Servlet and servlet life cycle
Servlet and servlet life cycleServlet and servlet life cycle
Servlet and servlet life cycle
 
jQuery for beginners
jQuery for beginnersjQuery for beginners
jQuery for beginners
 
Javascript arrays
Javascript arraysJavascript arrays
Javascript arrays
 
PHP - Introduction to PHP Date and Time Functions
PHP -  Introduction to  PHP Date and Time FunctionsPHP -  Introduction to  PHP Date and Time Functions
PHP - Introduction to PHP Date and Time Functions
 
PHP - Introduction to PHP AJAX
PHP -  Introduction to PHP AJAXPHP -  Introduction to PHP AJAX
PHP - Introduction to PHP AJAX
 
HTML Forms
HTML FormsHTML Forms
HTML Forms
 
PHP Loops and PHP Forms
PHP  Loops and PHP FormsPHP  Loops and PHP Forms
PHP Loops and PHP Forms
 
Get and post methods
Get and post methodsGet and post methods
Get and post methods
 
JavaScript - Chapter 4 - Types and Statements
 JavaScript - Chapter 4 - Types and Statements JavaScript - Chapter 4 - Types and Statements
JavaScript - Chapter 4 - Types and Statements
 
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 JavaScript - Chapter 9 - TypeConversion and Regular Expressions  JavaScript - Chapter 9 - TypeConversion and Regular Expressions
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 
Database Connectivity in PHP
Database Connectivity in PHPDatabase Connectivity in PHP
Database Connectivity in PHP
 
Introduction to HTML5 Canvas
Introduction to HTML5 CanvasIntroduction to HTML5 Canvas
Introduction to HTML5 Canvas
 
Java/Servlet/JSP/JDBC
Java/Servlet/JSP/JDBCJava/Servlet/JSP/JDBC
Java/Servlet/JSP/JDBC
 
Php with MYSQL Database
Php with MYSQL DatabasePhp with MYSQL Database
Php with MYSQL Database
 
Php server variables
Php server variablesPhp server variables
Php server variables
 
Javascript validating form
Javascript validating formJavascript validating form
Javascript validating form
 
javascript objects
javascript objectsjavascript objects
javascript objects
 

Similar to Regular Expressions in PHP

9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
Terry Yoast
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
 
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
corehard_by
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 

Similar to Regular Expressions in PHP (20)

Spsl II unit
Spsl   II unitSpsl   II unit
Spsl II unit
 
09 string processing_with_regex copy
09 string processing_with_regex copy09 string processing_with_regex copy
09 string processing_with_regex copy
 
PHP Web Programming
PHP Web ProgrammingPHP Web Programming
PHP Web Programming
 
Modern C++
Modern C++Modern C++
Modern C++
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
 
Parallelism in sql server
Parallelism in sql serverParallelism in sql server
Parallelism in sql server
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
Regexes in .NET
Regexes in .NETRegexes in .NET
Regexes in .NET
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressions
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Regular Expressions in PHP

  • 1. Regular Expressions for the Web Application Developer By Andrew Kandels
  • 2. Regular Expressions Regular expressions provide a concise, flexible means for matching strings of text, such as words or patterns of characters. POSIX PCRE Portable Operating System Interface Perl Compatible Regular Expressions • Traditional Unix regular • Perl 5 Extended Features expression syntax • Native C Extension • Generally Faster • PHP’s ereg_ functions • Optimization Qualifiers • Basic and extended versions Used by: • Programming languages • Apache and other servers
  • 3. Why Use Them? • Input Validation • Input Filtering • Search and Replace • Parsing and Data Extraction • Dynamic Recursion • Automation
  • 4. In PHP, POSIX = Deprecated ereg_* functions are now deprecated in newer versions of PHP. Switching to preg_* is generally pain free. Pain points: • Different matching criteria (greed) • preg_* requires delimiters • Different characters require escape sequences • preg favors option modifiers over functions
  • 5. Anatomy of a PHP Regular Expression /foo/i • Delimiters • Pattern to match • Options/modifiers preg_replace( „/(href|src)=„([^‟])*‟/i‟, „1=“2”‟, $str );
  • 6. PHP Regular Expressions • Must use a delimiter: ! @ # / • Use PHP’s single quotes (no escaping ’s) preg_match Match against a pattern and extract text preg_replace Like str_replace with a pattern (and sub-patterns) preg_match_all Like preg_match, but an array and count for every match preg_split Like explode() but with a pattern preg_quote Escapes text for use in a regular expression
  • 7. Modifiers and Options i PCRE_CASELESS – Ignores case m PCRE_MULTILINE – Ignores new-lines s PCRE_DOTALL – New lines count with dots (.) U Don’t be greedy
  • 8. Performance Killers Slow-downs in performance generally come from: • Alternation, the pipe/OR operator (|) Use [abcd] when possible over (a|b|c|d) • Multi-line (PCRE_DOTALL or /s) • Recursion: (d+)d* Use lengths when possible It’s not that slow!
  • 9. Sub-Patterns Sub-Patterns allow you to extract relevant text from searches: • For preg_replace, use either 1 or $1 in your replacement string • Sub-patterns are left-most indexed by first left parenthesis “(“
  • 11. Lookaheads Are zero-match so they won’t modify your cursor or be included in any sub-patterns. (?=pattern) Pattern can be any valid regex
  • 12. Lookbehinds (?<!pattern) Accepts some basic regex
  • 13. Multi-Line Processing /msU (Multi-line, include newlines with dots, non-greedy)
  • 14. Once-Only Sub-Patterns Eliminates slow recursion from wildcard searching. Less scans = more speed.
  • 15. Greedy By default, PCRE returns the biggest match. 100,000 runs took 0.2791 seconds
  • 16. Non-Greedy with Modifier The /U modifier returns the SMALLEST match. 100,000 runs took 0.2638 seconds (a little better, and it’s right)
  • 17. Restrictive Wild-Carding No greedy flag needed, faster without broad wild-cards. 100,000 runs took 0.2271 seconds (fastest yet, no options needed)
  • 18. grep Use grep –E or egrep for extended regular expressions (+, ?, |) and advanced functionality. -A n Print the next n lines after each match. -B n Print the previous n lines before each match. -i Ignore case -m n Stop after n matches -r Recursively search the file system -n Show line numbers -v Only show lines that don’t match
  • 19. sed Use –r (-E on OS X / FreeBSD) for extended regular expressions.
  • 20. The End Web: http://andrewkandels.com Mail: mailto:akandels@gmail.com Twitter: @andrewkandels