SlideShare a Scribd company logo
1 of 20
Regular Expressions for the
Web Application Developer
             By Andrew Kandels
Regular Expressions
Regular expressions provide a concise, flexible means for
matching strings of text, such as words or patterns of
characters.
POSIX                                 PCRE
Portable Operating System Interface   Perl Compatible Regular Expressions


• Traditional Unix regular            •   Perl 5 Extended Features
  expression syntax                   •   Native C Extension
                                      •   Generally Faster
• PHP’s ereg_ functions               •   Optimization Qualifiers

• Basic and extended versions Used by:
                              • Programming languages
                              • Apache and other servers
Why Use Them?
•   Input Validation
•   Input Filtering
•   Search and Replace
•   Parsing and Data Extraction
•   Dynamic Recursion
•   Automation
In PHP, POSIX = Deprecated
ereg_* functions are now deprecated in newer versions of
PHP.
Switching to preg_* is generally pain free. Pain points:

•   Different matching criteria (greed)
•   preg_* requires delimiters
•   Different characters require escape sequences
•   preg favors option modifiers over functions
Anatomy of a PHP Regular Expression


                           /foo/i
• Delimiters
• Pattern to match
• Options/modifiers
preg_replace(
   „/(href|src)=„([^‟])*‟/i‟,
   „1=“2”‟,
   $str
);
PHP Regular Expressions

• Must use a delimiter: ! @ # /
• Use PHP’s single quotes (no escaping ’s)

preg_match                      Match against a pattern and
                                extract text
preg_replace                    Like str_replace with a pattern
                                (and sub-patterns)
preg_match_all                  Like preg_match, but an array
                                and count for every match
preg_split                      Like explode() but with a
                                pattern
preg_quote                      Escapes text for use in a regular
                                expression
Modifiers and Options
i   PCRE_CASELESS – Ignores case

m   PCRE_MULTILINE – Ignores new-lines

s   PCRE_DOTALL – New lines count with dots
    (.)
U   Don’t be greedy
Performance Killers

Slow-downs in performance generally come from:

• Alternation, the pipe/OR operator (|)
  Use [abcd] when possible over (a|b|c|d)
• Multi-line (PCRE_DOTALL or /s)
• Recursion: (d+)d*
  Use lengths when possible

It’s not that slow!
Sub-Patterns

Sub-Patterns allow you to extract relevant text from searches:




• For preg_replace, use either 1 or $1 in your replacement string
• Sub-patterns are left-most indexed by first left parenthesis “(“
Named Sub-Patterns




(?P<name>pattern)
Lookaheads
Are zero-match so they won’t modify your cursor or be included in any sub-patterns.




                            (?=pattern)
                   Pattern can be any valid regex
Lookbehinds




   (?<!pattern)
Accepts some basic regex
Multi-Line Processing




                     /msU
(Multi-line, include newlines with dots, non-greedy)
Once-Only Sub-Patterns

Eliminates slow recursion from wildcard searching.




       Less scans = more speed.
Greedy

By default, PCRE returns the biggest match.




        100,000 runs took 0.2791 seconds
Non-Greedy with Modifier

The /U modifier returns the SMALLEST match.




       100,000 runs took 0.2638 seconds
               (a little better, and it’s right)
Restrictive Wild-Carding

No greedy flag needed, faster without broad wild-cards.




         100,000 runs took 0.2271 seconds
                (fastest yet, no options needed)
grep

Use grep –E or egrep for extended regular expressions (+, ?, |)
and advanced functionality.

-A n         Print the next n lines after each match.
-B n         Print the previous n lines before each match.
-i           Ignore case
-m n         Stop after n matches
-r           Recursively search the file system
-n           Show line numbers
-v           Only show lines that don’t match
sed

Use –r (-E on OS X / FreeBSD) for extended regular expressions.
The End

  Web: http://andrewkandels.com

  Mail: mailto:akandels@gmail.com

Twitter: @andrewkandels

More Related Content

What's hot

What's hot (20)

Introduction to php
Introduction to phpIntroduction to php
Introduction to php
 
Php forms
Php formsPhp forms
Php forms
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
Functions in javascript
Functions in javascriptFunctions in javascript
Functions in javascript
 
Oops concepts in php
Oops concepts in phpOops concepts in php
Oops concepts in php
 
Oops in PHP
Oops in PHPOops in PHP
Oops in PHP
 
JavaScript - Chapter 6 - Basic Functions
 JavaScript - Chapter 6 - Basic Functions JavaScript - Chapter 6 - Basic Functions
JavaScript - Chapter 6 - Basic Functions
 
Javascript functions
Javascript functionsJavascript functions
Javascript functions
 
Asynchronous JavaScript Programming with Callbacks & Promises
Asynchronous JavaScript Programming with Callbacks & PromisesAsynchronous JavaScript Programming with Callbacks & Promises
Asynchronous JavaScript Programming with Callbacks & Promises
 
Php and MySQL
Php and MySQLPhp and MySQL
Php and MySQL
 
4.2 PHP Function
4.2 PHP Function4.2 PHP Function
4.2 PHP Function
 
Database Connectivity in PHP
Database Connectivity in PHPDatabase Connectivity in PHP
Database Connectivity in PHP
 
Sorting arrays in PHP
Sorting arrays in PHPSorting arrays in PHP
Sorting arrays in PHP
 
jQuery Ajax
jQuery AjaxjQuery Ajax
jQuery Ajax
 
PHP - DataType,Variable,Constant,Operators,Array,Include and require
PHP - DataType,Variable,Constant,Operators,Array,Include and requirePHP - DataType,Variable,Constant,Operators,Array,Include and require
PHP - DataType,Variable,Constant,Operators,Array,Include and require
 
Regular expression in javascript
Regular expression in javascriptRegular expression in javascript
Regular expression in javascript
 
Php introduction
Php introductionPhp introduction
Php introduction
 
PHP - Introduction to PHP AJAX
PHP -  Introduction to PHP AJAXPHP -  Introduction to PHP AJAX
PHP - Introduction to PHP AJAX
 
JavaScript Objects
JavaScript ObjectsJavaScript Objects
JavaScript Objects
 
Php string function
Php string function Php string function
Php string function
 

Similar to Regular Expressions in PHP

9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
Terry Yoast
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
 
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
corehard_by
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 

Similar to Regular Expressions in PHP (20)

Spsl II unit
Spsl   II unitSpsl   II unit
Spsl II unit
 
09 string processing_with_regex copy
09 string processing_with_regex copy09 string processing_with_regex copy
09 string processing_with_regex copy
 
PHP Web Programming
PHP Web ProgrammingPHP Web Programming
PHP Web Programming
 
Modern C++
Modern C++Modern C++
Modern C++
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
 
Parallelism in sql server
Parallelism in sql serverParallelism in sql server
Parallelism in sql server
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
Regexes in .NET
Regexes in .NETRegexes in .NET
Regexes in .NET
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressions
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Regular Expressions in PHP

  • 1. Regular Expressions for the Web Application Developer By Andrew Kandels
  • 2. Regular Expressions Regular expressions provide a concise, flexible means for matching strings of text, such as words or patterns of characters. POSIX PCRE Portable Operating System Interface Perl Compatible Regular Expressions • Traditional Unix regular • Perl 5 Extended Features expression syntax • Native C Extension • Generally Faster • PHP’s ereg_ functions • Optimization Qualifiers • Basic and extended versions Used by: • Programming languages • Apache and other servers
  • 3. Why Use Them? • Input Validation • Input Filtering • Search and Replace • Parsing and Data Extraction • Dynamic Recursion • Automation
  • 4. In PHP, POSIX = Deprecated ereg_* functions are now deprecated in newer versions of PHP. Switching to preg_* is generally pain free. Pain points: • Different matching criteria (greed) • preg_* requires delimiters • Different characters require escape sequences • preg favors option modifiers over functions
  • 5. Anatomy of a PHP Regular Expression /foo/i • Delimiters • Pattern to match • Options/modifiers preg_replace( „/(href|src)=„([^‟])*‟/i‟, „1=“2”‟, $str );
  • 6. PHP Regular Expressions • Must use a delimiter: ! @ # / • Use PHP’s single quotes (no escaping ’s) preg_match Match against a pattern and extract text preg_replace Like str_replace with a pattern (and sub-patterns) preg_match_all Like preg_match, but an array and count for every match preg_split Like explode() but with a pattern preg_quote Escapes text for use in a regular expression
  • 7. Modifiers and Options i PCRE_CASELESS – Ignores case m PCRE_MULTILINE – Ignores new-lines s PCRE_DOTALL – New lines count with dots (.) U Don’t be greedy
  • 8. Performance Killers Slow-downs in performance generally come from: • Alternation, the pipe/OR operator (|) Use [abcd] when possible over (a|b|c|d) • Multi-line (PCRE_DOTALL or /s) • Recursion: (d+)d* Use lengths when possible It’s not that slow!
  • 9. Sub-Patterns Sub-Patterns allow you to extract relevant text from searches: • For preg_replace, use either 1 or $1 in your replacement string • Sub-patterns are left-most indexed by first left parenthesis “(“
  • 11. Lookaheads Are zero-match so they won’t modify your cursor or be included in any sub-patterns. (?=pattern) Pattern can be any valid regex
  • 12. Lookbehinds (?<!pattern) Accepts some basic regex
  • 13. Multi-Line Processing /msU (Multi-line, include newlines with dots, non-greedy)
  • 14. Once-Only Sub-Patterns Eliminates slow recursion from wildcard searching. Less scans = more speed.
  • 15. Greedy By default, PCRE returns the biggest match. 100,000 runs took 0.2791 seconds
  • 16. Non-Greedy with Modifier The /U modifier returns the SMALLEST match. 100,000 runs took 0.2638 seconds (a little better, and it’s right)
  • 17. Restrictive Wild-Carding No greedy flag needed, faster without broad wild-cards. 100,000 runs took 0.2271 seconds (fastest yet, no options needed)
  • 18. grep Use grep –E or egrep for extended regular expressions (+, ?, |) and advanced functionality. -A n Print the next n lines after each match. -B n Print the previous n lines before each match. -i Ignore case -m n Stop after n matches -r Recursively search the file system -n Show line numbers -v Only show lines that don’t match
  • 19. sed Use –r (-E on OS X / FreeBSD) for extended regular expressions.
  • 20. The End Web: http://andrewkandels.com Mail: mailto:akandels@gmail.com Twitter: @andrewkandels