SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Regular Expressions
      Redux
Scope

• medium to advanced
• 30 minutes
• performance / backtracking irrelevant
• no compatibility charts (yet)
TOC

• basic matching, quantifiers
• character classes, types, properties, anchors
• groups, options, replace string
• look-ahead/behind
• subexpressions
RE overview
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
Quantifiers
Quantifiers
• classic greedy: ?, *, +
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}

• non-greedy: ??, *?, +?, {5,7}?
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Character Classes /
    Properties
Character Classes /
      Properties
• [0-9a-z]   (classes)
Character Classes /
      Properties
• [0-9a-z]     (classes)
 •   +420[0-9]{9} = simplified czech phone nr.
Character Classes /
      Properties
• [0-9a-z]      (classes)
 •   +420[0-9]{9} = simplified czech phone nr.

 •   don’t: [A-z0-]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)

• [:alnum:], [:^space:] (POSIX bracket)
Character Types
Character Types
• . == anything (apart from newline)
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]

• SWD == [^s][^w][^d]
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Anchors
Anchors

• ^ - begining (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W

• zero width!
Options
Options
• /foo/imsx
 •   i - case insensitive

 •   m - multiline (^,$ represent start of string/file)

 •   s - single line (. matches newlines)

 •   x - extended!

 •   g - global
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global

• can be written inline
  •   (?imsx-imsx)

  •   (?imsx-imsx:...)
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global                      (?x-i)
                                         #this is cool
• can be written inline                  (
                                            foo #my important value
  •                                         | #don't forget the alternative
      (?imsx-imsx)
                                            bar
  •                                      ) # result equals to (foo|bar)
      (?imsx-imsx:...)
Groups/Replacing
Groups/Replacing
• (...) - matched group
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
• (?:...) - non-captured group
  •   useful for (?:foo)+ or (?:foo|bar)
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar
      •   1 -- oo

      •   2 -- o

      •   3 -- bar

      •   4 --
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar                       • man
      •                             •
          1 -- oo                       1 --

      •                             •
          2 -- o                        2 --

      •                             •
          3 -- bar                      3 --

      •                             •
          4 --                          4 -- man
Look-ahead/behind
• defines custom zero-width anchors
Look-ahead/behind
• defines custom zero-width anchors
                   positive negative

          ahead     (?=...)   (?!...)

          behind   (?<=...)   (?<!...)
Example

zdenek@gooddata.com
   /.*?@gooddata/


zdenek@gooddata.com
 /.*?(?=@gooddata)/
Recursive RE

• very important!
 •   quote & bracket matching

 •   technically not part of regular grammar

• two styles
 •   g<name> or g<n> - TextMate

 •   (?R) - Perl
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)

or: (([^()]|(?R))*)

Weitere ähnliche Inhalte

Andere mochten auch

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Michal Jurosz
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web AplikaciJakub Nesetril
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API WaterfallsJakub Nesetril
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimJakub Nesetril
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebJakub Nesetril
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaSJakub Nesetril
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.jsJakub Nesetril
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsJakub Nesetril
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJSJakub Nesetril
 

Andere mochten auch (20)

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web Aplikaci
 
Startup Accelerators
Startup AcceleratorsStartup Accelerators
Startup Accelerators
 
Harmony in API Design
Harmony in API DesignHarmony in API Design
Harmony in API Design
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API Waterfalls
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & Minim
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time Web
 
Post-REST Manifesto
Post-REST ManifestoPost-REST Manifesto
Post-REST Manifesto
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaS
 
Art of Building APIs
Art of Building APIsArt of Building APIs
Art of Building APIs
 
REST API tools
REST API toolsREST API tools
REST API tools
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.js
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for Analytics
 
Pushdown autometa
Pushdown autometaPushdown autometa
Pushdown autometa
 
Let's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScriptLet's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScript
 
Node at Apiary.io
Node at Apiary.ioNode at Apiary.io
Node at Apiary.io
 
API Design Workflows
API Design WorkflowsAPI Design Workflows
API Design Workflows
 
Pda
PdaPda
Pda
 
Apiary
ApiaryApiary
Apiary
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJS
 

Ähnlich wie Advanced Regular Expressions Redux

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to PerlSway Wang
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondMax Shirshin
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secretsHiro Asari
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009scweng
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokensscoates
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018Emma Burrows
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And PortKeiichi Daiba
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And PortKeiichi Daiba
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...Codemotion
 

Ähnlich wie Advanced Regular Expressions Redux (20)

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Lecture2 B
Lecture2 BLecture2 B
Lecture2 B
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokens
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And Port
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Advanced Regular Expressions Redux

  • 2. Scope • medium to advanced • 30 minutes • performance / backtracking irrelevant • no compatibility charts (yet)
  • 3. TOC • basic matching, quantifiers • character classes, types, properties, anchors • groups, options, replace string • look-ahead/behind • subexpressions
  • 5. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 6. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 7. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 10. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5}
  • 11. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1}
  • 12. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,}
  • 13. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,}
  • 14. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,} • non-greedy: ??, *?, +?, {5,7}?
  • 15. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 16. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 17. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 18. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 19. Character Classes / Properties
  • 20. Character Classes / Properties • [0-9a-z] (classes)
  • 21. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr.
  • 22. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-]
  • 23. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z]
  • 24. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties)
  • 25. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana)
  • 26. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana) • [:alnum:], [:^space:] (POSIX bracket)
  • 28. Character Types • . == anything (apart from newline)
  • 29. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode
  • 30. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode
  • 31. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F]
  • 32. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F] • SWD == [^s][^w][^d]
  • 33. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 34. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 36. Anchors • ^ - begining (line, string)
  • 37. Anchors • ^ - begining (line, string) • $ - end (line, string)
  • 38. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W
  • 39. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W • zero width!
  • 41. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global
  • 42. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global • can be written inline • (?imsx-imsx) • (?imsx-imsx:...)
  • 43. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global (?x-i) #this is cool • can be written inline ( foo #my important value • | #don't forget the alternative (?imsx-imsx) bar • ) # result equals to (foo|bar) (?imsx-imsx:...)
  • 46. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended)
  • 47. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket
  • 48. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket • (?:...) - non-captured group • useful for (?:foo)+ or (?:foo|bar)
  • 50. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • 1 -- oo • 2 -- o • 3 -- bar • 4 --
  • 51. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • man • • 1 -- oo 1 -- • • 2 -- o 2 -- • • 3 -- bar 3 -- • • 4 -- 4 -- man
  • 53. Look-ahead/behind • defines custom zero-width anchors positive negative ahead (?=...) (?!...) behind (?<=...) (?<!...)
  • 54. Example zdenek@gooddata.com /.*?@gooddata/ zdenek@gooddata.com /.*?(?=@gooddata)/
  • 55. Recursive RE • very important! • quote & bracket matching • technically not part of regular grammar • two styles • g<name> or g<n> - TextMate • (?R) - Perl
  • 56. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis )
  • 57. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis ) or: (([^()]|(?R))*)

Hinweis der Redaktion

  1. escaping???
  2. escaping???
  3. escaping???
  4. examples! possessive (?+, *+, ++)
  5. examples! possessive (?+, *+, ++)
  6. examples! possessive (?+, *+, ++)
  7. examples! possessive (?+, *+, ++)
  8. examples! possessive (?+, *+, ++)
  9. examples! possessive (?+, *+, ++)
  10. unicode compat table!
  11. unicode compat table!
  12. unicode compat table!
  13. unicode compat table!
  14. unicode compat table!
  15. unicode compat table!
  16. unicode compat table!
  17. notice the space at the end, capital reverses
  18. notice the space at the end, capital reverses
  19. notice the space at the end, capital reverses
  20. notice the space at the end, capital reverses
  21. notice the space at the end, capital reverses
  22. how about /g??
  23. how about /g??
  24. how about /g??