SlideShare ist ein Scribd-Unternehmen logo
1 von 117
Downloaden Sie, um offline zu lesen
Deeper down the rabbit hole
                         Advanced Regular Expressions


                     Jakob Westhoff <jakob@php.net>
                            @jakobwesthoff



                               PHPBarcamp.at
                                May 3, 2010




http://westhoffswelt.de        jakob@westhoffswelt.de     slide: 1 / 26
About Me



        Jakob Westhoff

              PHP developer for several years
              Computer science student at the TU Dortmund

              Co-Founder of the PHP Usergroup Dortmund
              Active in different Open Source projects




http://westhoffswelt.de        jakob@westhoffswelt.de         slide: 2 / 26
Asking the audience


         Who does already work with regular expressions?

         Regular expressions like this:

      / [ a−zA−Z]+/

         Or like this:

      ( ? P<image >(? : none | i n h e r i t ) | ( ? : u r l  (  s ∗ ( ? : ’ | ” )
            ? ( ? :   [ ’ ”   ) ] |   [ ˆ  ’ ”   ) ] | [ ˆ ’ ”   ) ] ) ∗ ( ? : ’ | ” ) ? s ∗  )
            ))




 http://westhoffswelt.de                  jakob@westhoffswelt.de                            slide: 3 / 26
Asking the audience


         Who does already work with regular expressions?

         Regular expressions like this:

      / [ a−zA−Z]+/

         Or like this:

      ( ? P<image >(? : none | i n h e r i t ) | ( ? : u r l  (  s ∗ ( ? : ’ | ” )
            ? ( ? :   [ ’ ”   ) ] |   [ ˆ  ’ ”   ) ] | [ ˆ ’ ”   ) ] ) ∗ ( ? : ’ | ” ) ? s ∗  )
            ))




 http://westhoffswelt.de                  jakob@westhoffswelt.de                            slide: 3 / 26
Asking the audience


         Who does already work with regular expressions?

         Regular expressions like this:

      / [ a−zA−Z]+/

         Or like this:

      ( ? P<image >(? : none | i n h e r i t ) | ( ? : u r l  (  s ∗ ( ? : ’ | ” )
            ? ( ? :   [ ’ ”   ) ] |   [ ˆ  ’ ”   ) ] | [ ˆ ’ ”   ) ] ) ∗ ( ? : ’ | ” ) ? s ∗  )
            ))




 http://westhoffswelt.de                  jakob@westhoffswelt.de                            slide: 3 / 26
Goals of this session



         Learn advanced techniques to use in (PCRE) regular
         expressions
               Assertions
               Once only subpatterns
               Conditional subpatterns
               Pattern recursion
               ...
         Learn howto to handle Unicode in your regular expressions




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 4 / 26
Goals of this session



         Learn advanced techniques to use in (PCRE) regular
         expressions
               Assertions
               Once only subpatterns
               Conditional subpatterns
               Pattern recursion
               ...
         Learn howto to handle Unicode in your regular expressions




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 4 / 26
Goals of this session



         Learn advanced techniques to use in (PCRE) regular
         expressions
               Assertions
               Once only subpatterns
               Conditional subpatterns
               Pattern recursion
               ...
         Learn howto to handle Unicode in your regular expressions




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 4 / 26
What Regular Expressions are. . .




         In theoretical computer science:
               Express regular languages
               Languages which can be described by deterministic finite state
               automata
               Type 3 grammars in the Chomsky hierarchy




 http://westhoffswelt.de        jakob@westhoffswelt.de           slide: 5 / 26
What Regular Expressions are. . .




         In theoretical computer science:
               Express regular languages
               Languages which can be described by deterministic finite state
               automata
               Type 3 grammars in the Chomsky hierarchy




 http://westhoffswelt.de        jakob@westhoffswelt.de           slide: 5 / 26
What Regular Expressions are. . .




         In theoretical computer science:
               Express regular languages
               Languages which can be described by deterministic finite state
               automata
               Type 3 grammars in the Chomsky hierarchy




 http://westhoffswelt.de        jakob@westhoffswelt.de           slide: 5 / 26
What Regular Expressions are. . .



         In practical day to day usage:
   “[. . . ]regular expressions provide concise and flexible means for
   identifying strings of text of interest, such as particular characters,
   words, or patterns of characters.”

                                                          – Wikipedia [1]




 http://westhoffswelt.de      jakob@westhoffswelt.de            slide: 6 / 26
What Regular Expressions are. . .



         In practical day to day usage:
   “[. . . ]regular expressions provide concise and flexible means for
   identifying strings of text of interest, such as particular characters,
   words, or patterns of characters.”

                                                          – Wikipedia [1]




 http://westhoffswelt.de      jakob@westhoffswelt.de            slide: 6 / 26
What Regular Expressions are. . .



         In practical day to day usage:
   “[. . . ]regular expressions provide concise and flexible means for
   identifying strings of text of interest, such as particular characters,
   words, or patterns of characters.”

                                                          – Wikipedia [1]




 http://westhoffswelt.de      jakob@westhoffswelt.de            slide: 6 / 26
What Regular Expressions are. . .



         In practical day to day usage:
   “[. . . ]regular expressions provide concise and flexible means for
   identifying strings of text of interest, such as particular characters,
   words, or patterns of characters.”

                                                          – Wikipedia [1]




 http://westhoffswelt.de      jakob@westhoffswelt.de            slide: 6 / 26
What Regular Expressions are. . .



         In practical day to day usage:
   “[. . . ]regular expressions provide concise and flexible means for
   identifying strings of text of interest, such as particular characters,
   words, or patterns of characters.”

                                                          – Wikipedia [1]




 http://westhoffswelt.de      jakob@westhoffswelt.de            slide: 6 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Building Blocks of a Regular Expression


         Basic structure of every regular expression

                                  /[a-z]+/im
         Delimiter
               Equal characters of arbitrary choice (must be escaped in
               expression)
               May be ( and ) in PCRE
         Expression
         Modifier
               A sequence of characters providing processing instructions




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 7 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Getting everybody up to speed



         ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
         ^, $ - Start and end of subject (or line in multiline mode)
         foo|bar - Logical Or
         (foo)(bar) - Subpattern grouping
         /(foo|bar)baz(1)/ - Backreferences
         [a-z], [^a-z] - Character classes




 http://westhoffswelt.de      jakob@westhoffswelt.de          slide: 8 / 26
Grouping Without Subpattern Creation




         Grouping might be needed without creating a subpattern

                            /(?:foobar)*/




 http://westhoffswelt.de    jakob@westhoffswelt.de        slide: 9 / 26
Grouping Without Subpattern Creation




         Grouping might be needed without creating a subpattern

                            /(?:foobar)*/




 http://westhoffswelt.de    jakob@westhoffswelt.de        slide: 9 / 26
Subpattern identification

         Subpatterns are numbered by opening paranthesis
         /(foo(bar)(baz))/
           1   foobarbaz
           2   bar
           3   baz


         Matches available from within PHP

      $ma tc h e s = a r r a y (
        0 => ” f o o b a r b a z ” ,
        1 => ” f o o b a r b a z ” ,
        2 => ” b a r ” ,
        3 => ” baz ” ,
      )




 http://westhoffswelt.de            jakob@westhoffswelt.de   slide: 10 / 26
Subpattern identification

         Subpatterns are numbered by opening paranthesis
         /(foo(bar)(baz))/
           1   foobarbaz
           2   bar
           3   baz


         Matches available from within PHP

      $ma tc h e s = a r r a y (
        0 => ” f o o b a r b a z ” ,
        1 => ” f o o b a r b a z ” ,
        2 => ” b a r ” ,
        3 => ” baz ” ,
      )




 http://westhoffswelt.de            jakob@westhoffswelt.de   slide: 10 / 26
Subpattern identification

         Subpatterns are numbered by opening paranthesis
         /(foo(bar)(baz))/
           1   foobarbaz
           2   bar
           3   baz


         Matches available from within PHP

      $ma tc h e s = a r r a y (
        0 => ” f o o b a r b a z ” ,
        1 => ” f o o b a r b a z ” ,
        2 => ” b a r ” ,
        3 => ” baz ” ,
      )




 http://westhoffswelt.de            jakob@westhoffswelt.de   slide: 10 / 26
Subpattern identification

         Subpatterns are numbered by opening paranthesis
         /(foo(bar)(baz))/
           1   foobarbaz
           2   bar
           3   baz


         Matches available from within PHP

      $ma tc h e s = a r r a y (
        0 => ” f o o b a r b a z ” ,
        1 => ” f o o b a r b a z ” ,
        2 => ” b a r ” ,
        3 => ” baz ” ,
      )




 http://westhoffswelt.de            jakob@westhoffswelt.de   slide: 10 / 26
Subpattern identification

         Subpatterns are numbered by opening paranthesis
         /(foo(bar)(baz))/
           1   foobarbaz
           2   bar
           3   baz


         Matches available from within PHP

      $ma tc h e s = a r r a y (
        0 => ” f o o b a r b a z ” ,
        1 => ” f o o b a r b a z ” ,
        2 => ” b a r ” ,
        3 => ” baz ” ,
      )




 http://westhoffswelt.de            jakob@westhoffswelt.de   slide: 10 / 26
Subpattern identification

         Subpatterns are numbered by opening paranthesis
         /(foo(bar)(baz))/
           1   foobarbaz
           2   bar
           3   baz


         Matches available from within PHP

      $ma tc h e s = a r r a y (
        0 => ” f o o b a r b a z ” ,
        1 => ” f o o b a r b a z ” ,
        2 => ” b a r ” ,
        3 => ” baz ” ,
      )




 http://westhoffswelt.de            jakob@westhoffswelt.de   slide: 10 / 26
Subpattern Naming

         PCRE allows custom naming

      /(?P<firstname>[A-Za-z]+) (?P<lastname>[A-Za-z]+)/


         Result with input Jakob Westhoff

      array (
        0 => ’ Jakob W e s t h o f f ’ ,
        ’ f i r s t n a m e ’ => ’ Jakob ’ ,
        1 => ’ Jakob ’ ,
        ’ l a s t n a m e ’ => ’ W e s t h o f f ’ ,
        2 => ’ W e s t h o f f ’ ,
      )




 http://westhoffswelt.de              jakob@westhoffswelt.de   slide: 11 / 26
Subpattern Naming

         PCRE allows custom naming

      /(?P<firstname>[A-Za-z]+) (?P<lastname>[A-Za-z]+)/


         Result with input Jakob Westhoff

      array (
        0 => ’ Jakob W e s t h o f f ’ ,
        ’ f i r s t n a m e ’ => ’ Jakob ’ ,
        1 => ’ Jakob ’ ,
        ’ l a s t n a m e ’ => ’ W e s t h o f f ’ ,
        2 => ’ W e s t h o f f ’ ,
      )




 http://westhoffswelt.de              jakob@westhoffswelt.de   slide: 11 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Assertions

         Formulate assertions on the matched string without
         consuming them

         Example

                             /foo(?=foo)/
         Input

                               foofoofoo
         Match

                               foofoofoo



 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 12 / 26
Negative Assertions




         Negative assertions are possible

         foo not followed by another foo

                              /foo(?!foo)/




 http://westhoffswelt.de      jakob@westhoffswelt.de   slide: 13 / 26
Negative Assertions




         Negative assertions are possible

         foo not followed by another foo

                              /foo(?!foo)/




 http://westhoffswelt.de      jakob@westhoffswelt.de   slide: 13 / 26
Backward Assertions


         bar preceeded by foo

                              ////////// /
                              /(?=foo)bar// ?
                               ////////// /
         Backward assertion

                              /(?<=foo)bar/

         Negative backward assertion
         bar not preceeded by foo

                              /(?<!foo)bar/



 http://westhoffswelt.de       jakob@westhoffswelt.de   slide: 14 / 26
Backward Assertions


         bar preceeded by foo

                              /(?=foo)bar/ ?
         Backward assertion

                              /(?<=foo)bar/

         Negative backward assertion
         bar not preceeded by foo

                              /(?<!foo)bar/



 http://westhoffswelt.de       jakob@westhoffswelt.de   slide: 14 / 26
Backward Assertions


         bar preceeded by foo

                              ////////// /
                              /(?=foo)bar// ?
                               ////////// /
         Backward assertion

                              /(?<=foo)bar/

         Negative backward assertion
         bar not preceeded by foo

                              /(?<!foo)bar/



 http://westhoffswelt.de       jakob@westhoffswelt.de   slide: 14 / 26
Backward Assertions


         bar preceeded by foo

                              ////////// /
                              /(?=foo)bar// ?
                               ////////// /
         Backward assertion

                              /(?<=foo)bar/

         Negative backward assertion
         bar not preceeded by foo

                              /(?<!foo)bar/



 http://westhoffswelt.de       jakob@westhoffswelt.de   slide: 14 / 26
Inner workings of the PCRE matcher


          PCRE uses backtracking to find matches

          Pattern: /d+foo/
          Subject: 123456789bar

      1   Eat up all the numbers: 123456789
      2   Try to match foo
      3   Backtrack one number and try to match foo again
      4   Repeat step 3 until a match is found or the subjects beginning
          is reached




 http://westhoffswelt.de       jakob@westhoffswelt.de        slide: 15 / 26
Inner workings of the PCRE matcher


          PCRE uses backtracking to find matches

          Pattern: /d+foo/
          Subject: 123456789bar

      1   Eat up all the numbers: 123456789
      2   Try to match foo
      3   Backtrack one number and try to match foo again
      4   Repeat step 3 until a match is found or the subjects beginning
          is reached




 http://westhoffswelt.de       jakob@westhoffswelt.de        slide: 15 / 26
Inner workings of the PCRE matcher


          PCRE uses backtracking to find matches

          Pattern: /d+foo/
          Subject: 123456789bar

      1   Eat up all the numbers: 123456789
      2   Try to match foo
      3   Backtrack one number and try to match foo again
      4   Repeat step 3 until a match is found or the subjects beginning
          is reached




 http://westhoffswelt.de       jakob@westhoffswelt.de        slide: 15 / 26
Inner workings of the PCRE matcher


          PCRE uses backtracking to find matches

          Pattern: /d+foo/
          Subject: 123456789bar

      1   Eat up all the numbers: 123456789
      2   Try to match foo
      3   Backtrack one number and try to match foo again
      4   Repeat step 3 until a match is found or the subjects beginning
          is reached




 http://westhoffswelt.de       jakob@westhoffswelt.de        slide: 15 / 26
Inner workings of the PCRE matcher


          PCRE uses backtracking to find matches

          Pattern: /d+foo/
          Subject: 123456789bar

      1   Eat up all the numbers: 123456789
      2   Try to match foo
      3   Backtrack one number and try to match foo again
      4   Repeat step 3 until a match is found or the subjects beginning
          is reached




 http://westhoffswelt.de       jakob@westhoffswelt.de        slide: 15 / 26
Inner workings of the PCRE matcher


          PCRE uses backtracking to find matches

          Pattern: /d+foo/
          Subject: 123456789bar

      1   Eat up all the numbers: 123456789
      2   Try to match foo
      3   Backtrack one number and try to match foo again
      4   Repeat step 3 until a match is found or the subjects beginning
          is reached




 http://westhoffswelt.de       jakob@westhoffswelt.de        slide: 15 / 26
Once only subpattern


         Once only subpatterns prevent backtracking once a certain
         pattern has acquired a match.

         Applying a once only pattern to the shown example

                               /(?>d+)foo/
         After matching the numbers and determining the following
         string is not foo the matcher stops
               123456789bar

         Can massively improve regex speed if used correctly




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 16 / 26
Once only subpattern


         Once only subpatterns prevent backtracking once a certain
         pattern has acquired a match.

         Applying a once only pattern to the shown example

                               /(?>d+)foo/
         After matching the numbers and determining the following
         string is not foo the matcher stops
               123456789bar

         Can massively improve regex speed if used correctly




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 16 / 26
Once only subpattern


         Once only subpatterns prevent backtracking once a certain
         pattern has acquired a match.

         Applying a once only pattern to the shown example

                               /(?>d+)foo/
         After matching the numbers and determining the following
         string is not foo the matcher stops
               123456789bar

         Can massively improve regex speed if used correctly




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 16 / 26
Once only subpattern


         Once only subpatterns prevent backtracking once a certain
         pattern has acquired a match.

         Applying a once only pattern to the shown example

                               /(?>d+)foo/
         After matching the numbers and determining the following
         string is not foo the matcher stops
               123456789bar

         Can massively improve regex speed if used correctly




 http://westhoffswelt.de       jakob@westhoffswelt.de       slide: 16 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Conditional subpattern


         If statement aquivalent in PCRE

               /(?(condition)yes-pattern|no-pattern)/
         Conditions can be direct matches or assertions

         Numbers need to be followed by foo, while everything else
         needs to be followed by bar

                          /(?(d+)foo|bar)/




 http://westhoffswelt.de     jakob@westhoffswelt.de         slide: 17 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Character, code points and graphemes



         Unicode consists of different code points
               The letter a: U+0061
               The mark ‘: U+0300

         One character might consist of multiple code points
               The letter a with the mark ‘ (`) : U+0061 U+0300
                                             a

         Some of these combinations exists as single code points
               The letter `: U+00E0
                          a




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 18 / 26
Unicode: Pattern matching


         Unicode processing is enabled using the u modifier
         PCRE works on UTF-8 encoded strings
         Each code point is handled as one character

         Match any unicode code point: x{FFFF}

         Remember the letter a with the mark ‘ (`)
                                                a

                          /x{0061}x{0030}/U




 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 19 / 26
Unicode: Pattern matching


         Unicode processing is enabled using the u modifier
         PCRE works on UTF-8 encoded strings
         Each code point is handled as one character

         Match any unicode code point: x{FFFF}

         Remember the letter a with the mark ‘ (`)
                                                a

                          /x{0061}x{0030}/U




 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 19 / 26
Unicode: Pattern matching


         Unicode processing is enabled using the u modifier
         PCRE works on UTF-8 encoded strings
         Each code point is handled as one character

         Match any unicode code point: x{FFFF}

         Remember the letter a with the mark ‘ (`)
                                                a

                          /x{0061}x{0030}/U




 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 19 / 26
Unicode: Pattern matching


         Unicode processing is enabled using the u modifier
         PCRE works on UTF-8 encoded strings
         Each code point is handled as one character

         Match any unicode code point: x{FFFF}

         Remember the letter a with the mark ‘ (`)
                                                a

                          /x{0061}x{0030}/U




 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 19 / 26
Unicode: Pattern matching


         Unicode processing is enabled using the u modifier
         PCRE works on UTF-8 encoded strings
         Each code point is handled as one character

         Match any unicode code point: x{FFFF}

         Remember the letter a with the mark ‘ (`)
                                                a

                          /x{0061}x{0030}/U




 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 19 / 26
Unicode: Pattern matching


         Unicode processing is enabled using the u modifier
         PCRE works on UTF-8 encoded strings
         Each code point is handled as one character

         Match any unicode code point: x{FFFF}

         Remember the letter a with the mark ‘ (`)
                                                a

                          /x{0061}x{0030}/U




 http://westhoffswelt.de     jakob@westhoffswelt.de        slide: 19 / 26
Unicode: Extended unicode sequences



         How to match the single and multi code point character?
               Remember: ` = U+0061 U+0300 oder U+00E0
                         a
         Using escape for extended unicode sequences: X

         X is aquivalent to (?>P{M}p{M}*)
               Wait. What? → Unicode character properties




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 20 / 26
Unicode: Extended unicode sequences



         How to match the single and multi code point character?
               Remember: ` = U+0061 U+0300 oder U+00E0
                         a
         Using escape for extended unicode sequences: X

         X is aquivalent to (?>P{M}p{M}*)
               Wait. What? → Unicode character properties




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 20 / 26
Unicode: Extended unicode sequences



         How to match the single and multi code point character?
               Remember: ` = U+0061 U+0300 oder U+00E0
                         a
         Using escape for extended unicode sequences: X

         X is aquivalent to (?>P{M}p{M}*)
               Wait. What? → Unicode character properties




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 20 / 26
Unicode: Extended unicode sequences



         How to match the single and multi code point character?
               Remember: ` = U+0061 U+0300 oder U+00E0
                         a
         Using escape for extended unicode sequences: X

         X is aquivalent to (?>P{M}p{M}*)
               Wait. What? → Unicode character properties




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 20 / 26
Unicode: Extended unicode sequences



         How to match the single and multi code point character?
               Remember: ` = U+0061 U+0300 oder U+00E0
                         a
         Using escape for extended unicode sequences: X

         X is aquivalent to (?>P{M}p{M}*)
               Wait. What? → Unicode character properties




 http://westhoffswelt.de       jakob@westhoffswelt.de         slide: 20 / 26
Unicode: Character properties

         Every unicode code point has a certain property assigned
         Characters may be matched by these properties
         Escapes p and P are used for this:
               p{xx}: All code points with the property xx
               P{xx}: All code points without the property xx

         Possible properties:
               L: Letter
               M: Mark
               P: Punctation
               Sc: Currency symbol
               ...




 http://westhoffswelt.de         jakob@westhoffswelt.de            slide: 21 / 26
Unicode: Character properties

         Every unicode code point has a certain property assigned
         Characters may be matched by these properties
         Escapes p and P are used for this:
               p{xx}: All code points with the property xx
               P{xx}: All code points without the property xx

         Possible properties:
               L: Letter
               M: Mark
               P: Punctation
               Sc: Currency symbol
               ...




 http://westhoffswelt.de         jakob@westhoffswelt.de            slide: 21 / 26
Unicode: Character properties

         Every unicode code point has a certain property assigned
         Characters may be matched by these properties
         Escapes p and P are used for this:
               p{xx}: All code points with the property xx
               P{xx}: All code points without the property xx

         Possible properties:
               L: Letter
               M: Mark
               P: Punctation
               Sc: Currency symbol
               ...




 http://westhoffswelt.de         jakob@westhoffswelt.de            slide: 21 / 26
Unicode: Character properties

         Every unicode code point has a certain property assigned
         Characters may be matched by these properties
         Escapes p and P are used for this:
               p{xx}: All code points with the property xx
               P{xx}: All code points without the property xx

         Possible properties:
               L: Letter
               M: Mark
               P: Punctation
               Sc: Currency symbol
               ...




 http://westhoffswelt.de         jakob@westhoffswelt.de            slide: 21 / 26
Pattern Recursion



         Recursion in regular expressions ?
         Possible with PCRE

         Validate BB-Code using PCRE

                          [b]Hello [i]World[/i]![/b]




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 22 / 26
Pattern Recursion



         Recursion in regular expressions ?
         Possible with PCRE

         Validate BB-Code using PCRE

                          [b]Hello [i]World[/i]![/b]




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 22 / 26
Pattern Recursion



         Recursion in regular expressions ?
         Possible with PCRE

         Validate BB-Code using PCRE

                          [b]Hello [i]World[/i]![/b]




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 22 / 26
BB-Code Recursion Example

                          [b]Hello [i]World[/i]![/b]



         Recursive regular expression pattern

   (
       [^[]*
         [(b|i)]
           (?:[^[]+|(?R))
         [/1]
       [^[]*
   )




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 23 / 26
BB-Code Recursion Example

                          [b]Hello [i]World[/i]![/b]



         Recursive regular expression pattern

   (
       [^[]*
         [(b|i)]
           (?:[^[]+|(?R))
         [/1]
       [^[]*
   )




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 23 / 26
BB-Code Recursion Example

                          [b]Hello [i]World[/i]![/b]



         Recursive regular expression pattern

   (
       [^[]*
         [(b|i)]
           (?:[^[]+|(?R))
         [/1]
       [^[]*
   )




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 23 / 26
BB-Code Recursion Example

                          [b]Hello [i]World[/i]![/b]



         Recursive regular expression pattern

   (
       [^[]*
         [(b|i)]
           (?:[^[]+|(?R))
         [/1]
       [^[]*
   )




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 23 / 26
BB-Code Recursion Example

                          [b]Hello [i]World[/i]![/b]



         Recursive regular expression pattern

   (
       [^[]*
         [(b|i)]
           (?:[^[]+|(?R))
         [/1]
       [^[]*
   )




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 23 / 26
BB-Code Recursion Example

                          [b]Hello [i]World[/i]![/b]



         Recursive regular expression pattern

   (
       [^[]*
         [(b|i)]
           (?:[^[]+|(?R))
         [/1]
       [^[]*
   )




 http://westhoffswelt.de        jakob@westhoffswelt.de   slide: 23 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Do NOT Parse Using Regular Expressions


         Even though this is possible you do NOT want to do it
               It is not maintainable
               It is nearly impossible to find errors
               Useful information extraction (building an AST) is not possible

         Use regular expressions for
               Match Patterns (not recursive structures)
               Tokenizing strings
               Validate really restricted input values




 http://westhoffswelt.de        jakob@westhoffswelt.de            slide: 24 / 26
Thanks for listening


          Questions, comments or annotations?


         Slides: http://westhoffswelt.de/portfolio.htm
                Contact: Jakob Westhoff <jakob@php.net>
                          Twitter: @jakobwesthoff

      Please leave comments and vote at: http://joind.in/1620




 http://westhoffswelt.de    jakob@westhoffswelt.de    slide: 25 / 26
Bibliography I




   [1] Wikipedia.
       Regular expressions — wikipedia, the free encyclopedia, 2002.
       [Online; accessed 25-February-2002].




 http://westhoffswelt.de        jakob@westhoffswelt.de              slide: 26 / 26

Weitere ähnliche Inhalte

Was ist angesagt?

TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painSander Mak (@Sander_Mak)
 
Getting Started with TypeScript
Getting Started with TypeScriptGetting Started with TypeScript
Getting Started with TypeScriptGil Fink
 
Introduction to TypeScript by Winston Levi
Introduction to TypeScript by Winston LeviIntroduction to TypeScript by Winston Levi
Introduction to TypeScript by Winston LeviWinston Levi
 
Type script - advanced usage and practices
Type script  - advanced usage and practicesType script  - advanced usage and practices
Type script - advanced usage and practicesIwan van der Kleijn
 
Typescript in 30mins
Typescript in 30mins Typescript in 30mins
Typescript in 30mins Udaya Kumar
 
Introducing type script
Introducing type scriptIntroducing type script
Introducing type scriptRemo Jansen
 
TypeScript - Silver Bullet for the Full-stack Developers
TypeScript - Silver Bullet for the Full-stack DevelopersTypeScript - Silver Bullet for the Full-stack Developers
TypeScript - Silver Bullet for the Full-stack DevelopersRutenis Turcinas
 
Greach 2014 - Metaprogramming with groovy
Greach 2014 - Metaprogramming with groovyGreach 2014 - Metaprogramming with groovy
Greach 2014 - Metaprogramming with groovyIván López Martín
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScriptOffirmo
 
Understanding Java Dynamic Proxies
Understanding Java Dynamic ProxiesUnderstanding Java Dynamic Proxies
Understanding Java Dynamic ProxiesRafael Luque Leiva
 
Typescript 101 introduction
Typescript 101   introductionTypescript 101   introduction
Typescript 101 introductionBob German
 

Was ist angesagt? (20)

TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the pain
 
Getting Started with TypeScript
Getting Started with TypeScriptGetting Started with TypeScript
Getting Started with TypeScript
 
Introduction to TypeScript by Winston Levi
Introduction to TypeScript by Winston LeviIntroduction to TypeScript by Winston Levi
Introduction to TypeScript by Winston Levi
 
Type script - advanced usage and practices
Type script  - advanced usage and practicesType script  - advanced usage and practices
Type script - advanced usage and practices
 
TypeScript intro
TypeScript introTypeScript intro
TypeScript intro
 
Typescript in 30mins
Typescript in 30mins Typescript in 30mins
Typescript in 30mins
 
Introducing type script
Introducing type scriptIntroducing type script
Introducing type script
 
TypeScript - Silver Bullet for the Full-stack Developers
TypeScript - Silver Bullet for the Full-stack DevelopersTypeScript - Silver Bullet for the Full-stack Developers
TypeScript - Silver Bullet for the Full-stack Developers
 
Greach 2014 - Metaprogramming with groovy
Greach 2014 - Metaprogramming with groovyGreach 2014 - Metaprogramming with groovy
Greach 2014 - Metaprogramming with groovy
 
TypeScript - An Introduction
TypeScript - An IntroductionTypeScript - An Introduction
TypeScript - An Introduction
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScript
 
Understanding Java Dynamic Proxies
Understanding Java Dynamic ProxiesUnderstanding Java Dynamic Proxies
Understanding Java Dynamic Proxies
 
Php 7 crash course
Php 7 crash coursePhp 7 crash course
Php 7 crash course
 
Modularity problems
Modularity  problemsModularity  problems
Modularity problems
 
TypeScript 101
TypeScript 101TypeScript 101
TypeScript 101
 
Typescript ppt
Typescript pptTypescript ppt
Typescript ppt
 
TypeScript
TypeScriptTypeScript
TypeScript
 
TypeScript Overview
TypeScript OverviewTypeScript Overview
TypeScript Overview
 
Learning typescript
Learning typescriptLearning typescript
Learning typescript
 
Typescript 101 introduction
Typescript 101   introductionTypescript 101   introduction
Typescript 101 introduction
 

Andere mochten auch

Developing CouchApps
Developing CouchAppsDeveloping CouchApps
Developing CouchAppswesthoff
 
QlikView 11: Work Smarter, Not Harder.
QlikView 11: Work Smarter, Not Harder.QlikView 11: Work Smarter, Not Harder.
QlikView 11: Work Smarter, Not Harder.Infinity Info Systems
 
KliqPlan Overview
KliqPlan OverviewKliqPlan Overview
KliqPlan OverviewKT-Labs
 
Practical qlikview 25 page sample
Practical qlikview   25 page samplePractical qlikview   25 page sample
Practical qlikview 25 page samplePractical QlikView
 
Qlik View Corporate Overview Ppt Presentation
Qlik View Corporate Overview Ppt PresentationQlik View Corporate Overview Ppt Presentation
Qlik View Corporate Overview Ppt Presentationpdalalau
 
Best Practices - QlikView Application Development
Best Practices - QlikView Application DevelopmentBest Practices - QlikView Application Development
Best Practices - QlikView Application DevelopmentTBSL
 
Qlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docxQlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docxAntonino Barbaro ©
 

Andere mochten auch (7)

Developing CouchApps
Developing CouchAppsDeveloping CouchApps
Developing CouchApps
 
QlikView 11: Work Smarter, Not Harder.
QlikView 11: Work Smarter, Not Harder.QlikView 11: Work Smarter, Not Harder.
QlikView 11: Work Smarter, Not Harder.
 
KliqPlan Overview
KliqPlan OverviewKliqPlan Overview
KliqPlan Overview
 
Practical qlikview 25 page sample
Practical qlikview   25 page samplePractical qlikview   25 page sample
Practical qlikview 25 page sample
 
Qlik View Corporate Overview Ppt Presentation
Qlik View Corporate Overview Ppt PresentationQlik View Corporate Overview Ppt Presentation
Qlik View Corporate Overview Ppt Presentation
 
Best Practices - QlikView Application Development
Best Practices - QlikView Application DevelopmentBest Practices - QlikView Application Development
Best Practices - QlikView Application Development
 
Qlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docxQlik project methodology handbook v 1.0 docx
Qlik project methodology handbook v 1.0 docx
 

Ähnlich wie Deeper Down Rabbit Hole Advanced Regex

Scala / Technology evolution
Scala  / Technology evolutionScala  / Technology evolution
Scala / Technology evolutionRuslan Shevchenko
 
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...openCypher
 
Scala design pattern
Scala design patternScala design pattern
Scala design patternKenji Yoshida
 
Compiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor ServicesCompiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor ServicesEelco Visser
 
Groovy DSLs - S2GForum London 2011 - Guillaume Laforge
Groovy DSLs - S2GForum London 2011 - Guillaume LaforgeGroovy DSLs - S2GForum London 2011 - Guillaume Laforge
Groovy DSLs - S2GForum London 2011 - Guillaume LaforgeGuillaume Laforge
 
Pharo Smalltalk as Universal Development Platform
Pharo Smalltalk as Universal Development PlatformPharo Smalltalk as Universal Development Platform
Pharo Smalltalk as Universal Development PlatformESUG
 
A recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesA recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesCoen De Roover
 
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler DesignKuppusamy P
 
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional ProgrammingLev Walkin
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...SemWebPro
 

Ähnlich wie Deeper Down Rabbit Hole Advanced Regex (19)

Scala / Technology evolution
Scala  / Technology evolutionScala  / Technology evolution
Scala / Technology evolution
 
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
 
Scala design pattern
Scala design patternScala design pattern
Scala design pattern
 
Compiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor ServicesCompiler Construction | Lecture 3 | Syntactic Editor Services
Compiler Construction | Lecture 3 | Syntactic Editor Services
 
Groovy DSLs - S2GForum London 2011 - Guillaume Laforge
Groovy DSLs - S2GForum London 2011 - Guillaume LaforgeGroovy DSLs - S2GForum London 2011 - Guillaume Laforge
Groovy DSLs - S2GForum London 2011 - Guillaume Laforge
 
Pharo Smalltalk as Universal Development Platform
Pharo Smalltalk as Universal Development PlatformPharo Smalltalk as Universal Development Platform
Pharo Smalltalk as Universal Development Platform
 
A recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesA recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templates
 
8074448.ppt
8074448.ppt8074448.ppt
8074448.ppt
 
Design Patterns
Design PatternsDesign Patterns
Design Patterns
 
UNIT 1 part II.ppt
UNIT 1 part II.pptUNIT 1 part II.ppt
UNIT 1 part II.ppt
 
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...RuleML2015:  Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional Programming
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
10-DesignPatterns.ppt
10-DesignPatterns.ppt10-DesignPatterns.ppt
10-DesignPatterns.ppt
 
Ase02 dmp.ppt
Ase02 dmp.pptAse02 dmp.ppt
Ase02 dmp.ppt
 
Lazy evaluation
Lazy evaluationLazy evaluation
Lazy evaluation
 
Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...
 

Kürzlich hochgeladen

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Deeper Down Rabbit Hole Advanced Regex

  • 1. Deeper down the rabbit hole Advanced Regular Expressions Jakob Westhoff <jakob@php.net> @jakobwesthoff PHPBarcamp.at May 3, 2010 http://westhoffswelt.de jakob@westhoffswelt.de slide: 1 / 26
  • 2. About Me Jakob Westhoff PHP developer for several years Computer science student at the TU Dortmund Co-Founder of the PHP Usergroup Dortmund Active in different Open Source projects http://westhoffswelt.de jakob@westhoffswelt.de slide: 2 / 26
  • 3. Asking the audience Who does already work with regular expressions? Regular expressions like this: / [ a−zA−Z]+/ Or like this: ( ? P<image >(? : none | i n h e r i t ) | ( ? : u r l ( s ∗ ( ? : ’ | ” ) ? ( ? : [ ’ ” ) ] | [ ˆ ’ ” ) ] | [ ˆ ’ ” ) ] ) ∗ ( ? : ’ | ” ) ? s ∗ ) )) http://westhoffswelt.de jakob@westhoffswelt.de slide: 3 / 26
  • 4. Asking the audience Who does already work with regular expressions? Regular expressions like this: / [ a−zA−Z]+/ Or like this: ( ? P<image >(? : none | i n h e r i t ) | ( ? : u r l ( s ∗ ( ? : ’ | ” ) ? ( ? : [ ’ ” ) ] | [ ˆ ’ ” ) ] | [ ˆ ’ ” ) ] ) ∗ ( ? : ’ | ” ) ? s ∗ ) )) http://westhoffswelt.de jakob@westhoffswelt.de slide: 3 / 26
  • 5. Asking the audience Who does already work with regular expressions? Regular expressions like this: / [ a−zA−Z]+/ Or like this: ( ? P<image >(? : none | i n h e r i t ) | ( ? : u r l ( s ∗ ( ? : ’ | ” ) ? ( ? : [ ’ ” ) ] | [ ˆ ’ ” ) ] | [ ˆ ’ ” ) ] ) ∗ ( ? : ’ | ” ) ? s ∗ ) )) http://westhoffswelt.de jakob@westhoffswelt.de slide: 3 / 26
  • 6. Goals of this session Learn advanced techniques to use in (PCRE) regular expressions Assertions Once only subpatterns Conditional subpatterns Pattern recursion ... Learn howto to handle Unicode in your regular expressions http://westhoffswelt.de jakob@westhoffswelt.de slide: 4 / 26
  • 7. Goals of this session Learn advanced techniques to use in (PCRE) regular expressions Assertions Once only subpatterns Conditional subpatterns Pattern recursion ... Learn howto to handle Unicode in your regular expressions http://westhoffswelt.de jakob@westhoffswelt.de slide: 4 / 26
  • 8. Goals of this session Learn advanced techniques to use in (PCRE) regular expressions Assertions Once only subpatterns Conditional subpatterns Pattern recursion ... Learn howto to handle Unicode in your regular expressions http://westhoffswelt.de jakob@westhoffswelt.de slide: 4 / 26
  • 9. What Regular Expressions are. . . In theoretical computer science: Express regular languages Languages which can be described by deterministic finite state automata Type 3 grammars in the Chomsky hierarchy http://westhoffswelt.de jakob@westhoffswelt.de slide: 5 / 26
  • 10. What Regular Expressions are. . . In theoretical computer science: Express regular languages Languages which can be described by deterministic finite state automata Type 3 grammars in the Chomsky hierarchy http://westhoffswelt.de jakob@westhoffswelt.de slide: 5 / 26
  • 11. What Regular Expressions are. . . In theoretical computer science: Express regular languages Languages which can be described by deterministic finite state automata Type 3 grammars in the Chomsky hierarchy http://westhoffswelt.de jakob@westhoffswelt.de slide: 5 / 26
  • 12. What Regular Expressions are. . . In practical day to day usage: “[. . . ]regular expressions provide concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” – Wikipedia [1] http://westhoffswelt.de jakob@westhoffswelt.de slide: 6 / 26
  • 13. What Regular Expressions are. . . In practical day to day usage: “[. . . ]regular expressions provide concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” – Wikipedia [1] http://westhoffswelt.de jakob@westhoffswelt.de slide: 6 / 26
  • 14. What Regular Expressions are. . . In practical day to day usage: “[. . . ]regular expressions provide concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” – Wikipedia [1] http://westhoffswelt.de jakob@westhoffswelt.de slide: 6 / 26
  • 15. What Regular Expressions are. . . In practical day to day usage: “[. . . ]regular expressions provide concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” – Wikipedia [1] http://westhoffswelt.de jakob@westhoffswelt.de slide: 6 / 26
  • 16. What Regular Expressions are. . . In practical day to day usage: “[. . . ]regular expressions provide concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” – Wikipedia [1] http://westhoffswelt.de jakob@westhoffswelt.de slide: 6 / 26
  • 17. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 18. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 19. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 20. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 21. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 22. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 23. Building Blocks of a Regular Expression Basic structure of every regular expression /[a-z]+/im Delimiter Equal characters of arbitrary choice (must be escaped in expression) May be ( and ) in PCRE Expression Modifier A sequence of characters providing processing instructions http://westhoffswelt.de jakob@westhoffswelt.de slide: 7 / 26
  • 24. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 25. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 26. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 27. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 28. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 29. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 30. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 31. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 32. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 33. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 34. Getting everybody up to speed ., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions ^, $ - Start and end of subject (or line in multiline mode) foo|bar - Logical Or (foo)(bar) - Subpattern grouping /(foo|bar)baz(1)/ - Backreferences [a-z], [^a-z] - Character classes http://westhoffswelt.de jakob@westhoffswelt.de slide: 8 / 26
  • 35. Grouping Without Subpattern Creation Grouping might be needed without creating a subpattern /(?:foobar)*/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 9 / 26
  • 36. Grouping Without Subpattern Creation Grouping might be needed without creating a subpattern /(?:foobar)*/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 9 / 26
  • 37. Subpattern identification Subpatterns are numbered by opening paranthesis /(foo(bar)(baz))/ 1 foobarbaz 2 bar 3 baz Matches available from within PHP $ma tc h e s = a r r a y ( 0 => ” f o o b a r b a z ” , 1 => ” f o o b a r b a z ” , 2 => ” b a r ” , 3 => ” baz ” , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 10 / 26
  • 38. Subpattern identification Subpatterns are numbered by opening paranthesis /(foo(bar)(baz))/ 1 foobarbaz 2 bar 3 baz Matches available from within PHP $ma tc h e s = a r r a y ( 0 => ” f o o b a r b a z ” , 1 => ” f o o b a r b a z ” , 2 => ” b a r ” , 3 => ” baz ” , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 10 / 26
  • 39. Subpattern identification Subpatterns are numbered by opening paranthesis /(foo(bar)(baz))/ 1 foobarbaz 2 bar 3 baz Matches available from within PHP $ma tc h e s = a r r a y ( 0 => ” f o o b a r b a z ” , 1 => ” f o o b a r b a z ” , 2 => ” b a r ” , 3 => ” baz ” , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 10 / 26
  • 40. Subpattern identification Subpatterns are numbered by opening paranthesis /(foo(bar)(baz))/ 1 foobarbaz 2 bar 3 baz Matches available from within PHP $ma tc h e s = a r r a y ( 0 => ” f o o b a r b a z ” , 1 => ” f o o b a r b a z ” , 2 => ” b a r ” , 3 => ” baz ” , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 10 / 26
  • 41. Subpattern identification Subpatterns are numbered by opening paranthesis /(foo(bar)(baz))/ 1 foobarbaz 2 bar 3 baz Matches available from within PHP $ma tc h e s = a r r a y ( 0 => ” f o o b a r b a z ” , 1 => ” f o o b a r b a z ” , 2 => ” b a r ” , 3 => ” baz ” , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 10 / 26
  • 42. Subpattern identification Subpatterns are numbered by opening paranthesis /(foo(bar)(baz))/ 1 foobarbaz 2 bar 3 baz Matches available from within PHP $ma tc h e s = a r r a y ( 0 => ” f o o b a r b a z ” , 1 => ” f o o b a r b a z ” , 2 => ” b a r ” , 3 => ” baz ” , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 10 / 26
  • 43. Subpattern Naming PCRE allows custom naming /(?P<firstname>[A-Za-z]+) (?P<lastname>[A-Za-z]+)/ Result with input Jakob Westhoff array ( 0 => ’ Jakob W e s t h o f f ’ , ’ f i r s t n a m e ’ => ’ Jakob ’ , 1 => ’ Jakob ’ , ’ l a s t n a m e ’ => ’ W e s t h o f f ’ , 2 => ’ W e s t h o f f ’ , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 11 / 26
  • 44. Subpattern Naming PCRE allows custom naming /(?P<firstname>[A-Za-z]+) (?P<lastname>[A-Za-z]+)/ Result with input Jakob Westhoff array ( 0 => ’ Jakob W e s t h o f f ’ , ’ f i r s t n a m e ’ => ’ Jakob ’ , 1 => ’ Jakob ’ , ’ l a s t n a m e ’ => ’ W e s t h o f f ’ , 2 => ’ W e s t h o f f ’ , ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 11 / 26
  • 45. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 46. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 47. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 48. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 49. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 50. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 51. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 52. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 53. Assertions Formulate assertions on the matched string without consuming them Example /foo(?=foo)/ Input foofoofoo Match foofoofoo http://westhoffswelt.de jakob@westhoffswelt.de slide: 12 / 26
  • 54. Negative Assertions Negative assertions are possible foo not followed by another foo /foo(?!foo)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 13 / 26
  • 55. Negative Assertions Negative assertions are possible foo not followed by another foo /foo(?!foo)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 13 / 26
  • 56. Backward Assertions bar preceeded by foo ////////// / /(?=foo)bar// ? ////////// / Backward assertion /(?<=foo)bar/ Negative backward assertion bar not preceeded by foo /(?<!foo)bar/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 14 / 26
  • 57. Backward Assertions bar preceeded by foo /(?=foo)bar/ ? Backward assertion /(?<=foo)bar/ Negative backward assertion bar not preceeded by foo /(?<!foo)bar/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 14 / 26
  • 58. Backward Assertions bar preceeded by foo ////////// / /(?=foo)bar// ? ////////// / Backward assertion /(?<=foo)bar/ Negative backward assertion bar not preceeded by foo /(?<!foo)bar/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 14 / 26
  • 59. Backward Assertions bar preceeded by foo ////////// / /(?=foo)bar// ? ////////// / Backward assertion /(?<=foo)bar/ Negative backward assertion bar not preceeded by foo /(?<!foo)bar/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 14 / 26
  • 60. Inner workings of the PCRE matcher PCRE uses backtracking to find matches Pattern: /d+foo/ Subject: 123456789bar 1 Eat up all the numbers: 123456789 2 Try to match foo 3 Backtrack one number and try to match foo again 4 Repeat step 3 until a match is found or the subjects beginning is reached http://westhoffswelt.de jakob@westhoffswelt.de slide: 15 / 26
  • 61. Inner workings of the PCRE matcher PCRE uses backtracking to find matches Pattern: /d+foo/ Subject: 123456789bar 1 Eat up all the numbers: 123456789 2 Try to match foo 3 Backtrack one number and try to match foo again 4 Repeat step 3 until a match is found or the subjects beginning is reached http://westhoffswelt.de jakob@westhoffswelt.de slide: 15 / 26
  • 62. Inner workings of the PCRE matcher PCRE uses backtracking to find matches Pattern: /d+foo/ Subject: 123456789bar 1 Eat up all the numbers: 123456789 2 Try to match foo 3 Backtrack one number and try to match foo again 4 Repeat step 3 until a match is found or the subjects beginning is reached http://westhoffswelt.de jakob@westhoffswelt.de slide: 15 / 26
  • 63. Inner workings of the PCRE matcher PCRE uses backtracking to find matches Pattern: /d+foo/ Subject: 123456789bar 1 Eat up all the numbers: 123456789 2 Try to match foo 3 Backtrack one number and try to match foo again 4 Repeat step 3 until a match is found or the subjects beginning is reached http://westhoffswelt.de jakob@westhoffswelt.de slide: 15 / 26
  • 64. Inner workings of the PCRE matcher PCRE uses backtracking to find matches Pattern: /d+foo/ Subject: 123456789bar 1 Eat up all the numbers: 123456789 2 Try to match foo 3 Backtrack one number and try to match foo again 4 Repeat step 3 until a match is found or the subjects beginning is reached http://westhoffswelt.de jakob@westhoffswelt.de slide: 15 / 26
  • 65. Inner workings of the PCRE matcher PCRE uses backtracking to find matches Pattern: /d+foo/ Subject: 123456789bar 1 Eat up all the numbers: 123456789 2 Try to match foo 3 Backtrack one number and try to match foo again 4 Repeat step 3 until a match is found or the subjects beginning is reached http://westhoffswelt.de jakob@westhoffswelt.de slide: 15 / 26
  • 66. Once only subpattern Once only subpatterns prevent backtracking once a certain pattern has acquired a match. Applying a once only pattern to the shown example /(?>d+)foo/ After matching the numbers and determining the following string is not foo the matcher stops 123456789bar Can massively improve regex speed if used correctly http://westhoffswelt.de jakob@westhoffswelt.de slide: 16 / 26
  • 67. Once only subpattern Once only subpatterns prevent backtracking once a certain pattern has acquired a match. Applying a once only pattern to the shown example /(?>d+)foo/ After matching the numbers and determining the following string is not foo the matcher stops 123456789bar Can massively improve regex speed if used correctly http://westhoffswelt.de jakob@westhoffswelt.de slide: 16 / 26
  • 68. Once only subpattern Once only subpatterns prevent backtracking once a certain pattern has acquired a match. Applying a once only pattern to the shown example /(?>d+)foo/ After matching the numbers and determining the following string is not foo the matcher stops 123456789bar Can massively improve regex speed if used correctly http://westhoffswelt.de jakob@westhoffswelt.de slide: 16 / 26
  • 69. Once only subpattern Once only subpatterns prevent backtracking once a certain pattern has acquired a match. Applying a once only pattern to the shown example /(?>d+)foo/ After matching the numbers and determining the following string is not foo the matcher stops 123456789bar Can massively improve regex speed if used correctly http://westhoffswelt.de jakob@westhoffswelt.de slide: 16 / 26
  • 70. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 71. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 72. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 73. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 74. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 75. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 76. Conditional subpattern If statement aquivalent in PCRE /(?(condition)yes-pattern|no-pattern)/ Conditions can be direct matches or assertions Numbers need to be followed by foo, while everything else needs to be followed by bar /(?(d+)foo|bar)/ http://westhoffswelt.de jakob@westhoffswelt.de slide: 17 / 26
  • 77. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 78. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 79. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 80. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 81. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 82. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 83. Unicode: Character, code points and graphemes Unicode consists of different code points The letter a: U+0061 The mark ‘: U+0300 One character might consist of multiple code points The letter a with the mark ‘ (`) : U+0061 U+0300 a Some of these combinations exists as single code points The letter `: U+00E0 a http://westhoffswelt.de jakob@westhoffswelt.de slide: 18 / 26
  • 84. Unicode: Pattern matching Unicode processing is enabled using the u modifier PCRE works on UTF-8 encoded strings Each code point is handled as one character Match any unicode code point: x{FFFF} Remember the letter a with the mark ‘ (`) a /x{0061}x{0030}/U http://westhoffswelt.de jakob@westhoffswelt.de slide: 19 / 26
  • 85. Unicode: Pattern matching Unicode processing is enabled using the u modifier PCRE works on UTF-8 encoded strings Each code point is handled as one character Match any unicode code point: x{FFFF} Remember the letter a with the mark ‘ (`) a /x{0061}x{0030}/U http://westhoffswelt.de jakob@westhoffswelt.de slide: 19 / 26
  • 86. Unicode: Pattern matching Unicode processing is enabled using the u modifier PCRE works on UTF-8 encoded strings Each code point is handled as one character Match any unicode code point: x{FFFF} Remember the letter a with the mark ‘ (`) a /x{0061}x{0030}/U http://westhoffswelt.de jakob@westhoffswelt.de slide: 19 / 26
  • 87. Unicode: Pattern matching Unicode processing is enabled using the u modifier PCRE works on UTF-8 encoded strings Each code point is handled as one character Match any unicode code point: x{FFFF} Remember the letter a with the mark ‘ (`) a /x{0061}x{0030}/U http://westhoffswelt.de jakob@westhoffswelt.de slide: 19 / 26
  • 88. Unicode: Pattern matching Unicode processing is enabled using the u modifier PCRE works on UTF-8 encoded strings Each code point is handled as one character Match any unicode code point: x{FFFF} Remember the letter a with the mark ‘ (`) a /x{0061}x{0030}/U http://westhoffswelt.de jakob@westhoffswelt.de slide: 19 / 26
  • 89. Unicode: Pattern matching Unicode processing is enabled using the u modifier PCRE works on UTF-8 encoded strings Each code point is handled as one character Match any unicode code point: x{FFFF} Remember the letter a with the mark ‘ (`) a /x{0061}x{0030}/U http://westhoffswelt.de jakob@westhoffswelt.de slide: 19 / 26
  • 90. Unicode: Extended unicode sequences How to match the single and multi code point character? Remember: ` = U+0061 U+0300 oder U+00E0 a Using escape for extended unicode sequences: X X is aquivalent to (?>P{M}p{M}*) Wait. What? → Unicode character properties http://westhoffswelt.de jakob@westhoffswelt.de slide: 20 / 26
  • 91. Unicode: Extended unicode sequences How to match the single and multi code point character? Remember: ` = U+0061 U+0300 oder U+00E0 a Using escape for extended unicode sequences: X X is aquivalent to (?>P{M}p{M}*) Wait. What? → Unicode character properties http://westhoffswelt.de jakob@westhoffswelt.de slide: 20 / 26
  • 92. Unicode: Extended unicode sequences How to match the single and multi code point character? Remember: ` = U+0061 U+0300 oder U+00E0 a Using escape for extended unicode sequences: X X is aquivalent to (?>P{M}p{M}*) Wait. What? → Unicode character properties http://westhoffswelt.de jakob@westhoffswelt.de slide: 20 / 26
  • 93. Unicode: Extended unicode sequences How to match the single and multi code point character? Remember: ` = U+0061 U+0300 oder U+00E0 a Using escape for extended unicode sequences: X X is aquivalent to (?>P{M}p{M}*) Wait. What? → Unicode character properties http://westhoffswelt.de jakob@westhoffswelt.de slide: 20 / 26
  • 94. Unicode: Extended unicode sequences How to match the single and multi code point character? Remember: ` = U+0061 U+0300 oder U+00E0 a Using escape for extended unicode sequences: X X is aquivalent to (?>P{M}p{M}*) Wait. What? → Unicode character properties http://westhoffswelt.de jakob@westhoffswelt.de slide: 20 / 26
  • 95. Unicode: Character properties Every unicode code point has a certain property assigned Characters may be matched by these properties Escapes p and P are used for this: p{xx}: All code points with the property xx P{xx}: All code points without the property xx Possible properties: L: Letter M: Mark P: Punctation Sc: Currency symbol ... http://westhoffswelt.de jakob@westhoffswelt.de slide: 21 / 26
  • 96. Unicode: Character properties Every unicode code point has a certain property assigned Characters may be matched by these properties Escapes p and P are used for this: p{xx}: All code points with the property xx P{xx}: All code points without the property xx Possible properties: L: Letter M: Mark P: Punctation Sc: Currency symbol ... http://westhoffswelt.de jakob@westhoffswelt.de slide: 21 / 26
  • 97. Unicode: Character properties Every unicode code point has a certain property assigned Characters may be matched by these properties Escapes p and P are used for this: p{xx}: All code points with the property xx P{xx}: All code points without the property xx Possible properties: L: Letter M: Mark P: Punctation Sc: Currency symbol ... http://westhoffswelt.de jakob@westhoffswelt.de slide: 21 / 26
  • 98. Unicode: Character properties Every unicode code point has a certain property assigned Characters may be matched by these properties Escapes p and P are used for this: p{xx}: All code points with the property xx P{xx}: All code points without the property xx Possible properties: L: Letter M: Mark P: Punctation Sc: Currency symbol ... http://westhoffswelt.de jakob@westhoffswelt.de slide: 21 / 26
  • 99. Pattern Recursion Recursion in regular expressions ? Possible with PCRE Validate BB-Code using PCRE [b]Hello [i]World[/i]![/b] http://westhoffswelt.de jakob@westhoffswelt.de slide: 22 / 26
  • 100. Pattern Recursion Recursion in regular expressions ? Possible with PCRE Validate BB-Code using PCRE [b]Hello [i]World[/i]![/b] http://westhoffswelt.de jakob@westhoffswelt.de slide: 22 / 26
  • 101. Pattern Recursion Recursion in regular expressions ? Possible with PCRE Validate BB-Code using PCRE [b]Hello [i]World[/i]![/b] http://westhoffswelt.de jakob@westhoffswelt.de slide: 22 / 26
  • 102. BB-Code Recursion Example [b]Hello [i]World[/i]![/b] Recursive regular expression pattern ( [^[]* [(b|i)] (?:[^[]+|(?R)) [/1] [^[]* ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 23 / 26
  • 103. BB-Code Recursion Example [b]Hello [i]World[/i]![/b] Recursive regular expression pattern ( [^[]* [(b|i)] (?:[^[]+|(?R)) [/1] [^[]* ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 23 / 26
  • 104. BB-Code Recursion Example [b]Hello [i]World[/i]![/b] Recursive regular expression pattern ( [^[]* [(b|i)] (?:[^[]+|(?R)) [/1] [^[]* ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 23 / 26
  • 105. BB-Code Recursion Example [b]Hello [i]World[/i]![/b] Recursive regular expression pattern ( [^[]* [(b|i)] (?:[^[]+|(?R)) [/1] [^[]* ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 23 / 26
  • 106. BB-Code Recursion Example [b]Hello [i]World[/i]![/b] Recursive regular expression pattern ( [^[]* [(b|i)] (?:[^[]+|(?R)) [/1] [^[]* ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 23 / 26
  • 107. BB-Code Recursion Example [b]Hello [i]World[/i]![/b] Recursive regular expression pattern ( [^[]* [(b|i)] (?:[^[]+|(?R)) [/1] [^[]* ) http://westhoffswelt.de jakob@westhoffswelt.de slide: 23 / 26
  • 108. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 109. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 110. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 111. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 112. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 113. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 114. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 115. Do NOT Parse Using Regular Expressions Even though this is possible you do NOT want to do it It is not maintainable It is nearly impossible to find errors Useful information extraction (building an AST) is not possible Use regular expressions for Match Patterns (not recursive structures) Tokenizing strings Validate really restricted input values http://westhoffswelt.de jakob@westhoffswelt.de slide: 24 / 26
  • 116. Thanks for listening Questions, comments or annotations? Slides: http://westhoffswelt.de/portfolio.htm Contact: Jakob Westhoff <jakob@php.net> Twitter: @jakobwesthoff Please leave comments and vote at: http://joind.in/1620 http://westhoffswelt.de jakob@westhoffswelt.de slide: 25 / 26
  • 117. Bibliography I [1] Wikipedia. Regular expressions — wikipedia, the free encyclopedia, 2002. [Online; accessed 25-February-2002]. http://westhoffswelt.de jakob@westhoffswelt.de slide: 26 / 26