SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Regular Expressions
  How not to turn one problem into two.




                       Carl Brown
                       CarlB@PDAgent.com
“Common Wisdom”

   “Some people, when confronted
   with a problem, think ‘I know, I'll
   use regular expressions.’ Now
      they have two problems.”



*See http://regex.info/blog/2006-09-15/247 for source.
What is a ‘Regular
  Expression’?
“...a concise and flexible means for
‘matching’ (specifying and recognizing) strings
of text, such as particular characters, words,
or patterns of characters” (So says Wikipedia)
“... a way of extracting substrings from text in a
‘usefully fuzzy’ way” (So says me)
...so for example?

Pull out the host from a URL string:
  http://([^/]*)/
find the date in a string
  ([0-9][0-9]*[-/][0-9][0-9]*[-/][0-9][0-9]*)
But they’re a Pain to
        Use
       Aren’t they?
Two Kinds of (OOish)
    Languages
 Some languages, Like perl or ruby, have
 Regex build into their strings, so they get used
 often.
 Most others, like Cocoa, Java, Python have
 Regular Expression Objects, that are
 complicated and a Pain in the Ass
Ruby


string.sub(“pattern”,“replacement”)
Cocoa (Apple)

+[NSRegularExpression regularExpressionWithPattern:(NSString *)
pattern options:(NSRegularExpressionOptions)options error:(NSError
**) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *)
string options:(NSMatchingOptions)options range:(NSRange)range
withTemplate:(NSString *)template]
Cocoa (Apple)
+[NSRegularExpression regularExpressionWithPattern:(NSString *)
pattern options:(NSRegularExpressionOptions)options error:(NSError
**) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *)
string options:(NSMatchingOptions)options range:(NSRange)range
withTemplate:(NSString *)template]


                    NSRegularExpressionOptions?

                        NSMatchingOptions?

                      Why do I need a Range?

                      What’s a template string?
Cocoa (Apple)
+[NSRegularExpression regularExpressionWithPattern:(NSString *)
pattern options:(NSRegularExpressionOptions)options error:(NSError
**) error]

-[NSRegularExpression replaceMatchesInString:(NSMutableString *)
string options:(NSMatchingOptions)options range:(NSRange)range
withTemplate:(NSString *)template]


                    NSRegularExpressionOptions?

                        NSMatchingOptions?

                      Why do I need a Range?

                      What’s a template string?

                   Is it really worth it?
Cocoa (sane)


 #import "NSString+PDRegex.h"

 [string stringByReplacingRegexPattern:@"pattern"
 withString:@"replacement" caseInsensitive:NO];




*See https://github.com/carlbrown/RegexOnNSString/
Python
              (an aside)


import re

re.match(“pattern”,“a pattern”) #no match

re.search(“pattern”,“a pattern”) #matches fine
But Regex’s are
impossible to maintain...
         Aren’t they?
But what about?

(?<!(=)|(="")|(='))(((http|ftp|
https)://)|(www.))+[w]+(.[w]+)
([w-.@?^=%&amp;:/~+#]*[w-@?
^=%&amp;/~+#])?(?!.*/a>)
But what about?
(?<!(=)|(="")|(='))(((http|ftp|
https)://)|(www.))+[w]+(.[w]+)
([w-.@?^=%&amp;:/~+#]*[w-@?
^=%&amp;/~+#])?(?!.*/a>)




   *That* guy has two problems
But what about?
    (?<!(=)|(="")|(='))(((http|ftp|
    https)://)|(www.))+[w]+(.[w]+)
    ([w-.@?^=%&amp;:/~+#]*[w-@?
    ^=%&amp;/~+#])?(?!.*/a>)

        *That* guy has two problems

   Well, Actually, he has n! problems where,
n is the number of hyperlinks in the input string
How to keep that from
happening (my advice)
 Limit yourself to only the basic meta-
 characters.
 Favor clarity over brevity.
 Take more smaller bites.
 Beware of greedy matching
The Basic Characters
       A Phrasebook
PhraseBook pt 1
PhraseBook pt 1
^.*
 “the junk to the left of what I want”
 This breaks down as ^ (the beginning of the string)
 followed by .* any number of any character.
PhraseBook pt 1
^.*
 “the junk to the left of what I want”
 This breaks down as ^ (the beginning of the string)
 followed by .* any number of any character.
.*$
 “the junk to the right of what I want”
 This breaks down as any number of any character .*
 followed by $ (the end of the string)
PhraseBook pt 2
[0–9][0–9]*
 “a number with at least one digit”
 The brackets ([ and ]) mean “any of the characters contained
 within the brackets”. So this means 1 character of 0–9 (so 0 1 2
 3 4 5 6 7 8 or 9) followed by zero or more of the same character.
PhraseBook pt 2
[0–9][0–9]*
 “a number with at least one digit”
 The brackets ([ and ]) mean “any of the characters contained
 within the brackets”. So this means 1 character of 0–9 (so 0 1 2
 3 4 5 6 7 8 or 9) followed by zero or more of the same character.

[^A-Za-z]
 “any character that’s not a letter”
 The ^ as the first character inside the brackets means “not” so
 instead of meaning “any letter” it means “anything not a letter”.
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)

*
 “a literal * ” (e.g. to match an asterisk)
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)

*
 “a literal * ” (e.g. to match an asterisk)

( ) or [ ]
 “literal parenthesis/brackets” (in Cocoa, at least)
PhraseBook pt 3
.
 “a literal period” (e.g. to match the dot in .com)

*
 “a literal * ” (e.g. to match an asterisk)

( ) or [ ]
 “literal parenthesis/brackets” (in Cocoa, at least)

( …stuff… )
 “stuff I want to refer to later as $1” (in Cocoa, at least)
PhraseBook pt 4
PhraseBook pt 4
    There is no...


       Part 4
But what about?
* Cheat Sheet from http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
But what about?
There is no...

       Part 4

But what about?
Clarity > Brevity
 (Really true of any language)
Choose the clearest
      way:

[A-Za-z_] instead of w

[^A-Za-z_] instead of W
Choose the
consistent way:
Choose the
    consistent way:
OSX:~$ grep '^root::*' /etc/passwd

         root:*:0:0:System Administrator:/var/root:/bin/sh

OSX:~$ grep '^root:+' /etc/passwd

OSX:~$
Choose the
     consistent way:
OSX:~$ grep '^root::*' /etc/passwd

         root:*:0:0:System Administrator:/var/root:/bin/sh

OSX:~$ grep '^root:+' /etc/passwd

OSX:~$


OSX:~$ grep '^root:.*' /etc/passwd

root:*:0:0:System Administrator:/var/root:/bin/sh

OSX:~$ grep '^root:.*?' /etc/passwd

OSX:~$
Except when you
      can’t

      ([^/][^]*)/ => 1
http://                       (POSIX/sed)


      ([^/][^]*)/ => $1
http://                   (perl/cocoa)
Take Smaller Bites
The less you do at a time, the safer each step is
Which is clearer?

NSString *domainName = [myHTMLString
stringByReplacingRegexPattern:
@"^.*href=[”’]http://(.*)/.*$"
withString:@"$1" caseInsensitive:YES];
Which is clearer?
   NSString *leftRemoved = [myHTMLString
   stringByReplacingRegexPattern: @"^.*href=[‘“]"
   withString:@"" caseInsensitive:YES];

   NSString *myURL = [leftRemoved
   stringByReplacingRegexPattern: @"[“‘].*$" withString:@""
   caseInsensitive:NO];
   NSString *hostAndPath = [myURL
   stringByReplacingRegexPattern: @"^.*http://"
   withString:@"" caseInsensitive:YES];

   NSString *domainName = [hostAndPath
   stringByReplacingRegexPattern: @"/.*$" withString:@""
   caseInsensitive:NO];

Bonus: This one can be stepped through with the debugger :-)
But isn’t that slower?


 Yes.
But isn’t that slower?


 Yes.
 But it doesn’t matter how fast you get the
 wrong answer.
Beware Greedy
    Matching
Remember this?
 NSString *domainName = [myHTMLString
 stringByReplacingRegexPattern:
 @"^.*href=[”’]http://(.*)/.*$" withString:@"$1"
 caseInsensitive:YES];
Beware Greedy
     Matching
Remember this?
  NSString *domainName = [myHTMLString
  stringByReplacingRegexPattern:
  @"^.*href=[”’]http://(.*)/.*$" withString:@"$1"
  caseInsensitive:YES];

What does it do if given:
  <a href=“http://1.example.com/”>This is a link</
  a> but <a href=“http://2.example.com/”>This is a
  link, too.</a>
Beware Greedy
     Matching
Remember this?
  NSString *domainName = [myHTMLString
  stringByReplacingRegexPattern:
  @"^.*href=[”’]http://(.*)/.*$" withString:@"$1"
  caseInsensitive:YES];

What does it do if given:
  <a href=“http://1.example.com/”>This is a link</
  a> but <a href=“http://2.example.com/”>This is a
  link, too.</a>
What you meant was:

 After ‘http://’ up to but not including the next ‘/’
What you meant was:

 After ‘http://’ up to but not including the next ‘/’
 Which is:

   http://([^/][^/]*)/
Remember this?
    (?<!(=)|(="")|(='))(((http|ftp|
    https)://)|(www.))+[w]+(.[w]+)
    ([w-.@?^=%&amp;:/~+#]*[w-@?
    ^=%&amp;/~+#])?(?!.*/a>)



   Well, Actually, he has n! problems where,
n is the number of hyperlinks in the input string
So if you had
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And tried to use:
(?<!(=)|(="")|(='))(((http|ftp|
https)://)|(www.))+[w]+(.[w]+)
([w-.@?^=%&amp;:/~+#]*[w-@?
^=%&amp;/~+#])?(?!.*/a>)
It would have to:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And then:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
And so on:
<p>Today’s Links:</p>
<UL>
    <LI><A HREF=”http://example.com/1”>Link 1</A></LI>
    <LI><A HREF=”http://example.com/2”>Link 2</A></LI>
    <LI><A HREF=”http://example.com/3”>Link 3</A></LI>
    <LI><A HREF=”http://example.com/4”>Link 4</A></LI>
    <LI><A HREF=”http://example.com/5”>Link 5</A></LI>
    <LI><A HREF=”http://example.com/6”>Link 6</A></LI>
</UL>
But what are they
    good for?
Encoding/decoding metadata from image file
names.
But what are they
    good for?
Encoding/decoding metadata from image file
names.
Renaming files on the command line (@2x?)
But what are they
                 good for?
            Encoding/decoding metadata from image file
            names.
            Renaming files on the command line (@2x?)
            Grabbing the user’s first name from a Full
            Name string (careful of Locales*)




*See http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
But what are they
    good for?
Encoding/decoding metadata from image file
names.
Renaming files on the command line (@2x?)
Grabbing the user’s first name from a Full
Name string (careful of Locales)
Stripping crap I don’t want out of user input
(trailing spaces, anyone?)
But what are they
    good for?
Encoding/decoding metadata from image file
names.
Renaming files on the command line (@2x?)
Grabbing the user’s first name from a Full
Name string (careful of Locales)
Stripping crap I don’t want out of user input
(trailing spaces, anyone?)
//.*[.* *release *] *;
Questions?
      CarlB@PDAgent.com

        @CarlAllenBrown

 www.escortmissions.com (Blog)

  www.PDAgent.com (Company)

   https://github.com/carlbrown

http://www.slideshare.net/carlbrown

Weitere ähnliche Inhalte

Was ist angesagt?

Template Haskell Tutorial
Template Haskell TutorialTemplate Haskell Tutorial
Template Haskell Tutorial
kizzx2
 

Was ist angesagt? (20)

Python Workshop
Python  Workshop Python  Workshop
Python Workshop
 
Template Haskell
Template HaskellTemplate Haskell
Template Haskell
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
PHP - Introduction to String Handling
PHP -  Introduction to  String Handling PHP -  Introduction to  String Handling
PHP - Introduction to String Handling
 
DBIx::Class introduction - 2010
DBIx::Class introduction - 2010DBIx::Class introduction - 2010
DBIx::Class introduction - 2010
 
Learn python - for beginners - part-2
Learn python - for beginners - part-2Learn python - for beginners - part-2
Learn python - for beginners - part-2
 
Idiomatic Javascript (ES5 to ES2015+)
Idiomatic Javascript (ES5 to ES2015+)Idiomatic Javascript (ES5 to ES2015+)
Idiomatic Javascript (ES5 to ES2015+)
 
Exhibition of Atrocity
Exhibition of AtrocityExhibition of Atrocity
Exhibition of Atrocity
 
Template Haskell Tutorial
Template Haskell TutorialTemplate Haskell Tutorial
Template Haskell Tutorial
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?
 
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
Haste (Same Language, Multiple Platforms) and Tagless Final Style (Same Synta...
 
The bones of a nice Python script
The bones of a nice Python scriptThe bones of a nice Python script
The bones of a nice Python script
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
1 the ruby way
1   the ruby way1   the ruby way
1 the ruby way
 
groovy & grails - lecture 3
groovy & grails - lecture 3groovy & grails - lecture 3
groovy & grails - lecture 3
 
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
 
Design Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on ExamplesDesign Patterns - Compiler Case Study - Hands-on Examples
Design Patterns - Compiler Case Study - Hands-on Examples
 
Intermediate Perl
Intermediate PerlIntermediate Perl
Intermediate Perl
 
Functional Pe(a)rls version 2
Functional Pe(a)rls version 2Functional Pe(a)rls version 2
Functional Pe(a)rls version 2
 

Andere mochten auch

36.easy french phrase book
36.easy french phrase book36.easy french phrase book
36.easy french phrase book
Hằng Đào
 
Vocabulary Lists for the SAT & Academic Sucess
Vocabulary Lists for the SAT & Academic SucessVocabulary Lists for the SAT & Academic Sucess
Vocabulary Lists for the SAT & Academic Sucess
Ryan Frank
 
Spoken english
Spoken englishSpoken english
Spoken english
mistimanas
 
Pet and Ket explanation for students in Year 6
Pet and Ket explanation for students in Year 6Pet and Ket explanation for students in Year 6
Pet and Ket explanation for students in Year 6
LuciaAbalos
 
Speaking reference
Speaking referenceSpeaking reference
Speaking reference
Gema Jl
 
British English & American English
British English & American EnglishBritish English & American English
British English & American English
tracy_su
 

Andere mochten auch (20)

Translation of The Noble Quran In The Farsi / Persian Language
Translation of The Noble Quran In The Farsi / Persian LanguageTranslation of The Noble Quran In The Farsi / Persian Language
Translation of The Noble Quran In The Farsi / Persian Language
 
36.easy french phrase book
36.easy french phrase book36.easy french phrase book
36.easy french phrase book
 
NAPS 2016 Jimmy Mello - Speak in 90 Days: Speaking, Fluency and Proficiency
NAPS 2016 Jimmy Mello - Speak in 90 Days: Speaking, Fluency and ProficiencyNAPS 2016 Jimmy Mello - Speak in 90 Days: Speaking, Fluency and Proficiency
NAPS 2016 Jimmy Mello - Speak in 90 Days: Speaking, Fluency and Proficiency
 
Vocabulary Lists for the SAT & Academic Sucess
Vocabulary Lists for the SAT & Academic SucessVocabulary Lists for the SAT & Academic Sucess
Vocabulary Lists for the SAT & Academic Sucess
 
Spoken english
Spoken englishSpoken english
Spoken english
 
Mini Talks Phrasebook
Mini Talks PhrasebookMini Talks Phrasebook
Mini Talks Phrasebook
 
Fifteen thousand useful phrases
Fifteen thousand useful phrasesFifteen thousand useful phrases
Fifteen thousand useful phrases
 
Barrons wordlist
Barrons wordlist   Barrons wordlist
Barrons wordlist
 
Pet and Ket explanation for students in Year 6
Pet and Ket explanation for students in Year 6Pet and Ket explanation for students in Year 6
Pet and Ket explanation for students in Year 6
 
Eko phrases
Eko phrasesEko phrases
Eko phrases
 
Berlitz Tip - Telephoning in English
Berlitz Tip  - Telephoning in EnglishBerlitz Tip  - Telephoning in English
Berlitz Tip - Telephoning in English
 
Your French Phrasebook
Your French PhrasebookYour French Phrasebook
Your French Phrasebook
 
Travel Smarter : Tips Before You Go
Travel Smarter : Tips Before You GoTravel Smarter : Tips Before You Go
Travel Smarter : Tips Before You Go
 
A Guide for Training Public Dialogue Facilitators
A Guide for Training Public Dialogue FacilitatorsA Guide for Training Public Dialogue Facilitators
A Guide for Training Public Dialogue Facilitators
 
7 bg ru-en basic grammar phrasebook
7 bg ru-en basic grammar phrasebook7 bg ru-en basic grammar phrasebook
7 bg ru-en basic grammar phrasebook
 
Useful phrase for presentation
Useful phrase for presentationUseful phrase for presentation
Useful phrase for presentation
 
Speaking reference
Speaking referenceSpeaking reference
Speaking reference
 
Besig workshop
Besig workshopBesig workshop
Besig workshop
 
Dialogue in the Classroom
Dialogue in the ClassroomDialogue in the Classroom
Dialogue in the Classroom
 
British English & American English
British English & American EnglishBritish English & American English
British English & American English
 

Ähnlich wie Using Regular Expressions and Staying Sane

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
Sway Wang
 
Introduction to Perl - Day 2
Introduction to Perl - Day 2Introduction to Perl - Day 2
Introduction to Perl - Day 2
Dave Cross
 
AST Transformations
AST TransformationsAST Transformations
AST Transformations
HamletDRC
 

Ähnlich wie Using Regular Expressions and Staying Sane (20)

Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
Play á la Rails
Play á la RailsPlay á la Rails
Play á la Rails
 
My First Rails Plugin - Usertext
My First Rails Plugin - UsertextMy First Rails Plugin - Usertext
My First Rails Plugin - Usertext
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Real life-coffeescript
Real life-coffeescriptReal life-coffeescript
Real life-coffeescript
 
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009
 
Unfiltered Unveiled
Unfiltered UnveiledUnfiltered Unveiled
Unfiltered Unveiled
 
C# 6 and 7 and Futures 20180607
C# 6 and 7 and Futures 20180607C# 6 and 7 and Futures 20180607
C# 6 and 7 and Futures 20180607
 
Lettering js
Lettering jsLettering js
Lettering js
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Modernizes your objective C - Oliviero
Modernizes your objective C - OlivieroModernizes your objective C - Oliviero
Modernizes your objective C - Oliviero
 
Introduction to Perl - Day 2
Introduction to Perl - Day 2Introduction to Perl - Day 2
Introduction to Perl - Day 2
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Interview C++11 code
Interview C++11 codeInterview C++11 code
Interview C++11 code
 
2007 09 10 Fzi Training Groovy Grails V Ws
2007 09 10 Fzi Training Groovy Grails V Ws2007 09 10 Fzi Training Groovy Grails V Ws
2007 09 10 Fzi Training Groovy Grails V Ws
 
AST Transformations
AST TransformationsAST Transformations
AST Transformations
 
Practical JavaScript Programming - Session 6/8
Practical JavaScript Programming - Session 6/8Practical JavaScript Programming - Session 6/8
Practical JavaScript Programming - Session 6/8
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular Expressions
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 

Mehr von Carl Brown

Cocoa coders 141113-watch
Cocoa coders 141113-watchCocoa coders 141113-watch
Cocoa coders 141113-watch
Carl Brown
 

Mehr von Carl Brown (20)

GDPR, User Data, Privacy, and Your Apps
GDPR, User Data, Privacy, and Your AppsGDPR, User Data, Privacy, and Your Apps
GDPR, User Data, Privacy, and Your Apps
 
New in iOS 11.3b4 and Xcode 9.3b4
New in iOS 11.3b4 and Xcode 9.3b4New in iOS 11.3b4 and Xcode 9.3b4
New in iOS 11.3b4 and Xcode 9.3b4
 
Managing Memory in Swift (Yes, that's a thing)
Managing Memory in Swift (Yes, that's a thing)Managing Memory in Swift (Yes, that's a thing)
Managing Memory in Swift (Yes, that's a thing)
 
Better Swift from the Foundation up #tryswiftnyc17 09-06
Better Swift from the Foundation up #tryswiftnyc17 09-06Better Swift from the Foundation up #tryswiftnyc17 09-06
Better Swift from the Foundation up #tryswiftnyc17 09-06
 
Generics, the Swift ABI and you
Generics, the Swift ABI and youGenerics, the Swift ABI and you
Generics, the Swift ABI and you
 
Swift GUI Development without Xcode
Swift GUI Development without XcodeSwift GUI Development without Xcode
Swift GUI Development without Xcode
 
what's new in iOS10 2016-06-23
what's new in iOS10 2016-06-23what's new in iOS10 2016-06-23
what's new in iOS10 2016-06-23
 
Open Source Swift: Up and Running
Open Source Swift: Up and RunningOpen Source Swift: Up and Running
Open Source Swift: Up and Running
 
Parse migration CocoaCoders April 28th, 2016
Parse migration CocoaCoders April 28th, 2016Parse migration CocoaCoders April 28th, 2016
Parse migration CocoaCoders April 28th, 2016
 
Swift 2.2 Design Patterns CocoaConf Austin 2016
Swift 2.2 Design Patterns CocoaConf Austin 2016Swift 2.2 Design Patterns CocoaConf Austin 2016
Swift 2.2 Design Patterns CocoaConf Austin 2016
 
Advanced, Composable Collection Views, From CocoaCoders meetup Austin Feb 12,...
Advanced, Composable Collection Views, From CocoaCoders meetup Austin Feb 12,...Advanced, Composable Collection Views, From CocoaCoders meetup Austin Feb 12,...
Advanced, Composable Collection Views, From CocoaCoders meetup Austin Feb 12,...
 
Gcd cc-150205
Gcd cc-150205Gcd cc-150205
Gcd cc-150205
 
Cocoa coders 141113-watch
Cocoa coders 141113-watchCocoa coders 141113-watch
Cocoa coders 141113-watch
 
iOS8 and the new App Store
iOS8 and the new App Store   iOS8 and the new App Store
iOS8 and the new App Store
 
Dark Art of Software Estimation 360iDev2014
Dark Art of Software Estimation 360iDev2014Dark Art of Software Estimation 360iDev2014
Dark Art of Software Estimation 360iDev2014
 
Intro to cloud kit Cocoader.org 24 July 2014
Intro to cloud kit   Cocoader.org 24 July 2014Intro to cloud kit   Cocoader.org 24 July 2014
Intro to cloud kit Cocoader.org 24 July 2014
 
Welcome to Swift (CocoaCoder 6/12/14)
Welcome to Swift (CocoaCoder 6/12/14)Welcome to Swift (CocoaCoder 6/12/14)
Welcome to Swift (CocoaCoder 6/12/14)
 
Writing Apps that Can See: Getting Data from CoreImage to Computer Vision - ...
Writing Apps that Can See: Getting Data from CoreImage to Computer  Vision - ...Writing Apps that Can See: Getting Data from CoreImage to Computer  Vision - ...
Writing Apps that Can See: Getting Data from CoreImage to Computer Vision - ...
 
Introduction to Git Commands and Concepts
Introduction to Git Commands and ConceptsIntroduction to Git Commands and Concepts
Introduction to Git Commands and Concepts
 
REST/JSON/CoreData Example Code - A Tour
REST/JSON/CoreData Example Code - A TourREST/JSON/CoreData Example Code - A Tour
REST/JSON/CoreData Example Code - A Tour
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Using Regular Expressions and Staying Sane

  • 1. Regular Expressions How not to turn one problem into two. Carl Brown CarlB@PDAgent.com
  • 2. “Common Wisdom” “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” *See http://regex.info/blog/2006-09-15/247 for source.
  • 3. What is a ‘Regular Expression’? “...a concise and flexible means for ‘matching’ (specifying and recognizing) strings of text, such as particular characters, words, or patterns of characters” (So says Wikipedia) “... a way of extracting substrings from text in a ‘usefully fuzzy’ way” (So says me)
  • 4. ...so for example? Pull out the host from a URL string: http://([^/]*)/ find the date in a string ([0-9][0-9]*[-/][0-9][0-9]*[-/][0-9][0-9]*)
  • 5. But they’re a Pain to Use Aren’t they?
  • 6. Two Kinds of (OOish) Languages Some languages, Like perl or ruby, have Regex build into their strings, so they get used often. Most others, like Cocoa, Java, Python have Regular Expression Objects, that are complicated and a Pain in the Ass
  • 8. Cocoa (Apple) +[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern options:(NSRegularExpressionOptions)options error:(NSError **) error] -[NSRegularExpression replaceMatchesInString:(NSMutableString *) string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)template]
  • 9. Cocoa (Apple) +[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern options:(NSRegularExpressionOptions)options error:(NSError **) error] -[NSRegularExpression replaceMatchesInString:(NSMutableString *) string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)template] NSRegularExpressionOptions? NSMatchingOptions? Why do I need a Range? What’s a template string?
  • 10. Cocoa (Apple) +[NSRegularExpression regularExpressionWithPattern:(NSString *) pattern options:(NSRegularExpressionOptions)options error:(NSError **) error] -[NSRegularExpression replaceMatchesInString:(NSMutableString *) string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)template] NSRegularExpressionOptions? NSMatchingOptions? Why do I need a Range? What’s a template string? Is it really worth it?
  • 11. Cocoa (sane) #import "NSString+PDRegex.h" [string stringByReplacingRegexPattern:@"pattern" withString:@"replacement" caseInsensitive:NO]; *See https://github.com/carlbrown/RegexOnNSString/
  • 12. Python (an aside) import re re.match(“pattern”,“a pattern”) #no match re.search(“pattern”,“a pattern”) #matches fine
  • 13. But Regex’s are impossible to maintain... Aren’t they?
  • 16. But what about? (?<!(=)|(="")|(='))(((http|ftp| https)://)|(www.))+[w]+(.[w]+) ([w-.@?^=%&amp;:/~+#]*[w-@? ^=%&amp;/~+#])?(?!.*/a>) *That* guy has two problems Well, Actually, he has n! problems where, n is the number of hyperlinks in the input string
  • 17. How to keep that from happening (my advice) Limit yourself to only the basic meta- characters. Favor clarity over brevity. Take more smaller bites. Beware of greedy matching
  • 18. The Basic Characters A Phrasebook
  • 20. PhraseBook pt 1 ^.* “the junk to the left of what I want” This breaks down as ^ (the beginning of the string) followed by .* any number of any character.
  • 21. PhraseBook pt 1 ^.* “the junk to the left of what I want” This breaks down as ^ (the beginning of the string) followed by .* any number of any character. .*$ “the junk to the right of what I want” This breaks down as any number of any character .* followed by $ (the end of the string)
  • 22. PhraseBook pt 2 [0–9][0–9]* “a number with at least one digit” The brackets ([ and ]) mean “any of the characters contained within the brackets”. So this means 1 character of 0–9 (so 0 1 2 3 4 5 6 7 8 or 9) followed by zero or more of the same character.
  • 23. PhraseBook pt 2 [0–9][0–9]* “a number with at least one digit” The brackets ([ and ]) mean “any of the characters contained within the brackets”. So this means 1 character of 0–9 (so 0 1 2 3 4 5 6 7 8 or 9) followed by zero or more of the same character. [^A-Za-z] “any character that’s not a letter” The ^ as the first character inside the brackets means “not” so instead of meaning “any letter” it means “anything not a letter”.
  • 24. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com)
  • 25. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com) * “a literal * ” (e.g. to match an asterisk)
  • 26. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com) * “a literal * ” (e.g. to match an asterisk) ( ) or [ ] “literal parenthesis/brackets” (in Cocoa, at least)
  • 27. PhraseBook pt 3 . “a literal period” (e.g. to match the dot in .com) * “a literal * ” (e.g. to match an asterisk) ( ) or [ ] “literal parenthesis/brackets” (in Cocoa, at least) ( …stuff… ) “stuff I want to refer to later as $1” (in Cocoa, at least)
  • 29. PhraseBook pt 4 There is no... Part 4
  • 30. But what about? * Cheat Sheet from http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/
  • 32. There is no... Part 4 But what about?
  • 33. Clarity > Brevity (Really true of any language)
  • 34. Choose the clearest way: [A-Za-z_] instead of w [^A-Za-z_] instead of W
  • 36. Choose the consistent way: OSX:~$ grep '^root::*' /etc/passwd root:*:0:0:System Administrator:/var/root:/bin/sh OSX:~$ grep '^root:+' /etc/passwd OSX:~$
  • 37. Choose the consistent way: OSX:~$ grep '^root::*' /etc/passwd root:*:0:0:System Administrator:/var/root:/bin/sh OSX:~$ grep '^root:+' /etc/passwd OSX:~$ OSX:~$ grep '^root:.*' /etc/passwd root:*:0:0:System Administrator:/var/root:/bin/sh OSX:~$ grep '^root:.*?' /etc/passwd OSX:~$
  • 38. Except when you can’t ([^/][^]*)/ => 1 http:// (POSIX/sed) ([^/][^]*)/ => $1 http:// (perl/cocoa)
  • 39. Take Smaller Bites The less you do at a time, the safer each step is
  • 40. Which is clearer? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES];
  • 41. Which is clearer? NSString *leftRemoved = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[‘“]" withString:@"" caseInsensitive:YES]; NSString *myURL = [leftRemoved stringByReplacingRegexPattern: @"[“‘].*$" withString:@"" caseInsensitive:NO]; NSString *hostAndPath = [myURL stringByReplacingRegexPattern: @"^.*http://" withString:@"" caseInsensitive:YES]; NSString *domainName = [hostAndPath stringByReplacingRegexPattern: @"/.*$" withString:@"" caseInsensitive:NO]; Bonus: This one can be stepped through with the debugger :-)
  • 42. But isn’t that slower? Yes.
  • 43. But isn’t that slower? Yes. But it doesn’t matter how fast you get the wrong answer.
  • 44. Beware Greedy Matching Remember this? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES];
  • 45. Beware Greedy Matching Remember this? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES]; What does it do if given: <a href=“http://1.example.com/”>This is a link</ a> but <a href=“http://2.example.com/”>This is a link, too.</a>
  • 46. Beware Greedy Matching Remember this? NSString *domainName = [myHTMLString stringByReplacingRegexPattern: @"^.*href=[”’]http://(.*)/.*$" withString:@"$1" caseInsensitive:YES]; What does it do if given: <a href=“http://1.example.com/”>This is a link</ a> but <a href=“http://2.example.com/”>This is a link, too.</a>
  • 47. What you meant was: After ‘http://’ up to but not including the next ‘/’
  • 48. What you meant was: After ‘http://’ up to but not including the next ‘/’ Which is: http://([^/][^/]*)/
  • 49. Remember this? (?<!(=)|(="")|(='))(((http|ftp| https)://)|(www.))+[w]+(.[w]+) ([w-.@?^=%&amp;:/~+#]*[w-@? ^=%&amp;/~+#])?(?!.*/a>) Well, Actually, he has n! problems where, n is the number of hyperlinks in the input string
  • 50. So if you had <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 51. And tried to use: (?<!(=)|(="")|(='))(((http|ftp| https)://)|(www.))+[w]+(.[w]+) ([w-.@?^=%&amp;:/~+#]*[w-@? ^=%&amp;/~+#])?(?!.*/a>)
  • 52. It would have to: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 53. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 54. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 55. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 56. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 57. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 58. And then: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 59. And so on: <p>Today’s Links:</p> <UL> <LI><A HREF=”http://example.com/1”>Link 1</A></LI> <LI><A HREF=”http://example.com/2”>Link 2</A></LI> <LI><A HREF=”http://example.com/3”>Link 3</A></LI> <LI><A HREF=”http://example.com/4”>Link 4</A></LI> <LI><A HREF=”http://example.com/5”>Link 5</A></LI> <LI><A HREF=”http://example.com/6”>Link 6</A></LI> </UL>
  • 60. But what are they good for? Encoding/decoding metadata from image file names.
  • 61. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?)
  • 62. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?) Grabbing the user’s first name from a Full Name string (careful of Locales*) *See http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
  • 63. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?) Grabbing the user’s first name from a Full Name string (careful of Locales) Stripping crap I don’t want out of user input (trailing spaces, anyone?)
  • 64. But what are they good for? Encoding/decoding metadata from image file names. Renaming files on the command line (@2x?) Grabbing the user’s first name from a Full Name string (careful of Locales) Stripping crap I don’t want out of user input (trailing spaces, anyone?) //.*[.* *release *] *;
  • 65. Questions? CarlB@PDAgent.com @CarlAllenBrown www.escortmissions.com (Blog) www.PDAgent.com (Company) https://github.com/carlbrown http://www.slideshare.net/carlbrown

Hinweis der Redaktion

  1. This is not a talk about every possible thing you can do with regular expressions. In fact, it&amp;#x2019;s exactly the opposite. This is about how to do a useful thing and do it without going crazy.\n
  2. \n
  3. So before I get too far, how many of you know what a regular expression is?\nHow many have used them before? How many feel comfortable with them?\n
  4. So here&amp;#x2019;s a quick example, just so those of you who haven&amp;#x2019;t touched them have an idea what I&amp;#x2019;m talking about between now and when we dig into examples later on.\n
  5. Well, it depends...you see...\n
  6. I&amp;#x2019;m saying OOish because I have issues with perl&amp;#x2019;s OO, but that&amp;#x2019;s another talk.\nI went from Basic to Pascal to C to perl (to C++ to Lisp to Java to Ruby to Objective-C). I started learning perl in 1989 or so, and it was exactly what I needed at the time - it was a language that was really good at exactly what C made very painful: String handling. I have better alternatives than perl now, but it taught me regex&amp;#x2019;s.\n
  7. This is an example of a usage in a language where a Regex is a first-class citizen.\n
  8. This is a WTF. And it brings to mind a bunch of questions...\n
  9. and the most often asked question in Cocoa Regex...\n
  10. \n
  11. This is better (but you have to do the #import).\n
  12. re.match in python implicitly anchors you to the beginning of a string. This is hideous.\n
  13. Well, I&amp;#x2019;d say no. I use them all the time.\n
  14. This is a actual regex I found in a program I was once asked to find the performance problem in.\n
  15. This is unmaintainable, and worse...\n
  16. We&amp;#x2019;ll come back to this one later\n
  17. \n
  18. \n
  19. Let me do a quick phrasebook first.\n
  20. Let me do a quick phrasebook first.\n
  21. You can (and should) put whatever characters you are looking for in square brackets. \nIf you omit the first [0&amp;#x2013;9] you might match nothing.\n\nLikewise, in the second part [^0-9] means &amp;#x201C;anything that isn&amp;#x2019;t a number&amp;#x201D;.\n
  22. You can (and should) put whatever characters you are looking for in square brackets. \nIf you omit the first [0&amp;#x2013;9] you might match nothing.\n\nLikewise, in the second part [^0-9] means &amp;#x201C;anything that isn&amp;#x2019;t a number&amp;#x201D;.\n
  23. Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  24. Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  25. Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  26. Anything else that you see that&amp;#x2019;s special (like &amp;#x2018;^&amp;#x2019; or &amp;#x2018;\\\\&amp;#x2019;) gets matched with a &amp;#x2018;\\&amp;#x2019; in front of it, too.\n
  27. I mean it, I&amp;#x2019;m done.\n
  28. But there&amp;#x2019;s all these other characters...\n
  29. \n
  30. \n
  31. can you tell the difference between &amp;#x2018;w&amp;#x2019; and &amp;#x2018;W&amp;#x2019; every time, without looking?\n\nCan you promise you&amp;#x2019;ll never get confused about whether &amp;#x2018;w&amp;#x2019; means &amp;#x2018;word&amp;#x2019; or &amp;#x2018;whitespace&amp;#x2019;?\n
  32. Maximize the utility of your investment \nThere is a &amp;#x2018;+&amp;#x2019; operator that *Sometimes* means &amp;#x201C;one or more&amp;#x201D; like ::*. + works in Cocoa, but not in grep. If you stick to the ones that are the same everywhere, you will get more use out of it and be less confused\nSame with .*? to handle greedy matching\n
  33. Maximize the utility of your investment \nThere is a &amp;#x2018;+&amp;#x2019; operator that *Sometimes* means &amp;#x201C;one or more&amp;#x201D; like ::*. + works in Cocoa, but not in grep. If you stick to the ones that are the same everywhere, you will get more use out of it and be less confused\nSame with .*? to handle greedy matching\n
  34. \n
  35. \n
  36. \n
  37. Note - regex&amp;#x2019;s don&amp;#x2019;t parse HTML/XML &amp;#x201C;correctly&amp;#x201D; so be careful\n
  38. \n
  39. \n
  40. You get the HTML between the links, don&amp;#x2019;t you?\n
  41. You get the HTML between the links, don&amp;#x2019;t you?\n
  42. You get the HTML between the links, don&amp;#x2019;t you?\n
  43. Although you can use .*? at least on some platforms\n
  44. Although you can use .*? at least on some platforms\n
  45. This code was used in production on a project I was asked to consult on in a Content Management System (of sorts) to detect links that should be clickable on a web page, but weren&amp;#x2019;t, and make them clickable.\n
  46. And the customer fed that Content Management System a big list of links\n
  47. note it&amp;#x2019;s looking at http followed by :// followed by stuff, then anything, then /A.\n
  48. The regex library grabs the longest string it can, first, to see if that&amp;#x2019;s a match (because it&amp;#x2019;s supposed to be greedy)\n
  49. then, when that doesn&amp;#x2019;t match, the next longest string\n
  50. and so on\n
  51. \n
  52. \n
  53. and then, when it&amp;#x2019;s exhausted the shortest string for that beginning match,\n
  54. It does it again for the next beginning match it finds\n
  55. and so on there.\n\nBAD IDEA.\n
  56. When I&amp;#x2019;m doing Core Data on the iPhone, the images go in a directory (NEVER in the DB!!), and I put info I might need (like when I should refresh it) in the image name, so I can do maintenance without having to ask the DB.\n
  57. \n
  58. \n
  59. And coming up next, my current favorite to use in XCode&amp;#x2019;s search project box...\n
  60. Which, of course, means the price just went up 25%.\n\nOnce you get comfortable with them, you start to see chances to use them everywhere.\n
  61. \n