SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
hacker 102
 code4lib 2010 preconference
Asheville, NC, USA 2010-02-21
iv. regular expressions

      JavaScript
if all language
      looked like
“aabaaaabbbabaababa”
         it’d be
    easy to parse
parsing
“aabaaaabbbabaababa”
  •   there are two
      elements, “a” and “b”
  •   either may occur in
      any order
  •   /([ab]+)/
• [] denotes “elements” or “class”
• // demarcates regex
• + denotes “one or more of previous thing”
• () denotes “remember this matched group”
• /[ab]/ # an ‘a’ or a ‘b’
• /[ab]+/ # one or more ‘a’s or ‘b’s
• /([ab]+)/ # a group of one or more ‘a’s or ‘b’s
to firebug!
• [a-z] is any lower case char bet. a-z
• [0-9] is any digit
• + is one or more of previous thing
• ? is zero or one of previous thing
• | is or, e.g. [a|b] is ‘a’ or ‘b’
• * is zero to many of previous thing
• . matches any character
• [^a-z] is anything *but* [a-z]
• [a-zA-Z0-9] is any of a-z, A-Z, 0-9
• {5} matches only 5 of the preceding thing
• {2,} matches at least 2 of the preceding thing
• {2,6} matches from 2 to 6 of preceding thing
• [d] is like [0-9] (any digit)
• [S] is any non-whitespace
try this

• visit any web page
• open firebug console
• title = window.document.title
• try regexes to match parts of
  the title
most every language
 has regex support
try unix “grep”
v. glue it together

     Python
problem: Carol’s data
TITLE: ABA journal.
BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008)
CURRENT VOL.: Vol. 95 (2009) -
OTHER LIBRARIES:
      Miami:v. 68 (1982) -
      USDC: v. 88 (2002) -
      Birm.:v. 89 (2003) -
(Formerly: American Bar Association Journal)
(Bound and on Hein)


TITLE: Administrative law review.
BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60
(2008)
CURRENT VOL.: Vol. 61 (2009) -
(Bound and on Hein)
starter code
   for you
#!/usr/bin/env python
import re
re_tag = re.compile(r'([A-Z .]+):')
re_title = re.compile('TITLE: (.*)')
for line in open('journals-carol-bean.txt'):
    line = line.strip()
    m1 = re_tag.match(line)
    m2 = re_title.match(line)
    if line == "":
        continue
    print "n->", line, "<-"
    if m1 or m2:
        print "MATCH"
    if m1:
        print 'tag:', m1.groups()
    if m2:
        print 'title:', m2.groups()

Weitere ähnliche Inhalte

Andere mochten auch

web archiving tools and technologies
web archiving tools and technologiesweb archiving tools and technologies
web archiving tools and technologiesDan Chudnov
 
Hacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, PythonHacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, PythonDan Chudnov
 
introduction to Django in five slides
introduction to Django in five slides introduction to Django in five slides
introduction to Django in five slides Dan Chudnov
 
collecting twitter data w/social feed manager
collecting twitter data w/social feed managercollecting twitter data w/social feed manager
collecting twitter data w/social feed managerDan Chudnov
 
think locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkthink locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkDan Chudnov
 
Repository Development at LC - Access 2009
Repository Development at LC - Access 2009Repository Development at LC - Access 2009
Repository Development at LC - Access 2009Dan Chudnov
 
TCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingTCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingDan Chudnov
 
what i want from linked data
what i want from linked datawhat i want from linked data
what i want from linked dataDan Chudnov
 
CRM: A Business Imperative for Companies during the Global Economic Downturn
CRM: A Business Imperative for Companies during the Global Economic DownturnCRM: A Business Imperative for Companies during the Global Economic Downturn
CRM: A Business Imperative for Companies during the Global Economic DownturnNavik Numsiang
 
WWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service MediumWWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service MediumDan Chudnov
 
Biodiversity Conservation in the Production Forests of Indonesia
Biodiversity Conservation in the Production Forests of IndonesiaBiodiversity Conservation in the Production Forests of Indonesia
Biodiversity Conservation in the Production Forests of IndonesiaGPFLR
 
Overview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research LabOverview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research LabDan Chudnov
 
Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerCapturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerDan Chudnov
 
Experience Gedepahala Corridor Programme
Experience Gedepahala Corridor ProgrammeExperience Gedepahala Corridor Programme
Experience Gedepahala Corridor ProgrammeGPFLR
 

Andere mochten auch (14)

web archiving tools and technologies
web archiving tools and technologiesweb archiving tools and technologies
web archiving tools and technologies
 
Hacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, PythonHacker 102 - regexes w/Javascript, Python
Hacker 102 - regexes w/Javascript, Python
 
introduction to Django in five slides
introduction to Django in five slides introduction to Django in five slides
introduction to Django in five slides
 
collecting twitter data w/social feed manager
collecting twitter data w/social feed managercollecting twitter data w/social feed manager
collecting twitter data w/social feed manager
 
think locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talkthink locally, code globally - dchud's code4lib japan 2013 talk
think locally, code globally - dchud's code4lib japan 2013 talk
 
Repository Development at LC - Access 2009
Repository Development at LC - Access 2009Repository Development at LC - Access 2009
Repository Development at LC - Access 2009
 
TCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingTCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linking
 
what i want from linked data
what i want from linked datawhat i want from linked data
what i want from linked data
 
CRM: A Business Imperative for Companies during the Global Economic Downturn
CRM: A Business Imperative for Companies during the Global Economic DownturnCRM: A Business Imperative for Companies during the Global Economic Downturn
CRM: A Business Imperative for Companies during the Global Economic Downturn
 
WWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service MediumWWIC - Library Linked Data as a Customer Service Medium
WWIC - Library Linked Data as a Customer Service Medium
 
Biodiversity Conservation in the Production Forests of Indonesia
Biodiversity Conservation in the Production Forests of IndonesiaBiodiversity Conservation in the Production Forests of Indonesia
Biodiversity Conservation in the Production Forests of Indonesia
 
Overview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research LabOverview of Adaptive Blocking for DDL Research Lab
Overview of Adaptive Blocking for DDL Research Lab
 
Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerCapturing the Ephemeral: Collecting Social Media with Social Feed Manager
Capturing the Ephemeral: Collecting Social Media with Social Feed Manager
 
Experience Gedepahala Corridor Programme
Experience Gedepahala Corridor ProgrammeExperience Gedepahala Corridor Programme
Experience Gedepahala Corridor Programme
 

Ähnlich wie Hacker102 - RegExes w/JavaScript and Python

Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsJames Baker
 
Scala in practice - 3 years later
Scala in practice - 3 years laterScala in practice - 3 years later
Scala in practice - 3 years laterpatforna
 
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks
 
Python advanced 2. regular expression in python
Python advanced 2. regular expression in pythonPython advanced 2. regular expression in python
Python advanced 2. regular expression in pythonJohn(Qiang) Zhang
 
Testing stateful, concurrent, and async systems using test.check
Testing stateful, concurrent, and async systems using test.checkTesting stateful, concurrent, and async systems using test.check
Testing stateful, concurrent, and async systems using test.checkEric Normand
 
Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Henry S
 
From Ruby to Scala
From Ruby to ScalaFrom Ruby to Scala
From Ruby to Scalatod esking
 
Perl Intro 3 Datalog Parsing
Perl Intro 3 Datalog ParsingPerl Intro 3 Datalog Parsing
Perl Intro 3 Datalog ParsingShaun Griffith
 
shellScriptAlt.pptx
shellScriptAlt.pptxshellScriptAlt.pptx
shellScriptAlt.pptxNiladriDey18
 
2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekingeProf. Wim Van Criekinge
 
Compass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperCompass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperWynn Netherland
 
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Andrea Telatin
 
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...DevClub_lv
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 

Ähnlich wie Hacker102 - RegExes w/JavaScript and Python (20)

Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: Basics
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
PHP - Introduction to PHP
PHP -  Introduction to PHPPHP -  Introduction to PHP
PHP - Introduction to PHP
 
Scala in practice - 3 years later
Scala in practice - 3 years laterScala in practice - 3 years later
Scala in practice - 3 years later
 
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...
 
Python advanced 2. regular expression in python
Python advanced 2. regular expression in pythonPython advanced 2. regular expression in python
Python advanced 2. regular expression in python
 
Intro to Perl and Bioperl
Intro to Perl and BioperlIntro to Perl and Bioperl
Intro to Perl and Bioperl
 
Testing stateful, concurrent, and async systems using test.check
Testing stateful, concurrent, and async systems using test.checkTesting stateful, concurrent, and async systems using test.check
Testing stateful, concurrent, and async systems using test.check
 
Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2
 
From Ruby to Scala
From Ruby to ScalaFrom Ruby to Scala
From Ruby to Scala
 
Perl Intro 3 Datalog Parsing
Perl Intro 3 Datalog ParsingPerl Intro 3 Datalog Parsing
Perl Intro 3 Datalog Parsing
 
shellScriptAlt.pptx
shellScriptAlt.pptxshellScriptAlt.pptx
shellScriptAlt.pptx
 
2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge
 
Compass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperCompass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS Developer
 
Introduction to Perl and BioPerl
Introduction to Perl and BioPerlIntroduction to Perl and BioPerl
Introduction to Perl and BioPerl
 
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regular expression for everyone
Regular expression for everyoneRegular expression for everyone
Regular expression for everyone
 
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 

Kürzlich hochgeladen

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Kürzlich hochgeladen (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Hacker102 - RegExes w/JavaScript and Python

  • 1. hacker 102 code4lib 2010 preconference Asheville, NC, USA 2010-02-21
  • 3. if all language looked like “aabaaaabbbabaababa” it’d be easy to parse
  • 4. parsing “aabaaaabbbabaababa” • there are two elements, “a” and “b” • either may occur in any order • /([ab]+)/
  • 5. • [] denotes “elements” or “class” • // demarcates regex • + denotes “one or more of previous thing” • () denotes “remember this matched group” • /[ab]/ # an ‘a’ or a ‘b’ • /[ab]+/ # one or more ‘a’s or ‘b’s • /([ab]+)/ # a group of one or more ‘a’s or ‘b’s
  • 7. • [a-z] is any lower case char bet. a-z • [0-9] is any digit • + is one or more of previous thing • ? is zero or one of previous thing • | is or, e.g. [a|b] is ‘a’ or ‘b’ • * is zero to many of previous thing • . matches any character
  • 8. • [^a-z] is anything *but* [a-z] • [a-zA-Z0-9] is any of a-z, A-Z, 0-9 • {5} matches only 5 of the preceding thing • {2,} matches at least 2 of the preceding thing • {2,6} matches from 2 to 6 of preceding thing • [d] is like [0-9] (any digit) • [S] is any non-whitespace
  • 9. try this • visit any web page • open firebug console • title = window.document.title • try regexes to match parts of the title
  • 10. most every language has regex support
  • 12. v. glue it together Python
  • 14. TITLE: ABA journal. BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008) CURRENT VOL.: Vol. 95 (2009) - OTHER LIBRARIES: Miami:v. 68 (1982) - USDC: v. 88 (2002) - Birm.:v. 89 (2003) - (Formerly: American Bar Association Journal) (Bound and on Hein) TITLE: Administrative law review. BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60 (2008) CURRENT VOL.: Vol. 61 (2009) - (Bound and on Hein)
  • 15. starter code for you
  • 16. #!/usr/bin/env python import re re_tag = re.compile(r'([A-Z .]+):') re_title = re.compile('TITLE: (.*)') for line in open('journals-carol-bean.txt'): line = line.strip() m1 = re_tag.match(line) m2 = re_title.match(line) if line == "": continue print "n->", line, "<-" if m1 or m2: print "MATCH" if m1: print 'tag:', m1.groups() if m2: print 'title:', m2.groups()