Hacker102 - RegExes w/JavaScript and Python

•

0 gefällt mir•548 views

Basic introduction to regexes using JavaScript and Python. Developed for code4lib 2010 conference preconf "Hacker 101/102".

Technologie

hacker 102
code4lib 2010 preconference
Asheville, NC, USA 2010-02-21

if all language
looked like
“aabaaaabbbabaababa”
it’d be
easy to parse

parsing
“aabaaaabbbabaababa”
• there are two
elements, “a” and “b”
• either may occur in
any order
• /([ab]+)/

• [] denotes “elements” or “class”
• // demarcates regex
• + denotes “one or more of previous thing”
• () denotes “remember this matched group”
• /[ab]/ # an ‘a’ or a ‘b’
• /[ab]+/ # one or more ‘a’s or ‘b’s
• /([ab]+)/ # a group of one or more ‘a’s or ‘b’s

• [a-z] is any lower case char bet. a-z
• [0-9] is any digit
• + is one or more of previous thing
• ? is zero or one of previous thing
• | is or, e.g. [a|b] is ‘a’ or ‘b’
• * is zero to many of previous thing
• . matches any character

$• [^a-z] is anything *but* [a-z] • [a-zA-Z0-9] is any of a-z, A-Z, 0-9 • {5} matches only 5 of the preceding thing • {2,} matches at least 2 of the preceding thing • {2,6} matches from 2 to 6 of preceding thing • [d] is like [0-9] (any digit) • [S] is any non-whitespace$

try this

• visit any web page
• open ﬁrebug console
• title = window.document.title
• try regexes to match parts of
the title

TITLE: ABA journal.
BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008)
CURRENT VOL.: Vol. 95 (2009) -
OTHER LIBRARIES:
Miami:v. 68 (1982) -
USDC: v. 88 (2002) -
Birm.:v. 89 (2003) -
(Formerly: American Bar Association Journal)
(Bound and on Hein)

TITLE: Administrative law review.
BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60
(2008)
CURRENT VOL.: Vol. 61 (2009) -
(Bound and on Hein)

#!/usr/bin/env python
import re
re_tag = re.compile(r'([A-Z .]+):')
re_title = re.compile('TITLE: (.*)')
for line in open('journals-carol-bean.txt'):
line = line.strip()
m1 = re_tag.match(line)
m2 = re_title.match(line)
if line == "":
continue
print "n->", line, "<-"
if m1 or m2:
print "MATCH"
if m1:
print 'tag:', m1.groups()
if m2:
print 'title:', m2.groups()

Empfohlen

A brief introduction to dartRandal Schwartz

Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy

Course 102: Lecture 4: Using Wild CardsAhmed El-Arabawy

Hacker 101/102 - Introduction to Programming w/ProcessingDan Chudnov

Linking Library Data on the WebDan Chudnov

20121018 Access "social feed manager"Dan Chudnov

CTS at LC - Access 2010Dan Chudnov

stuff i'm learning in data schoolDan Chudnov

Empfohlen

A brief introduction to dartRandal Schwartz

Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy

Course 102: Lecture 4: Using Wild CardsAhmed El-Arabawy

Hacker 101/102 - Introduction to Programming w/ProcessingDan Chudnov

Linking Library Data on the WebDan Chudnov

20121018 Access "social feed manager"Dan Chudnov

CTS at LC - Access 2010Dan Chudnov

stuff i'm learning in data schoolDan Chudnov

web archiving tools and technologiesDan Chudnov

Hacker 102 - regexes w/Javascript, PythonDan Chudnov

introduction to Django in five slides Dan Chudnov

collecting twitter data w/social feed managerDan Chudnov

think locally, code globally - dchud's code4lib japan 2013 talkDan Chudnov

Repository Development at LC - Access 2009Dan Chudnov

TCDL 2009 keynote: Better living through linkingDan Chudnov

what i want from linked dataDan Chudnov

CRM: A Business Imperative for Companies during the Global Economic DownturnNavik Numsiang

WWIC - Library Linked Data as a Customer Service MediumDan Chudnov

Biodiversity Conservation in the Production Forests of IndonesiaGPFLR

Overview of Adaptive Blocking for DDL Research LabDan Chudnov

Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerDan Chudnov

Experience Gedepahala Corridor ProgrammeGPFLR

Library Carpentry. Week One: BasicsJames Baker

P3 2018 python_regexesProf. Wim Van Criekinge

PHP - Introduction to PHPVibrant Technologies & Computers

Scala in practice - 3 years laterpatforna

Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks

Python advanced 2. regular expression in pythonJohn(Qiang) Zhang

Intro to Perl and BioperlBioinformatics and Computational Biosciences Branch

Testing stateful, concurrent, and async systems using test.checkEric Normand

Weitere ähnliche Inhalte

Andere mochten auch

web archiving tools and technologiesDan Chudnov

Hacker 102 - regexes w/Javascript, PythonDan Chudnov

introduction to Django in five slides Dan Chudnov

collecting twitter data w/social feed managerDan Chudnov

think locally, code globally - dchud's code4lib japan 2013 talkDan Chudnov

Repository Development at LC - Access 2009Dan Chudnov

TCDL 2009 keynote: Better living through linkingDan Chudnov

what i want from linked dataDan Chudnov

CRM: A Business Imperative for Companies during the Global Economic DownturnNavik Numsiang

WWIC - Library Linked Data as a Customer Service MediumDan Chudnov

Biodiversity Conservation in the Production Forests of IndonesiaGPFLR

Overview of Adaptive Blocking for DDL Research LabDan Chudnov

Capturing the Ephemeral: Collecting Social Media with Social Feed ManagerDan Chudnov

Experience Gedepahala Corridor ProgrammeGPFLR

Andere mochten auch (14)

web archiving tools and technologies

Hacker 102 - regexes w/Javascript, Python

introduction to Django in five slides

collecting twitter data w/social feed manager

think locally, code globally - dchud's code4lib japan 2013 talk

Repository Development at LC - Access 2009

TCDL 2009 keynote: Better living through linking

what i want from linked data

CRM: A Business Imperative for Companies during the Global Economic Downturn

WWIC - Library Linked Data as a Customer Service Medium

Biodiversity Conservation in the Production Forests of Indonesia

Overview of Adaptive Blocking for DDL Research Lab

Capturing the Ephemeral: Collecting Social Media with Social Feed Manager

Experience Gedepahala Corridor Programme

Ähnlich wie Hacker102 - RegExes w/JavaScript and Python

Library Carpentry. Week One: BasicsJames Baker

P3 2018 python_regexesProf. Wim Van Criekinge

PHP - Introduction to PHPVibrant Technologies & Computers

Scala in practice - 3 years laterpatforna

Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...Thoughtworks

Python advanced 2. regular expression in pythonJohn(Qiang) Zhang

Intro to Perl and BioperlBioinformatics and Computational Biosciences Branch

Testing stateful, concurrent, and async systems using test.checkEric Normand

Code for Startup MVP (Ruby on Rails) Session 2Henry S

From Ruby to Scalatod esking

Perl Intro 3 Datalog ParsingShaun Griffith

shellScriptAlt.pptxNiladriDey18

2015 bioinformatics python_strings_wim_vancriekingeProf. Wim Van Criekinge

Compass, Sass, and the Enlightened CSS DeveloperWynn Netherland

Introduction to Perl and BioPerlBioinformatics and Computational Biosciences Branch

Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Andrea Telatin

Regular ExpressionsNiek Schmoller

Regular expression for everyoneSanjeev Kumar Jaiswal

Things that every JavaScript developer should know by Rachel Appel at FrontCo...DevClub_lv

FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos

Ähnlich wie Hacker102 - RegExes w/JavaScript and Python (20)

Library Carpentry. Week One: Basics

P3 2018 python_regexes

PHP - Introduction to PHP

Scala in practice - 3 years later

Scala in-practice-3-years by Patric Fornasier, Springr, presented at Pune Sca...

Python advanced 2. regular expression in python

Intro to Perl and Bioperl

Testing stateful, concurrent, and async systems using test.check

Code for Startup MVP (Ruby on Rails) Session 2

From Ruby to Scala

Perl Intro 3 Datalog Parsing

shellScriptAlt.pptx

2015 bioinformatics python_strings_wim_vancriekinge

Compass, Sass, and the Enlightened CSS Developer

Introduction to Perl and BioPerl

Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...

Regular Expressions

Regular expression for everyone

Things that every JavaScript developer should know by Rachel Appel at FrontCo...

FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf

Kürzlich hochgeladen

A Journey Into the Emotions of Software DevelopersNicole Novielli

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Sample pptx for embedding into website for demoHarshalMandlekar2

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Training state-of-the-art general text embeddingZilliz

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Kürzlich hochgeladen (20)

A Journey Into the Emotions of Software Developers

What's New in Teams Calling, Meetings and Devices March 2024

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Generative AI for Technical Writer or Information Developers

WordPress Websites for Engineers: Elevate Your Brand

Anypoint Exchange: It’s Not Just a Repo!

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Sample pptx for embedding into website for demo

DSPy a system for AI to Write Prompts and Do Fine Tuning

Time Series Foundation Models - current state and future directions

Training state-of-the-art general text embedding

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

SIP trunking in Janus @ Kamailio World 2024

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

The State of Passkeys with FIDO Alliance.pptx

What is DBT - The Ultimate Data Build Tool.pdf

Hacker102 - RegExes w/JavaScript and Python

1. hacker 102 code4lib 2010 preconference Asheville, NC, USA 2010-02-21

2. iv. regular expressions JavaScript

3. if all language looked like “aabaaaabbbabaababa” it’d be easy to parse

4. parsing “aabaaaabbbabaababa” • there are two elements, “a” and “b” • either may occur in any order • /([ab]+)/

5. • [] denotes “elements” or “class” • // demarcates regex • + denotes “one or more of previous thing” • () denotes “remember this matched group” • /[ab]/ # an ‘a’ or a ‘b’ • /[ab]+/ # one or more ‘a’s or ‘b’s • /([ab]+)/ # a group of one or more ‘a’s or ‘b’s

6. to ﬁrebug!

7. • [a-z] is any lower case char bet. a-z • [0-9] is any digit • + is one or more of previous thing • ? is zero or one of previous thing • | is or, e.g. [a|b] is ‘a’ or ‘b’ • * is zero to many of previous thing • . matches any character

8. • [^a-z] is anything *but* [a-z] • [a-zA-Z0-9] is any of a-z, A-Z, 0-9 • {5} matches only 5 of the preceding thing • {2,} matches at least 2 of the preceding thing • {2,6} matches from 2 to 6 of preceding thing • [d] is like [0-9] (any digit) • [S] is any non-whitespace

9. try this • visit any web page • open ﬁrebug console • title = window.document.title • try regexes to match parts of the title

10. most every language has regex support

11. try unix “grep”

12. v. glue it together Python

13. problem: Carol’s data

14. TITLE: ABA journal. BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008) CURRENT VOL.: Vol. 95 (2009) - OTHER LIBRARIES: Miami:v. 68 (1982) - USDC: v. 88 (2002) - Birm.:v. 89 (2003) - (Formerly: American Bar Association Journal) (Bound and on Hein) TITLE: Administrative law review. BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60 (2008) CURRENT VOL.: Vol. 61 (2009) - (Bound and on Hein)

15. starter code for you

16. #!/usr/bin/env python import re re_tag = re.compile(r'([A-Z .]+):') re_title = re.compile('TITLE: (.*)') for line in open('journals-carol-bean.txt'): line = line.strip() m1 = re_tag.match(line) m2 = re_title.match(line) if line == "": continue print "n->", line, "<-" if m1 or m2: print "MATCH" if m1: print 'tag:', m1.groups() if m2: print 'title:', m2.groups()