Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
hacker 102
 code4lib 2010 preconference
Asheville, NC, USA 2010-02-21
iv. regular expressions

      JavaScript
if all language
      looked like
“aabaaaabbbabaababa”
         it’d be
    easy to parse
parsing
“aabaaaabbbabaababa”
  •   there are two
      elements, “a” and “b”
  •   either may occur in
      any order
  •...
• [] denotes “elements” or “class”
• // demarcates regex
• + denotes “one or more of previous thing”
• () denotes “remembe...
to firebug!
• [a-z] is any lower case char bet. a-z
• [0-9] is any digit
• + is one or more of previous thing
• ? is zero or one of pr...
• [^a-z] is anything *but* [a-z]
• [a-zA-Z0-9] is any of a-z, A-Z, 0-9
• {5} matches only 5 of the preceding thing
• {2,} ...
try this

• visit any web page
• open firebug console
• title = window.document.title
• try regexes to match parts of
  the...
most every language
 has regex support
try unix “grep”
v. glue it together

     Python
problem: Carol’s data
TITLE: ABA journal.
BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008)
CURRENT VOL.: Vol. 95 (2009) -
OTHER LIBRARIES:
      Mi...
starter code
   for you
#!/usr/bin/env python
import re
re_tag = re.compile(r'([A-Z .]+):')
re_title = re.compile('TITLE: (.*)')
for line in open(...
Sie haben dieses Dokument abgeschlossen.
Lade die Datei herunter und lese sie offline.
Nächste SlideShare
TCDL 2009 keynote: Better living through linking
Weiter

Hacker 102 - regexes w/Javascript, Python

  1. 1. hacker 102 code4lib 2010 preconference Asheville, NC, USA 2010-02-21
  2. 2. iv. regular expressions JavaScript
  3. 3. if all language looked like “aabaaaabbbabaababa” it’d be easy to parse
  4. 4. parsing “aabaaaabbbabaababa” • there are two elements, “a” and “b” • either may occur in any order • /([ab]+)/
  5. 5. • [] denotes “elements” or “class” • // demarcates regex • + denotes “one or more of previous thing” • () denotes “remember this matched group” • /[ab]/ # an ‘a’ or a ‘b’ • /[ab]+/ # one or more ‘a’s or ‘b’s • /([ab]+)/ # a group of one or more ‘a’s or ‘b’s
  6. 6. to firebug!
  7. 7. • [a-z] is any lower case char bet. a-z • [0-9] is any digit • + is one or more of previous thing • ? is zero or one of previous thing • | is or, e.g. [a|b] is ‘a’ or ‘b’ • * is zero to many of previous thing • . matches any character
  8. 8. • [^a-z] is anything *but* [a-z] • [a-zA-Z0-9] is any of a-z, A-Z, 0-9 • {5} matches only 5 of the preceding thing • {2,} matches at least 2 of the preceding thing • {2,6} matches from 2 to 6 of preceding thing • [d] is like [0-9] (any digit) • [S] is any non-whitespace
  9. 9. try this • visit any web page • open firebug console • title = window.document.title • try regexes to match parts of the title
  10. 10. most every language has regex support
  11. 11. try unix “grep”
  12. 12. v. glue it together Python
  13. 13. problem: Carol’s data
  14. 14. TITLE: ABA journal. BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008) CURRENT VOL.: Vol. 95 (2009) - OTHER LIBRARIES: Miami:v. 68 (1982) - USDC: v. 88 (2002) - Birm.:v. 89 (2003) - (Formerly: American Bar Association Journal) (Bound and on Hein) TITLE: Administrative law review. BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60 (2008) CURRENT VOL.: Vol. 61 (2009) - (Bound and on Hein)
  15. 15. starter code for you
  16. 16. #!/usr/bin/env python import re re_tag = re.compile(r'([A-Z .]+):') re_title = re.compile('TITLE: (.*)') for line in open('journals-carol-bean.txt'): line = line.strip() m1 = re_tag.match(line) m2 = re_title.match(line) if line == "": continue print "n->", line, "<-" if m1 or m2: print "MATCH" if m1: print 'tag:', m1.groups() if m2: print 'title:', m2.groups()
  • charlenopires

    Feb. 24, 2010

Basic introduction to regexes using JavaScript and Python. Developed for code4lib 2010 conference preconf "Hacker 101/102".

Aufrufe

Aufrufe insgesamt

1.780

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

3

Befehle

Downloads

31

Geteilt

0

Kommentare

0

Likes

1

×