Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Python Web Interaction

3.682 Aufrufe

Veröffentlicht am

Dev8D presentation showing my top 10 Python libraries for interacting with the web.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Python Web Interaction

  1. 1. Rob
Sanderson
 
‐
rsanderson@lanl.gov
 
‐
azaroth42@gmail.com
 
‐
@azaroth42
 Digital
Library
Prototyping
Team
 Los
Alamos
NaBonal
Laboratory,
 USA
 http://www.flickr.com/photos/42311564@N00/2355590274/ Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  2. 2. Overview Top 10 Libraries for Web Interaction •  urllib •  urllib2 •  urlparse •  httplib •  lxml •  rdflib •  json/simplejson •  mod_python, mod_wsgi •  bpython Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  3. 3. urllib >>> import urllib >>> urllib.quote('~azaroth/s?q=http://foo.com/') '%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/' >>> urllib.unquote('%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/') '~azaroth/s?q=http://foo.com/' >>> fh = urllib.urlopen('http://www.google.com/') >>> html = fh.read() >>> fh.close() >>> fh.getcode() 200 >>> fh.headers.dict['content-type'] 'text/html; charset=ISO-8859-1' Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  4. 4. urllib2 >>> import urllib2 >>> ph = urllib2.ProxyHandler( {'http' : 'http://proxyout.lanl.gov:8080/'}) >>> opener = urllib2.build_opener(ph) >>> urllib2.install_opener(opener) >>> # From now on, all requests will go through proxy >>> r = urllib2.Request('http://www.google.com/') >>> r.add_header('Referrer', 'http://www.somewhere.net') >>> fh = urllib2.urlopen(r) >>> html = fh.read() >>> fh.close() >>> # fh is the same as urllib's for headers/status Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  5. 5. urlparse >>> import urlparse >>> pr = urlparse.urlparse( 'https://www.google.com/search?q=foo&bar=bz#frag') >>> pr.scheme 'https' >>> pr.hostname 'www.google.com' >>> pr.path '/search' >>> pr.query 'q=foo&bar=bz' >>> pr.fragment 'frag' Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  6. 6. httplib >>> import httplib >>> cxn = httplib.HTTPConnection('www.google.com') >>> hdrs = {'Accept' : 'application/rdf+xml'} >>> path = "/search?q=some+search+query" >>> cxn.request("HEAD", path, headers=hdrs) >>> resp = cxn.getresponse() >>> resp.status 200 >>> resp_hdrs = dict(resp.getheaders()) >>> resp_hdrs['content-type'] # :( 'text/html; charset=ISO-8859-1' >>> data = resp.read() >>> cxn.close() Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  7. 7. lxml $ easy_install lxml >>> from lxml import etree >>> et = etree.XML('<a b="B"> A <c>C</c> </a>') >>> et.text ' A ' >>> et.attrib['b'] 'B' >>> for elem in et.iterchildren(): ... print elem <Element c at 16d1ed0> >>> html = etree.parse(StringIO.StringIO("<html><p>hi"), parser=etree.HTMLParser()) >>> html.xpath('/html/body/p') [<Element p at 16e00f0>] Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  8. 8. rdflib $ easy_install rdflib >>> import rdflib as rdf >>> inp = rdf.URLInputSource( 'http://xmlns.com/foaf/spec/20100101.rdf') >>> inp2 = rdf.StringInputSource("<a> <b> <c> .") >>> graph = rdf.ConjunctiveGraph() >>> graph.parse(inp) >>> sparql = "SELECT ?l WHERE {?w rdfs:label ?l . }" >>> res = graph.query(sparql, initNs={'rdfs':rdf.RDFS.RDFSNS})) >>> res.selected[0] rdf.Literal(u'Given name') >>> nt = graph.serialize(format='nt') Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  9. 9. json / simplejson >>> try: import simplejson as json ... except ImportError: import json >>> data = {'o' : (True, None, 1.0), "ints" : [1,2,3]} >>> json.dumps(data) '{"o": [true, null, 1.0], "ints": [1, 2, 3]}' >>> json.dumps(data, separators=(',', ':')) # compact '{"o":[true,null,1.0],"ints":[1,2,3]}' >>> json.loads('[1,2,"foo",null]') [1, 2, u'foo', None] Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  10. 10. mod_python, mod_wsgi import cgitb from mod_python import apache from mod_python.util import FieldStorage def handler(req): try: form = FieldStorage(req) # dict-like object for query path = req.uri req.status = 200 req.content_type = "text/plain" req.send_http_header() req.write(path) except: req.content_type = "text/html" cgitb.Hook(file=req).handle() return apache.OK Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  11. 11. bpython $ easy_install bpython $ bpython Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London

×