SlideShare a Scribd company logo
1 of 3
Download to read offline
XML namespaces and XPath with Python
March 2, 2016
1 XML namespaces and XPath with Python
Thomas Aglassinger http://www.roskakori.at
2 XML
• eXtensible Markup Language
• a blueprint for other file formats
• can represent sequences and hierarchies
• text based (binary somewhat possible using e.g. UUEncode)
• human readable
• somewhat verbose
• supports a Document Object Model (DOM)
2.1 XML with Python
• xml: part if the standard library
• xml.etree.ElementTree - XML as pythonic Trees
• xml.dom.mindom - DOM, warts and all
• xml.sax - sequential parsing of large documents
• works, but has limited support for namespaces, XPath etc.
• lxml: available from http://lxml.de/
• Python wrapper to C based XML libraries
• full support for namespaces, XPath, schemas etc
• universally used for “serious” XML processing
2.2 Example XML file
<?xml version="1.0" encoding="utf-8"?>
<people:list xmlns:people="https://www.example.org/xml/people">
<people:updated date="2016-02-16" />
<people:person name="Alice" phone="0650/12345678" size="172" />
<people:person name="Bob" phone="0654/23456789" size="167" />
<people:person name="B¨arbel" phone="0699/34567890" size="182" />
<people:person name="G¨unther" size="172">
<people:note>Ask for phone number.</people:note>
</people:person>
</people:list>
1
2.3 XML namespaces
In our example
xmlns:people="https://www.example.org/xml/people"
assigns the shortcut people to the namespace identified by https://www.example.org/xml/people.
2.4 XPath
XPath is a query language to find nodes in XML documents. Examples:
• /people:list/people:person - all person elements in the document
• /people:list/people:person[@phone] - all person elements in the document with a phone attribute
Tutorial: http://www.w3schools.com/xsl/xpath intro.asp
3 Extract information from XML
3.1 Read the document root
Compute the path to our example XML file:
In [1]: import os.path
people_xml_path = os.path.join(’examples’, ’people.xml’)
Build the document root from the file:
In [2]: from lxml import etree
people_root = etree.parse(people_xml_path)
3.2 Setup the namespace
In [3]: NAMESPACES = {
’people’: ’https://www.example.org/xml/people’,
}
3.3 Find persons and print details
In [4]: # Find persons matching XPath.
person_elements = people_root.xpath(
’/people:list/people:person[@phone]’,
namespaces=NAMESPACES)
# Print name and phone of persons found.
for person_element in person_elements:
print(
person_element.attrib[’name’] + ’: ’ +
person_element.attrib[’phone’])
Alice: 0650/12345678
Bob: 0654/23456789
B¨arbel: 0699/34567890
2
3.4 Examining XML elements
Elements have a tag, where namespaces are represente using the Clark notation {namespace}tag:
In [5]: person_element.tag
Out[5]: ’{https://www.example.org/xml/people}person’
XML attributes are a simlpe dictionary:
In [6]: person_element.attrib
Out[6]: {’phone’: ’0699/34567890’, ’name’: ’B¨arbel’, ’size’: ’182’}
3.5 Text nodes
Print notes about persons without a phone:
In [7]: note_elements_for_persons_without_phone = 
people_root.xpath(
’/people:list/people:person[not(@phone)]/people:note’,
namespaces=NAMESPACES)
for note_element in note_elements_for_persons_without_phone:
person_element = note_element.getparent()
person_name = person_element.attrib[’name’]
note_text = note_element.text
print(person_name + ’: ’ + note_text)
G¨unther: Ask for phone number.
Use getparent() to access the enclosing XML element (as seen above).
4 Summary
• XML will be around for the foreseeable future so learn to deal with it.
• Use lxmlfor any serious XML work in Python.
• Namespaces and XPath can be taimed.
3

More Related Content

What's hot

機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門hoxo_m
 
情報検索における評価指標の最新動向と新たな提案
情報検索における評価指標の最新動向と新たな提案情報検索における評価指標の最新動向と新たな提案
情報検索における評価指標の最新動向と新たな提案Mitsuo Yamamoto
 
ホモトピー型理論入門
ホモトピー型理論入門ホモトピー型理論入門
ホモトピー型理論入門k h
 
コンピュータシステムの理論と実装1
コンピュータシステムの理論と実装1コンピュータシステムの理論と実装1
コンピュータシステムの理論と実装1H T
 
Real Number Modeling (RNM) 超・初級編
Real Number Modeling (RNM) 超・初級編Real Number Modeling (RNM) 超・初級編
Real Number Modeling (RNM) 超・初級編Wataru Yamamoto
 
MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」moterech
 
多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズムMatsuiRyo
 
About chtMultiRegionFoam
About chtMultiRegionFoam About chtMultiRegionFoam
About chtMultiRegionFoam 守淑 田村
 
x86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTx86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTtakesako
 
PHP AST 徹底解説
PHP AST 徹底解説PHP AST 徹底解説
PHP AST 徹底解説do_aki
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてMaho Nakata
 
指数時間アルゴリズム入門
指数時間アルゴリズム入門指数時間アルゴリズム入門
指数時間アルゴリズム入門Yoichi Iwata
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツShuyo Nakatani
 
(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習Ichigaku Takigawa
 
Simple perceptron by TJO
Simple perceptron by TJOSimple perceptron by TJO
Simple perceptron by TJOTakashi J OZAKI
 

What's hot (20)

機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
情報検索における評価指標の最新動向と新たな提案
情報検索における評価指標の最新動向と新たな提案情報検索における評価指標の最新動向と新たな提案
情報検索における評価指標の最新動向と新たな提案
 
ホモトピー型理論入門
ホモトピー型理論入門ホモトピー型理論入門
ホモトピー型理論入門
 
コンピュータシステムの理論と実装1
コンピュータシステムの理論と実装1コンピュータシステムの理論と実装1
コンピュータシステムの理論と実装1
 
Real Number Modeling (RNM) 超・初級編
Real Number Modeling (RNM) 超・初級編Real Number Modeling (RNM) 超・初級編
Real Number Modeling (RNM) 超・初級編
 
1次式とノルムで構成された最適化問題とその双対問題
1次式とノルムで構成された最適化問題とその双対問題1次式とノルムで構成された最適化問題とその双対問題
1次式とノルムで構成された最適化問題とその双対問題
 
MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」MLaPP 9章 「一般化線形モデルと指数型分布族」
MLaPP 9章 「一般化線形モデルと指数型分布族」
 
圏とHaskellの型
圏とHaskellの型圏とHaskellの型
圏とHaskellの型
 
競プロでGo!
競プロでGo!競プロでGo!
競プロでGo!
 
多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム
 
About chtMultiRegionFoam
About chtMultiRegionFoam About chtMultiRegionFoam
About chtMultiRegionFoam
 
B-link-tree
B-link-treeB-link-tree
B-link-tree
 
x86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNTx86x64 SSE4.2 POPCNT
x86x64 SSE4.2 POPCNT
 
PHP AST 徹底解説
PHP AST 徹底解説PHP AST 徹底解説
PHP AST 徹底解説
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
 
指数時間アルゴリズム入門
指数時間アルゴリズム入門指数時間アルゴリズム入門
指数時間アルゴリズム入門
 
数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ数式をnumpyに落としこむコツ
数式をnumpyに落としこむコツ
 
(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習(2018.3) 分子のグラフ表現と機械学習
(2018.3) 分子のグラフ表現と機械学習
 
Simple perceptron by TJO
Simple perceptron by TJOSimple perceptron by TJO
Simple perceptron by TJO
 
実用Brainf*ckプログラミング
実用Brainf*ckプログラミング実用Brainf*ckプログラミング
実用Brainf*ckプログラミング
 

Similar to XML namespaces and XPath with Python (20)

CrashCourse: XML technologies
CrashCourse: XML technologiesCrashCourse: XML technologies
CrashCourse: XML technologies
 
Xml parsing
Xml parsingXml parsing
Xml parsing
 
1 xml fundamentals
1 xml fundamentals1 xml fundamentals
1 xml fundamentals
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
Extracting data from xml
Extracting data from xmlExtracting data from xml
Extracting data from xml
 
paper about xml
paper about xmlpaper about xml
paper about xml
 
XMl
XMlXMl
XMl
 
Advanced Web Programming Chapter 12
Advanced Web Programming Chapter 12Advanced Web Programming Chapter 12
Advanced Web Programming Chapter 12
 
Xml presentation
Xml presentationXml presentation
Xml presentation
 
Full xml
Full xmlFull xml
Full xml
 
Xml and xml processor
Xml and xml processorXml and xml processor
Xml and xml processor
 
Xml and xml processor
Xml and xml processorXml and xml processor
Xml and xml processor
 
Creating xml publisher documents with people code
Creating xml publisher documents with people codeCreating xml publisher documents with people code
Creating xml publisher documents with people code
 
CTDA Workshop on XML and MODS
CTDA Workshop on XML and MODSCTDA Workshop on XML and MODS
CTDA Workshop on XML and MODS
 
XML
XMLXML
XML
 
Web data management (chapter-1)
Web data management (chapter-1)Web data management (chapter-1)
Web data management (chapter-1)
 
Files in Python.pptx
Files in Python.pptxFiles in Python.pptx
Files in Python.pptx
 
Files in Python.pptx
Files in Python.pptxFiles in Python.pptx
Files in Python.pptx
 
XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
 
DITA and Translation Best Praticices
DITA and Translation Best PraticicesDITA and Translation Best Praticices
DITA and Translation Best Praticices
 

More from roskakori

Expanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on designExpanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on designroskakori
 
Django trifft Flutter
Django trifft FlutterDjango trifft Flutter
Django trifft Flutterroskakori
 
Multiple django applications on a single server with nginx
Multiple django applications on a single server with nginxMultiple django applications on a single server with nginx
Multiple django applications on a single server with nginxroskakori
 
Helpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and DjangoHelpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and Djangoroskakori
 
Startmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU GrazStartmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU Grazroskakori
 
Helpful logging with python
Helpful logging with pythonHelpful logging with python
Helpful logging with pythonroskakori
 
Helpful logging with Java
Helpful logging with JavaHelpful logging with Java
Helpful logging with Javaroskakori
 
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-EntwicklerEinführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-Entwicklerroskakori
 
Analyzing natural language feedback using python
Analyzing natural language feedback using pythonAnalyzing natural language feedback using python
Analyzing natural language feedback using pythonroskakori
 
Microsoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and DockerMicrosoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and Dockerroskakori
 
Migration to Python 3 in Finance
Migration to Python 3 in FinanceMigration to Python 3 in Finance
Migration to Python 3 in Financeroskakori
 
Introduction to pygments
Introduction to pygmentsIntroduction to pygments
Introduction to pygmentsroskakori
 
Lösungsorientierte Fehlerbehandlung
Lösungsorientierte FehlerbehandlungLösungsorientierte Fehlerbehandlung
Lösungsorientierte Fehlerbehandlungroskakori
 
Erste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit PythonErste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit Pythonroskakori
 
Introduction to trader bots with Python
Introduction to trader bots with PythonIntroduction to trader bots with Python
Introduction to trader bots with Pythonroskakori
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with pythonroskakori
 
Python builds mit ant
Python builds mit antPython builds mit ant
Python builds mit antroskakori
 
Kanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-AnforderungenKanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-Anforderungenroskakori
 

More from roskakori (18)

Expanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on designExpanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on design
 
Django trifft Flutter
Django trifft FlutterDjango trifft Flutter
Django trifft Flutter
 
Multiple django applications on a single server with nginx
Multiple django applications on a single server with nginxMultiple django applications on a single server with nginx
Multiple django applications on a single server with nginx
 
Helpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and DjangoHelpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and Django
 
Startmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU GrazStartmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU Graz
 
Helpful logging with python
Helpful logging with pythonHelpful logging with python
Helpful logging with python
 
Helpful logging with Java
Helpful logging with JavaHelpful logging with Java
Helpful logging with Java
 
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-EntwicklerEinführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
 
Analyzing natural language feedback using python
Analyzing natural language feedback using pythonAnalyzing natural language feedback using python
Analyzing natural language feedback using python
 
Microsoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and DockerMicrosoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and Docker
 
Migration to Python 3 in Finance
Migration to Python 3 in FinanceMigration to Python 3 in Finance
Migration to Python 3 in Finance
 
Introduction to pygments
Introduction to pygmentsIntroduction to pygments
Introduction to pygments
 
Lösungsorientierte Fehlerbehandlung
Lösungsorientierte FehlerbehandlungLösungsorientierte Fehlerbehandlung
Lösungsorientierte Fehlerbehandlung
 
Erste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit PythonErste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit Python
 
Introduction to trader bots with Python
Introduction to trader bots with PythonIntroduction to trader bots with Python
Introduction to trader bots with Python
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
 
Python builds mit ant
Python builds mit antPython builds mit ant
Python builds mit ant
 
Kanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-AnforderungenKanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-Anforderungen
 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

XML namespaces and XPath with Python

  • 1. XML namespaces and XPath with Python March 2, 2016 1 XML namespaces and XPath with Python Thomas Aglassinger http://www.roskakori.at 2 XML • eXtensible Markup Language • a blueprint for other file formats • can represent sequences and hierarchies • text based (binary somewhat possible using e.g. UUEncode) • human readable • somewhat verbose • supports a Document Object Model (DOM) 2.1 XML with Python • xml: part if the standard library • xml.etree.ElementTree - XML as pythonic Trees • xml.dom.mindom - DOM, warts and all • xml.sax - sequential parsing of large documents • works, but has limited support for namespaces, XPath etc. • lxml: available from http://lxml.de/ • Python wrapper to C based XML libraries • full support for namespaces, XPath, schemas etc • universally used for “serious” XML processing 2.2 Example XML file <?xml version="1.0" encoding="utf-8"?> <people:list xmlns:people="https://www.example.org/xml/people"> <people:updated date="2016-02-16" /> <people:person name="Alice" phone="0650/12345678" size="172" /> <people:person name="Bob" phone="0654/23456789" size="167" /> <people:person name="B¨arbel" phone="0699/34567890" size="182" /> <people:person name="G¨unther" size="172"> <people:note>Ask for phone number.</people:note> </people:person> </people:list> 1
  • 2. 2.3 XML namespaces In our example xmlns:people="https://www.example.org/xml/people" assigns the shortcut people to the namespace identified by https://www.example.org/xml/people. 2.4 XPath XPath is a query language to find nodes in XML documents. Examples: • /people:list/people:person - all person elements in the document • /people:list/people:person[@phone] - all person elements in the document with a phone attribute Tutorial: http://www.w3schools.com/xsl/xpath intro.asp 3 Extract information from XML 3.1 Read the document root Compute the path to our example XML file: In [1]: import os.path people_xml_path = os.path.join(’examples’, ’people.xml’) Build the document root from the file: In [2]: from lxml import etree people_root = etree.parse(people_xml_path) 3.2 Setup the namespace In [3]: NAMESPACES = { ’people’: ’https://www.example.org/xml/people’, } 3.3 Find persons and print details In [4]: # Find persons matching XPath. person_elements = people_root.xpath( ’/people:list/people:person[@phone]’, namespaces=NAMESPACES) # Print name and phone of persons found. for person_element in person_elements: print( person_element.attrib[’name’] + ’: ’ + person_element.attrib[’phone’]) Alice: 0650/12345678 Bob: 0654/23456789 B¨arbel: 0699/34567890 2
  • 3. 3.4 Examining XML elements Elements have a tag, where namespaces are represente using the Clark notation {namespace}tag: In [5]: person_element.tag Out[5]: ’{https://www.example.org/xml/people}person’ XML attributes are a simlpe dictionary: In [6]: person_element.attrib Out[6]: {’phone’: ’0699/34567890’, ’name’: ’B¨arbel’, ’size’: ’182’} 3.5 Text nodes Print notes about persons without a phone: In [7]: note_elements_for_persons_without_phone = people_root.xpath( ’/people:list/people:person[not(@phone)]/people:note’, namespaces=NAMESPACES) for note_element in note_elements_for_persons_without_phone: person_element = note_element.getparent() person_name = person_element.attrib[’name’] note_text = note_element.text print(person_name + ’: ’ + note_text) G¨unther: Ask for phone number. Use getparent() to access the enclosing XML element (as seen above). 4 Summary • XML will be around for the foreseeable future so learn to deal with it. • Use lxmlfor any serious XML work in Python. • Namespaces and XPath can be taimed. 3