SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Large output in XML
with Unicode and namespace

         Thomas Aglassinger
          http://roskakori.at
We wanted to write this:
We wanted to write this:




     XML
We wanted to write this:




     XML      Unicode
We wanted to write this:




                         Name
     XML      Unicode
                        spaces
We wanted to write this:




                                 Name
Large        XML      Unicode
                                spaces
We already knew how to read XML.
●   xml.dom.minidom.parse()
●   xml.etree.ElementTree.parse()
●   xml.sax.parse()
●   lxml.etree.parse()




                              http://encyclopediadramatica.se/File:Bill_Nye_Expert.jpg
So we went to the Python Library...




         http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg
...and we were like:




     http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)
xml.dom.minidom
xml.dom.minidom


            Explicit attribute
            for name space



            Verbose way to
             add attributes



              Many lines
               of code
xml.dom.minidom
●   “Users [...] who would like to write less code
    for processing XML files should consider using
    the xml.etree.ElementTree module instead”
    (The Python Standard Library, Chapter 19, Structured Markup)




                    http://www.destructoid.com/blogs/Sevre/femshep-5-a-space-opera-208844.phtml
xml.etree
xml.etree          Clark notation
                                instead of XPath




 Requires
Python 2.7




 Generally shorter    Similar issues
    but wider         with lxml.etree
Memory issues
●   So far, XML-Document is built in memory
●   Won't work well for large sets of data
●   We need a streaming interface
codecs.open() and write()
codecs.open() and write()




                          It just doen't
                            feel “right”
 Manual
escaping
saxutils.XMLGenerator
saxutils.XMLGenerator




No support for <x/>,
   only <x></x>
Lack of basic validation
●   Are all tags closed?
●   In the correct order?
●   Has a namespace been
    registered before usage?
So we had to go all kinky...




          http://mylittlefacewhen.com/f/3781/
...and write yet-another XML module




             http://www.110pounds.com/?p=6880
Before you judge too harshly:




            It just
        writes XML!
loxun
loxun
                            Compact
 Defaults to               namespace
UTF-8 output                 syntax




                   Compact
                   attribute
                    syntax
loxun
                   Supports
                      with-
                   statement                Streaming interface
                                          for low memory usage




                      No dependencies
                      on other modules
Pure Python 2.5+

                      Optimizes <x></x>
                        to simply <x/>
Raises XmlError if...
●   ...you add references to undefined name
    spaces
●   ...if you forget to close tags (elements)
●   ...if you build non-well formed documents
●   ...if you add non-ASCII characters in 8-bit
    strings
Available from:
●   http://pypi.python.org/pypi/loxun/
●   https://github.com/roskakori/loxun
●   Open Source

                                  $ sudo pip install loxun
                                  Downloading/unpacking loxun
                                   Downloading loxun-1.3.zip
                                   Running setup.py egg_info for package loxun

                                  Installing collected packages: loxun
                                    Running setup.py install for loxun

Code examples for this talk:      Successfully installed loxun
https://gist.github.com/3067859
Try loxun for:




     Large output in XML
with Unicode and namespace

                        Also writes
                        small ASCII
                           files!

Weitere ähnliche Inhalte

Mehr von roskakori

Mehr von roskakori (11)

Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-EntwicklerEinführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
 
Analyzing natural language feedback using python
Analyzing natural language feedback using pythonAnalyzing natural language feedback using python
Analyzing natural language feedback using python
 
Microsoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and DockerMicrosoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and Docker
 
Migration to Python 3 in Finance
Migration to Python 3 in FinanceMigration to Python 3 in Finance
Migration to Python 3 in Finance
 
Introduction to pygments
Introduction to pygmentsIntroduction to pygments
Introduction to pygments
 
Lösungsorientierte Fehlerbehandlung
Lösungsorientierte FehlerbehandlungLösungsorientierte Fehlerbehandlung
Lösungsorientierte Fehlerbehandlung
 
XML namespaces and XPath with Python
XML namespaces and XPath with PythonXML namespaces and XPath with Python
XML namespaces and XPath with Python
 
Introduction to trader bots with Python
Introduction to trader bots with PythonIntroduction to trader bots with Python
Introduction to trader bots with Python
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
 
Python builds mit ant
Python builds mit antPython builds mit ant
Python builds mit ant
 
Kanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-AnforderungenKanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-Anforderungen
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Large output in xml with unicode and namespace

  • 1. Large output in XML with Unicode and namespace Thomas Aglassinger http://roskakori.at
  • 2. We wanted to write this:
  • 3. We wanted to write this: XML
  • 4. We wanted to write this: XML Unicode
  • 5. We wanted to write this: Name XML Unicode spaces
  • 6. We wanted to write this: Name Large XML Unicode spaces
  • 7. We already knew how to read XML. ● xml.dom.minidom.parse() ● xml.etree.ElementTree.parse() ● xml.sax.parse() ● lxml.etree.parse() http://encyclopediadramatica.se/File:Bill_Nye_Expert.jpg
  • 8. So we went to the Python Library... http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg
  • 9. ...and we were like: http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)
  • 11. xml.dom.minidom Explicit attribute for name space Verbose way to add attributes Many lines of code
  • 12. xml.dom.minidom ● “Users [...] who would like to write less code for processing XML files should consider using the xml.etree.ElementTree module instead” (The Python Standard Library, Chapter 19, Structured Markup) http://www.destructoid.com/blogs/Sevre/femshep-5-a-space-opera-208844.phtml
  • 14. xml.etree Clark notation instead of XPath Requires Python 2.7 Generally shorter Similar issues but wider with lxml.etree
  • 15. Memory issues ● So far, XML-Document is built in memory ● Won't work well for large sets of data ● We need a streaming interface
  • 17. codecs.open() and write() It just doen't feel “right” Manual escaping
  • 20. Lack of basic validation ● Are all tags closed? ● In the correct order? ● Has a namespace been registered before usage?
  • 21. So we had to go all kinky... http://mylittlefacewhen.com/f/3781/
  • 22. ...and write yet-another XML module http://www.110pounds.com/?p=6880
  • 23. Before you judge too harshly: It just writes XML!
  • 24. loxun
  • 25. loxun Compact Defaults to namespace UTF-8 output syntax Compact attribute syntax
  • 26. loxun Supports with- statement Streaming interface for low memory usage No dependencies on other modules Pure Python 2.5+ Optimizes <x></x> to simply <x/>
  • 27. Raises XmlError if... ● ...you add references to undefined name spaces ● ...if you forget to close tags (elements) ● ...if you build non-well formed documents ● ...if you add non-ASCII characters in 8-bit strings
  • 28. Available from: ● http://pypi.python.org/pypi/loxun/ ● https://github.com/roskakori/loxun ● Open Source $ sudo pip install loxun Downloading/unpacking loxun Downloading loxun-1.3.zip Running setup.py egg_info for package loxun Installing collected packages: loxun Running setup.py install for loxun Code examples for this talk: Successfully installed loxun https://gist.github.com/3067859
  • 29. Try loxun for: Large output in XML with Unicode and namespace Also writes small ASCII files!