SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
 © Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Turning XML into XLS with Groovy
Nick Burch
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Turning XML to XLS, on the JVM, 
without loosing your sanity, with Groovy!
2
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
What is Groovy?
3

Groovy, now Apache Groovy, is a JVM based language

Optionally Typed, Dynamic Language

Many features inspired by Python, Ruby, SmallTalk

Java Friendly – Can use Java classes & libraries, but also 
can donate classes back to be used in Java

Seemless integration with Java, similar syntax

A lot less boilerplate than Java! But Java is learning...
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Fun with PDF XML
4

You can attach comments in a PDF

These have text, along with colour, and optionally some 
standard Dublin Core­esque metadata

Best done with Acrobat or open source tools

Surprisingly popular with many business sectors, 
Quanticate’s included, mostly for good reason!

Good news – Acrobat can export as XML as XFDF
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
5
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Groovy and XML
6

http://groovy­lang.org/processing­xml.html

Very easy and light­weight way to start processing XML

Can initially treat XML as a big map, and access elements 
just with dots

Multiple children of the same type treated as a list

Can access attributes as properties with the @ prefix

For simple XML, very short, succinct code
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
And in code....
7

XmlSlurper and XmlParser main 2 entry points
def response = new 
   XmlSlurper().parseText(books)
def firstAuthor = response.
     value.books.book[0].author
assert firstAuthor.text() == 
    'Manuel De Cervantes'
assert firstAuthor.@id == 1
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Groovy and advanced XML
8

http://groovy­lang.org/processing­xml.html

However... Still have a full XML DOM on hand

Can run arbitrary DOM and XPath querys

Feels a bit like JQuery etc – friendly, friendly, advanced 
selector, back to friendly again!

Call selector method, then find or findAll with a closure
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Groovy and lists
9

http://docs.groovy­lang.org/next 
/html/documentation/working­with­collections.html

Define simply inline, eg def stuff = [“1”,”b”,”Test”]

Run something for each, eg stuff.each { l   println l }→

Filter and transform with grep and collect methods

Can run any method on all entries with * syntax, eg
stuff*.trim().sort().unique()
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Groovy and Strings
10
http://docs.groovy­lang.org/latest/html/documentation/#all­strings

Can be regular Java strings, or GStrings with extra 
methods and functionality, Groovy sorts this for you

Use tripple single quotes to force Java string

Use tripple quotes for multi­line strings

In GStrings, can interpolate with ${...} eg
“Hello ${world}” or “1+3 is ${1+3}” or “List is {l.size()} big”
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
PoC to process the XML
11
 File xmlFile = new File(args[0])
 def xml = new XmlSlurper().parse(xmlFile)
 // Process all the Annotations
 // ­ Find everything under /xfdf/annots/freetext
 // ­ If that has no Subject, is the Domain
 // ­ If that does, grab text from /contents­richtext/body/p
 def freetext = xml.annots.freetext
 println "There are ${freetext.size()} annotations"
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
PoC to process the XML
12
def lastDomain = "N/A"
freetext.each { ft ­>
   def body = ft."contents­richtext".body
   def content = body.text() // Get all text of entries of P etc under the body
   if (ft['@page']) {
      If (! ft['@subject'].isEmpty()) {
println "P ${ft['@page']} ­ D ${lastDomain} ­ S ${ft['@subject']} ­ V ${content}"
      } else {
        lastDomain = content
     }
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Apache POI
13

https://poi.apache.org/

Pure Java library for reading and writing most Microsoft 
Office file formats

Especially strong on SpreadSheets (XLS and XLSX)

More closely alligned to the file formats than the 
applications, which can sometimes cause surprises (eg 
where Excel doesn’t store what you thought it did...)
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Simple XLS Exporter
14
Workbook wb = new HSSFWorkbook()
Sheet s = wb.createSheet(“Variables”)
dvars.eachWithIndex { v, idx ­>
            Row r = s.createRow(idx)
            r.createCell(0).setCellValue(dname)
            r.createCell(1).setCellValue(v.variable)
            r.createCell(2).setCellValue(v.pages.join(" "))
            r.createCell(3).setCellValue(v.comments.join(" : "))
}
(0..3).each { col   s.autoSizeColumn(col) }→
(new File(“output.xls”)).withOutputStream { out ­> wb.write(out) }
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Fancy filterable headers
15
public static Sheet headerSheet(Workbook wb, String name, List<String> headers) {
      CellStyle csHeader = makeHeaderStyle(wb, headerHeight)
      Sheet s = wb.createSheet(name)
      Row r = s.createRow(0)
      r.setHeightInPoints(headerHeight+1)
      headers.eachWithIndex { col, idx ­>
         Cell c = r.createCell(idx)
         c.setCellValue(col)
         c.setCellStyle(csHeader)
      }
      s.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.size()­1))
      return s
}
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
The Real Thing
16

Final Requirements were a bit more involved, and there 
were more edge cases than initially expected...

XML Processing code: ~150 lines

XLS Export code: ~150 lines

I’m sure a Groovy expert could get that shorter without 
affecting readability or maintainability!

Uses Gradle to fetch Apache POI, do single Shadow Jar
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
17
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
18
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
19
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
20
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Learning Groovy
21

Groovy In Action book

http://groovy­lang.org/learn.html 

Getting Started and Module Guides
http://groovy­lang.org/documentation.html 

StackOverflow questions

Groovy Docs, esp. for Java enhanced eg
http://docs.groovy­lang.org/docs/groovy­2.4.13/html/groovy­
jdk/java/lang/String.html
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
22
© Quanticate 2018
Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance
Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness
Any Questions?
23

Weitere ähnliche Inhalte

Ähnlich wie Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

Centralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical DevelopmentCentralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical DevelopmentSaama
 
There's Gold in Them Thar Data
There's Gold in Them Thar DataThere's Gold in Them Thar Data
There's Gold in Them Thar Datadclsocialmedia
 
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...Alexandre Riazanov
 
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...Amazon Web Services
 
Advanced VSClinical Reports with Scripting and Custom Integrations
Advanced VSClinical Reports with Scripting and Custom IntegrationsAdvanced VSClinical Reports with Scripting and Custom Integrations
Advanced VSClinical Reports with Scripting and Custom IntegrationsGolden Helix
 
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...gagravarr
 
The Definitive ABM Success Guide
The Definitive ABM Success GuideThe Definitive ABM Success Guide
The Definitive ABM Success Guideskidder8
 
Definitive ABM Success Guide - Azalead
Definitive ABM Success Guide - AzaleadDefinitive ABM Success Guide - Azalead
Definitive ABM Success Guide - AzaleadAzalead
 
The Definitive ABM Success Guide from the Account-Based Marketing Consortium
The Definitive ABM Success Guide from the Account-Based Marketing ConsortiumThe Definitive ABM Success Guide from the Account-Based Marketing Consortium
The Definitive ABM Success Guide from the Account-Based Marketing ConsortiumDemandbase
 
rahul cv modified
rahul cv modifiedrahul cv modified
rahul cv modifiedRahul Patil
 
MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...
MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...
MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...MongoDB
 
Cave health
Cave health Cave health
Cave health polla1
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMETyrone Grandison
 
CDISC Related Services
CDISC Related ServicesCDISC Related Services
CDISC Related ServicesIstvan Janosi
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Databricks
 
Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...
Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...
Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...Amazon Web Services
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...DataKitchen
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamSenturus
 

Ähnlich wie Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy (20)

Centralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical DevelopmentCentralizing Data to Address Imperatives in Clinical Development
Centralizing Data to Address Imperatives in Clinical Development
 
HEALTHCARE ANALYTICS IN CLOUD
HEALTHCARE ANALYTICS IN CLOUDHEALTHCARE ANALYTICS IN CLOUD
HEALTHCARE ANALYTICS IN CLOUD
 
There's Gold in Them Thar Data
There's Gold in Them Thar DataThere's Gold in Them Thar Data
There's Gold in Them Thar Data
 
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
 
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
Databases on AWS: The Right Tool for the Right Job (DAT205-R1) - AWS re:Inven...
 
Advanced VSClinical Reports with Scripting and Custom Integrations
Advanced VSClinical Reports with Scripting and Custom IntegrationsAdvanced VSClinical Reports with Scripting and Custom Integrations
Advanced VSClinical Reports with Scripting and Custom Integrations
 
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...
 
The Definitive ABM Success Guide
The Definitive ABM Success GuideThe Definitive ABM Success Guide
The Definitive ABM Success Guide
 
Definitive ABM Success Guide - Azalead
Definitive ABM Success Guide - AzaleadDefinitive ABM Success Guide - Azalead
Definitive ABM Success Guide - Azalead
 
The Definitive ABM Success Guide from the Account-Based Marketing Consortium
The Definitive ABM Success Guide from the Account-Based Marketing ConsortiumThe Definitive ABM Success Guide from the Account-Based Marketing Consortium
The Definitive ABM Success Guide from the Account-Based Marketing Consortium
 
rahul cv modified
rahul cv modifiedrahul cv modified
rahul cv modified
 
MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...
MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...
MongoDB World 2018: A Journey to the Cloud with Fraud Detection, Transactions...
 
Cave health
Cave health Cave health
Cave health
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHME
 
CDISC Related Services
CDISC Related ServicesCDISC Related Services
CDISC Related Services
 
Sdl use cases
Sdl use casesSdl use cases
Sdl use cases
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
 
Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...
Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...
Leadership Session: Accelerating Transformation in the Life Sciences (LFS201-...
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
 

Mehr von gagravarr

But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?gagravarr
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?gagravarr
 
The Apache Way
The Apache WayThe Apache Way
The Apache Waygagravarr
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?gagravarr
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?gagravarr
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...gagravarr
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!gagravarr
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-endgagravarr
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologiesgagravarr
 

Mehr von gagravarr (12)

But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?But we're already open source! Why would I want to bring my code to Apache?
But we're already open source! Why would I want to bring my code to Apache?
 
What's new with Apache Tika?
What's new with Apache Tika?What's new with Apache Tika?
What's new with Apache Tika?
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?How Big is Big – Tall, Grande, Venti Data?
How Big is Big – Tall, Grande, Venti Data?
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!The other Apache technologies your big data solution needs!
The other Apache technologies your big data solution needs!
 
Apache Tika end-to-end
Apache Tika end-to-endApache Tika end-to-end
Apache Tika end-to-end
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 

Kürzlich hochgeladen

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 

Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

  • 1.  © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Turning XML into XLS with Groovy Nick Burch
  • 2. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Turning XML to XLS, on the JVM,  without loosing your sanity, with Groovy! 2
  • 3. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness What is Groovy? 3  Groovy, now Apache Groovy, is a JVM based language  Optionally Typed, Dynamic Language  Many features inspired by Python, Ruby, SmallTalk  Java Friendly – Can use Java classes & libraries, but also  can donate classes back to be used in Java  Seemless integration with Java, similar syntax  A lot less boilerplate than Java! But Java is learning...
  • 4. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Fun with PDF XML 4  You can attach comments in a PDF  These have text, along with colour, and optionally some  standard Dublin Core­esque metadata  Best done with Acrobat or open source tools  Surprisingly popular with many business sectors,  Quanticate’s included, mostly for good reason!  Good news – Acrobat can export as XML as XFDF
  • 5. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness 5
  • 6. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Groovy and XML 6  http://groovy­lang.org/processing­xml.html  Very easy and light­weight way to start processing XML  Can initially treat XML as a big map, and access elements  just with dots  Multiple children of the same type treated as a list  Can access attributes as properties with the @ prefix  For simple XML, very short, succinct code
  • 7. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness And in code.... 7  XmlSlurper and XmlParser main 2 entry points def response = new     XmlSlurper().parseText(books) def firstAuthor = response.      value.books.book[0].author assert firstAuthor.text() ==      'Manuel De Cervantes' assert firstAuthor.@id == 1
  • 8. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Groovy and advanced XML 8  http://groovy­lang.org/processing­xml.html  However... Still have a full XML DOM on hand  Can run arbitrary DOM and XPath querys  Feels a bit like JQuery etc – friendly, friendly, advanced  selector, back to friendly again!  Call selector method, then find or findAll with a closure
  • 9. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Groovy and lists 9  http://docs.groovy­lang.org/next  /html/documentation/working­with­collections.html  Define simply inline, eg def stuff = [“1”,”b”,”Test”]  Run something for each, eg stuff.each { l   println l }→  Filter and transform with grep and collect methods  Can run any method on all entries with * syntax, eg stuff*.trim().sort().unique()
  • 10. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Groovy and Strings 10 http://docs.groovy­lang.org/latest/html/documentation/#all­strings  Can be regular Java strings, or GStrings with extra  methods and functionality, Groovy sorts this for you  Use tripple single quotes to force Java string  Use tripple quotes for multi­line strings  In GStrings, can interpolate with ${...} eg “Hello ${world}” or “1+3 is ${1+3}” or “List is {l.size()} big”
  • 11. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness PoC to process the XML 11  File xmlFile = new File(args[0])  def xml = new XmlSlurper().parse(xmlFile)  // Process all the Annotations  // ­ Find everything under /xfdf/annots/freetext  // ­ If that has no Subject, is the Domain  // ­ If that does, grab text from /contents­richtext/body/p  def freetext = xml.annots.freetext  println "There are ${freetext.size()} annotations"
  • 12. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness PoC to process the XML 12 def lastDomain = "N/A" freetext.each { ft ­>    def body = ft."contents­richtext".body    def content = body.text() // Get all text of entries of P etc under the body    if (ft['@page']) {       If (! ft['@subject'].isEmpty()) { println "P ${ft['@page']} ­ D ${lastDomain} ­ S ${ft['@subject']} ­ V ${content}"       } else {         lastDomain = content      }
  • 13. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Apache POI 13  https://poi.apache.org/  Pure Java library for reading and writing most Microsoft  Office file formats  Especially strong on SpreadSheets (XLS and XLSX)  More closely alligned to the file formats than the  applications, which can sometimes cause surprises (eg  where Excel doesn’t store what you thought it did...)
  • 14. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Simple XLS Exporter 14 Workbook wb = new HSSFWorkbook() Sheet s = wb.createSheet(“Variables”) dvars.eachWithIndex { v, idx ­>             Row r = s.createRow(idx)             r.createCell(0).setCellValue(dname)             r.createCell(1).setCellValue(v.variable)             r.createCell(2).setCellValue(v.pages.join(" "))             r.createCell(3).setCellValue(v.comments.join(" : ")) } (0..3).each { col   s.autoSizeColumn(col) }→ (new File(“output.xls”)).withOutputStream { out ­> wb.write(out) }
  • 15. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Fancy filterable headers 15 public static Sheet headerSheet(Workbook wb, String name, List<String> headers) {       CellStyle csHeader = makeHeaderStyle(wb, headerHeight)       Sheet s = wb.createSheet(name)       Row r = s.createRow(0)       r.setHeightInPoints(headerHeight+1)       headers.eachWithIndex { col, idx ­>          Cell c = r.createCell(idx)          c.setCellValue(col)          c.setCellStyle(csHeader)       }       s.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.size()­1))       return s }
  • 16. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness The Real Thing 16  Final Requirements were a bit more involved, and there  were more edge cases than initially expected...  XML Processing code: ~150 lines  XLS Export code: ~150 lines  I’m sure a Groovy expert could get that shorter without  affecting readability or maintainability!  Uses Gradle to fetch Apache POI, do single Shadow Jar
  • 17. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness 17
  • 18. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness 18
  • 19. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness 19
  • 20. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness 20
  • 21. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Learning Groovy 21  Groovy In Action book  http://groovy­lang.org/learn.html   Getting Started and Module Guides http://groovy­lang.org/documentation.html   StackOverflow questions  Groovy Docs, esp. for Java enhanced eg http://docs.groovy­lang.org/docs/groovy­2.4.13/html/groovy­ jdk/java/lang/String.html
  • 22. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness 22
  • 23. © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Any Questions? 23