Slides of presentation given at EuroGeographics KEN workshop on INSPIRE Data Harmonization, Paris oct 8-9, 2013: http://www.eurogeographics.org/event/inspire-ken-schema-transformation-workshop. Describes the Stetl ETL framework and cases of INSPIRE transformation. There is a video recording of this presentation: https://www.youtube.com/watch?v=vjdpYBm4AaM (first about XSLT and about halfway on Stetl for INSPIRE)
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Stetl for INSPIRE Data Transformation
1. INSPIRE Transformation with Stetl
-
A lightweight Python Framework
for Geospatial ETL
Just van den Broecke
EuroGeographics - KEN Workshop
Paris, Oct 8, 2013
www.justobjects.nl
2. About Me
Independent Open Source Geospatial Professional
Secretary OSGeo Dutch Local Chapter
Member of the Dutch OpenGeoGroep
Just van den Broecke
just@justobjects.nl
www.justobjects.nl
27. From Local National Data
to INSPIRE DL Services
Source
<GML>
NLExtract
Stetl
deegree
WFS
INSPIRE
<GML>
Atom
Feed
INSPIRE
Addresses
Dutch
Addresses+
Buildings
deegree
blobstore
Stetl
49. Example: XsltFilter Python
from util import Util, etree
from filter import Filter
from packet import FORMAT
log = Util.get_log("xsltfilter")
class XsltFilter(Filter):
# Constructor
def __init__(self, configdict, section):
Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
self.xslt_file_path = self.cfg.get('script')
self.xslt_file = open(self.xslt_file_path, 'r')
# Parse XSLT file only once
self.xslt_doc = etree.parse(self.xslt_file)
self.xslt_obj = etree.XSLT(self.xslt_doc)
self.xslt_file.close()
def invoke(self, packet):
if packet.data is None:
return packet
return self.transform(packet)
def transform(self, packet):
packet.data = self.xslt_obj(packet.data)
log.info("XSLT Transform OK")
return packet
50. [etl]
chains = input_xml_file|my_filter|output_std
[input_xml_file]
class = inputs.fileinput.XmlFileInput
file_path = input/cities.xml
# My custom component
[my_filter]
class = my.myfilter.MyFilter
[output_std]
class = outputs.standardoutput.StandardXmlOutput
class MyFilter(Filter):
# Constructor
def __init__(self, configdict, section):
Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc,
produces=FORMAT.etree_doc)
def invoke(self, packet):
log.info("CALLING MyFilter OK!!!!")
return packet
Your Own Components
Stetl concepts
Step 1- Define Class
Step 2- Config Class
51. Data Structures
Stetl concepts
• Components exchange Packets
• Packet contains data and status
• Data formats, e.g. :
xml_line_stream
etree_doc
etree_element (feature)
etree_element_array
string
any
.
.
53. Cases - The Netherlands
•INSPIRE Download Services
publish to deegree store (WFS)
generate GML files (for Atom Feed)
•National GML Datasets
GML to PostGIS (Top10NL, BGT)
54. [etl]
chains = input_sql_pre|schema_name_filter|output_postgres,
input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr,
input_sql_post|schema_name_filter|output_postgres
# Pre SQL file inputs to be executed
[input_sql_pre]
class = inputs.fileinput.StringFileInput
file_path = sql/drop-tables.sql,sql/create-schema.sql
# Post SQL file inputs to be executed
[input_sql_post]
class = inputs.fileinput.StringFileInput
file_path = sql/delete-duplicates.sql
# Generic filter to substitute Python-format string values like {schema} in string
[schema_name_filter]
class = filters.stringfilter.StringSubstitutionFilter
# format args {schema} is schema name
format_args = schema:{schema}
[output_postgres]
class = outputs.dboutput.PostgresDbOutput
database = {database}
host = {host}
port = {port}
user = {user}
password = {password}
schema = {schema}
# The source input file(s) from dir and produce gml:featureMember elements
[input_big_gml_files]
class = inputs.fileinput.XmlElementStreamerFileInput
file_path = {gml_files}
element_tags = featureMember
Top10NL Extract
Parameter
Substitution
57. Cases - INSPIRE Transforms
•Simple: Dutch Admin Borders to AU
•Advanced: Dutch Addresses to AD
58. INSPIRE - XSLT STRUCTURE
Local CP GML
to
INSPIRE SpatialDataset
Local CP GML
to
INSPIRE GML
Generate
CP INSPIRE GML
Reusable
XSLT ScriptsReusable
XSLT Scripts
Theme CP
Local AU GML
to
INSPIRE SpatialDataset
Local AU GML
to
INSPIRE GML
Generate
AU INSPIRE GML
Theme AU
Local GN GML
to
INSPIRE SpatialDataset
Local GN GML
to
INSPIRE GML
Generate
GN INSPIRE GML
Theme GN
Called by All
Locally
Specific XSL
Generic
XSL
XSLT Template Call
59. XSLT - 3 MAIN STEPS/SCRIPTS
1.Generate Spatial Dataset GML Container (specific)
2.Extract data values from local OGR simple feature data (specific)
3. Call XSLT template per Theme Feature type (generic)
67. Project Status - Sept 21, 2013
• v1.0.4 installable via PyPi
• Documentation on www.stetl.org
• Real world transforms done
• Seeking feedback, support and
contributors