ILOSTAT, the new database of labour statistics, has been designed based on a number of key ideas among which the aim to reduce the overburden to data providers by supporting as many data channels as possible, to be metadata driven, and to adopt every possible standard, played a fundamental role.
With these in mind, we developed a bi-directional interface to allow the dissemination and collection of data and metadata from and to ILOSTAT through SDMX datasets and related artefacts.
The implementation project had to get over several issues, especially on the conceptual side.
In this presentation we are going to see how the Software architecture for the interface was defined, the concepts that conforms the ILOSTAT concepts scheme, how it deals with the Descriptive metadata, a crucial resource in ILOSTAT, the definition of the scope of the DSD, with its pros and cons, and the implementation of a virtual registry and versioning system.
(Presented at SDMX Global Conference 2013, Paris)
Strategies for Landing an Oracle DBA Job as a Fresher
SDMX interface for ILOSTAT
1. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
ILO Department of Statistics
Edgardo Greising
greising@ilo.org
1
2. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
I. Introduction
II. Design
III. Software Architecture
IV. Data Collection & Dissemination
V. Next Steps
2
3. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
SDMX has been “around” since 2002
LABORSTA’s information model drawbacks
Lack of resources
Waiting for the standard to mature
ILOSTAT design in 2010
New information model following SDMX COG
SDMX included as part of ILOSTAT project
ILOSTAT development in 2011
SDMX interface for data collection and
dissemination
3
4. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
ILOSTAT modules
Data collection
Data cleaning process
Data dissemination
Workflow control
Metadata
4
5. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
ILOSTAT concepts scheme
Dimensions
Collection
Country
Frequency
Survey
Represented Variable (OBS_VALUE)
Classification Type (1..6)
Time
Attributes
Note Types
Value Status
Unit of measure
Unit multiplier
Time format
5
6. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Data Structure Definition
Scope of the DSD
1. One general DSD?
Easy to maintain but huge and volatile
2. One DSD per topic? (~20 topics)
Still too big and volatile.
3. One DSD per indicator? (~100 ind - i.e. Employment by sex and age)
OK for dissemination
Too many useless entries in country specific code lists
4. One DSD per Questionnaire table (indicator + country)?
OK. But …
How to maintain ~100 ind x ~200 cou =
Solution: Virtual Registry & Versioning module
20.000 DSD’s
6
7. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Virtual Registry
Key factors:
1. ILOSTAT is metadata driven
2. ILOSTAT Information Model is very similar to SDMX
Information Model
All SDMX artifacts considered as «virtually»
existing.
The SDMX connector creates and delivers
«on-the-fly» any requested artifact
7
8. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Versioning
Automatic for data structures and related data flows
Version increases with any change in the structural
metadata (code lists, classification versions, required
notes, etc.)
Process:
The data structure is generated with the default 1.0 version and full references
The result is serialized to an in memory buffer and a SHA1 hash computed
The hash result is compared to the data stored in the database:
• If no existing hash exists, the new hash is stored and the version
initialized at 1
• If the hashes are equal, the current version is returned
• If the hashes differ, the version is incremented and the new hash stored
The generated version number is passed to the actual structure generation
process, to be included in the returning flow
8
9. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Descriptive metadata (metacontent)
ILOSTAT includes many notes at different levels
All the notes are coded and classified by Note_Type
Avoided MSD usage for simplification
Notes are included in the DSD/DF as coded attributes
Attachment level:
Currently: All notes attached at Observation_value level. Actual
level determined by attribute name.
Future: Notes attached at the proper level (format change req.)
Only for collection:
Special “Free_Text” note type allow for capturing non-coded
annotations
9
10. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Java EE application based on the following components:
SDMXsource
Oracle Application Development Framework (ADF)
ILOSTAT Taskflow Library (also used for the ILOSTAT Website)
10
11. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Dissemination
Standard SDMX RESTful API
(partial)
http://www.ilo.org/ilostat/sdmx/ws/rest/...
Collection
Triggered by an APEX interface
for a given file
11
12. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
Set up provision agreements
International Organizations
Countries’ NSO & MoL
Develop new interfaces
JSON
SDMX 2.1
SDMX-RI Gateway
End-user access tools
ILO Information & Knowledge Management Gateway
ILOSTAT country profile report
Grapher tool
Mobile
Excel add-in
Capacity building + Tools
12
13. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
E-mail:
greising@ilo.org
Skype:
egreising
Twitter:
egreising
LinkedIn:
http://www.linkedin.com/in/egreising
13
14. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.
E-mail:
greising@ilo.org
Skype:
egreising
Twitter:
egreising
LinkedIn:
http://www.linkedin.com/in/egreising
14
Editor's Notes
Introduction SDMX has been «around» the ILO Department of Statisticssince 2002.Difficulties to implementwere:LABORSTA’s information model wasvery hard to match the SDMX IM. Different DB schemas for each type of series, missing keys and multiple unstructured descriptive metadataweresome of the characteristics of LABORSTA system that made SDMX mappingtoodifficult.Lack of resources to undertake the project, and the existence of otherpriorities.The perception that the SDMX standard was not mature enough as to beadopted.ILOSTAT design started in 2010 and the new information model wasconceivedtakingintoconsideration SDMX Content Oriented Guidelines.At the same time, SDMX wasconsidered as an integral part of the ILOSTAT project.ILOSTAT developmentstarted in October 2011 and including the concept of multiple data channels for collection and dissemination, being SDMX one of them in both directions.As ILOSTAT is a metadatadriven information system, and its information model wasdesignedfollowing SDMX standard recommendations, the implementation of SDMX resulted in an interface for the data flowsestablishedfrom SDMX to ILOSTAT datawarehouse and vice-versa.
The system can be split into three main modules that maps with the three main stages of the data compilation process: Data collection,Data processing and Data dissemination. The Data collection module comprises the design and build of the data collection instruments, which vary according to the data channel to be used. Currently ILOSTAT collects data through Excel questionnaires and csv files. The SDMX connector will be released very soon and will allow for uploading data through SDMX data flows. An electronic questionnaire (on line web form) is in the roadmap for this year as well.Some of the activities in this stage include sending e-mails with questionnaires and reminders, answering questions from countries, uploading the data, etc. Once data is collected, regardless the mean used for, it is processed by an exhaustive consistency checking and correction process. At this stage is where some descriptive metadata (in the form of footnotes and annotations) is coded based on free text provided by the countries.Each weekend (or whenever the amount of data incorporated justifies it) an automatic process computes a set of derived or calculated indicators based on those indicators that have passed the consistency rules and are considered “ready for dissemination”. After this process, all these indicators (collected or calculated) are moved to the dissemination database. The last module comprises the tools for data dissemination. It includes a dynamic website, data download in csv, pdf and Excel formats and SDMX web services (very soon). The Workflow control module tracks the evolution of the questionnaires and questionnaires’ tables through the overall process, and the Metadata module provides the tools and procedures for general metadata maintenance.
Indicator is the combination of one Represented variable broken down by one or more classifications.The Represented variable concept is assumed by the OBS_VALUE concept, the primary measureNote Types attributes can be Mandatory or Conditional, depending on structural metadata information in ILOSTAT.
Introduction SDMX has been «around» the ILO Department of Statisticssince 2002.Difficulties to implementwere:LABORSTA’s information model wasvery hard to match the SDMX IM. Different DB schemas for each type of series, missing keys and multiple unstructured descriptive metadataweresome of the characteristics of LABORSTA system that made SDMX mappingtoodifficult.Lack of resources to undertake the project, and the existence of otherpriorities.The perception that the SDMX standard was not mature enough as to beadopted.ILOSTAT design started in 2010 and the new information model wasconceivedtakingintoconsideration SDMX Content Oriented Guidelines.At the same time, SDMX wasconsidered as an integral part of the ILOSTAT project.ILOSTAT developmentstarted in October 2011 and including the concept of multiple data channels for collection and dissemination, being SDMX one of them in both directions.As ILOSTAT is a metadatadriven information system, and its information model wasdesignedfollowing SDMX standard recommendations, the implementation of SDMX resulted in an interface for the data flowsestablishedfrom SDMX to ILOSTAT datawarehouse and vice-versa.