4. Data Repository (Warehousing) Architecture Francesco Rizzo - Eurostat NSI Eurostat Pull Requestor eDAMIS Data Input SDMX Registry Intermediate storage Verification / Conversion To SDMX Received data in SDMX-ML Loader register Warehouse storage Eurobase query Dissemination XSL for SDMX-ML P U L L P U S H
6. PUSH & PULL We have normally two different approach to exchange data: PUSH and PULL
7. PUSH & PULL PUSH mode means that the data provider takes action to send the data to the party collecting the data. PULL mode implies that the data provider makes the data available via the Internet. The data consumer then fetches the data on his own initiative.
8.
9. SDMX SDMX promotes a “ data sharing ” model to facilitate low-cost, high-quality statistical data and metadata exchange. Data Providers publishes the availability of data/metadata to Data Consumers and the latter are responsible for fetching the data/metadata at will. .
17. Collective intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals The Collective Intelligence
18.
19. We are already in the future Thank You for your attention [email_address]
Hinweis der Redaktion
the Single Entry Point allows both push and pull methods. eDAMIS: is now able to recognise and deliver SDMX-ML files an SDMX-ML module for representing validation rules was developed and it is used by the eDAMIS validation engine. the conceptual extension to the SDMX-IM and its XML implementation will be submitted to the SDMX sponsoring organisations for conformance testing and approval. The pull approach concerns the following steps: Step 1: when new data are available, the NSI should: Create an SDMX-ML file containing the new data, or Do nothing if the NSI WS builds SDMX-ML messages upon request Step 2: the NSI should add a new feed entry, including an SDMX-ML Query message describing the new Dataset, to the NSI feed the Pull Requestor reads the new feed entry and: Retrieves the SDMX-ML file from the specified URL, if it resides in a URL, or Uses the Query Message included in the feed to query the NSI WS, if the data are prepared by the NSI WS The Pull Requestor forwards the SDMX-ML dataset to the rest of the modules within Eurostat production environment
UN Statistical Commission 02/2008: SDMX is recognized as “the preferred standard for exchange and sharing of data and metadata in the global statistical community” Further involving national and international statistical agencies Importance of capacity building and outreach (webpage, seminars, workshops, training…) More freely available IT tools……. The focus of SDMX is exchange and sharing of aggregated statistical data (also called macro data) rather than observations of individual statistical units (also called microdata) but in many cases it can be used for microdata. The basic structure of aggregated statistical data is a multidimensional table (also called a cube). Often, but not always, one of the dimensions is time (or reference time), i.e. the time periods or points in time to which the aggregate data refer. The time dimension is often equidistant, having a certain frequency, e.g. yearly or monthly. In such cases, the multidimensional table may be viewed as consisting of a number of time series, each of them being defined by one specific value (member) of each of the other dimensions. Data sharing based on the SDMX reduces the reporting burden of organisations. Allows them to publish data once and let their counterparties “pull” data and related metadata as required. This achieved with the availability of an abstract I nformation M odel capable of supporting any time-series and cross-sectional data, structural metadata and reference metadata (SDMX-IM) standardised XML schemas derived from the model (SDMX-ML) the use of web-services technology. Data sharing process is based on an architecture of central registry services. Registry services provide visibility into the data and metadata existing within the community and support the access and use of this data and metadata by providing a set of triggers for automated processing. The use of standards for all data, metadata, and the registry services themselves is ubiquitous, permitting a high level of automation within a data-sharing community.
Data sharing a group of partners agree on providing access to their data according to standard processes, formats and technologies Data providers can: notify the hub of new sets of data and corresponding structural metadata (measures, dimension, code lists, etc.); make data available directly from their systems through a querying system; Data users can browse the hub to define a dataset of interest via the above structural metadata; retrieve the dataset from the NSIs.