Session IV - Quality issues and generalized business processes - Bruno, Infante, Rocco, Scannapieco, On the Design and Implementation of a Generalized Process for Business Statistics - Pierre Lavallée, Discussion
Ähnlich wie Session IV - Quality issues and generalized business processes - Bruno, Infante, Rocco, Scannapieco, On the Design and Implementation of a Generalized Process for Business Statistics - Pierre Lavallée, Discussion
Ähnlich wie Session IV - Quality issues and generalized business processes - Bruno, Infante, Rocco, Scannapieco, On the Design and Implementation of a Generalized Process for Business Statistics - Pierre Lavallée, Discussion (20)
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Session IV - Quality issues and generalized business processes - Bruno, Infante, Rocco, Scannapieco, On the Design and Implementation of a Generalized Process for Business Statistics - Pierre Lavallée, Discussion
1. On the Design and Implementation
of a Generalized Process for
Business Statistics
BY BRUNO, INFANTE, RUOCCO AND SCANNAPIECO
DISCUSSION
PIERRE LAVALLÉE
ROMA, 19-20 NOVEMBRE 2018
1
2. Content
1. Context
2. Questions and issues
3. GSBPM applied at Istat
4. Generalized Process for Business Statistics
5. GPBS Use Cases
6. Some answers to questions and issues
7. Conclusion
2
3. 1. Context
Since 2014, Istat is involved in a significant revision of the
statistical production
Main objective: “enrich the supply and quality of the
information produced, while improving the effectiveness
and efficiency of the overall activity”
Generalized Process for Business Statistics (GPBS):
Designing and implementing a standardized system to support
business statistics
Integrating in a single environment the different steps of business
surveys
3
4. 1. Context (cont’d)
Generic Statistical Business Process Model (GSBPM):
Describes and defines the set of business processes needed to
produce official statistics
Provides a standard framework and harmonized terminology to
help statistical organizations to share methods and components,
within and between different surveys
4
5. 2. Questions and issues
1. How can we manage the integration of ad-hoc legacy applications in a standard
environment built on generalized functionalities? How to cope with data processing
parameters edited by end user at runtime, through graphical interactive interfaces?
2. How can we implement a user-friendly workflow, easy to configure and manage by
all type of users, despite their knowledge of software applications? Which is the most
suitable environment?
3. As the process chain has a low degree of complexity, is it better to develop an in-
house solutions or to evaluate a commercial product. (e.g. enterprise service bus,
orchestrator, etc.) to manage the workflow?
4. The generalized statistical production process can be potentially abstracted as a
“datadriven” workflow, in line with the scientific nature of Istat’s processes. Can we
rely on the characteristics of data-driven workflows for designing and implementing
more efficient solutions for the GPBS?
5. Concerning the integration of various statistical services in the workflow, how to
design structural metadata to describe the data structures passed as input/output
to services and/or accessed as central data repository?
6. We do have process metadata describing for instance the process flow that could
have an “active” way in being interpreted for the execution. We do however have
also other kinds of process metadata related for instance to the naming of services
and processes themselves. How do we manage such kinds of process metadata?
Questions already answered in May 2018…
5
6. 3. GSBPM applied at Istat
GPBS project uses the GSBPM framework to standardise:
methods and tools for some phases of the statistical production
process;
the workflow to manage the different phases of the production process
Current situation:
Highly customized methods and tools
Some processes tied to specific persons
“The strong connection between people and processes is a barrier
towards sharing and standardization.”
Consequences:
1. Duplicated work and inefficiencies in the production process
2. No reuse of tools and competencies
6
7. 3. GSBPM applied at Istat (cont’d)
First step:
Subset of GSBPM sub-processes, focusing on the Process and Analyze phases and
their interdependencies
These sub-processes corresponds to the standard methodological tasks
7
8. 3. GSBPM applied at Istat (cont’d)
2nd step:
Subset of Business Statistics surveys (5 short term statistics and 3
structural business statistics) to start the modelling phase
For each selected surveys:
1. Interviewed the referring person, to obtain current description of the
processes
2. Modelled and designed the future integrated system built with
generalized tools, supporting the different process steps
3. Missing item: Discussion with referring persons on implementation
steps. Necessary to get the “buy in” of these persons/projects
8
9. 3. GSBPM applied at Istat (cont’d)
Preliminary result:
To reduce duplications, need to harmonise GSBPM sub-processes: ‘Review &
validation’ and ‘Edit & impute’
9
10. 4. GPBS
Concepts:
Data Service
Process step
Statistical Service: a software providing one or more statistical business
functions (extract sample, calculate weights, perform error checking,
etc.) that can be invoked as a service
→ Performs the methodological tasks
Orchestrator
Metadata Management:
Structural metadata: data set name, variable names, variable types, unit
identifier, classification identifier, etc.
Referential metadata: information objects necessary to run the process
Process Metadata: No formal definition?…
10
11. 4. GPBS (cont’d)
Assessment of GPBS performances:
Identification of indicators to assess (i) the efficiency and (ii) the
quality improvement before and after the adoption of GPBS
tools
Main expected outcomes:
Decrease of manual revisions, overlapping steps and overediting
“The reduction of the time lag to produce the final estimates should
encourage the survey managers to use GPBS tools, instead of their
own procedures”
Not necessarily the case for repeated surveys. We would rather say
“The reduction of implementation costs and time to produce […]”.
11
12. 4. GPBS (cont’d)
Assessment of GPBS performances (cont’d):
Target improvements to measure related to the six dimensions of
the ESS Quality Assurance Framework, especially accuracy,
reliability, coherence and comparability
In case improvements are difficult to assess by explicit indicators:
collect survey staff feedbacks and opinions concerning specific
topics
Likely to happen in several survey steps
At the end, there might not be several real measurable
improvements, but rather a “general improvement” of methods,
tools and workflow…
12
13. 5. GPBS Use Cases
5.1 Detection of influential suspicious units
GSBPM sub-process: “Review & validate”
Statistical Services can be invoked either in a
synchronous or asynchronous mode
Synchronous: “Not suitable for huge amount of data”
Asynchronous: “Suitable for large amount of data”
→ Why?
13
14. 5. GPBS Use Cases (cont’d)
5.2 Manual revision of influential suspicious units
Orchestration
“Assuming the output of the influential values detection
[…] is a subset of units to be manually reviewed, the
survey staff may:
1. Use the available information (other sources or historical survey
data) to check the coherence of survey data;
2. Contact the respondents to verify collected data.”
The 2nd step should rather be:
2. If time and budget is sufficient, contact the respondents to
verify collected data. Otherwise, flag data to be imputed
14
15. 5. GPBS Use Cases (cont’d)
5.2 Manual revision of influential suspicious units (cont’d)
In the technical remarks, we have:
“The Process Orchestrator should perform the
following tasks: Avoid overlapping of different users
revisions (transactional support in coordination with
the RDBMS);“
→ What means RDBMS?
15
16. 5. GPBS Use Cases (cont’d)
5.3 Business demographic events
GSBPM sub-process “Create frame & select sample”
“It would be useful to store and centrally manage all changes
transmitted by the respondents of different surveys, in order to
reduce the time gap between the Register and survey data.”
Good point
However, need to keep in mind that a repeated overlapping survey
cannot be its only source of update on the Business Register. This
can create serious biases in the results of such surveys
In practice, this problem seldom occur because of several various
update sources on the Business Register
16
17. 5. GPBS Use Cases (cont’d)
5.3 Business demographic events (cont’d)
Different update schedules:
Immediate updates for short-term surveys
Different update schedules for structural statistics
Adopted solution: hold two versions of key variables
“Current” version: updated as soon new information is received
“Frozen” version: stable for a certain period and is updated from the
current version periodically
Challenge: “design the most appropriate data architecture to
standardize the link between the ‘current’ version and the ‘frozen’
one.”
17
18. 5. GPBS Use Cases (cont’d)
5.3 Business demographic events (cont’d)
“Frozen” version:
Survey specific
Corresponds to the extracted frame for each survey
Statistical Service:
1. Update “current” version
2. Quality check before updating “frozen” version
3. Update “frozen” version
This approach is likely to create coherence and scheduling
problems
Ex: Once the “frozen” version is updated, how a given survey can go
back for specific tasks such as nonresponse adjustments or calibration?
18
19. 5. GPBS Use Cases (cont’d)
5.3 Business demographic events (cont’d)
Proposed approach:
1. Always have a “current” version of the Business Register
2. As part of the frame creation process of a specific survey,
extract a “frozen” version (frame) from the Business Register
3. Keep the “frozen” version during the whole survey process
4. Update the “current” version of the Business Register using survey
feedback (being careful with possible bias problems)
5. Extract a new “frozen” version (frame) for the next occurrence
of the survey
19
20. 5. GPBS Use Cases (cont’d)
5.3 Business demographic events (cont’d)
Simple approach that solves the coherence and
scheduling problems
No need to “evaluate the best timing that reduces the trade-off
between timeliness and accuracy of the stored information and
statistical outputs to be built from the Register.”
“The second issue concerning the workflow, arises from the need
to inform and prioritize the different stakeholders about data
changes and updates.”
The stakeholders only need to be informed about changes and
updates between the two “frozen” versions (i.e., two frames) for
their two surveys occurrences
20
21. 6. Some answers to questions and issues
May come back to questions 5 and 6…
5. Concerning the integration of various statistical services in the
workflow, how to design structural metadata to describe the data
structures passed as input/output to services and/or accessed as
central data repository?
SAS® already have PROC options that are more or less “structural
metadata”. They can maybe be used as examples for the GPBS
Ex: Some generalized software developed by Statistics Canada looks like SAS®
PROC.
Look at the development of other members of the UNECE - High-level
Group on the Modernisation of Official Statistics. The Common Statistical
Production Architecture (CSPA) can help in standardising calls of other
software (statistical services)
21
22. 6. Some answers to questions and issues
(cont’d)
6. We do have process metadata describing for instance the process
flow that could have an “active” way in being interpreted for the
execution. We do however have also other kinds of process
metadata related for instance to the naming of services and
processes themselves. How do we manage such kinds of process
metadata?
Again, the SAS® PROC can be used as examples of “process
metadata” for the GPBS
22
23. 7. Conclusion
Significant developments in the GPBS
Clearer view of the specific processes and workflows of
business statistics in accordance to the GSBPM
Key step: Discussion with referring persons on
implementation steps. Necessary to get the “buy in” of
these persons
23