Data quality and regulations are perpetual drivers for Data Governance and Stewardship solutions that systematically monitor the execution of data policy. And yet, there is a long road ahead to achieve Trust in Data. It is still a relatively unknown topic or comes with trauma from past failed attempts; there is no political framework with executive champions, leading to reactive rather than proactive behavior, and software support is marginal.
Data Governance and Stewardship requires automation of business semantics management at its nucleus, in order to achieve a wide adoption and confluence of Data Trust between business and IT communities in the organization.
In this lecture, we start by reviewing 'C' in ICT and reflect on the dilemma: what is the most important quality of data: truth or trust? We review the wide spectrum of business semantics. We visit the different phases of data pain as a company grows, and we map their situation on this spectrum of semantics.
Next, we introduce the principles and framework for business semantics management to support data governance and stewardship focusing on the structural (what), processual (how) and organizational (who) components. We illustrate with stories from the field.
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Business Semantics for Data Governance and Stewardship
1. Business Semantics
For Data Governance & Stewardship
Dr. Pieter De Leenheer
Sloan Hall
Stanford University
Feb 4 - 2015
2. Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
4. La Trahison des Images (2)
https://deleenheer.wordpress.com/2009/12/15/magrittes-flirting-with-semantics/
5. What we talk about when we talk about
no Data Governance
Who approved this?
I wish these guys
spoke our
language
I can’t understand
this report !
I’ve never seen this
code! Who
introduced this ?
This doesn’t seem
right. Are we sure
this data is correct ?
The Problem
This rule is
different in our
country !
This is an exception
to the rule !
6. Glossary Search
• How frequently do you look up a word for your
business?
• To what purpose?
– Clarification
– Differentiation
• What are your main sources?
• Hierarchy-based navigation or key-word based
search?
• Authoritative Truth or trust?
7. From Truth to Trust: Behind the Curtains
https://www.research.ibm.com/visual/projects/history_flow/results.htm
8. Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
9. Spectrum of Business Semantics
Welty, C., Lehmann, F., Gruninger, G., and Uschold, M. (1999). Ontology: Expert systems all over again? In Invited panel at AAAI-99: The National
Conference on Artificial Intelligence, Austin, Texas, USA.
10. The Big ‘Metadata’ Bang
Catalogue and text files
• The start of an organization’s data management
• Represented by shared folders with lists of things such as product,
customer, templates
• First ‘clouds’ of metadata
– Naturally emerge as by-product
– For human consumption
– Locally understood
• From this point exponential
expansion:
• in volume
• in consumers (receiver)
• in producers (sender)
• in entropy
11. Glossary
• List of terms and definitions
e.g., http://web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/data-governance-and-stewardship-materials/
13. Taxonomy
• Formalized representation of a “thesaurus”
• Generalize and specialize properties and relations
– generalize Vendor and Customer with similar properties into
Party
– specialize Location into Home Address and Office Address
because of different properties
• Classifying a thing as a Term, Data Element or System
– E.g., “customer” vs. “CUST_TBL” vs. “CRM” to determine
ownership
• Inheritance-based reasoning such as syllogisms
– Premise: “John doe” is a lead
– Premise: All leads receive a mortgage offering
– Conclusion : “John Doe” receives a mortgage offering
15. Logical constraints
• Modal Logic:
– context determines meaning, truthfulness, validity
– plausibility vs. necessity
• Modalities determine:
– who owns a term per region, process, function
– where and how enforce terms
– What the definition is of a term
18. Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
19. Situating an organization’s level of
glossary need
size characterizing events business needs technology support status
1 to 50
first term-and-condition templates,
first products, customers
a catalogue of items like customers, products and
offerings spreadsheet database
51 to 100
first customer segmentation
lead engine setup
business functions defined
as the catalogues grow in size, transform loose
descriptions and definitions in text files into a glossary
of terms
shared file folders (for lead, prospect, customer,
product, offering)
101 to 500
business functions populated
inter-functional business processes
develop
product and customer data volumes
grow
the need for a thesaurus for comparing glossaries,
differentation of customer types, pricing models,
reporting templates
local data analytics and storage
Spreadsheet, mediawiki, functional processes like
salesforce, SDLC, servicenow; forecasting tools,
reporting tools, databases
501 to 1000
invested growth
mergers and acquisition take place
first signs of corrupt data reports on
the board table
the need to transforming thesauri into taxonomies
and data models and architecture frames
ISO/ACORD/BCBS standardization
mediawikis go viral without proper alignemnt between
them; first metadata tools in IT to align certain
functions, business limited to spreadsheets
1001 plus
global operations
one or more red flaggs: legal
(regulatory compliance breached):
organizational (CxO fired), bad
reputation (fraud), financial loss
(penalties, debt)
Reporting standards transformed into corporate data
policies and rules and data quality
Modalities as to who are to define them and how and
where to enforce them have been set
The need for the CDO function is mentioned but
resistance from CIO/CTO
Big Data opportunities loom beyond the data nebula
(screen with universe).
platform with several data management systems (infa,
ibm, oracle) scared by M&A.
Lineage fragmented, not properly validated by business
data governance organization theorized (or failed
before) so no one takes accountability, lack of
functional descriptions or enterprise-wide
championship
Glossaries’ usefulness implodes as their numbers
increase
The enterprise data model is common ground for IT
but useless to the business. Validation is urgent.
20. Overview
• ICT: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
21. Principles of Business Semantics
• Democracy
• Emergence
• Perspective rendering
• Perspective unification
• Validation
http://www.academia.edu/874733/Business_semantics_management_A_case_study_for_competency-centric_HRM
22. Principles at work in the Situation Map
• Emergence is a continuous principle at work
• Unification and rendering continuous in flux but
at two different frequencies (B vs. IT)
• Validation is limited to technical lineage
• Democracy and Business Validation (socio-
technical) are lacking
• Reactive rather than pro-active governance
(defining) and stewardship (enforcing)
• Lack of tools
23. Overview
• Communication: from Truth to Trust
• The Spectrum of Business Semantics
• Situation Map
• Business Semantics Governance & Stewardship
– Principles
– Operating Framework
• Reflection and Questions
24. Gradually Build Trust
based on Stewardship and Validation
• What?
– Qualitative meta data: e.g., definition for
address, codes, mappings, classifications, etc.
• Who?
– Roles and responsibilities for people
• How ?
– Collaborative workflows to orchestrate
people in achieving high-quality meta-data
– Start Simple, Buy-in, Council
– Measure Maturity and Trust
– Separate stewardship from integration
Data Governance Council: Governance Operating Model
Roles &
Responsibilities
Processes &
Workflow
Asset Types &
Traceability
Data Governance
Organization
Data Stewardship Activities
Data Quality
Development
IT / Operational Data Management Activities
Data
Modeling
Metadata
Lineage
Establishes& drives
Aligns& Coordinates
Reports& Escalates
Monitors& Remediates
Metadata
Scanning
Reference Data
Authoring
Data
Integration
Hierarchy
Management
Business &
Data Definitions
Business
Traceability
Semantic
Modeling
Mapping
Specifications
Policy
Management
Business
Rules
Data Quality
Rules
Data Quality
Reporting
Issue
Management
Reference Data
Crosswalks
Master Data
Stewardship
Data Quality Profiling
DQ Defect
Resolution
...
26. Global Data Governance
• Objective
– n Enterprise service buses => 1 Global Information Market Place
• Challenges
– Data Service = data sharing agreement across organization silos, policies,
regulations, semantic assumptions. E.g., Address
– No clear balance between data ownership and control:
• responsibilities are not set
• for each data point : increasing exposure to risk regarding quality and policy
compliance
• Service is more about trust because truth is relative
34. What is to be governed?
Data Governance Questions
• What does the term ”address” mean?
• How is term “address" represented?
• In what system are data elements on ”address”
recorded?
• What views does a data sharing agreement include?
• To which policy does my data sharing agreement
comply?
• What country is my term “address” classified?
• …
Collibra Traceability Paths
Term has attributes definition, description, etc.
Term is represented by Data Element
Data Element has system of record System
Data sharing Agreement groups Data View
…
Business Term
≠
Data Element
https://compass.collibra.com/display/COOK/Asset+Types+and+Traceability+Requirements
38. How is it to be governed?
• Status Types and Workflows
– For Domains, Terms, Users, and later for Issues and Data Sharing
Agreements
BUSINESS SEMANTICS GLOSSARY
Candidate In Progress
Under Review
Accepted In Revision
Rejected
Term requested on
the domain page
11
1
2
2
3
3
2
3
Depricated
4
5
Workflows
1
2
Propose Business Term
Edit Business Term
3 Onboarding Business Term
4 Deprecate Business Term
5 Reactivate Business Term
https://compass.collibra.com/display/COOK/Lifecycle%3A+Workflows+and+Status+Types
42. Questions for the Audience
We presume the starting point is glossary.
• What factors would make it impossible?
• Know of cases where it has been achieved without?
• Is it possible to establish data governance without a glossary?
Hinweis der Redaktion
In 2009 I published my dissertation entitled community-based ontology evolution, principles of business semantics management
That was one year after we founded Collibra as a spinoff of the lab where I did my research between 2003 and 2008.
At the time of writing my dissertation I only had 2 validations in two industries: HR competency management and automotive industry
Much of my work would provide the foundation for our tools and methods for DG
6 years later we have more than 50 customers and I it’s the right time to dig up the theory and see its still valid in what we do in what we call data governance
Magritte playfully illustrates the semantic dimensions of our perceived reality when we try to communicate it. Basically semiotics introduces an indetermination between a sign and how to interpret it.
Is this a pipe ? Or is it just an image of a pipe ? Or more is it a projection of the image of a pipe in our eye?
Magritte offered us 18 guidelines on how to interpret his semantic puzzles paintings. However, a surrealist artist by principle leaves more than enough room for interpretation. However, when dealing with sensitive data it has to be precise.
What does these terms mean?
How and where are there stored ?
What is the health of the data backing up these concepts?
DG = identify people, establish responsibility and operationalise processes.
No data governance does not mean data quality can be managed good. It is globalization and increased data service that makes quality and truth of data releative, and we more have to rely on trust
This problem of semiotics occurs to us every day from the moment we wake up reading signs, interpreting and explaining terms of a contract,
We too easily assume others that is those with the proper authority have carefully crafted the business semantics for us, so we can simply go to the search bar and enter a keyword. E.g., a dictionary, wordnet or wikipedia
Yet aren’t you surprised the search turns out too many results or you may feel of an upcoming disturbance in trusting the source?
Only when enough people are involved, perspectives are taken into account, a meaningful agreement can be reached
1. December 3, 2001 = The initial version of evolution, 526 words long, is posted by someone with the user name "Dmerrill." It offers links to pages for creationism and intelligent design but makes no mention of controversy.
2. July 13, 2002 = An anonymous user redefines evolution as "a controversial theory some scientists present as a scientific explanation." Within two hours, it is changed to read "the commonly accepted scientific theory."
3. October 1, 2002 = "Graft," shown in yellowish green, makes his debut. He will create 79 edits over three years and spend hours hashing out the content on discussion pages with pro- and antievolution editors. A biology grad student at Harvard University, Graft has edited more than 250 Wikipedia entries.
4. August 9, 2004 = A black line occurs whenever the entire entry is deleted by a vandal. (Entries are also defaced with nonsense or vulgarities.) Editing Wikipedia has become such a popular pastime that, even with more than 1 million entries, about half of all vandalisms are corrected within five minutes.
5. March 29, 2005 The entry reaches its longest point, 5,611 words. That evening, 888 words are excised, causing a cliff like drop in the graph. The deleted text, a cynical passage about creationists, was cut by proevolution editors who insist on a neutral point of view.
6. September 19, 2005 = A week before the intelligent design trial in Dover, Pennsylvania, begins, an edit war erupts when "Jlefler" writes that "a strong scientific and layman community advocate creationism." The phrase is removed or reapplied eight times in one hour, leaving a narrow yellow zigzag.
Catalogue: List of all countries
Glossary: list of terms + definitions
Thesaurus: add homo-, syno, mero-, hyper- and hyponymous relations.
Taxonomy:
generalize and specialize properties and relations
inheritance-based reasoning such as syllogisms
Frames
Modal Logic: context determines meaning, truthfulness, validit
Modalities on who owns a term per region, process, function
Modalities as to where and how enforce terms
The eye of the beholder: customer in sales vs. marketing
Other dimensions of Context (c,t) where c is a vector c=<d1,d,2,…>
E.g. We may specialize Location into Home address and office address because of different properties. (emergence). We may generalize concepts with similar properties such as vendor and customer.
-key objective: n Enterprise service buses => One Global Information Hub
-challenges:
-different semantic assumptions, policies, rules
-sharing is strictly controlled
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Set up the operating model for the business semantics glossary
Import all existing IBM BG content
Split out the true business glossary terms from (critical) data elements (address line vs. ADR_LIN) and deploy ther Data dcitionary accordingly
Build the REST integration
Load Policies and Rules
Set up the operating model for data sharing agreements(user, view, request, rule)
Integrate with the Hub
Extend User management with Worker Master
Rule the global data hub with Collibra
Insert image position BG as part of IG
We too easily assume others that is those with the proper authority have carefully crafted the business semantics for us, so we can simply go to the search bar and enter a keyword. E.g., a dictionary, wordnet or wikipedia
Yet aren’t you surprised the search turns out too many results or you may feel of an upcoming disturbance in trusting the source?
We too easily assume others that is those with the proper authority have carefully crafted the business semantics for us, so we can simply go to the search bar and enter a keyword. E.g., a dictionary, wordnet or wikipedia
Yet aren’t you surprised the search turns out too many results or you may feel of an upcoming disturbance in trusting the source?
Define as a list of terms. By now ihould be more than just a list of definitions (metadata) for terms but also for policies, rules, tables, systems, etc.– as we will see. Also it says nothing about how it comes to be.