SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Digital Enterprise Research Institute                                         www.deri.ie




                              Wikipedia (DBpedia):
                           Crowdsourced Data Curation
                      Edward Curry, Andre Freitas, Seán O'Riain




 ed.curry@deri.org
 http://www.deri.org/
 http://www.EdwardCurry.org/
 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Speaker Profile
Digital Enterprise Research Institute                                                www.deri.ie



            Research Scientist at the Digital Enterprise Research
             Institute (DERI)
                   Leading international web science research organization
            Researching how web of data is changing way business
             work and interact with information
                   Projects include studies of enterprise linked data, community-
                    based data curation, semantic data analytics, and semantic
                    search
                   Investigate utilization within the pharmaceutical, oil &
                    gas, financial, advertising, media, manufacturing, health
                    care, ICT, and automotive industries
            Invited speaker at the 2010 MIT Sloan CIO Symposium
             to an audience of more than 600 CIOs
Overview
Digital Enterprise Research Institute                    www.deri.ie



            Curation Background
                   The Business Need for Curated Data
                   What is Data Curation?
                   Data Quality and Curation
                   How to Curate Data


            Wikipedia (DBpedia) Case Study

            Best Practices from Case Study Learning
The Business Need
Digital Enterprise Research Institute                              www.deri.ie



               Knowledge workers need:
                   Access              to the right information
                   Confidence              in that information


               Working incomplete
                inaccurate, or wrong
                information can have
                disastrous consequences
The Problems with Data
Digital Enterprise Research Institute                                           www.deri.ie



          Flawed Data
             Effects   25% of critical data in world‟s top companies
                 (Gartner)

          Data Quality
             Recent               banking crisis (Economist Dec‟09)
             Inaccurate   figures made it difficult to manage operations
                 (investments exposure and risk)
                    –   “asset are defined differently in different programs”
                    –   “numbers did not always add up”
                    –   “departments do not trust each other‟s figures”
                    –   “figures … not worth the pixels they were made of”
What is Data Curation?
Digital Enterprise Research Institute                                    www.deri.ie


        Digital Curation
            Selection,    preservation, maintenance, collection, and
                archiving of digital assets

        Data Curation
            Active             management of data over its life-cycle

        Data Curators
            Ensure     data is
                trustworthy, discoverable, accessible, reusable, and fit for
                use
                   – Museum cataloguers of the Internet age
What is Data Curation?
Digital Enterprise Research Institute                              www.deri.ie




            Data Governance
                Convergence    of data quality, data
                    management, business process management, and
                    risk management

            Data Curation is a complimentary activity
                Part   of overall data governance strategy for
                    organization

            Data Curator = Data Steward ??
                   Overlapping terms between communities
Data Quality and Curation
Digital Enterprise Research Institute                                               www.deri.ie



            What is Data Quality?
                Desirable              characteristics for information resource
                Described              as a series of quality dimensions
                       – Discoverability, Accessibility, Timeliness, Completeness, Inte
                         rpretation, Accuracy, Consistency, Provenance & Reputation

            Data curation can be used to improve these
             quality dimensions
Data Quality and Curation
Digital Enterprise Research Institute                                    www.deri.ie



            Discoverability & Accessibility
                Curate    to streamline search by storing and classifying
                    in appropriate and consistent manner

            Accuracy
                Curate     to ensure data correctly represents the “real-
                    world” values it models

            Consistency
                Curate      to ensure data created and maintained using
                    standardized definitions, calculations, terms, and
                    identifiers
Data Quality and Curation
Digital Enterprise Research Institute                                                www.deri.ie




            Provenance & Reputation
                Curate                 to track source of data and determine reputation
                Curate                 to include the objectivity of the source/producer
                       – Is the information unbiased, unprejudiced, and impartial?
                       – Or does it come from a reputable but partisan source?




                       Other dimensions discussed in chapter
How to Curate Data
Digital Enterprise Research Institute                               www.deri.ie




            Data Curation is a large field with sophisticated
             techniques and processes

            Section provides high-level overview on:
                Should                 you curate data?
                Types             of Curation
                Setting                up a curation process


               Additional detail and references available in book
               chapter
Should You Curate Data?
Digital Enterprise Research Institute                                              www.deri.ie




            Curation can have multiple motivations
                Improving                accessibility, quality, consistency,…

            Will the data benefit from curation?
                Identify               business case
                Determine                if potential return support investment

            Not all enterprise data should be curated
                Suits   knowledge-centric data rather than transactional
                    operations data
Types of Data Curation
Digital Enterprise Research Institute                        www.deri.ie



            Multiple approaches to curate data, no single
             correct way
                Who?
                       – Individual Curators
                       – Curation Departments
                       – Community-based Curation
                How?
                       – Manual Curation
                       – (Semi-)Automated
                       – Sheer Curation
Types of Data Curation – Who?
Digital Enterprise Research Institute                                                 www.deri.ie




            Individual Data Curators
                Suitable               for infrequently changing small quantity of
                    data
                       – (<1,000 records)
                       – Minimal curation effort (minutes per record)
Types of Data Curation – Who?
Digital Enterprise Research Institute                                             www.deri.ie


            Curation Departments
                Curation     experts working with subject matter experts
                    to curate data within formal process
                       – Can deal with large curation effort (000‟s of records)

            Limitations
                Scalability: Can struggle with large quantities of
                    dynamic data (>million records)
                Availability:  Post-hoc nature creates delay in curated
                    data availability
Types of Data Curation - Who?
Digital Enterprise Research Institute                                    www.deri.ie



            Community-Based Data Curation
                Decentralized               approach to data curation
                Crowd-sourcing                the curation process
                       – Leverages community of users to curate data
                Wisdom                 of the community (crowd)
                Can           scale to millions of records
Types of Data Curation – How?
Digital Enterprise Research Institute                                        www.deri.ie



            Manual Curation
                Curators               directly manipulate data
                Can           tie users up with low-value add activities

            (Sem-)Automated Curation
                Algorithms      can (semi-)automate curation activities
                    such as data cleansing, record duplication and
                    classification
                Can           be supervised or approved by human curators
Types of Data Curation – How?
Digital Enterprise Research Institute                                          www.deri.ie



            Sheer curation, or Curation at Source
                Curation    activities integrated in normal workflow of
                    those creating and managing data
                Can     be as simple as vetting or “rating” the results of a
                    curation algorithm
                Results                can be available immediately

            Blended Approaches: Best of Both
                Sheer             curation + post hoc curation department
                Allows             immediate access to curated data
                Ensures                quality control with expert curation
Setting up a Curation Process
Digital Enterprise Research Institute                                  www.deri.ie




            5 Steps to setup a curation process:
               1 - Identify what data you need to curate
               2 - Identify who will curate the data
               3 - Define the curation workflow
               4 - Identity appropriate data-in & data-out formats
               5 - Identify the artifacts, tools, and processes needed to
                   support the curation process
Wikipedia
Digital Enterprise Research Institute                             www.deri.ie




              The World Largest Open Digital Curation Community
Wikipedia
Digital Enterprise Research Institute                                         www.deri.ie



        Open-source encyclopedia
        Collaboratively built by large community
                Challenges             existing models of content creation
                More            than 19,000,000 articles
                270+            languages, 3,200,000+ articles in English
                More            than 157,000 active contributors
            Studies show accuracy and stylistic formality are
             equivalent to resources developed in expert-
             based closed communities
                i.e.       Columbia and Britannica encyclopedias
Wikipedia
Digital Enterprise Research Institute                                            www.deri.ie



       MediaWiki
           Wiki          platform behind Wikipedia
                  – Widespread and popular technology
           Wikis            can also support data curation
                  – Lowers entry barriers for collaborative data curation
       Widely used inside organizations
           Intellipedia                covering 16 U.S. Intelligence agencies
           Wiki    Proteins, curated Protein data for knowledge
               discovery and annotation
Wikipedia
Digital Enterprise Research Institute                                www.deri.ie




           Decentralized environment supports creation of
            high quality information with:
               Social            organization
               Artifacts,  tools & processes for cooperative work
                   coordination


           Wikipedia collaboration dynamics highlight good
            practices
Wikipedia – Social Organization
Digital Enterprise Research Institute                                             www.deri.ie


            Any user can edit its contents
                Without                prior registration

            Does not lead to a chaotic scenario
                In   practice highly scalable approach for high quality
                    content creation on the Web

            Relies on simple but highly effective way to
             coordinate its curation process
            Curation is activity of Wikipedia admins
                Responsibility               for information quality standards
Wikipedia – Social Organization
Digital Enterprise Research Institute                                             www.deri.ie




            Four main types of accounts:
                Anonymous              users
                       – Identified by their associated IP address
                Registered             users
                       – Users with an account in the Wikipedia website
                Administrators/Editors
                       – Registered users with additional permissions in the system
                       – Access to curation tools
                Bots
                       – Programs that perform repetitive tasks
Wikipedia – Social Organization
Digital Enterprise Research Institute    www.deri.ie
Wikipedia – Social Organization
Digital Enterprise Research Institute                                           www.deri.ie



            Incentives
                Improvement               of one‟s reputation
                Sense              of efficacy
                       – Contributing effectively to a meaningful project
                Over            time focus of editors typically change
                       – From curators of a few articles in specific topics
                       – To more global curation perspective
                       – Enforcing quality assessment of Wikipedia as a whole
Wikipedia – Artifacts, Tools &
       Processes
Digital Enterprise Research Institute                                                 www.deri.ie




            Wiki Article Editor (Tool)
                   WYSIWYG or markup text editor
            Talk Pages (Tool)
                   Public arena for discussions around Wikipedia resources
            Watchlists (Tool)
                   Helps curators to actively monitor the integrity and quality of
                    resources they contribute
            Permission Mechanisms (Tool)
                   Users with administrator status can perform critical actions such
                    as remove pages and grant administrative permissions to new
                    users
Wikipedia – Artifacts, Tools &
       Processes
Digital Enterprise Research Institute                                                www.deri.ie


          Automated Edition (Tool)
                Bots are automated or semi-automated tools that perform repetitive
                 tasks over content
          Page History and Restore (Tool)
                Historical trail of changes to a Wikipedia Resource
          Guidelines, Policies & Templates (Artifact)
                Defines curation guidelines for editors to assess article quality
          Dispute Resolution (Process)
                Dispute mechanism between editors over the article contents
          Article
           Edition, Deletion, Merging, Redirection, Transwiking, Archiv
           al (Process)
                Describe the curation actions over Wikipedia resources
Wikipedia - DBPedia
Digital Enterprise Research Institute                                              www.deri.ie


            DBPedia Knowledge base
                Inherits               massive volume of curated Wikipedia data
                Built         using information info box properties
                Indirectly              uses wiki as data curation platform

            DBPedia provides direct access to data
                3.4         million entities and 1 billion RDF triples
                Comprehensive                 data infrastructure
                       – Concept URIs, definitions, and basic types
Digital Enterprise Research Institute   www.deri.ie
Wikipedia - DBPedia
Digital Enterprise Research Institute   www.deri.ie
Overview
Digital Enterprise Research Institute                    www.deri.ie



            Curation Background
                   The Business Need for Curated Data
                   What is Data Curation?
                   Data Quality and Curation
                   How to Curate Data


            Wikipedia (DBpedia) Case Study

            Best Practices from Case Study Learning
Best Practices from Case Study
       Learning
Digital Enterprise Research Institute                           www.deri.ie


            Social Best Practices
                Participation
                Engagement
                Incentives
                Community                Governance Models

            Technical Best Practices
                Data           Representation
                Human-                 and AutomatedCuration
                Track            Provenance
Social Best Practices
Digital Enterprise Research Institute                                              www.deri.ie




            Participation
                Stakeholders  involvement for data producers and
                    consumers must occur early in project
                       – Provides insight into basic questions of what they want
                         to do, for whom, and what it will provide
                White     papers are effective means to present these
                    ideas, and solicit opinion from community
                       – Can be used to establish informal „social contract‟ for
                         community
Social Best Practices
Digital Enterprise Research Institute                                               www.deri.ie




            Engagement
                Outreach                 activities essential for promotion and
                    feedback
                Typical                consumers-to-contributors ratios of less than
                    5%
                Social            communication and networking forums are
                    useful
                       – Majority of community may not communicate using
                         these media
                       – Communication by email still remains important
Social Best Practices
Digital Enterprise Research Institute                                     www.deri.ie




            Incentives
                Sheer      curation needs line of sight from data curating
                    activity, to tangible exploitation benefits
                Lack   of awareness of value proposition will slow
                    emergence of collaborative contributions
                Recognizing   contributing curators through a formal
                    feedback mechanism
                       – Reinforces contribution culture
                       – Directly increases output quality
Social Best Practices
Digital Enterprise Research Institute                                         www.deri.ie




            Community Governance Models
                Effective  governance structure is vital to ensure
                    success of community
                Internal  communities and consortium perform well
                    when they leverage traditional corporate and
                    democratic governance models
                Open      communities need to engage the community
                    within the governance process
                       – Follow less orthodox approaches using meritocratic
                         and autocratic principles
Technical Best Practices
Digital Enterprise Research Institute                                    www.deri.ie

            Data Representation
                Must   be robust and standardized to encourage
                    community usage and tools development
                Support     for legacy data formats and ability to
                    translate data forward to support new technology and
                    standards
            Human & Automated Curation
                Balancing              will improve data quality
                Automated      curation should always defer to, and never
                    override, human curation edits
                       – Automate validating data deposition and entry
                       – Target community at focused curation tasks
Technical Best Practices
Digital Enterprise Research Institute                                         www.deri.ie



            Track Provenance
                All  curation activities should be recorded and
                    maintained as part data provenance effort
                       – Especially where human curators are involved
                Users             can have different perspectives of provenance
                       – A scientist may need to evaluate the fine grained
                         experiment description behind the data
                       – For a business analyst the ‟brand‟ of data provider can
                         be sufficient for determining quality
Conclusions
Digital Enterprise Research Institute                                               www.deri.ie




        Data curation can ensure the quality of data and
         its fitness for use
        Pre-competitive data can be shared without
         conferring a commercial advantage
        Pre-competitive data communities
                Common                 curation tasks carried out once in public
                    domain
                Reduces                cost, increase quantity and quality
Acknowledgements
Digital Enterprise Research Institute                                                      www.deri.ie


        Collaborators Andre Freitas & Seán O'Riain

        Insight from Thought Leaders
               Evan Sandhaus (Semantic Technologist), Rob Larson (Vice President Product
                Development and Management), and Gregg Fenton (Director Emerging Platforms)
                from the New York Times
               Krista Thomas (Vice President, Marketing & Communications), Tom Tague
                (OpenCalais initiative Lead) from Thomson Reuters
               Antony Williams (VP of Strategic Development ) from ChemSpider
               Helen Berman (Director), John Westbrook (Product Development) from the Protein
                Data Bank
               Nick Lynch (Architect with AstraZeneca) from the Pistoia Alliance.

        The work presented has been funded by Science
         Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-
         2).
Further Information
Digital Enterprise Research Institute                     www.deri.ie


The Role of Community-Driven
Data Curation for Enterprises
Edward Curry, Andre Freitas, & Seán O'Riain




  In David Wood (ed.),
  Linking Enterprise Data Springer, 2010.
  Available Free at:
  http://3roundstones.com/led_book/led-curry-et-al.html

Weitere ähnliche Inhalte

Was ist angesagt?

The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesEdward Curry
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) DataEdward Curry
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataEdward Curry
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
 
System of Systems Information Interoperability using a Linked Dataspace
System of Systems Information Interoperability using a Linked DataspaceSystem of Systems Information Interoperability using a Linked Dataspace
System of Systems Information Interoperability using a Linked DataspaceEdward Curry
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementEdward Curry
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeEdward Curry
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeEdward Curry
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information ManagementEdward Curry
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachEdward Curry
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTEdward Curry
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 

Was ist angesagt? (20)

The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage DataCollaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
 
System of Systems Information Interoperability using a Linked Dataspace
System of Systems Information Interoperability using a Linked DataspaceSystem of Systems Information Interoperability using a Linked Dataspace
System of Systems Information Interoperability using a Linked Dataspace
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy Management
 
The Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for EuropeThe Big Data Value PPP: A Standardisation Opportunity for Europe
The Big Data Value PPP: A Standardisation Opportunity for Europe
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 

Andere mochten auch

Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInstituto Familia y Adopción
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...Edward Curry
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipEdward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
 

Andere mochten auch (8)

Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizaje
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
 

Ähnlich wie Wikipedia (DBpedia): Crowdsourced Data Curation

Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...Umair ul Hassan
 
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Camille Mathieu
 
Manfred Linking the Real World
Manfred Linking the Real WorldManfred Linking the Real World
Manfred Linking the Real Worldsssw2012
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...jodischneider
 
WikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and Outcomes
WikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and OutcomesWikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and Outcomes
WikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and Outcomesjodischneider
 
KMWorld Martin Briefing
KMWorld Martin BriefingKMWorld Martin Briefing
KMWorld Martin Briefingmartingarland
 
Towards Patient Controlled Privacy
Towards Patient Controlled PrivacyTowards Patient Controlled Privacy
Towards Patient Controlled PrivacyOwen Sacco
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government DataFadi Maali
 
Down to Business: Taking Action Quickly with Linked Data Services
Down to Business: Taking Action Quickly with Linked Data ServicesDown to Business: Taking Action Quickly with Linked Data Services
Down to Business: Taking Action Quickly with Linked Data ServicesInside Analysis
 
Knowledge management on the desktop
Knowledge management on the desktopKnowledge management on the desktop
Knowledge management on the desktopLaura Dragan
 
Digital DNA for Organic Enterprises
Digital DNA for Organic EnterprisesDigital DNA for Organic Enterprises
Digital DNA for Organic EnterprisesTeemu Arina
 
Externalization Trend
Externalization TrendExternalization Trend
Externalization TrendNigel Green
 
2018 10 igneous
2018 10 igneous2018 10 igneous
2018 10 igneousChris Dwan
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open DataDerilinx
 
Making sense out of disagreement, University of Limerick Interaction Design C...
Making sense out of disagreement, University of Limerick Interaction Design C...Making sense out of disagreement, University of Limerick Interaction Design C...
Making sense out of disagreement, University of Limerick Interaction Design C...jodischneider
 
Towards Social semantic journalism
Towards Social semantic journalismTowards Social semantic journalism
Towards Social semantic journalismBahareh Heravi
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open DataDerilinx
 
Keynote Theatre. Keynote Day 2. 16:30 Evelyn de Souza
Keynote Theatre. Keynote Day 2. 16:30   Evelyn de Souza Keynote Theatre. Keynote Day 2. 16:30   Evelyn de Souza
Keynote Theatre. Keynote Day 2. 16:30 Evelyn de Souza CloudExpoAsia
 
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyondEDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyondEuropean Data Forum
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebFabrizio Orlandi
 

Ähnlich wie Wikipedia (DBpedia): Crowdsourced Data Curation (20)

Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...
 
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
 
Manfred Linking the Real World
Manfred Linking the Real WorldManfred Linking the Real World
Manfred Linking the Real World
 
Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...Envisioning a discussion dashboard for collective intelligence of web convers...
Envisioning a discussion dashboard for collective intelligence of web convers...
 
WikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and Outcomes
WikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and OutcomesWikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and Outcomes
WikiSym2012 Deletion Discussions in Wikipedia: Decision Factors and Outcomes
 
KMWorld Martin Briefing
KMWorld Martin BriefingKMWorld Martin Briefing
KMWorld Martin Briefing
 
Towards Patient Controlled Privacy
Towards Patient Controlled PrivacyTowards Patient Controlled Privacy
Towards Patient Controlled Privacy
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
Down to Business: Taking Action Quickly with Linked Data Services
Down to Business: Taking Action Quickly with Linked Data ServicesDown to Business: Taking Action Quickly with Linked Data Services
Down to Business: Taking Action Quickly with Linked Data Services
 
Knowledge management on the desktop
Knowledge management on the desktopKnowledge management on the desktop
Knowledge management on the desktop
 
Digital DNA for Organic Enterprises
Digital DNA for Organic EnterprisesDigital DNA for Organic Enterprises
Digital DNA for Organic Enterprises
 
Externalization Trend
Externalization TrendExternalization Trend
Externalization Trend
 
2018 10 igneous
2018 10 igneous2018 10 igneous
2018 10 igneous
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open Data
 
Making sense out of disagreement, University of Limerick Interaction Design C...
Making sense out of disagreement, University of Limerick Interaction Design C...Making sense out of disagreement, University of Limerick Interaction Design C...
Making sense out of disagreement, University of Limerick Interaction Design C...
 
Towards Social semantic journalism
Towards Social semantic journalismTowards Social semantic journalism
Towards Social semantic journalism
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Keynote Theatre. Keynote Day 2. 16:30 Evelyn de Souza
Keynote Theatre. Keynote Day 2. 16:30   Evelyn de Souza Keynote Theatre. Keynote Day 2. 16:30   Evelyn de Souza
Keynote Theatre. Keynote Day 2. 16:30 Evelyn de Souza
 
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyondEDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
EDF2013: Keynote Stefan Decker: Big Data In Ireland - Linked Data and beyond
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 

Kürzlich hochgeladen

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 

Kürzlich hochgeladen (20)

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 

Wikipedia (DBpedia): Crowdsourced Data Curation

  • 1. Digital Enterprise Research Institute www.deri.ie Wikipedia (DBpedia): Crowdsourced Data Curation Edward Curry, Andre Freitas, Seán O'Riain ed.curry@deri.org http://www.deri.org/ http://www.EdwardCurry.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • 2. Speaker Profile Digital Enterprise Research Institute www.deri.ie  Research Scientist at the Digital Enterprise Research Institute (DERI)  Leading international web science research organization  Researching how web of data is changing way business work and interact with information  Projects include studies of enterprise linked data, community- based data curation, semantic data analytics, and semantic search  Investigate utilization within the pharmaceutical, oil & gas, financial, advertising, media, manufacturing, health care, ICT, and automotive industries  Invited speaker at the 2010 MIT Sloan CIO Symposium to an audience of more than 600 CIOs
  • 3. Overview Digital Enterprise Research Institute www.deri.ie  Curation Background  The Business Need for Curated Data  What is Data Curation?  Data Quality and Curation  How to Curate Data  Wikipedia (DBpedia) Case Study  Best Practices from Case Study Learning
  • 4. The Business Need Digital Enterprise Research Institute www.deri.ie  Knowledge workers need:  Access to the right information  Confidence in that information  Working incomplete inaccurate, or wrong information can have disastrous consequences
  • 5. The Problems with Data Digital Enterprise Research Institute www.deri.ie  Flawed Data  Effects 25% of critical data in world‟s top companies (Gartner)  Data Quality  Recent banking crisis (Economist Dec‟09)  Inaccurate figures made it difficult to manage operations (investments exposure and risk) – “asset are defined differently in different programs” – “numbers did not always add up” – “departments do not trust each other‟s figures” – “figures … not worth the pixels they were made of”
  • 6. What is Data Curation? Digital Enterprise Research Institute www.deri.ie  Digital Curation  Selection, preservation, maintenance, collection, and archiving of digital assets  Data Curation  Active management of data over its life-cycle  Data Curators  Ensure data is trustworthy, discoverable, accessible, reusable, and fit for use – Museum cataloguers of the Internet age
  • 7. What is Data Curation? Digital Enterprise Research Institute www.deri.ie  Data Governance  Convergence of data quality, data management, business process management, and risk management  Data Curation is a complimentary activity  Part of overall data governance strategy for organization  Data Curator = Data Steward ??  Overlapping terms between communities
  • 8. Data Quality and Curation Digital Enterprise Research Institute www.deri.ie  What is Data Quality?  Desirable characteristics for information resource  Described as a series of quality dimensions – Discoverability, Accessibility, Timeliness, Completeness, Inte rpretation, Accuracy, Consistency, Provenance & Reputation  Data curation can be used to improve these quality dimensions
  • 9. Data Quality and Curation Digital Enterprise Research Institute www.deri.ie  Discoverability & Accessibility  Curate to streamline search by storing and classifying in appropriate and consistent manner  Accuracy  Curate to ensure data correctly represents the “real- world” values it models  Consistency  Curate to ensure data created and maintained using standardized definitions, calculations, terms, and identifiers
  • 10. Data Quality and Curation Digital Enterprise Research Institute www.deri.ie  Provenance & Reputation  Curate to track source of data and determine reputation  Curate to include the objectivity of the source/producer – Is the information unbiased, unprejudiced, and impartial? – Or does it come from a reputable but partisan source? Other dimensions discussed in chapter
  • 11. How to Curate Data Digital Enterprise Research Institute www.deri.ie  Data Curation is a large field with sophisticated techniques and processes  Section provides high-level overview on:  Should you curate data?  Types of Curation  Setting up a curation process Additional detail and references available in book chapter
  • 12. Should You Curate Data? Digital Enterprise Research Institute www.deri.ie  Curation can have multiple motivations  Improving accessibility, quality, consistency,…  Will the data benefit from curation?  Identify business case  Determine if potential return support investment  Not all enterprise data should be curated  Suits knowledge-centric data rather than transactional operations data
  • 13. Types of Data Curation Digital Enterprise Research Institute www.deri.ie  Multiple approaches to curate data, no single correct way  Who? – Individual Curators – Curation Departments – Community-based Curation  How? – Manual Curation – (Semi-)Automated – Sheer Curation
  • 14. Types of Data Curation – Who? Digital Enterprise Research Institute www.deri.ie  Individual Data Curators  Suitable for infrequently changing small quantity of data – (<1,000 records) – Minimal curation effort (minutes per record)
  • 15. Types of Data Curation – Who? Digital Enterprise Research Institute www.deri.ie  Curation Departments  Curation experts working with subject matter experts to curate data within formal process – Can deal with large curation effort (000‟s of records)  Limitations  Scalability: Can struggle with large quantities of dynamic data (>million records)  Availability: Post-hoc nature creates delay in curated data availability
  • 16. Types of Data Curation - Who? Digital Enterprise Research Institute www.deri.ie  Community-Based Data Curation  Decentralized approach to data curation  Crowd-sourcing the curation process – Leverages community of users to curate data  Wisdom of the community (crowd)  Can scale to millions of records
  • 17. Types of Data Curation – How? Digital Enterprise Research Institute www.deri.ie  Manual Curation  Curators directly manipulate data  Can tie users up with low-value add activities  (Sem-)Automated Curation  Algorithms can (semi-)automate curation activities such as data cleansing, record duplication and classification  Can be supervised or approved by human curators
  • 18. Types of Data Curation – How? Digital Enterprise Research Institute www.deri.ie  Sheer curation, or Curation at Source  Curation activities integrated in normal workflow of those creating and managing data  Can be as simple as vetting or “rating” the results of a curation algorithm  Results can be available immediately  Blended Approaches: Best of Both  Sheer curation + post hoc curation department  Allows immediate access to curated data  Ensures quality control with expert curation
  • 19. Setting up a Curation Process Digital Enterprise Research Institute www.deri.ie  5 Steps to setup a curation process: 1 - Identify what data you need to curate 2 - Identify who will curate the data 3 - Define the curation workflow 4 - Identity appropriate data-in & data-out formats 5 - Identify the artifacts, tools, and processes needed to support the curation process
  • 20. Wikipedia Digital Enterprise Research Institute www.deri.ie The World Largest Open Digital Curation Community
  • 21. Wikipedia Digital Enterprise Research Institute www.deri.ie  Open-source encyclopedia  Collaboratively built by large community  Challenges existing models of content creation  More than 19,000,000 articles  270+ languages, 3,200,000+ articles in English  More than 157,000 active contributors  Studies show accuracy and stylistic formality are equivalent to resources developed in expert- based closed communities  i.e. Columbia and Britannica encyclopedias
  • 22. Wikipedia Digital Enterprise Research Institute www.deri.ie  MediaWiki  Wiki platform behind Wikipedia – Widespread and popular technology  Wikis can also support data curation – Lowers entry barriers for collaborative data curation  Widely used inside organizations  Intellipedia covering 16 U.S. Intelligence agencies  Wiki Proteins, curated Protein data for knowledge discovery and annotation
  • 23. Wikipedia Digital Enterprise Research Institute www.deri.ie  Decentralized environment supports creation of high quality information with:  Social organization  Artifacts, tools & processes for cooperative work coordination  Wikipedia collaboration dynamics highlight good practices
  • 24. Wikipedia – Social Organization Digital Enterprise Research Institute www.deri.ie  Any user can edit its contents  Without prior registration  Does not lead to a chaotic scenario  In practice highly scalable approach for high quality content creation on the Web  Relies on simple but highly effective way to coordinate its curation process  Curation is activity of Wikipedia admins  Responsibility for information quality standards
  • 25. Wikipedia – Social Organization Digital Enterprise Research Institute www.deri.ie  Four main types of accounts:  Anonymous users – Identified by their associated IP address  Registered users – Users with an account in the Wikipedia website  Administrators/Editors – Registered users with additional permissions in the system – Access to curation tools  Bots – Programs that perform repetitive tasks
  • 26. Wikipedia – Social Organization Digital Enterprise Research Institute www.deri.ie
  • 27. Wikipedia – Social Organization Digital Enterprise Research Institute www.deri.ie  Incentives  Improvement of one‟s reputation  Sense of efficacy – Contributing effectively to a meaningful project  Over time focus of editors typically change – From curators of a few articles in specific topics – To more global curation perspective – Enforcing quality assessment of Wikipedia as a whole
  • 28. Wikipedia – Artifacts, Tools & Processes Digital Enterprise Research Institute www.deri.ie  Wiki Article Editor (Tool)  WYSIWYG or markup text editor  Talk Pages (Tool)  Public arena for discussions around Wikipedia resources  Watchlists (Tool)  Helps curators to actively monitor the integrity and quality of resources they contribute  Permission Mechanisms (Tool)  Users with administrator status can perform critical actions such as remove pages and grant administrative permissions to new users
  • 29. Wikipedia – Artifacts, Tools & Processes Digital Enterprise Research Institute www.deri.ie  Automated Edition (Tool)  Bots are automated or semi-automated tools that perform repetitive tasks over content  Page History and Restore (Tool)  Historical trail of changes to a Wikipedia Resource  Guidelines, Policies & Templates (Artifact)  Defines curation guidelines for editors to assess article quality  Dispute Resolution (Process)  Dispute mechanism between editors over the article contents  Article Edition, Deletion, Merging, Redirection, Transwiking, Archiv al (Process)  Describe the curation actions over Wikipedia resources
  • 30. Wikipedia - DBPedia Digital Enterprise Research Institute www.deri.ie  DBPedia Knowledge base  Inherits massive volume of curated Wikipedia data  Built using information info box properties  Indirectly uses wiki as data curation platform  DBPedia provides direct access to data  3.4 million entities and 1 billion RDF triples  Comprehensive data infrastructure – Concept URIs, definitions, and basic types
  • 31. Digital Enterprise Research Institute www.deri.ie
  • 32. Wikipedia - DBPedia Digital Enterprise Research Institute www.deri.ie
  • 33. Overview Digital Enterprise Research Institute www.deri.ie  Curation Background  The Business Need for Curated Data  What is Data Curation?  Data Quality and Curation  How to Curate Data  Wikipedia (DBpedia) Case Study  Best Practices from Case Study Learning
  • 34. Best Practices from Case Study Learning Digital Enterprise Research Institute www.deri.ie  Social Best Practices  Participation  Engagement  Incentives  Community Governance Models  Technical Best Practices  Data Representation  Human- and AutomatedCuration  Track Provenance
  • 35. Social Best Practices Digital Enterprise Research Institute www.deri.ie  Participation  Stakeholders involvement for data producers and consumers must occur early in project – Provides insight into basic questions of what they want to do, for whom, and what it will provide  White papers are effective means to present these ideas, and solicit opinion from community – Can be used to establish informal „social contract‟ for community
  • 36. Social Best Practices Digital Enterprise Research Institute www.deri.ie  Engagement  Outreach activities essential for promotion and feedback  Typical consumers-to-contributors ratios of less than 5%  Social communication and networking forums are useful – Majority of community may not communicate using these media – Communication by email still remains important
  • 37. Social Best Practices Digital Enterprise Research Institute www.deri.ie  Incentives  Sheer curation needs line of sight from data curating activity, to tangible exploitation benefits  Lack of awareness of value proposition will slow emergence of collaborative contributions  Recognizing contributing curators through a formal feedback mechanism – Reinforces contribution culture – Directly increases output quality
  • 38. Social Best Practices Digital Enterprise Research Institute www.deri.ie  Community Governance Models  Effective governance structure is vital to ensure success of community  Internal communities and consortium perform well when they leverage traditional corporate and democratic governance models  Open communities need to engage the community within the governance process – Follow less orthodox approaches using meritocratic and autocratic principles
  • 39. Technical Best Practices Digital Enterprise Research Institute www.deri.ie  Data Representation  Must be robust and standardized to encourage community usage and tools development  Support for legacy data formats and ability to translate data forward to support new technology and standards  Human & Automated Curation  Balancing will improve data quality  Automated curation should always defer to, and never override, human curation edits – Automate validating data deposition and entry – Target community at focused curation tasks
  • 40. Technical Best Practices Digital Enterprise Research Institute www.deri.ie  Track Provenance  All curation activities should be recorded and maintained as part data provenance effort – Especially where human curators are involved  Users can have different perspectives of provenance – A scientist may need to evaluate the fine grained experiment description behind the data – For a business analyst the ‟brand‟ of data provider can be sufficient for determining quality
  • 41. Conclusions Digital Enterprise Research Institute www.deri.ie  Data curation can ensure the quality of data and its fitness for use  Pre-competitive data can be shared without conferring a commercial advantage  Pre-competitive data communities  Common curation tasks carried out once in public domain  Reduces cost, increase quantity and quality
  • 42. Acknowledgements Digital Enterprise Research Institute www.deri.ie  Collaborators Andre Freitas & Seán O'Riain  Insight from Thought Leaders  Evan Sandhaus (Semantic Technologist), Rob Larson (Vice President Product Development and Management), and Gregg Fenton (Director Emerging Platforms) from the New York Times  Krista Thomas (Vice President, Marketing & Communications), Tom Tague (OpenCalais initiative Lead) from Thomson Reuters  Antony Williams (VP of Strategic Development ) from ChemSpider  Helen Berman (Director), John Westbrook (Product Development) from the Protein Data Bank  Nick Lynch (Architect with AstraZeneca) from the Pistoia Alliance.  The work presented has been funded by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion- 2).
  • 43. Further Information Digital Enterprise Research Institute www.deri.ie The Role of Community-Driven Data Curation for Enterprises Edward Curry, Andre Freitas, & Seán O'Riain In David Wood (ed.), Linking Enterprise Data Springer, 2010. Available Free at: http://3roundstones.com/led_book/led-curry-et-al.html