SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
.nju.edu.cn




                RELIN: Relatedness and Informativeness-based
                    Centrality for Entity Summarization




                     Gong Cheng1, Thanh Tran2, Yuzhong Qu1
1 State Key Laboratory for Novel Software Technology, Nanjing University, China

          2 Institute AIFB, Karlsruhe Institute of Technology, Germany

                               gcheng@nju.edu.cn



                           Presented at ISWC2011
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           2 of 30
Entity search --- find entities that match an information need
                                                                     ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn                                    3 of 30
Pay-as-you-go data integration --- judge whether two entities denote the same
                                                                                    ws .nju.edu.cn




                                     sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                                                   4 of 30
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.
        Problem: to summarize lengthy entity descriptions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           5 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   6 of 30
Data graph
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   7 of 30
Feature set
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   8 of 30
Entity summarization
                                                 ws .nju.edu.cn

        Entity summarization = feature ranking
        Entity summary = k top-ranked features




Gong Cheng (程龚) gcheng@nju.edu.cn                9 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   10 of 30
Centrality-based ranking: concepts
                                                                                 ws .nju.edu.cn

        Widely applied to text summarization and ontology summarization
        By constructing a graph
            Nodes: data elements to be ranked
            Edges: connecting related nodes
        and then, measuring node centrality
            e.g. degree, PageRank, …



                     f2
                                       f1       f4
                                                                Relatednesss ≥ threshold

                                                         f5
   Relatednesss < threshold
                                                 f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               11 of 30
PageRank
                                                                                ws .nju.edu.cn

        Simulating a random surfer’s behavior who navigates from node to node
        Two types of action
            Following a random edge (with a uniform probability distribution)
            Jumping at random (with a uniform probability distribution)
        Ranking based on the stationary distribution of such a Markov chain




                     f2
                                     f1               f4

                                                                 f5

                                                       f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               12 of 30
Centrality-based ranking for entity summarization: problems
                                                                                            ws .nju.edu.cn

        How to define a good feature
            Not only capturing the main themes of the entity description
            But also distinguishing the entity from others
        Loss of information
            Float-valued function  boolean-valued function




                     f2
                                      f1               f4
                                                                           Relatednesss ≥ threshold

                                                                  f5
   Relatednesss < threshold
                                                        f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                                          13 of 30
RELIN: concepts
                                                                                               ws .nju.edu.cn

        An extension of PageRank
            Following a random edge (           )
            within a complete graph, with a probability proportional to the relatedness between the
            two associated nodes, i.e. no threshold needed
            Jumping at random (          )
            with a probability proportional to the amount of information carried by the target that
            helps to identify the entity




Gong Cheng (程龚) gcheng@nju.edu.cn                                                              14 of 30
RELIN: RELatedness and INformativeness-based centrality
                                                                                                ws .nju.edu.cn

        Two kinds of action
            Relational move --- more likely to a feature that carries related information about the
            theme currently under investigation
            Informational jump --- more likely to a feature that provides a large amount of new
            information for clarifying the identity of the underlying entity
        Two non-uniform probability distributions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                               15 of 30
Formalization
                                                                                                    ws .nju.edu.cn

        Actions (given the current feature fq)
            P(M|fq): the probability of performing a relational move from fq
            P(J|fq): the probability of performing an informational jump from fq
            subject to P(M|fq) + P(J|fq) = 1
        Targets for actions (given FS the feature set)
            P(fp|fq,M): the probability of performing a relational move from fq to fp
            P(fp|fq,J): the probability of performing an informational jump from fq to fp
            subject to      P f p | f q , M 1 and      P f p | fq , J 1
                           f p FS                         f p FS
        Result
            x(t): |FS|-dimensional vector
            xp(t): the probability that the surfer visits fp at step t
            Finally,
                 xp t 1                  xq t   P M | fq P f p | fq , M   P J | fq P f p | fq , J
                                    f q FS

            and
                 lim x t       x
                 t




Gong Cheng (程龚) gcheng@nju.edu.cn                                                                   16 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   17 of 30
Actions
                                        ws .nju.edu.cn

        P(M|fq) = 1 – λ
        P(J|fq) = λ
        λ: to be tuned in experiments




Gong Cheng (程龚) gcheng@nju.edu.cn       18 of 30
Relatedness --- P(fp|fq,M)
                                                                                      ws .nju.edu.cn

        Relatedness between features (i.e. property-value pairs) combines
            Relatedness between properties (i.e. resources)
            Relatedness between values (i.e. resources)
        Relatedness between resources = relatedness between resource names
            URI: label or local name
            Literal: lexical form
        Distributional relatedness between resource names
            More related = more often co-occur in certain contexts (e.g. documents)
        Estimated via “pointwise mutual information + Google”

                                                                      Hits si , s j
                                                      P si , s j
                                                                              N


                                                                   Hits s j
                                                      P sj
                                                                      N




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     19 of 30
Informativeness --- P(fp|fq,J)
                                                                                           ws .nju.edu.cn

        Self-information

        o: informational jump from fq to fp
        P(fp|fq): the probability that fp belongs to a feature set given fq also does so
        Estimated via a statistical analysis of the data set




        Approximation: P(fp|fq) = P(fp)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                          20 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   21 of 30
Experiments
                                    ws .nju.edu.cn

        Intrinsic evaluation
        Extrinsic evaluation




Gong Cheng (程龚) gcheng@nju.edu.cn   22 of 30
Intrinsic evaluation --- design
                                                                                ws .nju.edu.cn

        Task
            To manually construct ideal entity summaries as the gold standard
        Participants
            24 students majoring in computer science
        Test cases
            149 entity descriptions randomly selected from DBpedia 3.4
        Assignment
            4.43 participants per entity description
        Output
            Top-5 features
            Top-10 features




Gong Cheng (程龚) gcheng@nju.edu.cn                                               23 of 30
Intrinsic evaluation --- results
                                                                          ws .nju.edu.cn

        Metric: overlap between summaries

        Agreement between participants about ideal summaries
            2.91 when k=5
            7.86 when k=10
        Quality of summaries computed under different approach settings




             Baselines
              Ours




Gong Cheng (程龚) gcheng@nju.edu.cn                                         24 of 30
Extrinsic evaluation --- design
                                                                                 ws .nju.edu.cn

        Task
            To manually confirm entity mappings by using summaries
        Participants
            19 students majoring in computer science
        Test cases
            47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009)
            Gold-standard judgments based on owl:sameAs links
                 24 correct and 23 incorrect
        Assignment
            3.62 participants per pair, per approach setting
        Output
            Judgment: correct or incorrect




Gong Cheng (程龚) gcheng@nju.edu.cn                                                25 of 30
Extrinsic evaluation --- results
                                                                                         ws .nju.edu.cn

        Metrics
            Accuracy of the judgments
                  1.0 = consistent with the gold standard
                  0.0 = inconsistent
            Time spent
                  Normalized by the average time per judgment spent by the participant
                  1.0 = medium efficiency
                  Smaller value = higher efficiency


        Results




Gong Cheng (程龚) gcheng@nju.edu.cn                                                        26 of 30
Discussion
                                                                                      ws .nju.edu.cn

        Automatically computed summaries are still not as good as handcrafted ones.

                                                                         k=5   k=10
         Agreement between ideal summaries                              2.91 7.86
         Agreement between computed summaries and ideal summaries       2.40 4.88

        User-specific notion of informativeness
            Longitude and latitude are highly informative, but …
        Information redundancy
            Longitude + latitude = point
            What if multiple sources …
        Summarization = what + how (to present)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     27 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   28 of 30
Conclusions
                                                                                            ws .nju.edu.cn

        Problem of entity summarization
            Extractive
            About identifying the entity that underlies a lengthy description
        The RELIN model
            Variant of the random surfer model
            Non-uniform probability distributions
            Informativeness + relatedness
        Implementation
            Based on linguistic and information theory concepts
            Using information captured by the labels of nodes and edges in the data graph
        Experiments
            Closer to handcrafted ideal summaries
            Assisting users in confirming entity mappings more accurately




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           29 of 30
Future work --- application-specific entity summarization
                                                                ws .nju.edu.cn




                                    sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                               30 of 30
Related work --- summarization
                                                                                 ws .nju.edu.cn



   Paradigm               Approach             Measure             Model
                                                                        RELIN
         Extractive                                              - Relatedness
    - Text                  Centrality-based     PageRank-like   - Informativeness
    - Ontology                                                   - Non-uniform
                                                                 probability distribution



                                                     Others           PageRank
      Non-extractive
                                               - Degree          - Relatedness
    - Database               Centroid-based
                                               - Betweenness     - Uniform probability
    - Graph
                                               -…                distribution




Gong Cheng (程龚) gcheng@nju.edu.cn                                                31 of 30
Related work --- ranking
                                                                                         ws .nju.edu.cn

        Different goals --- to best identify the underlying entity
            B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE
            Internet Comput. 2005.
            R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010.
            T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC
            2009.
            …
        Exploitation of data semantics at different levels --- use labels of nodes and edges
            T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009.
            X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007.
            …




Gong Cheng (程龚) gcheng@nju.edu.cn                                                       32 of 30

Weitere ähnliche Inhalte

Andere mochten auch

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyViewGong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析Gong Cheng
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataGong Cheng
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
 

Andere mochten auch (6)

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
 

Mehr von Gong Cheng

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探Gong Cheng
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法Gong Cheng
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Gong Cheng
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索Gong Cheng
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探Gong Cheng
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索Gong Cheng
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference reviewGong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationGong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析Gong Cheng
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Gong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryGong Cheng
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...Gong Cheng
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic WebGong Cheng
 

Mehr von Gong Cheng (20)

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and Beyond
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic Web
 

Kürzlich hochgeladen

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Kürzlich hochgeladen (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization

  • 1. .nju.edu.cn RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization Gong Cheng1, Thanh Tran2, Yuzhong Qu1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Institute AIFB, Karlsruhe Institute of Technology, Germany gcheng@nju.edu.cn Presented at ISWC2011
  • 2. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 30
  • 3. Entity search --- find entities that match an information need ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 30
  • 4. Pay-as-you-go data integration --- judge whether two entities denote the same ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 30
  • 5. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Problem: to summarize lengthy entity descriptions Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 30
  • 6. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 30
  • 7. Data graph ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 30
  • 8. Feature set ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 30
  • 9. Entity summarization ws .nju.edu.cn Entity summarization = feature ranking Entity summary = k top-ranked features Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 30
  • 10. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 30
  • 11. Centrality-based ranking: concepts ws .nju.edu.cn Widely applied to text summarization and ontology summarization By constructing a graph Nodes: data elements to be ranked Edges: connecting related nodes and then, measuring node centrality e.g. degree, PageRank, … f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 30
  • 12. PageRank ws .nju.edu.cn Simulating a random surfer’s behavior who navigates from node to node Two types of action Following a random edge (with a uniform probability distribution) Jumping at random (with a uniform probability distribution) Ranking based on the stationary distribution of such a Markov chain f2 f1 f4 f5 f3 Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 30
  • 13. Centrality-based ranking for entity summarization: problems ws .nju.edu.cn How to define a good feature Not only capturing the main themes of the entity description But also distinguishing the entity from others Loss of information Float-valued function  boolean-valued function f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 30
  • 14. RELIN: concepts ws .nju.edu.cn An extension of PageRank Following a random edge ( ) within a complete graph, with a probability proportional to the relatedness between the two associated nodes, i.e. no threshold needed Jumping at random ( ) with a probability proportional to the amount of information carried by the target that helps to identify the entity Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 30
  • 15. RELIN: RELatedness and INformativeness-based centrality ws .nju.edu.cn Two kinds of action Relational move --- more likely to a feature that carries related information about the theme currently under investigation Informational jump --- more likely to a feature that provides a large amount of new information for clarifying the identity of the underlying entity Two non-uniform probability distributions Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 30
  • 16. Formalization ws .nju.edu.cn Actions (given the current feature fq) P(M|fq): the probability of performing a relational move from fq P(J|fq): the probability of performing an informational jump from fq subject to P(M|fq) + P(J|fq) = 1 Targets for actions (given FS the feature set) P(fp|fq,M): the probability of performing a relational move from fq to fp P(fp|fq,J): the probability of performing an informational jump from fq to fp subject to P f p | f q , M 1 and P f p | fq , J 1 f p FS f p FS Result x(t): |FS|-dimensional vector xp(t): the probability that the surfer visits fp at step t Finally, xp t 1 xq t P M | fq P f p | fq , M P J | fq P f p | fq , J f q FS and lim x t x t Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 30
  • 17. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 30
  • 18. Actions ws .nju.edu.cn P(M|fq) = 1 – λ P(J|fq) = λ λ: to be tuned in experiments Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 30
  • 19. Relatedness --- P(fp|fq,M) ws .nju.edu.cn Relatedness between features (i.e. property-value pairs) combines Relatedness between properties (i.e. resources) Relatedness between values (i.e. resources) Relatedness between resources = relatedness between resource names URI: label or local name Literal: lexical form Distributional relatedness between resource names More related = more often co-occur in certain contexts (e.g. documents) Estimated via “pointwise mutual information + Google” Hits si , s j P si , s j N Hits s j P sj N Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 30
  • 20. Informativeness --- P(fp|fq,J) ws .nju.edu.cn Self-information o: informational jump from fq to fp P(fp|fq): the probability that fp belongs to a feature set given fq also does so Estimated via a statistical analysis of the data set Approximation: P(fp|fq) = P(fp) Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 30
  • 21. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 30
  • 22. Experiments ws .nju.edu.cn Intrinsic evaluation Extrinsic evaluation Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 30
  • 23. Intrinsic evaluation --- design ws .nju.edu.cn Task To manually construct ideal entity summaries as the gold standard Participants 24 students majoring in computer science Test cases 149 entity descriptions randomly selected from DBpedia 3.4 Assignment 4.43 participants per entity description Output Top-5 features Top-10 features Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 30
  • 24. Intrinsic evaluation --- results ws .nju.edu.cn Metric: overlap between summaries Agreement between participants about ideal summaries 2.91 when k=5 7.86 when k=10 Quality of summaries computed under different approach settings Baselines Ours Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 30
  • 25. Extrinsic evaluation --- design ws .nju.edu.cn Task To manually confirm entity mappings by using summaries Participants 19 students majoring in computer science Test cases 47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009) Gold-standard judgments based on owl:sameAs links 24 correct and 23 incorrect Assignment 3.62 participants per pair, per approach setting Output Judgment: correct or incorrect Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 30
  • 26. Extrinsic evaluation --- results ws .nju.edu.cn Metrics Accuracy of the judgments 1.0 = consistent with the gold standard 0.0 = inconsistent Time spent Normalized by the average time per judgment spent by the participant 1.0 = medium efficiency Smaller value = higher efficiency Results Gong Cheng (程龚) gcheng@nju.edu.cn 26 of 30
  • 27. Discussion ws .nju.edu.cn Automatically computed summaries are still not as good as handcrafted ones. k=5 k=10 Agreement between ideal summaries 2.91 7.86 Agreement between computed summaries and ideal summaries 2.40 4.88 User-specific notion of informativeness Longitude and latitude are highly informative, but … Information redundancy Longitude + latitude = point What if multiple sources … Summarization = what + how (to present) Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 30
  • 28. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 28 of 30
  • 29. Conclusions ws .nju.edu.cn Problem of entity summarization Extractive About identifying the entity that underlies a lengthy description The RELIN model Variant of the random surfer model Non-uniform probability distributions Informativeness + relatedness Implementation Based on linguistic and information theory concepts Using information captured by the labels of nodes and edges in the data graph Experiments Closer to handcrafted ideal summaries Assisting users in confirming entity mappings more accurately Gong Cheng (程龚) gcheng@nju.edu.cn 29 of 30
  • 30. Future work --- application-specific entity summarization ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 30
  • 31. Related work --- summarization ws .nju.edu.cn Paradigm Approach Measure Model RELIN Extractive - Relatedness - Text Centrality-based PageRank-like - Informativeness - Ontology - Non-uniform probability distribution Others PageRank Non-extractive - Degree - Relatedness - Database Centroid-based - Betweenness - Uniform probability - Graph -… distribution Gong Cheng (程龚) gcheng@nju.edu.cn 31 of 30
  • 32. Related work --- ranking ws .nju.edu.cn Different goals --- to best identify the underlying entity B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE Internet Comput. 2005. R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010. T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC 2009. … Exploitation of data semantics at different levels --- use labels of nodes and edges T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009. X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007. … Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 30