SlideShare a Scribd company logo
1 of 35
Graphinder Semantic Search
Relational Keyword Search over Data Graphs
Thanh Tran, Lei Zhang, Veli Bicer, Yongtao Ma
Researcher: www.sites.google.com/site/kimducthanh
Co-Founder: www.graphinder.com
Agenda
•
•
•
•
•

Introduction
Graphinder: Overview
Keyword Query Translation
Keyword Query Result Ranking
Keyword Query Rewriting
– Suggesting correct and meaningful queries
– Auto-complete as user types
INTRODUCTION
Motivation: lots of structured data
Semantic Search: use information about entities and
relationships explicitly given in structured data to provide
relevant answers for complex questions asked using
intuitive interfaces
“singles written by freddie, who is
member of the band queen”
“single written by freddie queen”

MusicBrainz
Single

Artist
Queen

Person

Queen
Elizabeth 1

<x, type, Single>
<Freddie Mercury, writer, x>
<Freddie Mercury, type, Artist>
<Freddie Mercury, member, Queen>
<Queen, type, Band>

DBpedia
Freddie
Mercury

Brian
May
writer

Liar

1971

single

<x, type, Single>
<x, wrritenBy, Freddy>

Links

<Freddy, same-as, Freddy Mercury>
Entity Semantic Search: find relevant entity, return
structured data summary, facts, related entities
Relational Semantic Search: find relevant entities
involved in a relationship, return entity summaries…
Semantic Search Problem: understand user inputs as
entities and relationships and find relevant answers

“single written by freddie queen”
“singles written by freddie, who is
member of the band queen”
Single

Artist

Queen

Freddie
Mercury

Brian
May
writer

Person

Queen
Elizabeth 1

Liar

1971

single

Query Translation: What are possible
connections (schema-level) between
recognized entities and relationships?
1)
<x, type, Single>
<Freddie Mercury, writer, x>
<Freddie Mercury, member, Queen>
2)
….
Query Answering: What are actual
connections (data-level) between
recognized entities and relationships?
1)
<Liar Liar, type, Single>
<Freddie Mercury, writer, Liar Liar>
<Freddie Mercury, member, Queen>
2)
…
Relational Semantic Search at Facebook: recognizes entities and
relationships via LMs, uses manually specified template (grammar) to
find possible connections between them and computes answers via
resulting translated queries
“my friends, who is member of queen”
[start]
my friends, who is member of [id:Queen1]
friends(x,me), member(x,Queen1)
[user-head]
my friends
friends(x,me)

[user-filter]
who is member of [id:1]
member(x,Queen1)
[who]
who
-

[member-vp]
is member of [id:1]
member(x,Queen1)
[member-of-v]
is member of
member()

friends

member

{band}
[id:Queen1]
Queen1

queen

Grammar: set of production rules,
capturing all possible connections,
i.e. the search space of all parse trees
[start]  [users]
[users]  my friends
friends(x, me)
[…]  is member of [bands]
member(x, $1)
[bands]  {band}
$1
…
Grammar-based Query Translation:
which combination of production
rules results in a parse tree that
connects the recognized entities and
relationships?
OVERVIEW
Graphinder Semantic Search: a translation-based approach
for relational keyword search over data graphs

Single

Artist

Person

Queen

Queen Elizabeth 1

Freddie Mercury

Brian
May

Liar

1971

single

writer

Sem. Auto-completion

Query Translation
- Entity + Relationships
- Multi-source
- Domain-independent
- Low manual effort
Graphinder: selected publications
• On-demand, domain-independent, relational keyword search
over data graphs
–
–
–
–

Structure index for data graphs (TKDE13b)
Top-k exploration of translation candidates (ICDE09)
Index-based materialization of graphs (CIKM11a)
Ranking results using structured relevance model (SRM) (CIKM11b)

• Multi-source
– Deduplication using inferred type information: TYPifier (ICDE13),
TYPimatch (WSDM13)
– On-the-fly deduplication using SRM (WWW11)
– Ranking with deduplication (ISWC13)
– Routing keyword queries to relevant data graphs (TKDE13a)
– Hermes: keyword search over heterogeneous data graphs
(SIGMOD09)

• Semantic auto-completion
– Computing valid query rewrites for given keywords (VLDB14)
QUERY TRANSLATION
0) Query Translation: constructing pseudo schema graph
representing all possible connections between data elements
•

•

•

Structure index for data graph:
nodes are groups of data elements
that are share same structure
pattern
Parameters: structure pattern with
edge labels L and paths of maximum
length n
Pseudo schema
– Node groups all instances that have
same set of properties
– structure pattern: all properties, i.e.
all outgoing paths with n = 1, L = all
edge labels

•

Algorithm:
– Start with one single partition/node
representing all instances
– Spit until all nodes are “stable”, i.e.,
all contained instances share same
structure pattern

Single

Artist
Queen

Freddie
Mercury

Brian
May

Person

Queen
Elizabeth 1

Liar

single

writer

member

Artist

producer
Thing12

writer
Single

marital status
Person

Value2
1) Query Translation: constructing search space
representing all possible interpretations of query keywords
“written by freddie queen single”
Freddie
Mercury

Queen
Elizabeth 1

Artist

Freddie
Mercury

producer

Band

Queen

Data
Index

single

writer

member

Queen

Single

Single

Schema
Index

marital status

writer

Keyword Interpretation: use inverted
index and LM-based ranking function to
return relevant schema and data
elements

Person

Literal

Queen
Elizabeth 1

single

Search Space Construction: augment
pseudo schema with query-specific
keyword matching elements
• All possible connections of predicates
applicable to recognized query
keywords
Top-k Subgraph Exploration
Result Retrieval & Ranking
2) Query Translation: score-directed algorithm for finding
top-k subgraphs connecting keyword matching elements
“written by freddie queen single”

member
Artist

Freddie
Mercury

•
•
•

•

•
•

producer
Band

Queen

marital status

writer
Single

Person

Literal

Queen
Elizabeth 1

single

<x, type, Single>
<Queen, producer, x>
<Freddie Mercury, writer, x>
<Queen, type, Band>
<Freddy Mercury, type, Artist>

Algorithm: score-directed top-k Steiner graph search
Start: explore all distinct paths starting from keyword elements
Every iteration
• One step expansion of current path with highest score
• When connecting element found, merge paths and add resulting graph to list
Top-k termination: lowest score of the candidate list > highest possible score that
can achieved with paths in the queues yet to be explored
Termination: all paths of maximum length d have been explored
Final step: mapping rules to translate Steiner graph to structured query
RESULT RANKING
Ranking Using Structured LMs: Keyword query is short and
ambiguous, while structured data provide rich structure
information: ranking based on LMs capturing both content and
structure

• Structured LMs for
structured results r
• Structured LM for queries
using structured pseudorelevant feedback results FR
(relevance model)
• Compute distance between
query and result LMs

RM r (v )

P(v | r )

RMFr (v)

P(v | Fr )

Score( r )

RM Fr ( v ) log RM r ( v )
v V
Relevance Models
freddie queen
Query
F Documents

Merc
ury
Brian
May
Prote
st
Raid
Clas
h
Bank
West

• Term probabilities of query model is
based on documents
• Ranking behaves like similarity search
between pseudo-relevant feedback
documents and corpus documents

Candidate Documents

Merc
ury
Brian
May
Prote
st
Raid
Clas
h
Bank
West
Structured Relevance Models
Structured Data

queen single
Query

F Results

Merc
ury
Brian
May
Prote
st
Raid
Clas
h
Bank
West

• Term probabilities of query model is
based on pseudo-relevant structured data
• Ranking behaves like similarity search
between pseudo-relevant structured
results and structured result
candidates
Structured Data

Candidate Results

Merc
ury
Brian
May
Prote
st
Raid
Clas
h
Bank
West
Ranking: construct edge-specific query model for each unique e
from feedback resources FR, edge-specific model for every
candidate r, and finally, compute distance
For all
resources r
in FR

Prob of observing
term v in value of
property e of
resource r

RMname

RMcomment

RMx

Mercury

.091

.01

…

Brian

.082

.01

…

Champion
Importance of resource r w.r.t. query

v

.081

.02

…

Protest

.001

.042

…

Raid

.006

.014

…

…

…

…

…

v

RMname

RMcomment

RMx

Mercury

.073

.01

…

Brian

.052

.01

…

…

…

…

…
QUERY REWRITING
Query Rewriting: find syntactically and semantically valid
rewrites to suggest as user types
single from freddy mercury que
Freddie
Mercury

Queen
Elizabeth 1

Queen

single

writer

Single

Data
Index
Schema
Index

Benefits:
- Higher selectivity of query terms (quality)
- Reduced number of query terms (efficiency)
- Better search experience…
Freddie
Mercury

Data
Index

Queen

writer

Single

Schema
Index

Challenges: many rewrite candidates, some are
semantically not “valid” in the relational setting
single (marital status) writer “freddie mercury” queen
(the queen of UK)

Token rewriting via syntactic distance
Keyword Interpretation:
- Imprecise / fuzzy matching
1) single from freddie mercury queen
- Match every keyword
…
Token rewriting via semantic distance
1) single writer freddie mercury queen
…

Query segmentation
1) single writer “freddie mercury” queen
…

Keyword / Key Phrase Interpretation:
- Precise matching
- Match keyword and key phrases
Search Space Construction
Search Space Construction
Result Retrieval & Ranking
Probabilistic Model for Query Rewriting: the rank of a
query rewrite (suggestion) S is based on the
probability of observing S in the data, given the query
Based on
Bayes„ Theorem

Probability
users write
spelling errors
/ semantically
related query
independent of
data D

single writer freddy mercury que

1) single writer freddie mercury queen
2) single writer freddrick mercury monarch
3) song writer freddrick mercury head of state

Constant
given query Q
and data D

Single

Artist

Person

Queen

Queen Elizabeth 1

Token Rewriting: S is
ranked high when prob
that query Q can be
observed in S is high

Query Segmentation: S is
ranked high when prob that
S can be observed in the
data D is high

Freddie Mercury

Brian
May

Liar

writer

1971

single
Token Rewriting
• Modeling token rewriting P(Q|S)

Split: |
Concatenate: +

• Independence assumption

• Modeling syntactic and semantic differences

P(q|t): is high when q is
syntactically and
semantically close to t

single writer freddy mercury que
1) single writer “freddie mercury” queen
2) single writer “freddrick mercury” monarch
3) single writer “freddrick mercury” head of state

single | writer | freddie + mercury | queen
Query Segmentation
• Modeling query segmentation P(S|D)
single writer freddie mercury que

α = concatenate?
α = split?
where PD(αiti+1|t1α1t2…αi-1ti)
stands for P(αiti+1|t1α1t2…αi-1ti,D).

Singl
e

Art
ist

single writer freddie

Queen Elizabeth 1

Freddie
Mercury

Brian
May

Liar

writer

• Nth order Markov assumption

Person

Queen

1
9
7
1

single
Estimating Probability of Segmentation
• Maximum likelihood estimation (MLE)

where C(ti…tj) denotes the count of occurrences of the token sequence ti…tj

Segmentation in structured data setting
• Concatenate two segments si and sj when they co-occur in the data
• Split when si and sj are connected (si ↭ sj), i.e., when the two data
elements ni and ni mentioning si and sj are connected in the data
single writer freddie mercury queen

Single

Artist

α = concatenate?
α = split?

single writer freddie

Person

Queen

Freddie
Mercury

Brian
May
writer

Queen
Elizabeth 1

Liar

1971

single
Estimating Probability of Segmentation Case 1: previous
segment si has length equal or more than context N
• Two cases: (1) l(si) ≥ N; (2) l(si) < N
• (1) When the previously induced segment si has length equal or
more than N, i.e. l(si) ≥ N, it suffices to focus on si (N) to predict
the next action αi on ti+1
freddie j. mercury

queen

freddie j. mercury

queen

• Estimation of probability

where C(st) denotes the count of co-occurrences of the sequence st in D and
C(s ↭ t) is the count of all occurrences of token t connected to segment s
Estimating Probability of Segmentation Case 2: previous
segment si has length less than context N
• (2) When the previous segment si has length less than N, i.e. l(si) <
N, the action αi on the next token ti+1 depends on si and Pi(N), the
set of segments that precede si that together with si, contains at
most N tokens in total, i.e.,
single

writer

freddie

mercury single

writer

freddie

mercury

• Estimation of probability

where C(P ↭ s) denotes the count of all occurrences of the segment s
connected to all segments in P
EXPERIMENTAL RESULTS &
CONCLUSIONS
• Graphinder, a relational keyword search approach for suggesting query
•

•

•

•
•

completions, translating queries and ranking results
Keyword translation performance
– Query translation and index-based approaches at least one-order of magnitude
faster than online in-memory search (bidirectional)
– Query translation comparable with index-based approaches, but less space
Keyword translation result quality
– According to recent benchmark, our ranking consistently outperforms all
existing ranking systems in precision, recall and MAP (10% - 30% improvement)
Effect of query rewriting
– Better user experience
– Improves efficiency by reducing number of query terms
– Improves quality / selectivity of query terms
– …depends on complexity of queries and underlying keyword search engine
Tight integration of query suggestion and translation
From research prototypes to Graphinder, a powerful, flexible, low upfront-cost
semantic search system
Thanks!

Tran Duc Thanh
tran.du.th@gmail.com
http://sites.google.com/site/kimducthanh/
References (1)
– [VLDB14] Yongtao Ma, Thanh Tran
Probabilistic Query Rewriting for Efficient and and Effective Keyword Search on
Graph Data
In International Conference on Very Large Data Bases (VLDB'14). Hangzhou,
China, September, 2014
– [ISWC13] Daniel Herzig, Roi Blanco, Peter Mika and Thanh Tran
Federated Entity Search Using On-the-Fly Consolidation
In International Semantic Web Conference (ISWC'13). Sydney, Australia, October,
2013
– [ICDE13] Yongtao Ma, Thanh Tran
TYPifier: Inferring the Type Semantics of Structured Data
In International Conference on Data Engineering (ICDE'13). Brisbane, Australia, April,
2013
– [WSDM13] Yongtao Ma, Thanh Tran
TYPiMatch: Type-specific Unsupervised Learning of Keys and Key Values for
Heterogeneous Web Data Integration
In International Conference on Web Search and Data Mining (WSDM'13). Rome,
Italy, February, 2013
– [TKDE12a] Thanh Tran, Günter Ladwig, Sebastian Rudolph
Managing Structured and Semi-structured RDF Data Using Structure Indexes
In Transactions on Knowledge and Data Engineering journal.
– [TKDE12b] Thanh Tran, Lei Zhang
Keyword Query Routing
In Transactions on Knowledge and Data Engineering journal.
References (2)
– [WWW12] Daniel Herzig, Thanh Tran
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
In Proceedings of 21st International World Wide Web Conference (WWW'12). Lyon,
France, April, 2012
– [CIKM11a] Günter Ladwig, Thanh Tran
Index Structures and Top-k Join Algorithms for Native Keyword Search Databases
In Proceedings of 20th ACM Conference on Information and Knowledge
Management (CIKM'11). Glasgow, UK, October, 2011
– [CIKM11b] Veli Bicer, Thanh Tran
Ranking Support for Keyword Search on Structured Data using Relevance Models
In Proceedings of 20th ACM Conference on Information and Knowledge
Management (CIKM'11). Glasgow, UK, October, 2011
– [SIGIR11] Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey
Pound, Henry S. Thompson, Thanh Tran Duc
Repeatable and Reliable Search System Evaluation using Crowdsourcing
In Proceedings of 34th Annual International ACM SIGIR Conference (SIGIR'11),
Beijing, China, July, 2011
– [ICDE09] Duc Thanh Tran, Haofen Wang, Sebastian Rudolph, Philipp Cimiano
Top-k Exploration of Query Graph Candidates for Efficient Keyword Search on RDF
In Proceedings of the 25th International Conference on Data Engineering (ICDE'09).
Shanghai, China, March 2009
– [SIGMOD09] Haofen Wang, Thomas Penin, Kaifeng Xu, Junquan Chen, Xinruo Sun,
Linyun Fu, Yong Yu, Thanh Tran, Peter Haase, Rudi Studer
Hermes: A Travel through Semantics in the Data Web
In Proceedings of SIGMOD Conference 2009. Providence, USA, June-July, 2009
BACKUP

More Related Content

Similar to Graphinder Semantic Search Over Data Graphs

Big data search
Big data search Big data search
Big data search Thanh Tran
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Abhimanyu Lad
 
Data mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sirData mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sirchaudharipruthvirajr
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataRoi Blanco
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Margaret Wang
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open DataSarasi Sarangi
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jMax De Marzi
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...Università degli Studi di Milano-Bicocca
 
DBtrends Semantics 2016
DBtrends Semantics 2016DBtrends Semantics 2016
DBtrends Semantics 2016Edgard Marx
 

Similar to Graphinder Semantic Search Over Data Graphs (20)

Big data search
Big data search Big data search
Big data search
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Stack_Overflow-Network_Graph
Stack_Overflow-Network_GraphStack_Overflow-Network_Graph
Stack_Overflow-Network_Graph
 
Social (1)
Social (1)Social (1)
Social (1)
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
 
Data mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sirData mining and warehouse by dr D. R. Patil sir
Data mining and warehouse by dr D. R. Patil sir
 
BoTLRet: A Template-based Linked Data Information Retrieval
 BoTLRet: A Template-based Linked Data Information Retrieval BoTLRet: A Template-based Linked Data Information Retrieval
BoTLRet: A Template-based Linked Data Information Retrieval
 
Web mining
Web miningWeb mining
Web mining
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
 
B 4 gravty
B 4 gravtyB 4 gravty
B 4 gravty
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461Data.Mining.C.8(Ii).Web Mining 570802461
Data.Mining.C.8(Ii).Web Mining 570802461
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
DaCENA Personalized Exploration of Knowledge Graphs Within a Context. Seminar...
 
DBtrends Semantics 2016
DBtrends Semantics 2016DBtrends Semantics 2016
DBtrends Semantics 2016
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Graphinder Semantic Search Over Data Graphs

  • 1. Graphinder Semantic Search Relational Keyword Search over Data Graphs Thanh Tran, Lei Zhang, Veli Bicer, Yongtao Ma Researcher: www.sites.google.com/site/kimducthanh Co-Founder: www.graphinder.com
  • 2. Agenda • • • • • Introduction Graphinder: Overview Keyword Query Translation Keyword Query Result Ranking Keyword Query Rewriting – Suggesting correct and meaningful queries – Auto-complete as user types
  • 4. Motivation: lots of structured data
  • 5. Semantic Search: use information about entities and relationships explicitly given in structured data to provide relevant answers for complex questions asked using intuitive interfaces “singles written by freddie, who is member of the band queen” “single written by freddie queen” MusicBrainz Single Artist Queen Person Queen Elizabeth 1 <x, type, Single> <Freddie Mercury, writer, x> <Freddie Mercury, type, Artist> <Freddie Mercury, member, Queen> <Queen, type, Band> DBpedia Freddie Mercury Brian May writer Liar 1971 single <x, type, Single> <x, wrritenBy, Freddy> Links <Freddy, same-as, Freddy Mercury>
  • 6. Entity Semantic Search: find relevant entity, return structured data summary, facts, related entities
  • 7. Relational Semantic Search: find relevant entities involved in a relationship, return entity summaries…
  • 8. Semantic Search Problem: understand user inputs as entities and relationships and find relevant answers “single written by freddie queen” “singles written by freddie, who is member of the band queen” Single Artist Queen Freddie Mercury Brian May writer Person Queen Elizabeth 1 Liar 1971 single Query Translation: What are possible connections (schema-level) between recognized entities and relationships? 1) <x, type, Single> <Freddie Mercury, writer, x> <Freddie Mercury, member, Queen> 2) …. Query Answering: What are actual connections (data-level) between recognized entities and relationships? 1) <Liar Liar, type, Single> <Freddie Mercury, writer, Liar Liar> <Freddie Mercury, member, Queen> 2) …
  • 9. Relational Semantic Search at Facebook: recognizes entities and relationships via LMs, uses manually specified template (grammar) to find possible connections between them and computes answers via resulting translated queries “my friends, who is member of queen” [start] my friends, who is member of [id:Queen1] friends(x,me), member(x,Queen1) [user-head] my friends friends(x,me) [user-filter] who is member of [id:1] member(x,Queen1) [who] who - [member-vp] is member of [id:1] member(x,Queen1) [member-of-v] is member of member() friends member {band} [id:Queen1] Queen1 queen Grammar: set of production rules, capturing all possible connections, i.e. the search space of all parse trees [start]  [users] [users]  my friends friends(x, me) […]  is member of [bands] member(x, $1) [bands]  {band} $1 … Grammar-based Query Translation: which combination of production rules results in a parse tree that connects the recognized entities and relationships?
  • 11. Graphinder Semantic Search: a translation-based approach for relational keyword search over data graphs Single Artist Person Queen Queen Elizabeth 1 Freddie Mercury Brian May Liar 1971 single writer Sem. Auto-completion Query Translation - Entity + Relationships - Multi-source - Domain-independent - Low manual effort
  • 12. Graphinder: selected publications • On-demand, domain-independent, relational keyword search over data graphs – – – – Structure index for data graphs (TKDE13b) Top-k exploration of translation candidates (ICDE09) Index-based materialization of graphs (CIKM11a) Ranking results using structured relevance model (SRM) (CIKM11b) • Multi-source – Deduplication using inferred type information: TYPifier (ICDE13), TYPimatch (WSDM13) – On-the-fly deduplication using SRM (WWW11) – Ranking with deduplication (ISWC13) – Routing keyword queries to relevant data graphs (TKDE13a) – Hermes: keyword search over heterogeneous data graphs (SIGMOD09) • Semantic auto-completion – Computing valid query rewrites for given keywords (VLDB14)
  • 14. 0) Query Translation: constructing pseudo schema graph representing all possible connections between data elements • • • Structure index for data graph: nodes are groups of data elements that are share same structure pattern Parameters: structure pattern with edge labels L and paths of maximum length n Pseudo schema – Node groups all instances that have same set of properties – structure pattern: all properties, i.e. all outgoing paths with n = 1, L = all edge labels • Algorithm: – Start with one single partition/node representing all instances – Spit until all nodes are “stable”, i.e., all contained instances share same structure pattern Single Artist Queen Freddie Mercury Brian May Person Queen Elizabeth 1 Liar single writer member Artist producer Thing12 writer Single marital status Person Value2
  • 15. 1) Query Translation: constructing search space representing all possible interpretations of query keywords “written by freddie queen single” Freddie Mercury Queen Elizabeth 1 Artist Freddie Mercury producer Band Queen Data Index single writer member Queen Single Single Schema Index marital status writer Keyword Interpretation: use inverted index and LM-based ranking function to return relevant schema and data elements Person Literal Queen Elizabeth 1 single Search Space Construction: augment pseudo schema with query-specific keyword matching elements • All possible connections of predicates applicable to recognized query keywords Top-k Subgraph Exploration Result Retrieval & Ranking
  • 16. 2) Query Translation: score-directed algorithm for finding top-k subgraphs connecting keyword matching elements “written by freddie queen single” member Artist Freddie Mercury • • • • • • producer Band Queen marital status writer Single Person Literal Queen Elizabeth 1 single <x, type, Single> <Queen, producer, x> <Freddie Mercury, writer, x> <Queen, type, Band> <Freddy Mercury, type, Artist> Algorithm: score-directed top-k Steiner graph search Start: explore all distinct paths starting from keyword elements Every iteration • One step expansion of current path with highest score • When connecting element found, merge paths and add resulting graph to list Top-k termination: lowest score of the candidate list > highest possible score that can achieved with paths in the queues yet to be explored Termination: all paths of maximum length d have been explored Final step: mapping rules to translate Steiner graph to structured query
  • 18. Ranking Using Structured LMs: Keyword query is short and ambiguous, while structured data provide rich structure information: ranking based on LMs capturing both content and structure • Structured LMs for structured results r • Structured LM for queries using structured pseudorelevant feedback results FR (relevance model) • Compute distance between query and result LMs RM r (v ) P(v | r ) RMFr (v) P(v | Fr ) Score( r ) RM Fr ( v ) log RM r ( v ) v V
  • 19. Relevance Models freddie queen Query F Documents Merc ury Brian May Prote st Raid Clas h Bank West • Term probabilities of query model is based on documents • Ranking behaves like similarity search between pseudo-relevant feedback documents and corpus documents Candidate Documents Merc ury Brian May Prote st Raid Clas h Bank West
  • 20. Structured Relevance Models Structured Data queen single Query F Results Merc ury Brian May Prote st Raid Clas h Bank West • Term probabilities of query model is based on pseudo-relevant structured data • Ranking behaves like similarity search between pseudo-relevant structured results and structured result candidates Structured Data Candidate Results Merc ury Brian May Prote st Raid Clas h Bank West
  • 21. Ranking: construct edge-specific query model for each unique e from feedback resources FR, edge-specific model for every candidate r, and finally, compute distance For all resources r in FR Prob of observing term v in value of property e of resource r RMname RMcomment RMx Mercury .091 .01 … Brian .082 .01 … Champion Importance of resource r w.r.t. query v .081 .02 … Protest .001 .042 … Raid .006 .014 … … … … … v RMname RMcomment RMx Mercury .073 .01 … Brian .052 .01 … … … … …
  • 23. Query Rewriting: find syntactically and semantically valid rewrites to suggest as user types single from freddy mercury que Freddie Mercury Queen Elizabeth 1 Queen single writer Single Data Index Schema Index Benefits: - Higher selectivity of query terms (quality) - Reduced number of query terms (efficiency) - Better search experience… Freddie Mercury Data Index Queen writer Single Schema Index Challenges: many rewrite candidates, some are semantically not “valid” in the relational setting single (marital status) writer “freddie mercury” queen (the queen of UK) Token rewriting via syntactic distance Keyword Interpretation: - Imprecise / fuzzy matching 1) single from freddie mercury queen - Match every keyword … Token rewriting via semantic distance 1) single writer freddie mercury queen … Query segmentation 1) single writer “freddie mercury” queen … Keyword / Key Phrase Interpretation: - Precise matching - Match keyword and key phrases Search Space Construction Search Space Construction Result Retrieval & Ranking
  • 24. Probabilistic Model for Query Rewriting: the rank of a query rewrite (suggestion) S is based on the probability of observing S in the data, given the query Based on Bayes„ Theorem Probability users write spelling errors / semantically related query independent of data D single writer freddy mercury que 1) single writer freddie mercury queen 2) single writer freddrick mercury monarch 3) song writer freddrick mercury head of state Constant given query Q and data D Single Artist Person Queen Queen Elizabeth 1 Token Rewriting: S is ranked high when prob that query Q can be observed in S is high Query Segmentation: S is ranked high when prob that S can be observed in the data D is high Freddie Mercury Brian May Liar writer 1971 single
  • 25. Token Rewriting • Modeling token rewriting P(Q|S) Split: | Concatenate: + • Independence assumption • Modeling syntactic and semantic differences P(q|t): is high when q is syntactically and semantically close to t single writer freddy mercury que 1) single writer “freddie mercury” queen 2) single writer “freddrick mercury” monarch 3) single writer “freddrick mercury” head of state single | writer | freddie + mercury | queen
  • 26. Query Segmentation • Modeling query segmentation P(S|D) single writer freddie mercury que α = concatenate? α = split? where PD(αiti+1|t1α1t2…αi-1ti) stands for P(αiti+1|t1α1t2…αi-1ti,D). Singl e Art ist single writer freddie Queen Elizabeth 1 Freddie Mercury Brian May Liar writer • Nth order Markov assumption Person Queen 1 9 7 1 single
  • 27. Estimating Probability of Segmentation • Maximum likelihood estimation (MLE) where C(ti…tj) denotes the count of occurrences of the token sequence ti…tj Segmentation in structured data setting • Concatenate two segments si and sj when they co-occur in the data • Split when si and sj are connected (si ↭ sj), i.e., when the two data elements ni and ni mentioning si and sj are connected in the data single writer freddie mercury queen Single Artist α = concatenate? α = split? single writer freddie Person Queen Freddie Mercury Brian May writer Queen Elizabeth 1 Liar 1971 single
  • 28. Estimating Probability of Segmentation Case 1: previous segment si has length equal or more than context N • Two cases: (1) l(si) ≥ N; (2) l(si) < N • (1) When the previously induced segment si has length equal or more than N, i.e. l(si) ≥ N, it suffices to focus on si (N) to predict the next action αi on ti+1 freddie j. mercury queen freddie j. mercury queen • Estimation of probability where C(st) denotes the count of co-occurrences of the sequence st in D and C(s ↭ t) is the count of all occurrences of token t connected to segment s
  • 29. Estimating Probability of Segmentation Case 2: previous segment si has length less than context N • (2) When the previous segment si has length less than N, i.e. l(si) < N, the action αi on the next token ti+1 depends on si and Pi(N), the set of segments that precede si that together with si, contains at most N tokens in total, i.e., single writer freddie mercury single writer freddie mercury • Estimation of probability where C(P ↭ s) denotes the count of all occurrences of the segment s connected to all segments in P
  • 31. • Graphinder, a relational keyword search approach for suggesting query • • • • • completions, translating queries and ranking results Keyword translation performance – Query translation and index-based approaches at least one-order of magnitude faster than online in-memory search (bidirectional) – Query translation comparable with index-based approaches, but less space Keyword translation result quality – According to recent benchmark, our ranking consistently outperforms all existing ranking systems in precision, recall and MAP (10% - 30% improvement) Effect of query rewriting – Better user experience – Improves efficiency by reducing number of query terms – Improves quality / selectivity of query terms – …depends on complexity of queries and underlying keyword search engine Tight integration of query suggestion and translation From research prototypes to Graphinder, a powerful, flexible, low upfront-cost semantic search system
  • 33. References (1) – [VLDB14] Yongtao Ma, Thanh Tran Probabilistic Query Rewriting for Efficient and and Effective Keyword Search on Graph Data In International Conference on Very Large Data Bases (VLDB'14). Hangzhou, China, September, 2014 – [ISWC13] Daniel Herzig, Roi Blanco, Peter Mika and Thanh Tran Federated Entity Search Using On-the-Fly Consolidation In International Semantic Web Conference (ISWC'13). Sydney, Australia, October, 2013 – [ICDE13] Yongtao Ma, Thanh Tran TYPifier: Inferring the Type Semantics of Structured Data In International Conference on Data Engineering (ICDE'13). Brisbane, Australia, April, 2013 – [WSDM13] Yongtao Ma, Thanh Tran TYPiMatch: Type-specific Unsupervised Learning of Keys and Key Values for Heterogeneous Web Data Integration In International Conference on Web Search and Data Mining (WSDM'13). Rome, Italy, February, 2013 – [TKDE12a] Thanh Tran, Günter Ladwig, Sebastian Rudolph Managing Structured and Semi-structured RDF Data Using Structure Indexes In Transactions on Knowledge and Data Engineering journal. – [TKDE12b] Thanh Tran, Lei Zhang Keyword Query Routing In Transactions on Knowledge and Data Engineering journal.
  • 34. References (2) – [WWW12] Daniel Herzig, Thanh Tran Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration In Proceedings of 21st International World Wide Web Conference (WWW'12). Lyon, France, April, 2012 – [CIKM11a] Günter Ladwig, Thanh Tran Index Structures and Top-k Join Algorithms for Native Keyword Search Databases In Proceedings of 20th ACM Conference on Information and Knowledge Management (CIKM'11). Glasgow, UK, October, 2011 – [CIKM11b] Veli Bicer, Thanh Tran Ranking Support for Keyword Search on Structured Data using Relevance Models In Proceedings of 20th ACM Conference on Information and Knowledge Management (CIKM'11). Glasgow, UK, October, 2011 – [SIGIR11] Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh Tran Duc Repeatable and Reliable Search System Evaluation using Crowdsourcing In Proceedings of 34th Annual International ACM SIGIR Conference (SIGIR'11), Beijing, China, July, 2011 – [ICDE09] Duc Thanh Tran, Haofen Wang, Sebastian Rudolph, Philipp Cimiano Top-k Exploration of Query Graph Candidates for Efficient Keyword Search on RDF In Proceedings of the 25th International Conference on Data Engineering (ICDE'09). Shanghai, China, March 2009 – [SIGMOD09] Haofen Wang, Thomas Penin, Kaifeng Xu, Junquan Chen, Xinruo Sun, Linyun Fu, Yong Yu, Thanh Tran, Peter Haase, Rudi Studer Hermes: A Travel through Semantics in the Data Web In Proceedings of SIGMOD Conference 2009. Providence, USA, June-July, 2009

Editor's Notes

  1. Construct query model from structured data elements that are close to the queryIndex resources in the data graph where resources are treated as documents and attributes and attribute values are indexed as document terms use standard inverted index implementation and IR search engine to retrieve resources for a given keyword query initial run of the query yields F results
  2. Query model: probability of terms in the query model is estimated using F resources: intuitively, probability of a term is estimated as the probability of observing these terms in the F resources (based on the probability of observing the term in the e-value of r, and the probability of e) Weight by the importance of that resource: a resource is more important if query terms are more likely to be observed in that resources, compared to other resources in FEdge-specific resourcemodel:probability of observingterm v in e-value of r, smoothing with prpobability of observing term v in all values of rThe score of a resource calculated based on cross-entropy of edge-specific RM and edge-specific ResM:Aggrgated over EVERY E: Alpha allows to control the importance of edgesInstead of singleentities, rankingcomplexgraphscomprisingmultupleentities,calledJoinedResultTuple: modelcomplexresultsas a geometricmean of the entitymodelsRanking aggregated JRTs: The cross entropy between the edge-specific RM (Query Model) and geometric mean of combined edge-specific ResM:The proposed ranking function is monotonic with respect to the individual resource scores (a necessary property for using top-k algorithms)A language model is constructed for every attribute of the resource to capture the probability of a word being observed via repeated sampling from the content of a specific attribute of rLambda controls the weight of the edge-specific attribute, small value means less emphasis on the term of the attribute and more emphasis on the terms of the entire resource (terms in all attributes)Pe is the probability of observing a word v in the edge specific attribute a P* is the probability of observing a word v in all attributes of rConsider the co-occurences of a word and query words in the content of a specific attribute aThe sampling process we implement is iidiidsamping: query words and w are iid sampled from a unigram distribution a, i.e. representing content of the specific attribute a, then sample v from a, and then sample k times query words from a distribution representing the content of all attributes of r