Poster Final

•

0 gefällt mir•36 views

Gireeshma Reddy

R.I.T B. Thomas Golisano
College of COMPUTING AND INFORMATION SCIENCES
Using the GeX Approach for Approximate Matching on Graph Databases
Gireeshma Bokka Reddy (gr9334@rit.edu)
Advisor: Prof. Carlos R. Rivero
Rochester Institute of Technology
References:
1. F. Mandreoli, R. Martogliaa, W. Penzo Approximating expressive queries on graph-modeled data: The GeX approach.
2. C. Stark, B. Breitkreutz, A. Breitkreutz, M. Tyers,T. Reguly BioGRID: a general repository for interaction datasets.
INTRODUCTION:
Increase in the popularity of social networking
websites has increased the need for graph
databases as relationships between the data
hold an important role here. This project is based
on Approximate Matching using the GEX
Approach on such databases.
II. APPROXIMATE NODE MATCHING:
CONCLUSION:
The GeX Top-K Query algorithm is very
accurate but as the size of the dataset
increases, the time taken to compute the
result increases considerably. Future work can
include the approximation of the edge labels
as well.
BACKGROUND:
There are two methods which can be adopted
to match the patterns in a graph database
1. Exact subgraph matching.
2. Approximate subgraph matching.
DATASET DESCRIPTION:
• The data was obtained from the BioGRID
website which is an updated interaction
repository.
• The dataset consists of protein and genetic
interactions.
• >1,16,000 interactions from Saccharomyces
cerevisiae(yeast), Caenorhabditis elegans
(roundworm), Drosophila melanogaster (fly)
and Homo sapiens (humans) in CSV format
are available for download.
THE GeX TOP-K QUERY ANSWERING ALGORITHM:
CURSOR INITIALIZATION:
The following functions are performed on each
edge of the graph database:
1. Find the ones matching with the query edges.
2. Store the matched cursors.
CURSOR ACCESS AND SOLUTION BUILDING:
The following functions are performed on each
of the stored cursors:
1. The Scoring function is calculated for a
combination of the cursors and sorted in
ascending order.
2. The top-k values among the sorted cursors
are considered as the final results.
THE SCORING FUNCTION:
For a query, q = (Nq,Eq,LN
q,LE
q,V,C)
Scoring function given by S(Ԑ) =
𝛼
|𝑁 𝑞
| 𝑛 ∈ 𝑁 𝑞 𝑑 𝐿(λ(𝑛), λ(𝑓(𝑛))) ---------------------- (1)
Part (1) Measures the syntactic, semantic and
structural relationship between a query node
and its data node.
Part (2) Measures the semantic and structural
approximation between query edge and its
corresponding edge.
Part (3) Measures the traversal.
Approximate match Exact match
RESULTS:
I. EXACT NODE MATCHING:
Solution 2 Solution 3
Input Solution 1
Input Solution 1
Solution 2 Solution 3
The nodes are represented as follows:
+
𝛽
2 𝐸 𝑞 𝑒∈𝐸 𝑞 𝑑 𝐿 𝜆 𝑒 , 𝜆 𝑔 𝑒 +
𝑐 𝑔 𝑒
𝑀𝐶
------ (2)
+
𝛾
𝐶 𝑐∈𝐶(1 − 𝑠(𝑐)) ------------------------------------ (3)

Empfohlen

Using parallel hierarchical clustering toBiniam Behailu

Indexing data on the web a comparison of schema level indices for data searchTill Blume

K-SUBSPACES QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCHNexgen Technology

ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHSNexgen Technology

Fast top k path-based relevance query on massive graphsieeechennai

Iccsa stankuteha180611Beniamino Murgante

Improvement of Spatial Data Quality Using the Data ConﬂationBeniamino Murgante

15Technology_solution

Empfohlen

Using parallel hierarchical clustering toBiniam Behailu

Indexing data on the web a comparison of schema level indices for data searchTill Blume

K-SUBSPACES QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCHNexgen Technology

ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHSNexgen Technology

Fast top k path-based relevance query on massive graphsieeechennai

Iccsa stankuteha180611Beniamino Murgante

Improvement of Spatial Data Quality Using the Data ConﬂationBeniamino Murgante

15Technology_solution

03 interlinking-dassDiego Pessoa

Improved k-meansKasun Ranga Wijeweera

C055011012inventy

Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...The Statistical and Applied Mathematical Sciences Institute

APS March Meeting Nathan Frey 2020Nathan Frey, PhD

SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATIONNexgen Technology

Streaming Weather Data from Web APIs to Jupyter through KafkaLeo Salemann

AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...Nexgen Technology

Graph based Clustering怡秀林

Analysis of grid log data with Affinity PropagationGabriele Modena

INVERTED LINEAR QUADTREE: EFFICIENT TOP K SPATIAL KEYWORD SEARCHNexgen Technology

An Empirical Comparison of Fast and Efficient Tools for Mining Textual Datavtunali

Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyNathan Frey, PhD

Textmining Retrieval And Clusteringguest0edcaf

Fast and scalable range query processing with strong privacy protection for c...Shakas Technologies

EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDNexgen Technology

1Spatial: FME World Tour London: Postal address clean-up1Spatial

M phil-computer-science-machine-language-and-pattern-analysis-projectsVijay Karan

APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme

Data dissemination and materials informatics at LBNLAnubhav Jain

ROMAN URDU OPINION MINING SYSTEM (RUOMIS) cseij

Hybrid geo textual index structurecseij

Weitere ähnliche Inhalte

Was ist angesagt?

03 interlinking-dassDiego Pessoa

Improved k-meansKasun Ranga Wijeweera

C055011012inventy

Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...The Statistical and Applied Mathematical Sciences Institute

APS March Meeting Nathan Frey 2020Nathan Frey, PhD

SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATIONNexgen Technology

Streaming Weather Data from Web APIs to Jupyter through KafkaLeo Salemann

AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...Nexgen Technology

Graph based Clustering怡秀林

Analysis of grid log data with Affinity PropagationGabriele Modena

INVERTED LINEAR QUADTREE: EFFICIENT TOP K SPATIAL KEYWORD SEARCHNexgen Technology

An Empirical Comparison of Fast and Efficient Tools for Mining Textual Datavtunali

Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyNathan Frey, PhD

Textmining Retrieval And Clusteringguest0edcaf

Fast and scalable range query processing with strong privacy protection for c...Shakas Technologies

EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDNexgen Technology

1Spatial: FME World Tour London: Postal address clean-up1Spatial

M phil-computer-science-machine-language-and-pattern-analysis-projectsVijay Karan

APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme

Was ist angesagt? (19)

03 interlinking-dass

Improved k-means

C055011012

Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...

APS March Meeting Nathan Frey 2020

SCALABLE SEMI-SUPERVISED LEARNING BY EFFICIENT ANCHOR GRAPH REGULARIZATION

Streaming Weather Data from Web APIs to Jupyter through Kafka

AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...

Graph based Clustering

Analysis of grid log data with Affinity Propagation

INVERTED LINEAR QUADTREE: EFFICIENT TOP K SPATIAL KEYWORD SEARCH

An Empirical Comparison of Fast and Efficient Tools for Mining Textual Data

Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey

Textmining Retrieval And Clustering

Fast and scalable range query processing with strong privacy protection for c...

EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD

1Spatial: FME World Tour London: Postal address clean-up

M phil-computer-science-machine-language-and-pattern-analysis-projects

APM project meeting - June 13, 2012 - LBNL, Berkeley, CA

Ähnlich wie Poster Final

Data dissemination and materials informatics at LBNLAnubhav Jain

ROMAN URDU OPINION MINING SYSTEM (RUOMIS) cseij

Hybrid geo textual index structurecseij

EDBT 2015: Summer School Overviewdgarijo

G1803054653IOSR Journals

Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy

SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj

Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal

K-means Clustering Method for the Analysis of Log Dataidescitation

20 26 Ijarcsee Journal

Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker

Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce

I017235662IOSR Journals

Sub-Graph Finding Information over Nebula Networksijceronline

ME SynopsisPoonam Debnath

O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit

RSDC (Reliable Scheduling Distributed in Cloud Computing)IJCSEA Journal

Energy Efficient Optimal Paths Using PDORP-LCpaperpublications3

Ensemble based Distributed K-Modes ClusteringIJERD Editor

A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER

Ähnlich wie Poster Final (20)

Data dissemination and materials informatics at LBNL

ROMAN URDU OPINION MINING SYSTEM (RUOMIS)

Hybrid geo textual index structure

EDBT 2015: Summer School Overview

G1803054653

Graphical Structure Learning accelerated with POWER9

SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS

Density Based Clustering Approach for Solving the Software Component Restruct...

K-means Clustering Method for the Analysis of Log Data

20 26

Survey on classification algorithms for data mining (comparison and evaluation)

Particle Swarm Optimization based K-Prototype Clustering Algorithm

I017235662

Sub-Graph Finding Information over Nebula Networks

ME Synopsis

O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...

RSDC (Reliable Scheduling Distributed in Cloud Computing)

Energy Efficient Optimal Paths Using PDORP-LC

Ensemble based Distributed K-Modes Clustering

A Novel Multi- Viewpoint based Similarity Measure for Document Clustering

Poster Final

1. R.I.T B. Thomas Golisano College of COMPUTING AND INFORMATION SCIENCES Using the GeX Approach for Approximate Matching on Graph Databases Gireeshma Bokka Reddy (gr9334@rit.edu) Advisor: Prof. Carlos R. Rivero Rochester Institute of Technology References: 1. F. Mandreoli, R. Martogliaa, W. Penzo Approximating expressive queries on graph-modeled data: The GeX approach. 2. C. Stark, B. Breitkreutz, A. Breitkreutz, M. Tyers,T. Reguly BioGRID: a general repository for interaction datasets. INTRODUCTION: Increase in the popularity of social networking websites has increased the need for graph databases as relationships between the data hold an important role here. This project is based on Approximate Matching using the GEX Approach on such databases. II. APPROXIMATE NODE MATCHING: CONCLUSION: The GeX Top-K Query algorithm is very accurate but as the size of the dataset increases, the time taken to compute the result increases considerably. Future work can include the approximation of the edge labels as well. BACKGROUND: There are two methods which can be adopted to match the patterns in a graph database 1. Exact subgraph matching. 2. Approximate subgraph matching. DATASET DESCRIPTION: • The data was obtained from the BioGRID website which is an updated interaction repository. • The dataset consists of protein and genetic interactions. • >1,16,000 interactions from Saccharomyces cerevisiae(yeast), Caenorhabditis elegans (roundworm), Drosophila melanogaster (fly) and Homo sapiens (humans) in CSV format are available for download. THE GeX TOP-K QUERY ANSWERING ALGORITHM: CURSOR INITIALIZATION: The following functions are performed on each edge of the graph database: 1. Find the ones matching with the query edges. 2. Store the matched cursors. CURSOR ACCESS AND SOLUTION BUILDING: The following functions are performed on each of the stored cursors: 1. The Scoring function is calculated for a combination of the cursors and sorted in ascending order. 2. The top-k values among the sorted cursors are considered as the final results. THE SCORING FUNCTION: For a query, q = (Nq,Eq,LN q,LE q,V,C) Scoring function given by S(Ԑ) = 𝛼 |𝑁 𝑞 | 𝑛 ∈ 𝑁 𝑞 𝑑 𝐿(λ(𝑛), λ(𝑓(𝑛))) ---------------------- (1) Part (1) Measures the syntactic, semantic and structural relationship between a query node and its data node. Part (2) Measures the semantic and structural approximation between query edge and its corresponding edge. Part (3) Measures the traversal. Approximate match Exact match RESULTS: I. EXACT NODE MATCHING: Solution 2 Solution 3 Input Solution 1 Input Solution 1 Solution 2 Solution 3 The nodes are represented as follows: + 𝛽 2 𝐸 𝑞 𝑒∈𝐸 𝑞 𝑑 𝐿 𝜆 𝑒 , 𝜆 𝑔 𝑒 + 𝑐 𝑔 𝑒 𝑀𝐶 ------ (2) + 𝛾 𝐶 𝑐∈𝐶(1 − 𝑠(𝑐)) ------------------------------------ (3)