SlideShare a Scribd company logo
1 of 18
Benchmarking graph databases on the problem 
of community detection 
Sotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris 
Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI) 
ADBIS 2014 Conference 
Ohrid, Republic of Macedonia, September 7-10, 2014
Important Note 
• This is an update of the actual presentation given in 
September 2014. After the presentation, we received 
comments about the implementation of our benchmark, 
which led us to update the implementation and rerun the 
experiments. We would like to thank @lgarulli and 
@OrientDB for their contribution. 
• The updated paper is now available on: 
http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf 
while the original is still available on the Springer site: 
http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1 
#2
#3 
Overview 
• Problem formulation 
• Related work 
• Workload description 
• Evaluation 
• Conclusions
Motivation 
• The rapid growth of Online Social Networks contributes 
to the creation of high-volume and velocity graph data 
– Twitter had 145M users in 2010 and 300M in 2011 
http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/ 
• The need for massive graph data management and 
processing systems is constantly increasing 
– Many methods and technologies available (RDBMS, OODBMS 
and graph databases) 
• Benchmarks to evaluate candidate solutions are 
absolutely essential! 
– We had to choose one in the context of the SocialSensor 
research project 
#4
Problem Formulation 
• Given a set of database management systems assess 
the performance of each system with respect to a 
common mining task: community detection 
• Many DBMS available to use 
– Graph databases selected, as they are designed to store 
and manage efficiently big graph data 
– Benchmarked systems: Titan, OrientDB and Neo4j 
• Many operations used in databases to benchmark 
– Workloads that simulate common operations in OSNs 
employed 
#5
Related Work (1) 
• Evaluation between different DBMS 
– Giatsoglou et al. focused on the Social Tagging System use 
case scenario 
– Angles et al. and Armstrong et al. proposed a synthetic 
graph generator with OSN characteristics and use this data 
for evaluation 
– Grossniklaus et al. used a workloads that cover a wide 
variety of graph data use case 
– Vicknair et al. utilized query workload that simulates 
typical operations performed in provenance system 
#6
Related Work (2) 
• Evaluation between graph DBMS 
– Bader et al. proposed a benchmark with four operations: 
insert, h-hops node and edge selection, while Dominguez 
et al. reports the implementation. 
– A similar workload is proposed by Jouili et al. emphasizing 
the effects of increasing multiple concurrent users. 
– Ciglan et al. proposed a benchmark focusing on graph 
traversal operations. 
– Dayarathna et al. used similar workload, focusing on graph 
databases server and cloud environments. 
#7
Workloads Description (1) 
Our Contribution 
•Clustering Workload (CW) 
– Very important due to its numerous applications, such as 
topic detection, photo clustering and event detection. 
– Until now most community detection algorithms used 
main memory to store the graph and perform the 
required operations  fast execution time, unable to 
manage big graphs. 
– We propose an implementation of Louvain Method on top 
of graph databases employing cache techniques  take 
advantage of both graph database capabilities and in-memory 
execution speed. 
#8
Workload Description (2) 
Supplementary Workloads 
•Massive Insertion Workload (MIW) 
– Populates a graph with bulk load operations. 
– Simulates batch creation of a graph. 
•Single Insertion Workload (SIW) 
– Every object insertion is committed directly. 
– Simulates the growth of a OSN. 
•Query Workload (QW) 
– FindNeighbors query (FN)  Finds the neighbors of all nodes 
• simulates the friends retrieval a Facebook user 
– FindAdjacentNodes query (FA)  Finds the adjacent nodes of all edges 
• used to find whether two user joined the same Facebook group 
– FindShortestPath (FS)  Finds the shortest paths between 100 nodes 
• used to find the level two LinkedIn users are connected 
#9
Experimental Study 
• Datasets 
– Synthetic for CW, generated with LFR generator. 
– Real for MIW, SIW & QW, from SNAP dataset collection. 
• Evaluation Measure: execution time (second) 
#10
Results: Comparison wrt CW 
#11
Results: Cache size impact 
#12 
Titan 
Neo4j 
OrientDB
Results: comparison wrt MIW and QW 
#13
Results: Comparison wrt SIW 
YouTube LiveJournal 
#14
Conclusions 
SUMMARY 
• OrientDB is the most efficient solution for CW, while Titan is 
the winner in SIW 
• Neo4j is the fastest graph database for MIW and QW 
• Titan and OrientDB have competitive performance in MIW 
and QW respectively  don't scale as good as Neo4j 
FUTURE WORK 
• Test with bigger graphs. 
• Run the benchmark employing the distributed version of 
Titan and OrientDB and other graph databases (Sparksee 
already added). 
• Improve performance of the implemented community 
detection method. 
#15
References (1) 
Evaluation between different DBMS 
Giatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web 
and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data 
Management 1. Volume 331 of Studies in Computational Intelligence. Springer 
Berlin Heidelberg (2011) 19-58 
Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database 
systems for social network applications. In: First International Workshop on 
Graph Data Management Experiences and Systems. GRADES '13, New York, NY, 
USA, ACM (2013) 15:1-15:7 
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database 
benchmark based on the facebook social graph. (2013) 
Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data 
management and processing. (2013) 
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a 
graph database and a relational database: A data provenance perspective. In: 
Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, 
New York, NY, USA, ACM (2010) 42:1-42:6 
#16
References (2) 
Evaluation between graph DBMS 
Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, 
B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009) 
Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez- 
Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc 
scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., 
Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. 
Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg 
(2010) 37-48 
Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph 
databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th 
International Conference on. (April 2012) 186-189 
Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social 
Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708- 
715 
Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores 
in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 
2012 IEEE 4th International Conference on. (Dec 2012) 363-370 
#17
Source: https://github.com/socialsensor/graphdb-benchmarks 
#18 
Questions 
Further contact: 
sotbeis@iti.gr / papadop@iti.gr 
Acknowledgement: 
www.socialsensor.eu

More Related Content

What's hot

FOSS4G Firenze 2022 참가기
FOSS4G Firenze 2022 참가기FOSS4G Firenze 2022 참가기
FOSS4G Firenze 2022 참가기SANGHEE SHIN
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataNeo4j
 
디지털트윈 기술 및 스마트시티 적용 사례
디지털트윈 기술 및  스마트시티 적용 사례 디지털트윈 기술 및  스마트시티 적용 사례
디지털트윈 기술 및 스마트시티 적용 사례 SANGHEE SHIN
 
게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal
게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal
게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for UnrealKyu-sung Choi
 
Introduction to Geographic Information System (GIS)
Introduction to Geographic Information System (GIS)Introduction to Geographic Information System (GIS)
Introduction to Geographic Information System (GIS)Shashank Singh
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j
 
Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Ismail El Gayar
 
GeoServer, an introduction for beginners
GeoServer, an introduction for beginnersGeoServer, an introduction for beginners
GeoServer, an introduction for beginnersGeoSolutions
 
Geocoding for beginners
Geocoding for beginnersGeocoding for beginners
Geocoding for beginnersAkansha Mishra
 
IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...
IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...
IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...Senturus
 
Open Source GIS
Open Source GISOpen Source GIS
Open Source GISJoe Larson
 
spatial data infrastructure : issues and concepts
spatial data infrastructure : issues and conceptsspatial data infrastructure : issues and concepts
spatial data infrastructure : issues and conceptsDesconnets Jean-Christophe
 
Geographical information system
Geographical information system Geographical information system
Geographical information system Maharshi Dave
 
Chapter one gis
Chapter one gisChapter one gis
Chapter one gisGokul Saud
 
PostGIS and Spatial SQL
PostGIS and Spatial SQLPostGIS and Spatial SQL
PostGIS and Spatial SQLTodd Barr
 
Sharing Geospatial Intelligence and Services
Sharing Geospatial Intelligence and ServicesSharing Geospatial Intelligence and Services
Sharing Geospatial Intelligence and ServicesGovCloud Network
 
Geospatial digital twin
Geospatial digital twinGeospatial digital twin
Geospatial digital twinDany Laksono
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021Dendej Sawarnkatat
 

What's hot (20)

History of GIS.pdf
History of GIS.pdfHistory of GIS.pdf
History of GIS.pdf
 
FOSS4G Firenze 2022 참가기
FOSS4G Firenze 2022 참가기FOSS4G Firenze 2022 참가기
FOSS4G Firenze 2022 참가기
 
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your DataGraphs in Automotive and Manufacturing - Unlock New Value from Your Data
Graphs in Automotive and Manufacturing - Unlock New Value from Your Data
 
디지털트윈 기술 및 스마트시티 적용 사례
디지털트윈 기술 및  스마트시티 적용 사례 디지털트윈 기술 및  스마트시티 적용 사례
디지털트윈 기술 및 스마트시티 적용 사례
 
게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal
게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal
게임엔진과 공간정보 3D 콘텐츠 융합 : Cesium for Unreal
 
Introduction to Geographic Information System (GIS)
Introduction to Geographic Information System (GIS)Introduction to Geographic Information System (GIS)
Introduction to Geographic Information System (GIS)
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
 
Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)
 
GeoServer, an introduction for beginners
GeoServer, an introduction for beginnersGeoServer, an introduction for beginners
GeoServer, an introduction for beginners
 
Geocoding for beginners
Geocoding for beginnersGeocoding for beginners
Geocoding for beginners
 
IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...
IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...
IBM Cognos Analytics Reporting vs. Dashboarding: Matching Tools to Business R...
 
Open Source GIS
Open Source GISOpen Source GIS
Open Source GIS
 
GIS file types
GIS file typesGIS file types
GIS file types
 
spatial data infrastructure : issues and concepts
spatial data infrastructure : issues and conceptsspatial data infrastructure : issues and concepts
spatial data infrastructure : issues and concepts
 
Geographical information system
Geographical information system Geographical information system
Geographical information system
 
Chapter one gis
Chapter one gisChapter one gis
Chapter one gis
 
PostGIS and Spatial SQL
PostGIS and Spatial SQLPostGIS and Spatial SQL
PostGIS and Spatial SQL
 
Sharing Geospatial Intelligence and Services
Sharing Geospatial Intelligence and ServicesSharing Geospatial Intelligence and Services
Sharing Geospatial Intelligence and Services
 
Geospatial digital twin
Geospatial digital twinGeospatial digital twin
Geospatial digital twin
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 

Viewers also liked

OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesCurtis Mosters
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandrajohnrjenson
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraNakul Jeirath
 
OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0Orient Technologies
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseMayuree Srikulwong
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Sebastian Verheughe
 
Tackling a 1 billion member social network
Tackling a 1 billion member social networkTackling a 1 billion member social network
Tackling a 1 billion member social networkArtur Bańkowski
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)Luca Garulli
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQLLuigi Dell'Aquila
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Curtis Mosters
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQLLuca Garulli
 

Viewers also liked (20)

OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databasesOrientDB vs Neo4j - and an introduction to NoSQL databases
OrientDB vs Neo4j - and an introduction to NoSQL databases
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0OrientDB Distributed Architecture v2.0
OrientDB Distributed Architecture v2.0
 
MongoDB vs OrientDB
MongoDB vs OrientDBMongoDB vs OrientDB
MongoDB vs OrientDB
 
Analysis on Cloud Computing Database
Analysis on Cloud Computing DatabaseAnalysis on Cloud Computing Database
Analysis on Cloud Computing Database
 
La especificacion MoReq2: Modelo de Requisitos para ERMS
La especificacion MoReq2: Modelo de Requisitos para ERMSLa especificacion MoReq2: Modelo de Requisitos para ERMS
La especificacion MoReq2: Modelo de Requisitos para ERMS
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
Tackling a 1 billion member social network
Tackling a 1 billion member social networkTackling a 1 billion member social network
Tackling a 1 billion member social network
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)
 
OrientDB
OrientDBOrientDB
OrientDB
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
Arda Maps - master's thesis defense - Verteidigung der Masterarbeit (German)
 
Graph db
Graph dbGraph db
Graph db
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 

Similar to Benchmarking graph databases on the problem of community detection

EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal
 
YandongWang_Resume
YandongWang_ResumeYandongWang_Resume
YandongWang_Resumeyandong wang
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentationPaolo Missier
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneResearch Data Alliance
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchTill Blume
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSijdms
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slidesARDC
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 

Similar to Benchmarking graph databases on the problem of community detection (20)

EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
YandongWang_Resume
YandongWang_ResumeYandongWang_Resume
YandongWang_Resume
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Observlets
Observlets Observlets
Observlets
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Benchmarking graph databases on the problem of community detection

  • 1. Benchmarking graph databases on the problem of community detection Sotirios Beis, Symeon Papadopoulos, Yiannis Kompatsiaris Centre for Research and Technology Hellas, Information Technologies Institute (CERTH-ITI) ADBIS 2014 Conference Ohrid, Republic of Macedonia, September 7-10, 2014
  • 2. Important Note • This is an update of the actual presentation given in September 2014. After the presentation, we received comments about the implementation of our benchmark, which led us to update the implementation and rerun the experiments. We would like to thank @lgarulli and @OrientDB for their contribution. • The updated paper is now available on: http://mklab.iti.gr/files/beis_adbis2014_corrected.pdf while the original is still available on the Springer site: http://link.springer.com/chapter/10.1007/978-3-319-10518-5_1 #2
  • 3. #3 Overview • Problem formulation • Related work • Workload description • Evaluation • Conclusions
  • 4. Motivation • The rapid growth of Online Social Networks contributes to the creation of high-volume and velocity graph data – Twitter had 145M users in 2010 and 300M in 2011 http://dstevenwhite.com/2011/12/29/social-media-growth-2006-2011/ • The need for massive graph data management and processing systems is constantly increasing – Many methods and technologies available (RDBMS, OODBMS and graph databases) • Benchmarks to evaluate candidate solutions are absolutely essential! – We had to choose one in the context of the SocialSensor research project #4
  • 5. Problem Formulation • Given a set of database management systems assess the performance of each system with respect to a common mining task: community detection • Many DBMS available to use – Graph databases selected, as they are designed to store and manage efficiently big graph data – Benchmarked systems: Titan, OrientDB and Neo4j • Many operations used in databases to benchmark – Workloads that simulate common operations in OSNs employed #5
  • 6. Related Work (1) • Evaluation between different DBMS – Giatsoglou et al. focused on the Social Tagging System use case scenario – Angles et al. and Armstrong et al. proposed a synthetic graph generator with OSN characteristics and use this data for evaluation – Grossniklaus et al. used a workloads that cover a wide variety of graph data use case – Vicknair et al. utilized query workload that simulates typical operations performed in provenance system #6
  • 7. Related Work (2) • Evaluation between graph DBMS – Bader et al. proposed a benchmark with four operations: insert, h-hops node and edge selection, while Dominguez et al. reports the implementation. – A similar workload is proposed by Jouili et al. emphasizing the effects of increasing multiple concurrent users. – Ciglan et al. proposed a benchmark focusing on graph traversal operations. – Dayarathna et al. used similar workload, focusing on graph databases server and cloud environments. #7
  • 8. Workloads Description (1) Our Contribution •Clustering Workload (CW) – Very important due to its numerous applications, such as topic detection, photo clustering and event detection. – Until now most community detection algorithms used main memory to store the graph and perform the required operations  fast execution time, unable to manage big graphs. – We propose an implementation of Louvain Method on top of graph databases employing cache techniques  take advantage of both graph database capabilities and in-memory execution speed. #8
  • 9. Workload Description (2) Supplementary Workloads •Massive Insertion Workload (MIW) – Populates a graph with bulk load operations. – Simulates batch creation of a graph. •Single Insertion Workload (SIW) – Every object insertion is committed directly. – Simulates the growth of a OSN. •Query Workload (QW) – FindNeighbors query (FN)  Finds the neighbors of all nodes • simulates the friends retrieval a Facebook user – FindAdjacentNodes query (FA)  Finds the adjacent nodes of all edges • used to find whether two user joined the same Facebook group – FindShortestPath (FS)  Finds the shortest paths between 100 nodes • used to find the level two LinkedIn users are connected #9
  • 10. Experimental Study • Datasets – Synthetic for CW, generated with LFR generator. – Real for MIW, SIW & QW, from SNAP dataset collection. • Evaluation Measure: execution time (second) #10
  • 12. Results: Cache size impact #12 Titan Neo4j OrientDB
  • 13. Results: comparison wrt MIW and QW #13
  • 14. Results: Comparison wrt SIW YouTube LiveJournal #14
  • 15. Conclusions SUMMARY • OrientDB is the most efficient solution for CW, while Titan is the winner in SIW • Neo4j is the fastest graph database for MIW and QW • Titan and OrientDB have competitive performance in MIW and QW respectively  don't scale as good as Neo4j FUTURE WORK • Test with bigger graphs. • Run the benchmark employing the distributed version of Titan and OrientDB and other graph databases (Sparksee already added). • Improve performance of the implemented community detection method. #15
  • 16. References (1) Evaluation between different DBMS Giatsoglou, M., Papadopoulos, S., Vakali, A.: Massive graph management for the web and web 2.0. In Vakali, A., Jain, L., eds.: New Directions in Web Data Management 1. Volume 331 of Studies in Computational Intelligence. Springer Berlin Heidelberg (2011) 19-58 Angles, R., Prat-Perez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: Benchmarking database systems for social network applications. In: First International Workshop on Graph Data Management Experiences and Systems. GRADES '13, New York, NY, USA, ACM (2013) 15:1-15:7 Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. (2013) Grossniklaus, M., Leone, S., Zaschke, T.: Towards a benchmark for graph data management and processing. (2013) Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: A data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference. ACM SE '10, New York, NY, USA, ACM (2010) 42:1-42:6 #16
  • 17. References (2) Evaluation between graph DBMS Bader, D.A., Feo, J., Gilbert, J., Kepner, J., Koester, D., Loh, E., Madduri, K., Mann, B., Meuse, T., Robinson, E.: Hpc scalable graph analysis benchmark (2009) Dominguez-Sal, D., Urbn-Bayes, P., Gimnez-Va, A., Gmez-Villamor, S., Martnez- Bazn, N., Larriba-Pey, J.: Survey of graph database performance on the hpc scalable graph analysis benchmark. In Shen, H., Pei, J., zsu, M., Zou, L., Lu, J., Ling, T.W., Yu, G., Zhuang, Y., Shao, J., eds.:Web-Age Information Management. Volume 6185 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (2010) 37-48 Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on. (April 2012) 186-189 Jouili, S., Vansteenberghe, V.: An empirical comparison of graph databases. In: Social Computing (SocialCom), 2013 International Conference on. (Sept 2013) 708- 715 Dayarathna, M., Suzumura, T.: Xgdbench: A benchmarking platform for graph stores in exascale clouds. In: Cloud Computing Technology and Science (Cloud-Com), 2012 IEEE 4th International Conference on. (Dec 2012) 363-370 #17
  • 18. Source: https://github.com/socialsensor/graphdb-benchmarks #18 Questions Further contact: sotbeis@iti.gr / papadop@iti.gr Acknowledgement: www.socialsensor.eu