SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Sep 2010 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud OWLIM Replication Cluster @ Amazon EC2
Goals 
• Test the scalability of OWLIM RC on a really large 
cluster 
• Can we break the million queries per hour barrier? 
OWLIM Replication Cluster @ AWS Sep 2010 #2
INTRODUCTION 
OWLIM Replication Cluster @ AWS Sep 2010 #3
Berlin SPARQL Benchmark (BSBM) 
• http://www4.wiwiss.fu-berlin. 
de/bizer/BerlinSPARQLBenchmark/results/ 
• Evaluates the performance of RDF query engines in 
an e-commerce use case 
– searching products and navigating related information 
• Randomized query mixes (25 SPARQL queries) are 
evaluated continuously 
• Different dataset size & number of concurrent clients 
– 25M, 100M and 200M triples 
OWLIM Replication Cluster @ AWS Sep 2010 #4
Benchmarking AWS 
• Extensive performance tests of EC2 instances 
– I/O, CPU, Network 
– BSBM (SPARQL), RDF materialisation 
• High Memory EC2 instances offer (surprisingly) good 
performance for RDF related processing 
– Comparable to local non-virtualised hardware 
OWLIM Replication Cluster @ AWS Sep 2010 #5
Benchmarking AWS – testbeds 
OWLIM Replication Cluster @ AWS Sep 2010 #6 
CPU cores RAM (GB) Virtualisation 
Local-L 2×2.4 GHz 8 ESX 
Local-XL 4×2.9 GHz 12 No 
Local-3XL 8×3.3 GHz 48 No 
L 2×2 ECU* 7.5 Xen 
XL 4×2 ECU* 15 Xen 
High-Mem XL 2×3.25 ECU* 17 Xen 
High-Mem 2XL 4×3.25 ECU* 34 Xen 
High-Mem 4XL 8×3.25 ECU* 68 Xen 
High-CPU XL 8×2.5 ECU* 7 Xen 
1 ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor
Benchmarking AWS – BSBM 100M results 
OWLIM Replication Cluster @ AWS Sep 2010 #7 
0 
500 
1000 
1500 
2000 
2500 
3000 
3500 
4000 
4500 
5000 
1 4 16 32 64 
Query mixes / hour 
concurrent clients 
Local-L 
L-ub 
Local-XL 
XL-ub 
HM-XL-ub 
HM-2XL-ub 
Local-3XL 
Local-3XL-SSD 
HM-4XL-ub 
HC-XL-ub
Benchmarking AWS – RDF materialisation 
OWLIM Replication Cluster @ AWS Sep 2010 #8 
0 
1000 
2000 
3000 
4000 
5000 
6000 
materialisation time (sec) 
UMBEL 
DBP-SKOS
OWLIM Replication Cluster 
• Improves scalability with respect to concurrent user 
requests 
• How does it work? 
– Each write request is multiplexed to all repository 
instances 
– Each read request is dispatched to one instance only 
– To ensure load-balancing, 
read requests are sent to the 
instance with the shortest 
execution queue 
OWLIM Replication Cluster @ AWS Sep 2010 #9
OWLIM CLUSTER ON EC2 – 
BENCHMARKS 
OWLIM Replication Cluster @ AWS Sep 2010 #10
AWS testbed setup 
• OWLIM Replication Cluster 
– One Master node, 10-100 Slave nodes 
– 100 million triples / 16GB database size 
• BSBM 100M dataset 
– Each cluster node has a replica of the database 
– 1000 concurrent BSBM clients 
• Amazon EC2 
– Master node – HM-2XL (34GB RAM, 4x3.25 ECU) 
– Slave nodes – HM-XL (17 GB RAM, 2x3.25 ECU) 
– Ubuntu (x64) 
OWLIM Replication Cluster @ AWS Sep 2010 #11
Total QMpH (Query Mix per Hour) 
OWLIM Replication Cluster @ AWS Sep 2010 #12 
0 
50000 
100000 
150000 
200000 
250000 
10 20 30 40 50 60 70 80 90 100 
total QMpH 
cluster size (HM-XL nodes) 
BSBM-100M, 1000 concurrent clients 
1000 clients
Total QMpH – summary 
• (almost) Linear scalability of the cluster 
• 20 nodes handle more than 1 million SPARQL queries 
per hour (40,000 QMpH) 
– 1 Query Mix = 25 SPARQL queries 
• 100 nodes handle 5 million SPARQL queries per hour 
(200,000 QMpH) 
OWLIM Replication Cluster @ AWS Sep 2010 #13
QMpH per cluster node 
OWLIM Replication Cluster @ AWS Sep 2010 #14 
1800 
1900 
2000 
2100 
2200 
2300 
2400 
10 20 30 40 50 60 70 80 90 100 
QMpH per node 
cluster size (HM-XL nodes) 
BSBM-100M, 1000 concurrent clients 
1000 clients 
trendline (Power)
QMpH per cluster node – summary 
• Low parallelisation overhead 
– Only 10% deterioration in QMpH per cluster node when 
the cluster grows 10 times (from 10 to 100 nodes) 
– Cluster nodes handle 2,000-2,300 QMpH (a standalone 
HM-XL node on EC2 handles ~2,500 QMpH) 
OWLIM Replication Cluster @ AWS Sep 2010 #15
What about the cost? 
• 100,000 SPARQL queries per 1$ on AWS 
– ~4,000 Query Mixes / $ 
• 1 Query Mix = 25 SPARQL queries 
– EC2 pricing 
• Master node (on-demand HM-2XL) – $1.00/hour 
• Slave node (on demand HM-XL) – $0.50/hour 
OWLIM Replication Cluster @ AWS Sep 2010 #16
What about the cost (2) 
OWLIM Replication Cluster @ AWS Sep 2010 #17 
3400 
3600 
3800 
4000 
4200 
4400 
4600 
10 20 30 40 50 60 70 80 90 100 
Query Mixes / $ 
cluster size 
Query Mixes per 1 USD 
QMpH/$
DETAILED CLUSTER METRICS 
OWLIM Replication Cluster @ AWS Sep 2010 #18
Cluster monitoring 
• Amazon CloudWatch provides instance level 
monitoring for EC2 
– CPU load, Bandwidth utilisation, I/O, … 
– Minimum granularity of monitoring periods – 1 minute 
• OWLIM Cluster metrics 
– Monitor Master and a random Slave for ~180 min 
– Many test runs 
• a single run takes a few minutes 
– Idle CPU/IO/Network on diagram is the time between test 
runs 
OWLIM Replication Cluster @ AWS Sep 2010 #19
CPU load (Master) 
OWLIM Replication Cluster @ AWS Sep 2010 #20 
0 
10 
20 
30 
40 
50 
60 
70 
80 
0 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
70 
75 
80 
85 
90 
95 
100 
105 
110 
115 
120 
125 
130 
135 
140 
145 
150 
155 
160 
165 
170 
175 
180 
185 
% 
time (min) 
CPU load (Master) 
CPU load
CPU load (Slave) 
OWLIM Replication Cluster @ AWS Sep 2010 #21 
0 
20 
40 
60 
80 
100 
120 
0 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
70 
75 
80 
85 
90 
95 
100 
105 
110 
115 
120 
125 
130 
135 
140 
145 
150 
155 
% 
time (min) 
CPU load (random Slave) 
CPU load
Network traffic (Master) 
OWLIM Replication Cluster @ AWS Sep 2010 #22 
0 
5 
10 
15 
20 
25 
30 
35 
0 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
70 
75 
80 
85 
90 
95 
100 
105 
110 
115 
120 
125 
130 
135 
140 
145 
150 
155 
160 
165 
170 
175 
180 
185 
MB/s 
time (min) 
Network traffic (Master) 
inbound (MB/s) 
outbound (MB/s)
Network traffic (Slave) 
OWLIM Replication Cluster @ AWS Sep 2010 #23 
0.00 
0.02 
0.04 
0.06 
0.08 
0.10 
0.12 
0 
4 
8 
12 
16 
20 
24 
28 
32 
36 
40 
44 
48 
52 
56 
60 
64 
68 
72 
76 
80 
84 
88 
92 
96 
100 
104 
108 
112 
116 
120 
124 
128 
132 
136 
140 
144 
148 
152 
156 
MB/s 
time (min) 
Network traffic (random Slave) 
inbound (MB/s) 
outbound (MB/s)
I/O (Slave) 
OWLIM Replication Cluster @ AWS Sep 2010 #24 
0.00 
0.50 
1.00 
1.50 
2.00 
2.50 
3.00 
3.50 
0 
5 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
70 
75 
80 
85 
90 
95 
100 
105 
110 
115 
120 
125 
130 
135 
140 
145 
150 
155 
160 
165 
170 
MB/s 
time (min) 
I/O (random Slave) 
Disk Read (MB/s) 
Disk Write (MB/s)
Q & A 
Questions? 
@ontotext 
OWLIM Replication Cluster @ AWS Sep 2010 #25

Weitere ähnliche Inhalte

Was ist angesagt?

Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)
Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)
Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)
Yasuhito Takamiya
 
Running a Lean Startup with AWS - Spreaker Case Study
Running a Lean Startup with AWS - Spreaker Case StudyRunning a Lean Startup with AWS - Spreaker Case Study
Running a Lean Startup with AWS - Spreaker Case Study
Marco Pracucci
 

Was ist angesagt? (20)

DevOps in Droplr
DevOps in DroplrDevOps in Droplr
DevOps in Droplr
 
A Year of Droplr Cloud Architecture Evolution with AWS and Serverless
A Year of Droplr Cloud Architecture Evolution with AWS and ServerlessA Year of Droplr Cloud Architecture Evolution with AWS and Serverless
A Year of Droplr Cloud Architecture Evolution with AWS and Serverless
 
Scaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersScaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customers
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
AWS Connect 2017 - Container (feat. AWS)
AWS Connect 2017 -  Container (feat. AWS)AWS Connect 2017 -  Container (feat. AWS)
AWS Connect 2017 - Container (feat. AWS)
 
DevOps Summit 2016 - The immutable Journey
DevOps Summit 2016 - The immutable JourneyDevOps Summit 2016 - The immutable Journey
DevOps Summit 2016 - The immutable Journey
 
Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)
Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)
Bare Metal Cloud: 実マシンを提供するクラウドサービス (SWoPP 2010)
 
Droplr Serverless Revolution - How we killed 50 servers in a year
Droplr Serverless Revolution - How we killed 50 servers in a yearDroplr Serverless Revolution - How we killed 50 servers in a year
Droplr Serverless Revolution - How we killed 50 servers in a year
 
Maximizing EC2 and Elastic Block Store Disk Performance (STG302) | AWS re:Inv...
Maximizing EC2 and Elastic Block Store Disk Performance (STG302) | AWS re:Inv...Maximizing EC2 and Elastic Block Store Disk Performance (STG302) | AWS re:Inv...
Maximizing EC2 and Elastic Block Store Disk Performance (STG302) | AWS re:Inv...
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
 
Scaling application servers for efficiency
Scaling application servers for efficiencyScaling application servers for efficiency
Scaling application servers for efficiency
 
ACS & vSphere Draft
ACS & vSphere DraftACS & vSphere Draft
ACS & vSphere Draft
 
Terraform
TerraformTerraform
Terraform
 
Ceph majority commit
Ceph majority commitCeph majority commit
Ceph majority commit
 
AWS Lambda at JUST EAT
AWS Lambda at JUST EATAWS Lambda at JUST EAT
AWS Lambda at JUST EAT
 
Ceph recovery improvement v0.2
Ceph recovery improvement v0.2Ceph recovery improvement v0.2
Ceph recovery improvement v0.2
 
Terraform
TerraformTerraform
Terraform
 
Projektowanie systemów IT w chmurach obliczeniowych (AMG.net Tech Cafe)
Projektowanie systemów IT w chmurach obliczeniowych (AMG.net Tech Cafe)Projektowanie systemów IT w chmurach obliczeniowych (AMG.net Tech Cafe)
Projektowanie systemów IT w chmurach obliczeniowych (AMG.net Tech Cafe)
 
Running a Lean Startup with AWS - Spreaker Case Study
Running a Lean Startup with AWS - Spreaker Case StudyRunning a Lean Startup with AWS - Spreaker Case Study
Running a Lean Startup with AWS - Spreaker Case Study
 
Deswik Software Suite v5.0: Pseudoflow pit optimization algorithm
Deswik Software Suite v5.0: Pseudoflow pit optimization algorithmDeswik Software Suite v5.0: Pseudoflow pit optimization algorithm
Deswik Software Suite v5.0: Pseudoflow pit optimization algorithm
 

Andere mochten auch

Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science Practitioners
Marin Dimitrov
 
OWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudOWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the Cloud
Marin Dimitrov
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 
Hackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриHackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъри
Nikolay Stoitsev
 
Facilitation Skills: Best and Worst Facilitator Practices
Facilitation Skills: Best and Worst Facilitator PracticesFacilitation Skills: Best and Worst Facilitator Practices
Facilitation Skills: Best and Worst Facilitator Practices
Active Presence Limited
 

Andere mochten auch (20)

Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science Practitioners
 
An examination of the spatial dimensions of pollination facilitation in an ar...
An examination of the spatial dimensions of pollination facilitation in an ar...An examination of the spatial dimensions of pollination facilitation in an ar...
An examination of the spatial dimensions of pollination facilitation in an ar...
 
OWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudOWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the Cloud
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic Suite
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-Service
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
 
From Python to Java
From Python to JavaFrom Python to Java
From Python to Java
 
Hackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриHackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъри
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL Queries
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart Data
 
Crossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyCrossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic Technology
 
Facilitator Skills - Guide
Facilitator Skills - GuideFacilitator Skills - Guide
Facilitator Skills - Guide
 
Facilitation Skills: Best and Worst Facilitator Practices
Facilitation Skills: Best and Worst Facilitator PracticesFacilitation Skills: Best and Worst Facilitator Practices
Facilitation Skills: Best and Worst Facilitator Practices
 
Basic Facilitation Skills
Basic Facilitation SkillsBasic Facilitation Skills
Basic Facilitation Skills
 

Ähnlich wie Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Ähnlich wie Scaling to Millions of Concurrent SPARQL Queries on the Cloud (20)

Devnexus slides - Amazon Web Services
Devnexus slides - Amazon Web ServicesDevnexus slides - Amazon Web Services
Devnexus slides - Amazon Web Services
 
SQL Server in the AWS Cloud
SQL Server in the AWS CloudSQL Server in the AWS Cloud
SQL Server in the AWS Cloud
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Kinney j aws
Kinney j awsKinney j aws
Kinney j aws
 
AWS Customer Presentation - AideRss
AWS Customer Presentation - AideRss AWS Customer Presentation - AideRss
AWS Customer Presentation - AideRss
 
Best Practices running SQL Server on AWS
Best Practices running SQL Server on AWSBest Practices running SQL Server on AWS
Best Practices running SQL Server on AWS
 
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
AWS Webcast - Webinar Series for State and Local Government #2: Discover the ...
 
Deep Dive Amazon EC2
Deep Dive Amazon EC2Deep Dive Amazon EC2
Deep Dive Amazon EC2
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWSAWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
AWS Webcast - AWS Webinar Series for Education #2 - Getting Started with AWS
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
Design, Deploy, and Optimize SQL Server on AWS - AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - AWS Online Tech TalksDesign, Deploy, and Optimize SQL Server on AWS - AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - AWS Online Tech Talks
 
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech TalksDesign, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
 

Mehr von Marin Dimitrov

Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
Marin Dimitrov
 
Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and Challenges
Marin Dimitrov
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
Marin Dimitrov
 

Mehr von Marin Dimitrov (16)

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career Journey
 
Open Source @ Uber
Open Source @ Uber Open Source @ Uber
Open Source @ Uber
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & Organisations
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ Uber
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger Self
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed Sites
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance Teams
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
 
Career Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityCareer Days 2012 @ Sofia University
Career Days 2012 @ Sofia University
 
Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and Challenges
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Scaling to Millions of Concurrent SPARQL Queries on the Cloud

  • 1. Sep 2010 Scaling to Millions of Concurrent SPARQL Queries on the Cloud OWLIM Replication Cluster @ Amazon EC2
  • 2. Goals • Test the scalability of OWLIM RC on a really large cluster • Can we break the million queries per hour barrier? OWLIM Replication Cluster @ AWS Sep 2010 #2
  • 3. INTRODUCTION OWLIM Replication Cluster @ AWS Sep 2010 #3
  • 4. Berlin SPARQL Benchmark (BSBM) • http://www4.wiwiss.fu-berlin. de/bizer/BerlinSPARQLBenchmark/results/ • Evaluates the performance of RDF query engines in an e-commerce use case – searching products and navigating related information • Randomized query mixes (25 SPARQL queries) are evaluated continuously • Different dataset size & number of concurrent clients – 25M, 100M and 200M triples OWLIM Replication Cluster @ AWS Sep 2010 #4
  • 5. Benchmarking AWS • Extensive performance tests of EC2 instances – I/O, CPU, Network – BSBM (SPARQL), RDF materialisation • High Memory EC2 instances offer (surprisingly) good performance for RDF related processing – Comparable to local non-virtualised hardware OWLIM Replication Cluster @ AWS Sep 2010 #5
  • 6. Benchmarking AWS – testbeds OWLIM Replication Cluster @ AWS Sep 2010 #6 CPU cores RAM (GB) Virtualisation Local-L 2×2.4 GHz 8 ESX Local-XL 4×2.9 GHz 12 No Local-3XL 8×3.3 GHz 48 No L 2×2 ECU* 7.5 Xen XL 4×2 ECU* 15 Xen High-Mem XL 2×3.25 ECU* 17 Xen High-Mem 2XL 4×3.25 ECU* 34 Xen High-Mem 4XL 8×3.25 ECU* 68 Xen High-CPU XL 8×2.5 ECU* 7 Xen 1 ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor
  • 7. Benchmarking AWS – BSBM 100M results OWLIM Replication Cluster @ AWS Sep 2010 #7 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1 4 16 32 64 Query mixes / hour concurrent clients Local-L L-ub Local-XL XL-ub HM-XL-ub HM-2XL-ub Local-3XL Local-3XL-SSD HM-4XL-ub HC-XL-ub
  • 8. Benchmarking AWS – RDF materialisation OWLIM Replication Cluster @ AWS Sep 2010 #8 0 1000 2000 3000 4000 5000 6000 materialisation time (sec) UMBEL DBP-SKOS
  • 9. OWLIM Replication Cluster • Improves scalability with respect to concurrent user requests • How does it work? – Each write request is multiplexed to all repository instances – Each read request is dispatched to one instance only – To ensure load-balancing, read requests are sent to the instance with the shortest execution queue OWLIM Replication Cluster @ AWS Sep 2010 #9
  • 10. OWLIM CLUSTER ON EC2 – BENCHMARKS OWLIM Replication Cluster @ AWS Sep 2010 #10
  • 11. AWS testbed setup • OWLIM Replication Cluster – One Master node, 10-100 Slave nodes – 100 million triples / 16GB database size • BSBM 100M dataset – Each cluster node has a replica of the database – 1000 concurrent BSBM clients • Amazon EC2 – Master node – HM-2XL (34GB RAM, 4x3.25 ECU) – Slave nodes – HM-XL (17 GB RAM, 2x3.25 ECU) – Ubuntu (x64) OWLIM Replication Cluster @ AWS Sep 2010 #11
  • 12. Total QMpH (Query Mix per Hour) OWLIM Replication Cluster @ AWS Sep 2010 #12 0 50000 100000 150000 200000 250000 10 20 30 40 50 60 70 80 90 100 total QMpH cluster size (HM-XL nodes) BSBM-100M, 1000 concurrent clients 1000 clients
  • 13. Total QMpH – summary • (almost) Linear scalability of the cluster • 20 nodes handle more than 1 million SPARQL queries per hour (40,000 QMpH) – 1 Query Mix = 25 SPARQL queries • 100 nodes handle 5 million SPARQL queries per hour (200,000 QMpH) OWLIM Replication Cluster @ AWS Sep 2010 #13
  • 14. QMpH per cluster node OWLIM Replication Cluster @ AWS Sep 2010 #14 1800 1900 2000 2100 2200 2300 2400 10 20 30 40 50 60 70 80 90 100 QMpH per node cluster size (HM-XL nodes) BSBM-100M, 1000 concurrent clients 1000 clients trendline (Power)
  • 15. QMpH per cluster node – summary • Low parallelisation overhead – Only 10% deterioration in QMpH per cluster node when the cluster grows 10 times (from 10 to 100 nodes) – Cluster nodes handle 2,000-2,300 QMpH (a standalone HM-XL node on EC2 handles ~2,500 QMpH) OWLIM Replication Cluster @ AWS Sep 2010 #15
  • 16. What about the cost? • 100,000 SPARQL queries per 1$ on AWS – ~4,000 Query Mixes / $ • 1 Query Mix = 25 SPARQL queries – EC2 pricing • Master node (on-demand HM-2XL) – $1.00/hour • Slave node (on demand HM-XL) – $0.50/hour OWLIM Replication Cluster @ AWS Sep 2010 #16
  • 17. What about the cost (2) OWLIM Replication Cluster @ AWS Sep 2010 #17 3400 3600 3800 4000 4200 4400 4600 10 20 30 40 50 60 70 80 90 100 Query Mixes / $ cluster size Query Mixes per 1 USD QMpH/$
  • 18. DETAILED CLUSTER METRICS OWLIM Replication Cluster @ AWS Sep 2010 #18
  • 19. Cluster monitoring • Amazon CloudWatch provides instance level monitoring for EC2 – CPU load, Bandwidth utilisation, I/O, … – Minimum granularity of monitoring periods – 1 minute • OWLIM Cluster metrics – Monitor Master and a random Slave for ~180 min – Many test runs • a single run takes a few minutes – Idle CPU/IO/Network on diagram is the time between test runs OWLIM Replication Cluster @ AWS Sep 2010 #19
  • 20. CPU load (Master) OWLIM Replication Cluster @ AWS Sep 2010 #20 0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 % time (min) CPU load (Master) CPU load
  • 21. CPU load (Slave) OWLIM Replication Cluster @ AWS Sep 2010 #21 0 20 40 60 80 100 120 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 % time (min) CPU load (random Slave) CPU load
  • 22. Network traffic (Master) OWLIM Replication Cluster @ AWS Sep 2010 #22 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 MB/s time (min) Network traffic (Master) inbound (MB/s) outbound (MB/s)
  • 23. Network traffic (Slave) OWLIM Replication Cluster @ AWS Sep 2010 #23 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128 132 136 140 144 148 152 156 MB/s time (min) Network traffic (random Slave) inbound (MB/s) outbound (MB/s)
  • 24. I/O (Slave) OWLIM Replication Cluster @ AWS Sep 2010 #24 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 MB/s time (min) I/O (random Slave) Disk Read (MB/s) Disk Write (MB/s)
  • 25. Q & A Questions? @ontotext OWLIM Replication Cluster @ AWS Sep 2010 #25