SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Predic'ng	
  SPARQL	
  Query	
  Execu'on	
  
Time	
  and	
  Sugges'ng	
  SPARQL	
  
Queries	
  Based	
  on	
  Query	
  History	
  
Rakebul	
  Hasan	
  
Context	
  
•  Assis'ng	
  human	
  users	
  and	
  soAware	
  agents	
  in:	
  
–  Querying	
  Seman'c	
  Web	
  data	
  
•  Understanding	
  query	
  behavior:	
  predic'ng	
  query	
  
performance	
  
–  Workload	
  management,	
  query	
  scheduling,	
  query	
  op'miza'on	
  

•  Construc'ng	
  and	
  refining	
  queries:	
  sugges'ng	
  alterna'ves	
  

–  Consuming	
  Seman'c	
  Web	
  data	
  
•  Understanding	
  reasoning	
  of	
  Seman'c	
  Web	
  soAware	
  agents:	
  
explaining	
  reasoning	
  
–  Transparency,	
  trust,	
  scrutability,	
  decision	
  effec'veness,	
  decision	
  
efficiency,	
  user	
  sa'sfac'on	
  
1	
  
Outline	
  
•  Predic'ng	
  SPARQL	
  query	
  execu'on	
  'me	
  

•  Sugges'ng	
  similar	
  SPARQL	
  queries	
  from	
  query	
  
history	
  

2	
  
PREDICTING	
  SPARQL	
  QUERY	
  
EXECUTION	
  TIME	
  
3	
  
•  Accurately	
  predic'ng	
  query	
  performance	
  
enables	
  effec've	
  	
  
–  workload	
  management	
  
–  query	
  scheduling	
  
–  query	
  op'miza'on	
  

4	
  
Understanding	
  performance	
  of	
  
computer	
  programs	
  

Insight.	
  [Knuth]	
  Use	
  scien'fic	
  method	
  to	
  
understand	
  performance	
  

5	
  
Scien'fic	
  method	
  applied	
  to	
  analysis	
  of	
  
algorithms	
  
•  A	
  framework	
  for	
  predic'ng	
  performance	
  and	
  comparing	
  
algorithms.	
  
•  Scien'fic	
  method	
  
– 
– 
– 
– 
– 

Observe	
  some	
  feature	
  of	
  the	
  natural	
  world.	
  
Hypothesize	
  a	
  model	
  that	
  is	
  consistent	
  with	
  the	
  observa'ons.	
  
Predict	
  events	
  using	
  the	
  hypothesis.	
  
Verify	
  the	
  predic'ons	
  by	
  making	
  further	
  observa'ons.	
  
Validate	
  by	
  repea'ng	
  un'l	
  the	
  hypothesis	
  and	
  observa'ons	
  
agree.	
  

•  Principles	
  

–  Experiments	
  must	
  be	
  reproducible.	
  	
  
–  Hypotheses	
  must	
  be	
  falsifiable.	
  	
  

•  Feature	
  of	
  the	
  natural	
  world.	
  Computer	
  itself.	
  
Slide	
  credit:	
  Robert	
  Sedgewick	
  

6	
  
Example:	
  3-­‐Sum	
  
•  3-­‐SUM.	
  Given	
  N dis'nct	
  integers,	
  how	
  many	
  
triples	
  sum	
  to	
  exactly	
  zero?	
  
•  3-­‐SUM	
  brute-­‐force	
  algorithm.	
  Check	
  all	
  the	
  
possible	
  triples.	
  
•  How	
  much	
  'me	
  does	
  it	
  take?	
  

Slide	
  credit:	
  Robert	
  Sedgewick	
  

7	
  
Data	
  analysis	
  
•  Standard	
  plot.	
  Plot	
  running	
  'me	
  T (N)	
  vs.	
  input	
  size	
  N.	
  

Slide	
  credit:	
  Robert	
  Sedgewick	
  

8	
  
Data	
  analysis	
  
•  Log-­‐log	
  plot.	
  Plot	
  running	
  'me	
  lg(T (N))	
  vs.	
  input	
  size lg N.	
  

•  Regression.	
  Fit	
  straight	
  line	
  through	
  data	
  points:	
  a N b.	
  
•  Hypothesis.	
  The	
  running	
  'me	
  is	
  about	
  1.006 × 10 –10 × N 2.999
Slide	
  credit:	
  Robert	
  Sedgewick	
  

9	
  
Predic'on	
  and	
  valida'on	
  
•  Hypothesis.	
  The	
  running	
  'me	
  is	
  about	
  1.006 × 10 –10 × N 2.999

•  Predic'ons.	
  
–  51.0	
  seconds	
  for	
  N =	
  8000.	
  
–  408.1	
  seconds	
  for	
  N =	
  16000.	
  

•  Observa'ons.	
  

Validates	
  the	
  hypothesis	
  

Slide	
  credit:	
  Robert	
  Sedgewick	
  

10	
  
Understanding	
  performance	
  of	
  
database	
  queries	
  
•  Ganapathi	
  et	
  al.	
  predic'ng	
  performance	
  
metrics	
  of	
  database	
  queries	
  prior	
  to	
  query	
  
execu'on	
  using	
  machine	
  learning.	
  
•  Gupta	
  et	
  al.	
  use	
  machine	
  learning	
  for	
  
predic'ng	
  query	
  execu'on	
  'me	
  ranges.	
  

Ganapathi	
  et	
  al.:	
  Predic'ng	
  mul'ple	
  metrics	
  for	
  queries:	
  Befer	
  decisions	
  enabled	
  by	
  machine	
  learning.	
  In	
  Proc.	
  of	
  the	
  2009	
  IEEE	
  ICDE	
  
Gupta	
  et	
  al.:	
  PQR:	
  Predic'ng	
  query	
  execu'on	
  'mes	
  for	
  autonomous	
  workload	
  management.	
  In	
  Proc.	
  of	
  the	
  2008	
  ICAC	
  

11	
  
Predic'ng	
  SPARQL	
  query	
  execu'on	
  
'me	
  
•  Key	
  challenge.	
  Feature	
  engineering	
  
–  Represen'ng	
  SPARQL	
  queries	
  as	
  feature	
  vectors	
  
•  Each	
  dimension	
  of	
  the	
  vector	
  is	
  a	
  feature	
  

12	
  
Configura'on	
  
•  Apache	
  Jena	
  TDB	
  
–  With	
  DBpedia	
  3.8	
  dataset	
  	
  

•  Training,	
  valida'on,	
  and	
  test	
  queries:	
  
randomly	
  selected	
  from	
  DBpedia	
  SPARQL	
  
Benchmark	
  (DBPSB)	
  query	
  dataset	
  
–  3600	
  training,	
  1200	
  valida'on,	
  1200	
  test	
  	
  

13	
  
Jena	
  ARQ	
  query	
  processing	
  
•  A	
  SPARQL	
  query	
  in	
  ARQ	
  goes	
  through	
  several	
  
stages	
  of	
  processing:	
  
–  String	
  to	
  Query	
  (parsing)	
  
–  Transla'on	
  from	
  Query	
  to	
  a	
  SPARQL	
  algebra	
  
expression	
  
–  Op'miza'on	
  of	
  the	
  algebra	
  expression	
  
–  Query	
  plan	
  determina'on	
  and	
  low-­‐level	
  
op'miza'on	
  
–  Evalua'on	
  of	
  the	
  query	
  plan	
  
14	
  
SPARQL	
  algebra	
  features	
  
•  SPARQL	
  Algebra1	
  

1	
  hfp://www.w3.org/TR/sparql11-­‐query/#sparqlQuery	
  

15	
  
SPARQL	
  algebra	
  features	
  
DEFGHI,4)/48,9>$$'8JJ703%#<&)0J4)/4JA<BJ=,
KFLFMN,OHKNHPMN,.%/0+,.%"&1,QRFEF,S,
,,,.7,4)/4805)7,90/"3$)8'+(#)%:#+(;+(<&)0=,<,
,,,.7,4)/48%/0+,.%/0+,,
,,,TDNHTPUL,S,.7,4)/48%"&1,.%"&1,V
V

!"#$"%&$
'()*+&$,-.%/0+,.%"&12

3+4$*)"%

56'

$("'3+,
.7,
4)/4805)7,
90/"3$)8'+(#)%:#+(;+(<&)0=

56'

$("'3+,
.7,
4)/48%/0+,
.%/0+

$("'3+,
.7,
4)/48%"&1
.%"&1

$("'3+,56',*)"%,3+4$*)"%,<,<,<,<,'()*+&$,!"#$"%&$,<,<,<,<,!+'$>
,,?,,,,,@,,,A,,,,,,B,,,,,<,<,<,<,,,,B,,,,,,,B,,,,,<,<,<,<,,,C

16	
  
Experiment	
  1	
  
•  Model:	
  Support	
  Vector	
  Machine	
  regression	
  
•  Evalua'on	
  measure:	
  R2

• 

Measures	
  how	
  well	
  future	
  samples	
  are	
  likely	
  to	
  be	
  predicted	
  by	
  the	
  
model.	
  

17	
  
Experiment	
  1	
  
•  Test	
  dataset	
  R2	
  =	
  0.004492	
  

Log	
  scale	
  plomng	
  of	
  predicted	
  vs	
  actual	
  execu'on	
  'mes	
  for	
  the	
  test	
  queries.	
  

18	
  
Experiment	
  1	
  
Some	
  of	
  the	
  long	
  running	
  queries	
  share	
  structurally	
  
similar	
  basic	
  graph	
  paferns.	
  
{	
  	
  
	
  dbpedia	
  :1549	
  _Mikko	
  ?p	
  ?	
  uri	
  .	
  
	
  ?	
  uri	
  rdf	
  :	
  type	
  ?x	
  
}	
  
Challenge.	
  How	
  do	
  we	
  represent	
  basic	
  graph	
  
paferns	
  as	
  vectors?	
  
19	
  
Basic	
  Graph	
  Pafern	
  Features	
  
•  Infinite	
  number	
  of	
  possibili'es	
  to	
  write	
  a	
  basic	
  graph	
  
pafern	
  (BGP)	
  
•  Only	
  the	
  set	
  of	
  literal	
  values	
  and	
  the	
  set	
  of	
  resources	
  
appearing	
  in	
  the	
  RDF	
  graph	
  
–  Exponen'al	
  number	
  of	
  possibili'es	
  
–  A	
  graph	
  with	
  n	
  triples	
  has	
  2n subsets	
  of	
  triples	
  	
  

•  Feature	
  vector	
  with	
  exponen'al	
  number	
  of	
  dimensions	
  
–  Not	
  feasible	
  	
  
20	
  
Basic	
  Graph	
  Pafern	
  Features	
  
•  Pafern	
  graph	
  =	
  RDF	
  graph	
  constructed	
  from	
  
all	
  the	
  BGPs	
  in	
  a	
  query	
  
–  Replace	
  variables	
  with	
  a	
  fixed	
  symbol	
  ‘?’	
  

•  Cluster	
  the	
  training	
  queries	
  based	
  on	
  pafern	
  
graph	
  similari'es	
  
•  Create	
  a	
  vector	
  with	
  similarity	
  scores	
  between	
  
the	
  pafern	
  graph	
  of	
  the	
  query	
  and	
  the	
  queries	
  
in	
  the	
  cluster	
  centers.	
  
21	
  
•  Graph	
  Edit	
  Distance	
  
–  Minimum	
  amount	
  of	
  distor'on	
  needed	
  to	
  
transform	
  one	
  graph	
  to	
  another	
  

–  Compute	
  similarity	
  by	
  inversing	
  distance	
  

22	
  
•  Graph	
  Edit	
  Distance	
  
–  Usually	
  computed	
  using	
  A*	
  search	
  	
  
•  Exponen'al	
  running	
  'me	
  

–  Bipar'te	
  matching	
  based	
  approximated	
  graph	
  edit	
  
distance	
  with	
  	
  
•  Previous	
  research	
  shows	
  very	
  accurate	
  results	
  with	
  
classifica'on	
  problems	
  

23	
  
•  Clustering	
  Training	
  Queries	
  
–  K-­‐mediods	
  clustering	
  algorithm	
  with	
  approximated	
  
edit	
  distance	
  as	
  distance	
  func'on	
  
•  Selects	
  data	
  points	
  as	
  cluster	
  centers	
  
•  Arbitrary	
  distance	
  func'on	
  

24	
  
Experiment	
  2	
  
•  Model:	
  Support	
  Vector	
  Machine	
  regression	
  
•  Test	
  dataset	
  R2	
  =	
  0.124204	
  
•  K	
  =	
  10	
  

Algebra	
  features	
  

Algebra	
  +	
  BGP	
  features	
  

25	
  
Mul'ple	
  Regressions	
  
•  We	
  train	
  different	
  SMV	
  regressions	
  for	
  
different	
  'me	
  ranges.	
  
•  The	
  variance	
  in	
  y-­‐axis	
  is	
  less	
  for	
  each	
  
regression,	
  easier	
  to	
  fit	
  a	
  curve.	
  

26	
  
•  Different	
  'me	
  ranges	
  
–  Clustering	
  the	
  execu'on	
  'me	
  ranges	
  
•  We	
  use	
  x-­‐means	
  clustering	
  algorithm	
  which	
  
automa'cally	
  es'mates	
  the	
  number	
  of	
  clusters	
  
–  5	
  clusters	
  found	
  in	
  the	
  training	
  dataset	
  

–  Each	
  cluster	
  contains	
  queries	
  with	
  similar	
  
execu'on	
  'mes	
  

27	
  
•  Predic'ng	
  execu'on	
  'me	
  range	
  
–  Predict	
  the	
  corresponding	
  clusters	
  for	
  unseen	
  
queries.	
  
–  How	
  
•  Train	
  a	
  SMV	
  classifier	
  with	
  the	
  found	
  clusters	
  as	
  labels	
  
•  Classify	
  unseen	
  queries:	
  accuracy	
  of	
  96%	
  for	
  the	
  test	
  
dataset	
  	
  
•  This	
  means	
  we	
  can	
  accurately	
  predict	
  'me	
  ranges	
  
28	
  
•  Predic'ng	
  execu'on	
  'me	
  
–  Different	
  SMV	
  regressions	
  for	
  different	
  'me	
  
ranges.	
  
–  Use	
  the	
  corresponding	
  regression	
  to	
  the	
  'me	
  
range	
  cluster	
  for	
  an	
  unseen	
  query	
  

29	
  
Experiment	
  3	
  
•  Test	
  dataset	
  R2	
  =	
  0.83862	
  

Algebra	
  +	
  BGP	
  features	
  

Mul'ple	
  regressions	
  
30	
  
Predic'ng	
  with	
  nearest	
  neighbors	
  
regression	
  
•  The	
  k-­‐nearest	
  neighbors	
  algorithm	
  (k-­‐NN)	
  is	
  
oAen	
  successful	
  in	
  the	
  cases	
  where	
  decision	
  
boundary	
  is	
  irregular.	
  
•  We	
  train	
  a	
  k-­‐NN	
  with	
  	
  
–  Euclidean	
  distance	
  as	
  the	
  distance	
  func'on	
  
–  Distance	
  weigh'ng:	
  weighted	
  by	
  the	
  inverse	
  of	
  the	
  
distance	
  

31	
  
•  k-­‐dimensional	
  tree	
  (k-­‐d	
  tree)	
  data	
  structure	
  to	
  
search	
  the	
  nearest	
  neighbors	
  	
  
–  a	
  space-­‐par''oning	
  data	
  structure	
  for	
  organizing	
  
points	
  in	
  a	
  k-­‐dimensional	
  space	
  

•  Complexity	
  of	
  a	
  search:	
  O(log N)	
  opera'ons	
  

32	
  
Experiment	
  4	
  
•  Test	
  dataset	
  R2	
  =	
  0.837	
  
•  k=2	
  for	
  k-­‐NN	
  (selected	
  by	
  cross	
  valida'on)	
  

Mul'ple	
  regressions	
  

k-­‐NN	
  

33	
  
•  Future	
  work	
  
–  Training	
  data	
  with	
  broad	
  coverage	
  
•  DBpedia	
  SPARQL	
  benchmark	
  query	
  templates	
  	
  
–  Berlin:	
  5	
  templates	
  
–  DBPSB:	
  20	
  templates	
  

–  Fine	
  tuning	
  with	
  more	
  cross	
  valida'on	
  

34	
  
SUGGESTING	
  SPARQL	
  QUERIES	
  

35	
  
Sugges'ng	
  SPARQL	
  queries	
  based	
  on	
  
query	
  history	
  
•  Use	
  the	
  same	
  features	
  	
  
•  Construct	
  a	
  k-­‐d	
  tree	
  for	
  nearest	
  neighbor	
  
search	
  
•  Top	
  M neighbors	
  for	
  a	
  query	
  are	
  the	
  top	
  M 	
  
sugges'ons	
  for	
  that	
  query	
  

36	
  
Example	
  
SELECT	
  DISTINCT	
  ?uri	
  
WHERE	
  
{	
  	
  
	
  dbpedia	
  :1549	
  _Mikko	
  ?p	
  ?	
  uri	
  .	
  
	
  ?	
  uri	
  rdf	
  :	
  type	
  ?x	
  
}	
  

Sugges'on	
  1	
  
SELECT	
  DISTINCT	
  ?uri	
  
WHERE	
  
{	
  	
  
	
  dbpedia	
  :	
  Radu_Sabo	
  ?p	
  ?	
  uri	
  .	
  
	
  ?	
  uri	
  rdf	
  :	
  type	
  ?x	
  
}	
  
Sugges'on	
  2	
  
SELECT	
  DISTINCT	
  ?uri	
  
WHERE	
  
{	
  	
  
	
  dbpedia	
  :	
  Hafar_Al	
  -­‐	
  Ba'n	
  ?p	
  ?	
  uri	
  .	
  
	
  ?	
  uri	
  rdf	
  :	
  type	
  ?x	
  
}	
  
Sugges'on	
  3	
  
SELECT	
  DISTINCT	
  ?uri	
  
WHERE	
  
{	
  	
  
	
  dbpedia	
  :	
  Maurice_D	
  ._G.	
  _Scof	
  ?p	
  ?	
  uri	
  .	
  
	
  ?	
  uri	
  rdf	
  :	
  type	
  ?x	
  
}	
  

37	
  
•  Future	
  work	
  
–  Query	
  construc'on	
  and	
  refinement	
  workflow	
  
•  How	
  to	
  use	
  the	
  query	
  sugges'ons?	
  

–  Evalua'ng	
  the	
  sugges'ons	
  
•  User	
  study	
  

38	
  
Thank	
  you	
  

39	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaJen Aman
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...Spark Summit
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkDalei Li
 
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...Kuldeep Jiwani
 
Distributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz SikanderDistributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz Sikanderrogerz1234567
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonSujit Pal
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit
 
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...CloudxLab
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerSpark Summit
 
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Databricks
 
A Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsDonald Nguyen
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data ValidationDatabricks
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Big Data Spain
 

Was ist angesagt? (20)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
 
Distributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz SikanderDistributed Convex Optimization Thesis - Behroz Sikander
Distributed Convex Optimization Thesis - Behroz Sikander
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and Python
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
 
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...
 
A Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph AnalyticsA Lightweight Infrastructure for Graph Analytics
A Lightweight Infrastructure for Graph Analytics
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 

Ähnlich wie Predicting SPARQL query execution time and suggesting SPARQL queries based on query history

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Soheila Dehghanzadeh
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionRakebul Hasan
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Thanh Tran
 
Winning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsWinning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsKonstantinos Pelechrinis
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsDatabricks
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDatabricks
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large GraphsNishant Gandhi
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesDatabricks
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Databricks
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Anmol Dwivedi
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...James McCombs
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsDatabricks
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 

Ähnlich wie Predicting SPARQL query execution time and suggesting SPARQL queries based on query history (20)

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance Prediction
 
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
 
Winning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsWinning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and Tensors
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Fp12_Efficient_SCM
Fp12_Efficient_SCMFp12_Efficient_SCM
Fp12_Efficient_SCM
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
 
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
Performance Benchmarking of the R Programming Environment on the Stampede 1.5...
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Predicting SPARQL query execution time and suggesting SPARQL queries based on query history

  • 1. Predic'ng  SPARQL  Query  Execu'on   Time  and  Sugges'ng  SPARQL   Queries  Based  on  Query  History   Rakebul  Hasan  
  • 2. Context   •  Assis'ng  human  users  and  soAware  agents  in:   –  Querying  Seman'c  Web  data   •  Understanding  query  behavior:  predic'ng  query   performance   –  Workload  management,  query  scheduling,  query  op'miza'on   •  Construc'ng  and  refining  queries:  sugges'ng  alterna'ves   –  Consuming  Seman'c  Web  data   •  Understanding  reasoning  of  Seman'c  Web  soAware  agents:   explaining  reasoning   –  Transparency,  trust,  scrutability,  decision  effec'veness,  decision   efficiency,  user  sa'sfac'on   1  
  • 3. Outline   •  Predic'ng  SPARQL  query  execu'on  'me   •  Sugges'ng  similar  SPARQL  queries  from  query   history   2  
  • 4. PREDICTING  SPARQL  QUERY   EXECUTION  TIME   3  
  • 5. •  Accurately  predic'ng  query  performance   enables  effec've     –  workload  management   –  query  scheduling   –  query  op'miza'on   4  
  • 6. Understanding  performance  of   computer  programs   Insight.  [Knuth]  Use  scien'fic  method  to   understand  performance   5  
  • 7. Scien'fic  method  applied  to  analysis  of   algorithms   •  A  framework  for  predic'ng  performance  and  comparing   algorithms.   •  Scien'fic  method   –  –  –  –  –  Observe  some  feature  of  the  natural  world.   Hypothesize  a  model  that  is  consistent  with  the  observa'ons.   Predict  events  using  the  hypothesis.   Verify  the  predic'ons  by  making  further  observa'ons.   Validate  by  repea'ng  un'l  the  hypothesis  and  observa'ons   agree.   •  Principles   –  Experiments  must  be  reproducible.     –  Hypotheses  must  be  falsifiable.     •  Feature  of  the  natural  world.  Computer  itself.   Slide  credit:  Robert  Sedgewick   6  
  • 8. Example:  3-­‐Sum   •  3-­‐SUM.  Given  N dis'nct  integers,  how  many   triples  sum  to  exactly  zero?   •  3-­‐SUM  brute-­‐force  algorithm.  Check  all  the   possible  triples.   •  How  much  'me  does  it  take?   Slide  credit:  Robert  Sedgewick   7  
  • 9. Data  analysis   •  Standard  plot.  Plot  running  'me  T (N)  vs.  input  size  N.   Slide  credit:  Robert  Sedgewick   8  
  • 10. Data  analysis   •  Log-­‐log  plot.  Plot  running  'me  lg(T (N))  vs.  input  size lg N.   •  Regression.  Fit  straight  line  through  data  points:  a N b.   •  Hypothesis.  The  running  'me  is  about  1.006 × 10 –10 × N 2.999 Slide  credit:  Robert  Sedgewick   9  
  • 11. Predic'on  and  valida'on   •  Hypothesis.  The  running  'me  is  about  1.006 × 10 –10 × N 2.999 •  Predic'ons.   –  51.0  seconds  for  N =  8000.   –  408.1  seconds  for  N =  16000.   •  Observa'ons.   Validates  the  hypothesis   Slide  credit:  Robert  Sedgewick   10  
  • 12. Understanding  performance  of   database  queries   •  Ganapathi  et  al.  predic'ng  performance   metrics  of  database  queries  prior  to  query   execu'on  using  machine  learning.   •  Gupta  et  al.  use  machine  learning  for   predic'ng  query  execu'on  'me  ranges.   Ganapathi  et  al.:  Predic'ng  mul'ple  metrics  for  queries:  Befer  decisions  enabled  by  machine  learning.  In  Proc.  of  the  2009  IEEE  ICDE   Gupta  et  al.:  PQR:  Predic'ng  query  execu'on  'mes  for  autonomous  workload  management.  In  Proc.  of  the  2008  ICAC   11  
  • 13. Predic'ng  SPARQL  query  execu'on   'me   •  Key  challenge.  Feature  engineering   –  Represen'ng  SPARQL  queries  as  feature  vectors   •  Each  dimension  of  the  vector  is  a  feature   12  
  • 14. Configura'on   •  Apache  Jena  TDB   –  With  DBpedia  3.8  dataset     •  Training,  valida'on,  and  test  queries:   randomly  selected  from  DBpedia  SPARQL   Benchmark  (DBPSB)  query  dataset   –  3600  training,  1200  valida'on,  1200  test     13  
  • 15. Jena  ARQ  query  processing   •  A  SPARQL  query  in  ARQ  goes  through  several   stages  of  processing:   –  String  to  Query  (parsing)   –  Transla'on  from  Query  to  a  SPARQL  algebra   expression   –  Op'miza'on  of  the  algebra  expression   –  Query  plan  determina'on  and  low-­‐level   op'miza'on   –  Evalua'on  of  the  query  plan   14  
  • 16. SPARQL  algebra  features   •  SPARQL  Algebra1   1  hfp://www.w3.org/TR/sparql11-­‐query/#sparqlQuery   15  
  • 17. SPARQL  algebra  features   DEFGHI,4)/48,9>$$'8JJ703%#<&)0J4)/4JA<BJ=, KFLFMN,OHKNHPMN,.%/0+,.%"&1,QRFEF,S, ,,,.7,4)/4805)7,90/"3$)8'+(#)%:#+(;+(<&)0=,<, ,,,.7,4)/48%/0+,.%/0+,, ,,,TDNHTPUL,S,.7,4)/48%"&1,.%"&1,V V !"#$"%&$ '()*+&$,-.%/0+,.%"&12 3+4$*)"% 56' $("'3+, .7, 4)/4805)7, 90/"3$)8'+(#)%:#+(;+(<&)0= 56' $("'3+, .7, 4)/48%/0+, .%/0+ $("'3+, .7, 4)/48%"&1 .%"&1 $("'3+,56',*)"%,3+4$*)"%,<,<,<,<,'()*+&$,!"#$"%&$,<,<,<,<,!+'$> ,,?,,,,,@,,,A,,,,,,B,,,,,<,<,<,<,,,,B,,,,,,,B,,,,,<,<,<,<,,,C 16  
  • 18. Experiment  1   •  Model:  Support  Vector  Machine  regression   •  Evalua'on  measure:  R2 •  Measures  how  well  future  samples  are  likely  to  be  predicted  by  the   model.   17  
  • 19. Experiment  1   •  Test  dataset  R2  =  0.004492   Log  scale  plomng  of  predicted  vs  actual  execu'on  'mes  for  the  test  queries.   18  
  • 20. Experiment  1   Some  of  the  long  running  queries  share  structurally   similar  basic  graph  paferns.   {      dbpedia  :1549  _Mikko  ?p  ?  uri  .    ?  uri  rdf  :  type  ?x   }   Challenge.  How  do  we  represent  basic  graph   paferns  as  vectors?   19  
  • 21. Basic  Graph  Pafern  Features   •  Infinite  number  of  possibili'es  to  write  a  basic  graph   pafern  (BGP)   •  Only  the  set  of  literal  values  and  the  set  of  resources   appearing  in  the  RDF  graph   –  Exponen'al  number  of  possibili'es   –  A  graph  with  n  triples  has  2n subsets  of  triples     •  Feature  vector  with  exponen'al  number  of  dimensions   –  Not  feasible     20  
  • 22. Basic  Graph  Pafern  Features   •  Pafern  graph  =  RDF  graph  constructed  from   all  the  BGPs  in  a  query   –  Replace  variables  with  a  fixed  symbol  ‘?’   •  Cluster  the  training  queries  based  on  pafern   graph  similari'es   •  Create  a  vector  with  similarity  scores  between   the  pafern  graph  of  the  query  and  the  queries   in  the  cluster  centers.   21  
  • 23. •  Graph  Edit  Distance   –  Minimum  amount  of  distor'on  needed  to   transform  one  graph  to  another   –  Compute  similarity  by  inversing  distance   22  
  • 24. •  Graph  Edit  Distance   –  Usually  computed  using  A*  search     •  Exponen'al  running  'me   –  Bipar'te  matching  based  approximated  graph  edit   distance  with     •  Previous  research  shows  very  accurate  results  with   classifica'on  problems   23  
  • 25. •  Clustering  Training  Queries   –  K-­‐mediods  clustering  algorithm  with  approximated   edit  distance  as  distance  func'on   •  Selects  data  points  as  cluster  centers   •  Arbitrary  distance  func'on   24  
  • 26. Experiment  2   •  Model:  Support  Vector  Machine  regression   •  Test  dataset  R2  =  0.124204   •  K  =  10   Algebra  features   Algebra  +  BGP  features   25  
  • 27. Mul'ple  Regressions   •  We  train  different  SMV  regressions  for   different  'me  ranges.   •  The  variance  in  y-­‐axis  is  less  for  each   regression,  easier  to  fit  a  curve.   26  
  • 28. •  Different  'me  ranges   –  Clustering  the  execu'on  'me  ranges   •  We  use  x-­‐means  clustering  algorithm  which   automa'cally  es'mates  the  number  of  clusters   –  5  clusters  found  in  the  training  dataset   –  Each  cluster  contains  queries  with  similar   execu'on  'mes   27  
  • 29. •  Predic'ng  execu'on  'me  range   –  Predict  the  corresponding  clusters  for  unseen   queries.   –  How   •  Train  a  SMV  classifier  with  the  found  clusters  as  labels   •  Classify  unseen  queries:  accuracy  of  96%  for  the  test   dataset     •  This  means  we  can  accurately  predict  'me  ranges   28  
  • 30. •  Predic'ng  execu'on  'me   –  Different  SMV  regressions  for  different  'me   ranges.   –  Use  the  corresponding  regression  to  the  'me   range  cluster  for  an  unseen  query   29  
  • 31. Experiment  3   •  Test  dataset  R2  =  0.83862   Algebra  +  BGP  features   Mul'ple  regressions   30  
  • 32. Predic'ng  with  nearest  neighbors   regression   •  The  k-­‐nearest  neighbors  algorithm  (k-­‐NN)  is   oAen  successful  in  the  cases  where  decision   boundary  is  irregular.   •  We  train  a  k-­‐NN  with     –  Euclidean  distance  as  the  distance  func'on   –  Distance  weigh'ng:  weighted  by  the  inverse  of  the   distance   31  
  • 33. •  k-­‐dimensional  tree  (k-­‐d  tree)  data  structure  to   search  the  nearest  neighbors     –  a  space-­‐par''oning  data  structure  for  organizing   points  in  a  k-­‐dimensional  space   •  Complexity  of  a  search:  O(log N)  opera'ons   32  
  • 34. Experiment  4   •  Test  dataset  R2  =  0.837   •  k=2  for  k-­‐NN  (selected  by  cross  valida'on)   Mul'ple  regressions   k-­‐NN   33  
  • 35. •  Future  work   –  Training  data  with  broad  coverage   •  DBpedia  SPARQL  benchmark  query  templates     –  Berlin:  5  templates   –  DBPSB:  20  templates   –  Fine  tuning  with  more  cross  valida'on   34  
  • 37. Sugges'ng  SPARQL  queries  based  on   query  history   •  Use  the  same  features     •  Construct  a  k-­‐d  tree  for  nearest  neighbor   search   •  Top  M neighbors  for  a  query  are  the  top  M   sugges'ons  for  that  query   36  
  • 38. Example   SELECT  DISTINCT  ?uri   WHERE   {      dbpedia  :1549  _Mikko  ?p  ?  uri  .    ?  uri  rdf  :  type  ?x   }   Sugges'on  1   SELECT  DISTINCT  ?uri   WHERE   {      dbpedia  :  Radu_Sabo  ?p  ?  uri  .    ?  uri  rdf  :  type  ?x   }   Sugges'on  2   SELECT  DISTINCT  ?uri   WHERE   {      dbpedia  :  Hafar_Al  -­‐  Ba'n  ?p  ?  uri  .    ?  uri  rdf  :  type  ?x   }   Sugges'on  3   SELECT  DISTINCT  ?uri   WHERE   {      dbpedia  :  Maurice_D  ._G.  _Scof  ?p  ?  uri  .    ?  uri  rdf  :  type  ?x   }   37  
  • 39. •  Future  work   –  Query  construc'on  and  refinement  workflow   •  How  to  use  the  query  sugges'ons?   –  Evalua'ng  the  sugges'ons   •  User  study   38