SlideShare ist ein Scribd-Unternehmen logo
1 von 112
Results	
  of	
  the	
  second	
  worldwide	
  
            evalua3on	
  campaign	
  for	
  
                           seman3c	
  tools	
  
               ©	
  the	
  SEALS	
  Project	
  
        h>p://www.seals-­‐project.eu/	
  
2nd	
  SEALS	
  Yards3cks	
  for	
  
 Ontology	
  Management	
  
2nd	
  SEALS	
  Yards3cks	
  for	
  Ontology	
  
                Management	
  
•  Conformance	
  and	
  interoperability	
  results	
  
•  Scalability	
  results	
  
•  Conclusions	
  




3
Conformance	
  evalua3on	
  
•  Ontology	
  language	
  conformance	
  
    –  The	
  ability	
  to	
  adhere	
  to	
  exis3ng	
  ontology	
  language	
  
       specifica3ons	
  
•  Goal:	
  to	
  evaluate	
  the	
  conformance	
  of	
  seman3c	
  
   technologies	
  with	
  regards	
  to	
  ontology	
  representa3on	
  
   languages	
  
                                         Tool X


                          O1               O1’              O1’’


                                  Step 1: Import + Export

                               O1 = O1’’ + α - α’

4
Metrics	
  
•  Execu9on	
  informs	
  about	
  the	
  correct	
  execu3on:	
  	
  
      –  OK.	
  No	
  execu3on	
  problem	
  
      –  FAIL.	
  Some	
  execu3on	
  problem	
  
      –  Pla+orm	
  Error	
  (P.E.)	
  PlaQorm	
  excep3on	
  


•  Informa9on	
  added	
  or	
  lost	
  in	
  terms	
  of	
  triples,	
  axioms,	
  etc.	
  
                                                                                               Oi = Oi’ + α - α’

•  Conformance	
  informs	
  whether	
  the	
  ontology	
  has	
  been	
  
   processed	
  correctly	
  with	
  no	
  addi3on	
  or	
  loss	
  of	
  
   informa3on:	
  
      –  SAME	
  if	
  Execu'on	
  is	
  OK	
  and	
  Informa'on	
  added	
  and	
  
         Informa'on	
  lost	
  are	
  void	
  
      –  DIFFERENT	
  if	
  Execu'on	
  is	
  OK	
  but	
  Informa'on	
  added	
  or	
            Oi = Oi’ ?
         Informa'on	
  lost	
  are	
  not	
  void	
  
      –  NO	
  if	
  Execu'on	
  is	
  FAIL	
  or	
  P.E.	
  


      5
Interoperability	
  evalua3on	
  
•  Ontology	
  language	
  interoperability	
  
     –  The	
  ability	
  to	
  interchange	
  ontologies	
  and	
  use	
  them	
  
•  Goal:	
  to	
  evaluate	
  the	
  interoperability	
  of	
  seman3c	
  technologies	
  in	
  
   terms	
  of	
  the	
  ability	
  that	
  such	
  technologies	
  have	
  to	
  interchange	
  
   ontologies	
  and	
  use	
  them	
  

                              Tool X                                    Tool Y

            O1                   O1’                 O1’’                 O1’’’            O1’’’’


                     Step 1: Import + Export                     Step 2: Import + Export
                       O1 = O1’’ + α - α’                          O1’’=O1’’’’ + β - β’

                                                 Interchange

                                    O1 = O1’’’’ + α - α’ + β - β’

6
Metrics	
  
•  Execu9on	
  informs	
  about	
  the	
  correct	
  execu3on:	
  	
  
      –    OK.	
  No	
  execu3on	
  problem	
  
      –    FAIL.	
  Some	
  execu3on	
  problem	
  
      –    Pla+orm	
  Error	
  (P.E.)	
  PlaQorm	
  excep3on	
  
      –    Not	
  Executed.	
  (N.E.)	
  Second	
  step	
  not	
  executed	
  

•  Informa9on	
  added	
  or	
  lost	
  in	
  terms	
  of	
  triples,	
  axioms,	
  etc.	
  
                                                                                                     Oi = Oi’ + α - α’

•  Interchange	
  informs	
  whether	
  the	
  ontology	
  has	
  been	
  
   interchanged	
  correctly	
  with	
  no	
  addi3on	
  or	
  loss	
  of	
  
   informa3on:	
  
      –  SAME	
  if	
  Execu'on	
  is	
  OK	
  and	
  Informa'on	
  added	
  and	
  Informa'on	
  
         lost	
  are	
  void	
  
      –  DIFFERENT	
  if	
  Execu'on	
  is	
  OK	
  but	
  Informa'on	
  added	
  or	
  
         Informa'on	
  lost	
  are	
  not	
  void	
                                                     Oi = Oi’ ?
      –  NO	
  if	
  Execu'on	
  is	
  FAIL,	
  N.E.,	
  or	
  P.E.	
  


       7
Test	
  suites	
  used	
  

    Name	
                                               Defini9on	
                         Nº	
  Tests	
  
    RDF(S)	
  Import	
  Test	
  Suite	
                  Manual	
                           82	
  
    OWL	
  Lite	
  Import	
  Test	
  Suite	
             Manual	
                           82	
  
    OWL	
  DL	
  Import	
  Test	
  Suite	
               Keyword-­‐driven	
  generator	
   561	
  
    OWL	
  Full	
  Import	
  Test	
  Suite	
             Manual	
                           90	
  
    OWL	
  Content	
  Pa>ern	
                           Expressive	
  generator	
          81	
  
    OWL	
  Content	
  Pa>ern	
  Expressive	
             Expressive	
  generator	
          81	
  
    OWL	
  Content	
  Pa>ern	
  Full	
  Expressive	
     Expressive	
  generator	
          81	
  




8
Tools	
  evaluated	
  
    1st	
  Evalua3on	
  
    Campaign	
  




    2nd	
  Evalua3on	
  
    Campaign	
  




9
Evalua3on	
  Execu3on	
  
•  Evalua3ons	
  automa3cally	
  performed	
  with	
  the	
  SEALS	
  
   PlaQorm	
  
    –  h>p://www.seals-­‐project.eu/	
  
                                                                                SEALS




•  Evalua3on	
  materials	
  available	
                          Test Suite
                                                                               Test Suite

                                                                                                 Test Suite
                                                                                                 Raw Result


     –  Test	
  Data	
  
     –  Results	
  
                                                                                  Test Suite
                                                                                Interpretation




     –  Metadata	
                           Conformance   Interoperability                          Scalability




10
Dynamic	
  result	
  visualiza3on	
  




11
RDF(S)	
  conformance	
  results	
  
                       •  Jena	
  and	
  Sesame	
  behave	
  
                          iden3cally	
  (no	
  problems)	
  
                       •  The	
  behaviour	
  of	
  the	
  OWL	
  API-­‐
                          based	
  tools	
  (NeOn	
  Toolkit,	
  OWL	
  
                          API	
  and	
  Protégé	
  4)	
  has	
  
                          significantly	
  changed	
  
                            –  Transform	
  ontologies	
  to	
  OWL	
  2	
  
                            –  Some	
  problems	
  
                                 •  Less	
  in	
  newer	
  versions	
  
                       •  Protégé	
  OWL	
  improves	
  




12
OWL	
  Lite	
  conformance	
  results	
  

                         •  Jena	
  and	
  Sesame	
  behave	
  
                            iden3cally	
  (no	
  problems)	
  
                         •  The	
  OWL	
  API-­‐based	
  tools	
  (NeOn	
  
                            Toolkit,	
  OWL	
  API	
  and	
  Protégé	
  4)	
  
                            improve	
  
                              –  Transform	
  ontologies	
  to	
  OWL	
  2	
  
                         •  Protégé	
  OWL	
  improves	
  




13
OWL	
  DL	
  conformance	
  results	
  

                        •  Jena	
  and	
  Sesame	
  behave	
  
                           iden3cally	
  (no	
  problems)	
  
                        •  OWL	
  API	
  and	
  Protégé	
  4	
  improve	
  
                        •  NeOn	
  Toolkit	
  	
  worsenes	
  
                        •  Protégé	
  OWL	
  behaves	
  
                           iden3cally	
  
                        •  Robustness	
  increases	
  




14
Content	
  pa>ern	
  conformance	
  results	
  
                                 •  New	
  issues	
  iden3fied	
  in	
  
                                    the	
  OWL	
  API-­‐based	
  tools	
  
                                    (NeOn	
  Toolkit,	
  OWL	
  API	
  
                                    and	
  Protégé	
  4)	
  

                                 •  New	
  issue	
  iden3fied	
  in	
  
                                    Protégé	
  4	
  



                                 •  No	
  new	
  issues	
  




15
Interoperability	
  results	
  
1st	
  Evalua3on	
          2nd	
  Evalua3on	
  
Campaign	
                  Campaign	
             •  Same	
  analysis	
  as	
  in	
  
                                                      conformance	
  
                                                   •  OWL	
  DL:	
  New	
  issue	
  found	
  
                                                      in	
  interchanges	
  from	
  
                                                      Protégé	
  4	
  to	
  Protégé	
  OWL	
  
                                                   •  Conclusions:	
  
                                                        –  RDF-­‐based	
  tool	
  have	
  no	
  
                                                           interoperability	
  problems	
  
                                                        –  OWL-­‐based	
  tools	
  have	
  no	
  
                                                           interoperability	
  problems	
  
                                                           with	
  OWL	
  Lite	
  but	
  have	
  
                                                           some	
  with	
  OWL	
  DL.	
  
                                                        –  Tools	
  based	
  on	
  the	
  OWL	
  
                                                           API	
  cannot	
  interoperate	
  
                                                           using	
  RDF(S)	
  (they	
  
                                                           convert	
  ontologies	
  into	
  
                                                           OWL	
  2)	
  
   04.08.2010

   16
2nd	
  SEALS	
  Yards3cks	
  for	
  Ontology	
  
                 Management	
  
•  Conformance	
  and	
  interoperability	
  results	
  
•  Scalability	
  results	
  
•  Conclusions	
  




17
Scalability	
  evalua3on	
  
                      Tool X


         O1             O1’              O1’’


               Step 1: Import + Export

              O1 = O1’’ + α - α’




18
Execu3on	
  se^ngs	
  
Test	
  suites:	
  
•  Real	
  World.	
  Complex	
  ontologies	
  from	
  biological	
  and	
  
   medical	
  domains	
  
•  Real	
  World	
  NCI.	
  Thesaurus	
  subsets	
  (1.5-­‐2	
  3mes	
  bigger)	
  
•  LUBM.	
  Synthe3c	
  ontologies	
  
Execu9on	
  Environment:	
  
•  Win7-­‐64bit,	
  Intel	
  Core	
  2	
  Duo	
  CPU,	
  2.40GHz,	
  4.00	
  GB	
  RAM	
  
   (Real	
  World	
  Ontologies	
  Test	
  Collec'ons)	
  
•  WinServer-­‐64bit,	
  AMD	
  Dual	
  Core,	
  2.60	
  GHz	
  (4	
  Processors),	
  
   8.00	
  GB	
  RAM	
  (LUBM	
  Ontologies	
  Test	
  Collec'on)	
  
Constraint:	
  
•  30	
  min	
  threshold	
  per	
  test	
  case	
  

19
Real	
  World	
  Scalability	
  Test	
  Suite	
  
Test	
   Size	
       Triples	
     Protégé	
        Protégé4	
   Protégé           OWL	
  API	
     OWL	
  API	
     Neon	
  	
     Neon	
       Jena	
  v.   Sesame	
  
           MB	
                      OWL	
  	
         v.41	
      4	
  v.42	
       v.310	
          v.324	
         v.232	
        v.252	
        270	
       v.265	
  
RO1	
      0.2	
      3K	
          5	
  (sec)	
          2	
            2	
             2	
              2	
            3	
           2	
           3	
          2	
  
RO2	
      0.6	
      4K	
                2	
             2	
            2	
             2	
              2	
            2	
           2	
           3	
          1	
  
RO3	
       1	
      11K	
               11	
             3	
            4	
            12	
              5	
            7	
           7	
           8	
          2	
  
RO4	
       3	
      31K	
                4	
             5	
            5	
             5	
              4	
            5	
           5	
           5	
          3	
  
RO5	
       4	
      82K	
                8	
             8	
           10	
             7	
              7	
           12	
           7	
           8	
          4	
  
RO6	
       6	
      92K	
                8	
             9	
           12	
             9	
              9	
           11	
          14	
           9	
          4	
  
RO7	
      10	
     135K	
               10	
            11	
           11	
            11	
             10	
           13	
          11	
          10	
          4	
  
RO8	
      10	
     167K	
               14	
             9	
            8	
             8	
              9	
           11	
          11	
          12	
          4	
  
RO9	
      20	
     270K	
               22	
            20	
           24	
            18	
             16	
           19	
          19	
          18	
          7	
  
R10	
      24	
     315K	
               68	
            21	
           24	
            19	
             18	
           26	
          20	
          19	
          8	
  
R11	
      26	
     346K	
            162	
              25	
           19	
            22	
             21	
           27	
          22	
          22	
          9	
  
R12	
      40	
     407K	
                -­‐	
          24	
           22	
            26	
             23	
           28	
          30	
          26	
          9	
  
R13	
      44	
     646K	
                -­‐	
          36	
           33	
            35	
             34	
           44	
          40	
          37	
         13	
  
R14	
      46	
     671K	
                -­‐	
          30	
           27	
            28	
             28	
           35	
          37	
          41	
         13	
  
R15	
      84	
     864K	
                -­‐	
          34	
           26	
            32	
             26	
           36	
          33	
          69	
         21	
  
R16	
      117	
   1623K	
               -­‐	
            -­‐	
           -­‐	
          -­‐	
            -­‐	
           -­‐	
         -­‐	
      102	
         33	
  


          20
Real	
  World	
  NCI	
  Scalability	
  Test	
  Suite	
  
Test	
   Size	
      Triples	
   Protégé	
   Protégé4	
   Protégé4	
   OWL	
  API	
   OWL	
  API	
   NTK	
  v.    NTK	
  v.   Jena	
  v.   Sesame	
  
           MB	
                     OWL	
  	
     v.41	
        v.42	
       v.310	
       v.324	
      232	
      252	
        270	
       v.265	
  
NO1	
      0.5	
     3.6K	
    10	
  (sec)	
        5	
           6	
          4	
           3	
         4	
         4	
         4	
          2	
  
NO2	
      0.6	
     4.3K	
          4	
            3	
           3	
          3	
           3	
         3	
         3	
         3	
          2	
  
NO3	
      1	
       11K	
           5	
            4	
           4	
          4	
           4	
         4	
         4	
         3	
          2	
  
NO4	
      4	
       31K	
           9	
            5	
           8	
          5	
           5	
         6	
         5	
         5	
          3	
  
NO5	
      11	
      82K	
        13	
              7	
          10	
          8	
           8	
         9	
         8	
         9	
          5	
  
NO6	
      14	
      109K	
       17	
              8	
          10	
          9	
          10	
        10	
        10	
        10	
          5	
  
NO7	
      18	
      135K	
       19	
              9	
          12	
         10	
          10	
        12	
        12	
        11	
          5	
  
NO8	
      23	
      167K	
       23	
             10	
          14	
         11	
          11	
        13	
        13	
        14	
          7	
  
NO9	
      38	
      270K	
       37	
             15	
          16	
         15	
          13	
        18	
        17	
        20	
          9	
  
N10	
      44	
      314K	
       74	
             16	
          18	
         16	
          17	
        21	
        19	
        23	
         10	
  
N11	
      48	
      347K	
      136	
             17	
          19	
         16	
          18	
        21	
        20	
        24	
         10	
  
N12	
      56	
      407K	
           -­‐	
        20	
          22	
         19	
          19	
        26	
        24	
        30	
         13	
  
N13	
      89	
      646K	
           -­‐	
        29	
          28	
         28	
          29	
        39	
        35	
        47	
         18	
  
N14	
      92	
      671K	
           -­‐	
        28	
          32	
         28	
          29	
        39	
        35	
        49	
         21	
  
N15	
      118	
     864K	
           -­‐	
        34	
          36	
         34	
          36	
        48	
        45	
        63	
         26	
  
N16	
      211	
     1540K	
          -­‐	
        61	
          61	
         62	
          71	
        83	
       100	
       282	
         41	
  


          21
LUBM	
  Test	
  Suite	
  
Test	
      Size	
        Protégé	
      Protégé4	
     Protégé4	
     OWL	
  API	
     OWL	
  API	
     NTK	
  v.    NTK	
  v.   Jena	
  v.   Sesame	
  
            MB	
           OWL	
  	
       v.41	
         v.42	
        v.310	
          v.324	
          232	
        252	
        270	
       v.265	
  
LO1	
            8	
        29	
            20	
           25	
            15	
            29	
            11	
        16	
         17	
         5	
  
LO2	
           19	
       1M52	
           19	
           30	
            18	
            30	
            16	
        22	
         30	
         8	
  
LO3	
           28	
       2M59	
           17	
           28	
            27	
            40	
            20	
        26	
         42	
        10	
  
LO4	
           39	
       4M05	
           24	
           33	
            33	
            41	
            28	
        39	
         47	
        12	
  
LO5	
            51	
     17M27	
           36	
           40	
             -­‐	
          54	
             -­‐	
      54	
         59	
        14	
  
LO6	
            60	
     22M43	
           41	
           45	
             -­‐	
          60	
             -­‐	
     1M04	
      1M03	
        16	
  
LO7	
            72	
     26M32	
          1M1	
           53	
             -­‐	
        1M18	
             -­‐	
     1M28	
      1M17	
        19	
  
LO8	
            82	
        -­‐	
        1M16	
           59	
             -­‐	
         1M3	
             -­‐	
       -­‐	
     1M27	
        20	
  
LO9	
            92	
        -­‐	
        1M37	
          1M8	
             -­‐	
        2M12	
             -­‐	
       -­‐	
     1M39	
        23	
  
L10	
           105	
        -­‐	
         2M2	
         1M31	
             -­‐	
        2M53	
             -­‐	
       -­‐	
     1M48	
        27	
  
L11	
           116	
        -­‐	
        3M18	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     2M02	
        33	
  
L12	
           129	
        -­‐	
        4M59	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     2M15	
        35	
  
L13	
           143	
        -­‐	
        7M21	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     2M33	
        40	
  
L14	
           153	
        -­‐	
        9M07	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
      2M4	
        42	
  
L15	
           162	
        -­‐	
       11M23	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     2M52	
        43	
  
L16	
           174	
        -­‐	
       14M09	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     3M02	
        44	
  
L17	
           184	
        -­‐	
         17M	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
      3M2	
        46	
  
L18	
           197	
        -­‐	
       23M05	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     3M34	
        51	
  
L19	
           251	
        -­‐	
       27M21	
            -­‐	
           -­‐	
           -­‐	
           -­‐	
       -­‐	
     3M49	
       1M12	
  
           22
LUBM	
  Test	
  Suite	
  (II)	
  
Test	
      Size	
  ,	
     Protégé4	
     Jena	
  v.   Sesame	
      Test	
   Size	
  ,	
     Sesame	
  v.    Test	
   Size	
  ,	
     Sesame	
  v.
            MB	
              v.41	
         270	
       v.265	
                  MB	
         265	
                       MB	
         265	
  
L20	
           263	
            -­‐	
     4M05	
       1M11	
       L36	
        412	
        1M44	
          Le51	
      1,105	
      -­‐	
  
L21	
           284	
            -­‐	
     4M17	
       1M03	
       L37	
        421	
        1M45	
          Le52	
      1,205	
      -­‐	
  
L22	
           242	
            -­‐	
     4M18	
       1M07	
       L38	
        430	
        1M49	
          Le53	
      1,302	
      -­‐	
  
L23	
           251	
            -­‐	
     4M36	
       1M03	
       L39	
        441	
        1M49	
          Le54	
      1,404	
      -­‐	
  
L24	
           263	
            -­‐	
     4M56	
       1M07	
       L40	
        453	
        1M55	
          Le55	
      1,514	
      -­‐	
  
L25	
           284	
            -­‐	
     5M31	
       1M17	
       L41	
        467	
        2M05	
  
L26	
           297	
            -­‐	
     5M35	
       1M18	
       L42	
        480	
        2M04	
  
L27	
           307	
            -­‐	
     5M46	
       1M22	
       L43	
        489	
        2M14	
  
L28	
           317	
            -­‐	
     6M09	
       1M27	
       L44	
        498	
        2M13	
  
L29	
           330	
            -­‐	
     6M13	
        1M3	
  
                                                                     L45	
        510	
        2M23	
  
L30	
           340	
            -­‐	
     6M23	
        1M3	
         LUBM	
  EXTENDED	
  TEST	
  SUITE	
  
L31	
           354	
            -­‐	
     8M03	
       1M35	
       Le46	
       598	
        2M49	
  
L32	
           363	
            -­‐	
     8M07	
       1M31	
                                 16M58	
  
                                                                     Le47	
       705	
  
L33	
           375	
            -­‐	
     9M19	
       1M33	
       Le48	
       802	
              -­‐	
  
L34	
           386	
            -­‐	
       -­‐	
       1M3	
  
                                                                     Le49	
       906	
              -­‐	
  
L35	
           399	
            -­‐	
       -­‐	
      1M39	
  
                                                                     Le50	
       1,001	
            -­‐	
  


           23
2nd	
  SEALS	
  Yards3cks	
  for	
  Ontology	
  
                 Management	
  
•  Conformance	
  and	
  interoperability	
  results	
  
•  Scalability	
  results	
  
•  Conclusions	
  




24
Conclusions	
  –	
  Test	
  data	
  
•  Test	
  suites	
  are	
  not	
  exhaus3ve	
  
     –  The	
  new	
  test	
  suites	
  helped	
  detec3ng	
  new	
  issues	
  
•  A	
  more	
  expressive	
  test	
  suite	
  does	
  not	
  imply	
  
   detec3ng	
  more	
  issues	
  
•  We	
  used	
  exis3ng	
  ontologies	
  as	
  input	
  for	
  the	
  test	
  
   data	
  generator	
  
     –  Requires	
  a	
  previous	
  analysis	
  of	
  the	
  ontologies	
  to	
  
        detect	
  defects	
  	
  
     –  We	
  found	
  ontologies	
  with	
  issues	
  that	
  we	
  had	
  to	
  
        correct	
  

25
Conclusions	
  -­‐	
  Results	
  
•  Tools	
  have	
  improved	
  their	
  conformance,	
  interoperability,	
  
   and	
  robustness	
  
•  High	
  influence	
  of	
  development	
  decisions	
  	
  
     –  the	
  OWL	
  API	
  radically	
  changed	
  the	
  way	
  of	
  dealing	
  with	
  RDF	
  
        ontologies	
  	
  
           •  need	
  tools	
  for	
  easy	
  evalua3on	
  
           •  need	
  stronger	
  regression	
  tes3ng	
  
•  The	
  automated	
  genera3or	
  defined	
  test	
  cases	
  that	
  a	
  person	
  
   would	
  have	
  never	
  though	
  about	
  but	
  which	
  iden3fied	
  new	
  
   tool	
  issues	
  
•  using	
  bigger	
  ontologies	
  for	
  conformance	
  and	
  
   interoperability	
  tes3ng	
  makes	
  much	
  more	
  difficult	
  to	
  find	
  
   problems	
  in	
  the	
  tools	
  


26
Evaluating Storage and
   Reasoning Systems
Index
•    Evaluation scenarios
•    Evaluation descriptions
•    Test data
•    Tools
•    Results
•    Conclusion
Advanced	
  reasoning	
  system	
  

•  Descrip3on	
  logic	
  based	
  system	
  (DLBS)	
  
•  Standard	
  reasoning	
  services	
  
   –  Classifica3on	
  
   –  Class	
  sa3sfiability	
  
   –  Ontology	
  sa3sfiability	
  
   –  Logical	
  entailment	
  
Exis3ng	
  evalua3ons	
  

•  Datasets	
  
      –  	
  Synthe3c	
  genera3on	
  
      –  	
  Hand	
  craked	
  ontologies	
  
      –  	
  Real-­‐world	
  ontologies	
  
•  Evalua3ons	
  
      –  KRSS	
  benchmark	
  
      –  TANCS	
  benchmark	
  
      –  Gardiner	
  dataset	
  
04.08.2010

30
Evaluation criteria
•  Interoperability
   –  the capability of the software product to interact with one or more
      specified systems
   –  a system must
       •  conform to the standard input formats
       •  be able to perform standard inference services
•  Performance
   –  the capability of the software to provide appropriate
      performance, relative to the amount of resources used, under
      stated conditions
Evaluation metrics

•  Interoperability
  –  Number of tests passed without parsing errors
  –  Number of inference tests passed
•  Performance
  –  Loading time
  –  Inference time
Class satisfiability evaluation
•  Standard inference service that is widely used in
   ontology engineering
•  The goal: to assess both DLBS s interoperability and
   performance
•  Input
   –  OWL ontology
   –  One or several class IRIs
•  Output
   –    TRUE the evaluation outcome coincide with expected result
   –    FALSE the evaluation outcome differ from expected outcome
   –    ERROR indicates IO error
   –    UNKNOWN indicates that the system is unable to compute
        inference in the given timeframe
Class satisfiability evaluation
Ontology satisfiability evaluation
•  Standard inference service typically carried out before
   performing any other reasoning task
•  The goal: to assess both DLBS s interoperability and
   performance
•  Input
   –  OWL ontology
•  Output
   –    TRUE the evaluation outcome coincide with expected result
   –    FALSE the evaluation outcome differ from expected outcome
   –    ERROR indicates IO error
   –    UNKNOWN indicates that the system is unable to compute
        inference in the given timeframe
Ontology satisfiability evaluation
Classification evaluation
•  Inference service that is typically carried out after
   testing ontology satisfiability and prior to
   performing any other reasoning task
•  The goal: to assess both DLBS s interoperability
   and performance
•  Input
   –  OWL ontology
•  Output
   –  OWL ontology
   –  ERROR indicates IO error
   –  UNKNOWN indicates that the system is unable to
      compute inference in the given timeframe
Classification evaluation
Logical entailment evaluation
•  Standard inference service that is the basis for query
   answering
•  The goal: to assess both DLBS s interoperability and
   performance
•  Input
   –  2 OWL ontologies
•  Output
   –    TRUE the evaluation outcome coincide with expected result
   –    FALSE the evaluation outcome differ from expected outcome
   –    ERROR indicates IO error
   –    UNKNOWN indicates that the system is unable to compute
        inference in the given timeframe
Logical entailment
Storage and reasoning systems
          evaluation component

•  SRS component is intended to evaluate the
   description logic based systems (DLBS)
   –  Implementing OWL-API 3 de-facto standard for DLBS
   –  Implementing SRS SEALS DLBS interface
•  SRS supports test data in all syntactic formats
   supported by OWL-API 3
•  SRS saves the evaluation results and
   interpretations in MathML 3 format
DLBS interface
•  Java methods to be implemented by system
   developers
  –  OWLOntology loadOntology(IRI iri)
  –  boolean isSatisfiable(OWLOntology onto, OWLClass
     class)
  –  boolean isSatisfiable(OWLOntology onto)
  –  OWLOntology classifyOntology(OWLOntology onto)
  –  URI saveOntology(OWLOntology onto, IRI iri)
  –  boolean entails(OWLOntology onto1, OWLOntology
     onto2)
Testing Data

•  The ontologies from the Gardiner evaluation
   suite.
  –  Over 300 ontologies of varying expressivity and size.
•  Various versions of the GALEN ontology
•  Various ontologies that have been created in EU
   funded projects, such as SEMINTEC, VICODI
   and AEO
•  155 entailment tests from OWL 2 test cases
   repository
Evaluation setup
•  3	
  DLBSs	
  
     –  FaCT++	
  C++	
  implementa3on	
  of	
  FaCT	
  OWL	
  DL	
  reasoner	
  
     –  HermiT	
  Java	
  based	
  OWL	
  DL	
  reasoner	
  u3lizing	
  novel	
  hypertableau	
  
        algorithms	
  
     –  Jcel	
  Java	
  based	
  OWL	
  2	
  EL	
  reasoner	
  
     –  FaCT++C	
  	
  evaluated	
  without	
  OWL	
  prepareReasoner()	
  call	
  
     –  HermiTC	
  evaluated	
  without	
  OWL	
  prepareReasoner()	
  call	
  


•  2	
  AMD	
  Athlon(tm)	
  64	
  X2	
  Dual	
  Core	
  Processor	
  4600+	
  machines	
  
   with	
  2GB	
  of	
  main	
  memory	
  	
  
     –  DLBSs	
  were	
  allowed	
  to	
  allocate	
  up	
  to	
  1	
  GB	
  
Evaluation results: Classification
          FaCT++   HermiT   jcel
ALT, ms   68       506      856
ART, ms   15320    167808   2144
TRUE      160      145      16
FALSE     0        0        0
ERROR     47       33       4
UNKNOWN   3        32       0
Evaluation results: Class satisfiability

           FaCT++    HermiT     jcel
ALT, ms    1047      255        438
ART, ms    21376     517043     1113
TRUE       157       145        15
FALSE      1         0          0
ERROR      36        35         5
UNKNOWN    16        30         0
Evaluation results: Ontology
                satisfiability
            FaCT++    HermiT    jcel
ALT, ms     1315      410       708
ART, ms     25175     249802    1878
TRUE        134       146       16
FALSE       0         0         0
ERROR       45        33        4
UNKNOWN 0             31        0
Evaluation results: Entailment
          FaCT++   HermiT
ALT, ms   14       33
ART, ms   1        20673
TRUE      46       119
FALSE     67       14
ERROR     34       9
UNKNOWN 0          3
Evaluation results: Non entailment
          FaCT++   HermiT
ALT, ms   47       92
ART, ms   5        127936
TRUE      7        7
FALSE     0        1
ERROR     3        1
UNKNOWN 0          1
Comparative evaluation:
              Classification
             FaCT++C   HermiTC
ALT, ms      309       207
ART, ms      3994      2272
TRUE         112       112
Comparative evaluation: Class
          satisfiability
          FaCT++C   HermiTC
ALT, ms   333       225
ART, ms   216       391
TRUE      113       113
Comparative evaluation: Ontology
           satisfiability
          FaCT++C   HermiTC
ALT, ms   333       225
ART, ms   216       391
TRUE      113       113
Comparative evaluation: Entailment
          FaCT++C   HermiTC
ALT, ms   7         7
ART, ms   2         24
TRUE      1         1
Comparative evaluation: Non-
           Entailment
          FaCT++C   HermiTC
ALT, ms   22        18
ART, ms   2         43
TRUE      4         4
Comparative evaluation:
           Classification
        FaCT++C HermiTC FaCT++ HermiT jcel
ALT, ms 398     355     1471    771    856
ART, ms 11548   1241    36650   2817   2144
TRUE    16      16      16      16     16
Comparative evaluation: Class
          satisfiability
        FaCT++C HermiTC FaCT++ HermiT jcel
ALT, ms 382     342     532     1062   438
ART, ms 159     223     7603    3437   1113
TRUE    15      15      15      15     15
Comparative evaluation:
        Ontology satisfiability
        FaCT++C HermiTC FaCT++ HermiT jcel
ALT, ms 360     365     1389    1262   708
ART, ms 11548   202     36650   4790   1878
TRUE    16      16      16      16     16
Challenging ontologies:
                 Classification
Ontology        Mosquito GALEN mged   go    worm-
                -anatomy                    anatomy
Classes         1864     2749  229    19528 6731

Relations       2       413     102   1      5

FaCT++C,LT ms 3760      663     189   4362   783

FaCT++C,RT ms 9568      9970    355   28041 45739

HermiTC,LT ms   510     609     273   4328   973

HermiTC,RT ms   944     12623   27974 12698 2491
Challenging ontologies:
                 Classification
Ontology         plans   information human Fly-    emap
                                           anato
                                           my
Classes          118     121         8342  6326    13731

Relations        263     197       1       3       1

FaCT++C, LT ms 67        106       3186    662     1965

FaCT++C, RT ms 661       126       132607 5016     156714

HermiTC, LT ms   67      95        1192    746     1311

HermiTC, RT ms   115576 7064       3842    6564    7097
Challenging ontologies: Class
                  satisfiability
Ontology      not       GALEN      mged go       plans
              GALEN
Class         Digestion Trimetho   Thing GO_0042 schedule
                        prim             447
Classes       3087      2749       229   19528   118

Relations     413       413        102    1      263

FaCT++C, LT 1130        652        174    4351   78

FaCT++C, RT 3215        1065       160    1465   79

HermiTC, LT   1087      680        358    3961   67

HermiTC, RT   11210     9108       4333   2776   3459
Challenging ontologies: Ontology
               satisfiability
Ontology      not     GALEN   mged go        plans
              GALEN
Classes       3087    2749    229    19528   118

Relations     413     413     102    1       263

FaCT++C, LT 992       618     189    4383    67

FaCT++C, RT 3047      1057    170    1413    74

HermiTC, LT   1166    590     346    4371    69

HermiTC, RT   11562   9408    3197   2687    1827
Conclusion
•  Errors:
  –  datatypes not supported in the systems
  –  syntax related : a system was unable to
     register a role or a concept
  –  expressivity errors
•  Execution time is dominated by small
   number of hard problems
SEALS	
  Ontology	
  Matching	
  
              Evalua3on	
  campaign	
  
               …	
  also	
  known	
  as	
  OAEI	
  2011.5	
  



6/26/12

63
Ontology	
  Matching	
  
 Person	
                                                                                           People	
  


               Author	
                                                                                           Author	
  
                                                 <	
  Author,	
  Author,	
  =,	
  0.97	
  >	
                                         writes	
  
               Commi>eeMember	
                  <	
  Paper,	
  Paper,	
  =,	
  0.94	
  >	
                      Reviewer	
  
                                                 <	
  reviews,	
  reviews,	
  =,	
  0.91	
  >	
  
                                                 <	
  writes,	
  writes,	
  =,	
  0.7	
  >	
  
                                 PCMember	
      <	
  Person,	
  People,	
  =,	
  0.8	
  >	
                           reviews	
  
                                                 <	
  Document,	
  Doc,	
  =,	
  0.7	
  >	
  
                                                 <	
  Reviewer,	
  Review,	
  =,	
  0.6	
  >	
  
                            reviews	
            …	
  
                                                                                                      Doc	
  
Document	
  

                                                                                                                          Paper	
  
                Paper	
  
                                    writes	
  
               Review	
  

    6/26/12

    64
OAEI	
  &	
  SEALS	
  

•  OAEI	
  :	
  Ontology	
  Alignment	
  Evalua3on	
  Ini3a3ve	
  
     –  Organized	
  as	
  annual	
  campaign	
  from	
  2005	
  to	
  2012	
  
     –  Included	
  in	
  Ontology	
  Matching	
  workshop	
  at	
  ISWC	
  
     –  Different	
  tracks	
  (evalua3on	
  scenarios)	
  organized	
  by	
  
        different	
  researchers	
  

•  Star3ng	
  in	
  2010:	
  Support	
  from	
  SEALS	
  
     –  OAEI	
  2010,	
  OAEI	
  2011,	
  and	
  OAEI	
  2011.5	
  
6/26/12

65
OAEI	
  2011.5	
  par3cipants	
  




6/26/12

66
Jose	
  Aguirre	
  


                           OAEI	
  tracks	
  
                                                                       Jerome	
  	
  Euzenat	
  

                                                                        INRIA	
  Grenoble	
  

•  Benchmark	
  
     –  Matching	
  different	
  versions	
  of	
  the	
  same	
  ontology	
  
     –  Scalability:	
  	
  
        Size	
  	
  run3mes	
  
•    Conference	
  
•    Mul3Farm	
  
•    Anatomy	
  
•    Large	
  BioMed	
  


6/26/12

67
Ondřej	
  Šváb-­‐Zamazal	
  


                         OAEI	
  tracks	
  
                                                                   Vojtěch	
  Svátek	
  

                                                                  Prague	
  University	
  
                                                                      of	
  Economics	
  

•  Benchmark	
  
•  Conference	
  
     –  Same	
  domain,	
  different	
  ontology	
  
     –  Manually	
  generated	
  reference	
  alignment	
  
•  Mul3Farm	
  
•  Anatomy	
  
•  Large	
  BioMed	
  


6/26/12

68
Chris3an	
  Meilicke,	
  


                              OAEI	
  tracks	
  
                                                             Cassia	
  Trojahn	
  

                                                      University	
  Mannheim	
  
                                                           INRIA	
  Grenoble	
  

•  Benchmark	
  
•  Conference	
  
•  Mul3Farm:	
  Mul3lingual	
  Ontology	
  Matching	
  
     –  Based	
  on	
  Conference	
  
     –  Testcases	
  for	
  Spanish,	
  German,	
  
        French,	
  Russian,	
  Portuguese,	
  
        Czech,	
  Dutch,	
  Chinese	
  
•  Anatomy	
  
•  Large	
  BioMed	
  

6/26/12

69
Chris3an	
  Meilicke,	
  


                          OAEI	
  tracks	
  
                                               Heiner	
  Stuckenschmidt	
  

                                                University	
  Mannheim	
  


•    Benchmark	
  
•    Conference	
  
•    Mul3Farm	
  
•    Anatomy	
  
     –  Matching	
  mouse	
  
        on	
  human	
  anatomy	
  
     –  Run3mes	
  
•  Large	
  BioMed	
  

6/26/12

70
Ernesto	
  Jimenez	
  Ruiz	
  


                            OAEI	
  tracks	
  
                                                      Bernardo	
  Cuenca	
  Grau	
  
                                                                  Ian	
  Horrocks	
  

                                                         University	
  of	
  Oxford	
  

•    Benchmark	
  
•    Conference	
  
•    Mul3Farm	
  
•    Anatomy	
  
•    Large	
  BioMed	
  
     –  Very	
  large	
  dataset	
  (FMA-­‐NCI)	
  
     –  Includes	
  coherence	
  analysis	
  


6/26/12

71
Detailed	
  results	
  

          h>p://oaei.ontologymatching.org/2011.5/
                     results/index.html	
  




6/26/12

72
Ques3ons?	
  

             Write	
  a	
  mail	
  to	
  Chris3an	
  Meilicke	
  
          chris3an@informa3k.uni-­‐mannheim.de	
  




6/26/12

73
IWEST	
  2012	
  workshop	
  located	
  at	
  ESWC	
  2012	
  




                                                                 Seman3c	
  Search	
  Systems	
  
                                                                    Evalua3on	
  Campaign	
  




            6/26/12

            74
Two	
  phase	
  approach	
  
•  Seman3c	
  search	
  tools	
  evalua3on	
  demands	
  a	
  
   user-­‐in-­‐the-­‐loop	
  phase	
  
     –  usability	
  criterion	
  


•  Two	
  phases:	
  
     –  User-­‐in-­‐the-­‐loop	
  
     –  Automated	
  


6/26/12

75
Evalua3on	
  criteria	
  by	
  phase	
  
Each	
  phase	
  will	
  address	
  a	
  different	
  subset	
  of	
  
criteria.	
  

•  Automated	
  phase:	
  query	
  expressiveness,	
  
   scalability,	
  performance	
  

•  User-­‐in-­‐the-­‐loop	
  phase:	
  usability,	
  query	
  
   expressiveness	
  

6/26/12

76
Par3cipants	
  
Tool	
                   Descrip9on	
                                                                 UITL	
     Auto	
  

K-­‐Search	
             Form-­‐based	
                                                                 x	
        x	
  
Ginseng	
                Natural	
  language	
  with	
  constrained	
  vocabulary	
  and	
              x	
  
                         grammar	
  

NLP-­‐Reduce	
           Natural	
  language	
  for	
  full	
  English	
  ques3ons,	
  sentence	
       x	
  
                         fragments,	
  and	
  keywords.	
  

Jena	
  Arq	
            SPARQL	
  query	
  engine.	
  Automated	
  phase	
  baseline	
                            x	
  
RDF.Net	
  Query	
       SPARQL-­‐based	
                                                                          x	
  
Seman3c	
  Crystal	
     Graph-­‐based	
                                                                x	
  
Affec3ve	
  Graphs	
      Graph-­‐based	
                                                                x	
  

    6/26/12

    77
Usability	
  Evalua3on	
  Setup	
  

 •  Data:	
  Mooney	
  Natural	
  Language	
  Learning	
  Data	
  

 •  Subjects:	
  	
  20	
  (10	
  expert	
  users;	
  10	
  casual	
  users)	
  
      –  Each	
  subject	
  evaluated	
  the	
  5	
  par3cipa3ng	
  tools	
  

 •  Task:	
  Formulate	
  5	
  ques3ons	
  in	
  each	
  tool’s	
  interface	
  	
  

 •  Data	
  Collected:	
  	
  success	
  rate,	
  input	
  3me,	
  number	
  of	
  
    a>empts,	
  response	
  3me,	
  user	
  sa3sfac3on	
  
    ques3onnaires,	
  demographics	
  

04.08.2010

78
1	
  concept,	
  
  1	
  rela3on	
  
                                      Ques3ons	
  
1)	
  Give	
  me	
  all	
  the	
  capitals	
  of	
  the	
  USA?	
           2	
  concepts,	
  2	
  rela3ons	
  


2)	
  What	
  are	
  the	
  ci9es	
  in	
  states	
  through	
  which	
  the	
  
      Mississippi	
  runs?	
                                                             compara3ve	
  

3)	
  Which	
  states	
  have	
  a	
  city	
  named	
  Columbia	
  with	
  a	
  city	
  
      popula3on	
  over	
  50,000?	
  
                                                                                               superla3ve	
  
4)	
  Which	
  lakes	
  are	
  in	
  the	
  state	
  with	
  the	
  highest	
  point?	
  

5)	
  Tell	
  me	
  which	
  rivers	
  do	
  not	
  traverse	
  the	
  
                                                                                nega3on	
  
	
  	
  	
  	
  	
  	
  state	
  with	
  the	
  capital	
  Nashville?	
  
 04.08.2010

 79
Automated	
  Evalua3on	
  Setup	
  

 •  Data:	
  EvoOnt	
  dataset	
  
      –  Five	
  sizes:	
  1K	
  10K	
  100K	
  1M	
  10M	
  triples	
  

 •  Task:	
  Answer	
  10	
  ques3ons	
  per	
  dataset	
  size	
  

 •  Data	
  Collected:	
  	
  ontology	
  load	
  3me,	
  query	
  3me,	
  number	
  
    of	
  results,	
  result	
  list	
  

 •  Analyses:	
  precision,	
  recall,	
  f-­‐measure,	
  mean	
  query	
  3me,	
  
    mean	
  3me	
  per	
  result,	
  etc	
  

04.08.2010

80
Configura3on	
  
•  All	
  tools	
  executed	
  on	
  SEALS	
  PlaQorm	
  
•  Each	
  tool	
  executed	
  within	
  a	
  Virtual	
  Machine	
  

                          Linux	
                                     Windows	
  
     OS	
                 Ubuntu	
  10.10	
  (64-­‐bit)	
             Windows	
  7	
  (64-­‐bit)	
  
     Num	
  CPUs	
        2	
                                         4	
  
     Memory	
  (GB)	
     4	
                                         4	
  
     Tools	
              Arq	
  v2.8.2	
  and	
  Arq	
  v2.9.0	
     RDF	
  Query	
  v0.5.1-­‐beta	
  




6/26/12

81
FINDINGS	
  -­‐	
  USABILITY	
  

6/26/12

82
Graph-­‐based	
  tools	
  most	
  liked	
  	
  
                                                                            (highest	
  ranks	
  and	
  average	
  SUS	
  scores)	
  
                                                                                                           Tool
                                                   100.0
                                                                                                        Semantic-Crystal

                                                                                                                     •  Perceived	
  by	
  expert	
  users	
  
System Usability Scale "SUS" Questionnaire score




                                                                                                        Affective-Graphs
                                                                                                        K-Search
                                                                                                        Ginseng
                                                                                                        Nlp-Reduce

                                                    80.0                                                                as	
  intui9ve	
  allowing	
  them	
  
                                                                                                                        to	
  easily	
  formulate	
  more	
  
                                                    60.0                                                                complex	
  queries.	
  

                                                    40.0
                                                                                                                     •  Casual	
  users	
  enjoyed	
  the	
  
                                                                                                                        fun	
  and	
  visually-­‐appealing	
  
                                                    20.0
                                                                                                                        interfaces	
  which	
  created	
  a	
  
                                                                  17
                                                                                                                        pleasant	
  search	
  
                                                      .0
                                                                                                                        experience.	
  	
  
                                                                   Casual                  Expert

                                                                               UserType

                                                     04.08.2010

                                                     83
Form-­‐based	
  approach	
  most	
  liked	
  by	
  casual	
  
                                                                                users	
  
                                                                                                                    •  Perceived	
  by	
  casual	
  users	
  as	
  
                                                                                                        Tool
                                                      5
Extended Questionnaire Question "The system's query




                                                                                                     Semantic-Crystal
  language was easy to understand and use" score




                                                                                                     Affective-Graphs
                                                                                                     K-Search
                                                                                                     Ginseng
                                                                                                     Nlp-Reduce
                                                                                                                       midpoint	
  between	
  NL	
  and	
  
                                                      4
                                                                                                                       graph-­‐based.	
  

                                                                                                                    •  Allow	
  more	
  complex	
  queries	
  
                                                      3                                                                than	
  the	
  NL	
  does.	
  

                                                                                                                    •  Less	
  complicated	
  and	
  less	
  
                                                      2
                                                                                           61
                                                                                                                       query	
  input	
  3me	
  than	
  the	
  
                                                                                                                       graph-­‐based.	
  	
  
                                                      1
                                                                  17                                                •  Together	
  with	
  graph-­‐based:	
  
                                                                       Casual               Expert
                                                                                                                       most	
  liked	
  by	
  expert	
  users	
  
                                                                                UserType

                                                          04.08.2010

                                                          84
Casual	
  Users	
  liked	
  Controlled-­‐NL	
  approach	
  
                                                                                                           •  Casuals:	
  	
  
                                                                                                   Tool
                                                                                                                   •  liked	
  guidance	
  through	
  
                                                   100.0
                                                                                                Semantic-Crystal
System Usability Scale "SUS" Questionnaire score




                                                                                                Affective-Graphs

                                                                                                                      sugges3ons.	
  
                                                                                                K-Search
                                                                                                Ginseng
                                                                                                Nlp-Reduce

                                                    80.0
                                                                                                                   •  Prefer	
  to	
  be	
  ‘controlled’	
  by	
  the	
  
                                                                                                                      language	
  model,	
  allowing	
  only	
  
                                                    60.0
                                                                                                                      valid	
  queries.	
  


                                                    40.0
                                                                                                           •  Experts:	
  	
  
                                                                                                                   •  restric3ve	
  and	
  frustra3ng.	
  
                                                    20.0
                                                                                                                   •  Prefer	
  to	
  have	
  more	
  flexibility	
  
                                                                                                                      and	
  expressiveness	
  rather	
  than	
  
                                                      .0
                                                                  17                                                  support	
  and	
  restric3on.	
  
                                                                   Casual              Expert

                                                                            UserType

                                                     04.08.2010

                                                     85
Free-­‐NL	
  challenge:	
  habitability	
  problem	
  
                    1.0                                               Tool
                                                                   Semantic-Crystal
                                                                   Affective-Graphs   •  Free-­‐NL	
  liked	
  for	
  its	
  simplicity,	
  
                                                                   K-Search

                     .8
                                                                   Ginseng
                                                                   Nlp-Reduce            familiarity,	
  naturalness	
  and	
  low	
  
                                                                                         query	
  input	
  3me	
  required.	
  
Answer found rate




                                           42                 96
                     .6

                                                                                      •  Facing	
  habitability	
  problem:	
  
                                                                                         mismatch	
  between	
  users	
  query	
  
                                                              98
                     .4


                                                                                         terms	
  and	
  tools	
  ones.	
  
                     .2


                                                              99                      •  Lead	
  to	
  lowest	
  success	
  rate,	
  
                                                                                         highest	
  number	
  of	
  trials	
  to	
  get	
  
                     .0                                       97


                                 Casual              Expert

                                          UserType                                       a	
  sa3sfying	
  answer,	
  and	
  in	
  turn	
  
                                                                                         very	
  low	
  user	
  sa3sfac3on.	
  

                          04.08.2010

                          86
FINDINGS	
  -­‐	
  AUTOMATED	
  

6/26/12

87
Overview	
  
•  K-­‐Search	
  couldn’t	
  load	
  the	
  ontologies	
  
     –  external	
  ontology	
  import	
  not	
  supported	
  
     –  cyclic	
  rela3ons	
  with	
  concepts	
  in	
  remote	
  ontologies	
  not	
  
        supported	
  
•  Non-­‐NL	
  tools	
  transform	
  queries	
  a	
  priori	
  

•  Na3ve	
  SPARQL	
  tools	
  exhibit	
  differences	
  in	
  query	
  
   approach	
  (see	
  load	
  and	
  query	
  3mes)	
  	
  


6/26/12

88
Ontology	
  load	
  3me	
  
                             Arq v2.8.2 ontology load time
                             Arq v2.9.0 ontology load time
            100000           RDF Query v0.5.1-beta ontology load time
                                                                                 •  RDF	
  Query	
  loads	
  
                                                                                    ontology	
  on-­‐the-­‐fly.	
  
                                                                                    Load	
  3mes	
  therefore	
  
                                                                                    independent	
  of	
  
Time (ms)




             10000
                                                                                    dataset	
  size.	
  

                                                                                 •  Arq	
  loads	
  ontology	
  
              1000
                                                                                    into	
  memory.	
  	
  




                         1                 10                     100     1000
                                    Dataset size (thousands of triples)


               6/26/12

               89
Query	
  3me	
  
                             Arq v2.8.2 mean query time                         •  RDF	
  Query	
  loads	
  
                             Arq v2.9.0 mean query time                            ontology	
  on-­‐the-­‐fly.	
  
            100000           RDF Query v0.5.1-beta mean query time
                                                                                   Query	
  3mes	
  therefore	
  
                                                                                   incorporate	
  load	
  3me.	
  	
  
                                                                                •  Expensive	
  for	
  more	
  
                                                                                   than	
  one	
  query	
  in	
  a	
  
Time (ms)




             10000
                                                                                   session.	
  

                                                                                •  Arq	
  loads	
  ontology	
  
                                                                                   into	
  memory.	
  	
  
              1000
                                                                                •  Query	
  3mes	
  largely	
  
                                                                                   independent	
  of	
  
                                                                                   dataset	
  size	
  

                         1                10                     100     1000
                                   Dataset size (thousands of triples)


               6/26/12

               90
SEALS	
  Seman3c	
  Web	
  Service	
  Tools	
  
                       Evalua3on	
  Campaign	
  2011	
  

                    Seman9c	
  Web	
  Service	
  Discovery	
  
                         Evalua9on	
  Results	
  




04.08.2010
6/26/1204.08.2010

91
Evalua3on	
  of	
  SWS	
  Discovery	
  
•  Finding	
  Web	
  Services	
  based	
  on	
  their	
  seman3c	
  
   descrip3ons	
  	
  

•  For	
  a	
  given	
  goal,	
  and	
  a	
  given	
  set	
  of	
  service	
  
   descrip3ons,	
  the	
  tool	
  returns	
  the	
  match	
  degree	
  
   between	
  the	
  goal	
  and	
  each	
  service	
  	
  

•  Measurement	
  services	
  are	
  provided	
  via	
  the	
  SEALS	
  
   plaQorm	
  to	
  measure	
  the	
  rate	
  of	
  matching	
  
   correctness	
  


92    92
Campaign Overview
http://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/
                 semantic-web-service-tools-evaluation-campaign-2011


•  	
  Goal	
  
     –  Which	
  ontology/annota3on	
  is	
  the	
  best:	
  WSMO-­‐Lite,	
  OWL-­‐S	
  or	
  
        SAWSDL?	
  
•  Assump3ons:	
  
     –  Same	
  corresponding	
  Test	
  Collec3ons	
  (TCs)	
  
     –  Same	
  corresponding	
  Matchmaking	
  algorithms	
  (Tools)	
  
     –  The	
  corresponding	
  tools	
  will	
  belong	
  to	
  the	
  same	
  
        provider	
  
     –  The	
  level	
  of	
  performance	
  of	
  a	
  tool	
  for	
  a	
  specific	
  TC	
  is	
  
        of	
  secondary	
  importance	
  	
  


93     93
Campaign Overview
http://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/
                 semantic-web-service-tools-evaluation-campaign-2011


Given	
  that	
  a	
  tool	
  T	
  can	
  apply	
  the	
  same	
  corresponding	
  
matchmaking	
  algorithm	
  M	
  to	
  corresponding	
  test	
  
collec3ons,	
  say,	
  TC1,	
  TC2	
  and	
  TC3,	
  we	
  would	
  like	
  to	
  
compare	
  the	
  performance	
  (e.g.	
  Precision,	
  Recall)	
  
among	
  MTC1,	
  MTC2	
  and	
  MTC3	
  




94    94
Background:	
  S3	
  Challenge	
  
                 h>p://www-­‐ags.d•i.uni-­‐sb.de/~klusch/s3/index.html	
              	
  

     T1	
      T2	
   ……	
   Tn	
              TI	
     TII	
   ……	
   TXV	
  
                                                                                             ……	
  
     M1	
      M2	
   ……	
   Mn	
              MI	
     MII	
   ……	
   MXV	
  



                 TCa	
  (e.g	
  owl-­‐s)	
              TCb	
  (e.g.	
  sawsdl)	
            ……	
  



95     95
Background:	
  S3	
  Challenge	
  
                   h>p://www-­‐ags.d•i.uni-­‐sb.de/~klusch/s3/index.html	
           	
  
     1st	
  Evalua9on	
  Campaign	
  (2010)	
  

     T1	
     T2	
   ……	
   Tn	
              TI	
     TII	
   ……	
   TXV	
  
                                                                                            ……	
  
     M1	
     M2	
   ……	
   Mn	
              MI	
     MII	
   ……	
   MXV	
  



                TCa	
  (e.g	
  owl-­‐s)	
              TCb	
  (e.g.	
  sawsdl)	
            ……	
  



96     96
Background:	
  SWS	
  Challenge	
  
      h>p://sws-­‐challenge.org/wiki/index.php/Scenario:_Shipment_Discovery	
  	
  

                       T1	
                         TI	
                        Ta	
  


                      M1	
                         MI	
                        Ma	
          ……	
  


     Formalism1(e.g.	
  ocml)	
        FormalismI(e.g.	
  owl-­‐s)	
   Formalisma	
  




                                    Goal	
  descrip3ons	
  (e.g.	
  plain	
  text)	
  	
  




97   97
SEALS	
  2nd	
  	
  
              SWS	
  Discovery	
  Evalua3on	
  

                  T1	
                       T2	
                      T3	
              ……	
  


                                            M	
  



          TC1	
  (e.g	
  owl-­‐s)	
   TC2	
  (e.g.	
  sawsdl)	
   TC3	
  (e.g.	
  wsmo-­‐lite)	
     ……	
  



98   98
SEALS	
  Test	
  Collec3ons	
  
•  WSMO-­‐LITE-­‐TC	
  (1080	
  services,	
  42	
  goals)	
  
    h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4b	
  

    h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4g	
  	
  


•  SAWSDL-­‐TC	
  (1080	
  services,	
  42	
  goals)	
  
    h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/SAWSDL-­‐TC/3.0-­‐1b	
  

    h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/SAWSDL-­‐TC/3.0-­‐1g	
  


•  OWLS-­‐TC	
  (1083	
  services,	
  42	
  goals)	
  
    h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/OWLS-­‐TC/4.0-­‐11b	
  

    h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/OWLS-­‐TC/4.0-­‐11g	
  




     99
Metrics	
  –	
  Galago	
  (1)	
  




100   100
Metrics	
  –	
  Galago	
  (2)	
  




101   101
SWS	
  Discovery	
  Evalua3on	
  Workflow	
  




102
SWS	
  Tool	
  Deployment	
  
      Wrapper	
  for	
  SEALS	
  plaQorm	
  




103
Tools	
  
            WSMO-­‐LITE-­‐TC	
                                 SAWSDL-­‐TC	
             OWLS-­‐TC	
  
             WSMO-­‐LITE-­‐OU1	
                               SAWSDL-­‐OU1	
  
                                                             SAWSDL-­‐URJC2	
           OWLS-­‐URJC2	
  

                                                               SAWSDL-­‐M03	
            OWLS-­‐M03	
  

1.	
  Ning	
  Li,	
  The	
  Open	
  University	
  
2.	
  Ziji	
  Cong	
  et	
  al.,	
  University	
  of	
  Rey	
  Juan	
  Carlos	
  	
  
3.	
  Ma>hias	
  Klusch	
  et	
  al.	
  German	
  Research	
  Center	
  for	
  Ar3ficial	
  Intelligence	
  




   104       104
Tools	
  
            WSMO-­‐LITE-­‐TC	
                                 SAWSDL-­‐TC	
             OWLS-­‐TC	
  
             WSMO-­‐LITE-­‐OU1	
                               SAWSDL-­‐OU1	
  
                                                             SAWSDL-­‐URJC2	
           OWLS-­‐URJC2	
  

                                                               SAWSDL-­‐M03	
            OWLS-­‐M03	
  

1.	
  Ning	
  Li,	
  The	
  Open	
  University	
  
2.	
  Ziji	
  Cong	
  et	
  al.,	
  University	
  of	
  Rey	
  Juan	
  Carlos	
  	
  
3.	
  Ma>hias	
  Klusch	
  et	
  al.	
  German	
  Research	
  Center	
  for	
  Ar3ficial	
  Intelligence	
  




   105       105
Evalua3on	
  Execu3on	
  
•  Evalua3on	
  workflow	
  was	
  executed	
  on	
  the	
  SEALS	
  
   PlaQorm	
  
•  All	
  tools	
  were	
  executed	
  within	
  a	
  Virtual	
  Machine	
  

                               Windows	
  
      OS	
                     Windows	
  7	
  (64-­‐bit)	
  
      Num	
  CPUs	
            4	
  
      Memory	
  (GB)	
         4	
  
      Tools	
                  WSMO-­‐LITE-­‐OU,	
  SAWSDL-­‐OU	
  



106

        6/26/12
Par3al	
  Evalua3on	
  Results	
  
           WSMO-­‐LITE	
  vs.	
  SAWSDL	
  	
  

      WSMO-­‐LITE-­‐OU	
         SAWSDL-­‐OU	
  


                         M	
  



 WSMO-­‐LITE-­‐TC	
                 SAWSDL-­‐TC	
  



107
*	
  This	
  table	
  only	
  shows	
  the	
  results	
  that	
  are	
  different	
  




          108
Analysis	
  	
  
•  Out	
  of	
  42	
  goals,	
  only	
  19	
  have	
  different	
  results	
  in	
  terms	
  
   of	
  Precision	
  and	
  recall	
  

•  On	
  17	
  out	
  of	
  19	
  occasions,	
  WSMO-­‐Lite	
  improves	
  
   discovery	
  precision	
  over	
  SAWSDL	
  through	
  specializing	
  
   service	
  seman3cs	
  	
  

•  WSMO-­‐Lite	
  performs	
  worse	
  than	
  SAWSDL	
  in	
  6	
  of	
  19	
  
   occasions	
  on	
  discovery	
  recall	
  while	
  performing	
  the	
  
   same	
  for	
  the	
  other	
  13	
  occasions	
  

       109
Analysis	
  	
  
•  Goal	
  #17:	
  novel_author_service.wsdl	
  (Educ3on	
  domain)	
  
   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4b/suite/
   17/component/GoalDocument/	
  



•  Services	
  chosen	
  from	
  SAWSDL	
  but	
  not	
  WSMO-­‐Lite	
  
   (Economy	
  domain)	
  
  •  roman3cnovel_authormaxprice_service.wsdl	
  
  •  roman3cnovel_authorprice_service.wsdl	
  
  •  roman3cnovel_authorrecommendedprice_service	
  
  •  short-­‐story_authorprice_service.wsdl	
  
  •  science-­‐fic3on-­‐novel_authorprice_service.wsdl	
  
  •  sciencefic3onbook_authorrecommendedprice_service.wsdl	
  
  •  ……….	
  


       110
Lessons	
  Learned	
  
•  WSMO-­‐LITE-­‐OU	
  tends	
  to	
  perform	
  be>er	
  than	
  
   SAWSDL-­‐OU	
  in	
  terms	
  of	
  precision,	
  but	
  slightly	
  worse	
  
   in	
  recall.	
  
•  The	
  only	
  feature	
  of	
  WSMO-­‐Lite	
  used	
  against	
  SAWSDL	
  
   was	
  the	
  service	
  category	
  (based	
  on	
  TC	
  domains).	
  
    –  Services	
  were	
  filtered	
  by	
  service	
  category	
  in	
  WSMO-­‐LITE-­‐
       OU	
  and	
  not	
  in	
  SAWSDL-­‐OU	
  
•  Further	
  tests	
  with	
  addi3onal	
  tools	
  and	
  measures	
  are	
  
   needed	
  for	
  any	
  conclusive	
  results	
  about	
  WSMO-­‐Lite	
  
   vs.	
  SAWSDL	
  (many	
  tools	
  are	
  not	
  available	
  yet)	
  

      111
Conclusions	
  
•  This	
  has	
  been	
  the	
  first	
  SWS	
  evalua3on	
  campaign	
  in	
  the	
  
   community	
  focusing	
  on	
  the	
  impact	
  of	
  the	
  service	
  ontology/
   annota3on	
  on	
  performance	
  

•  This	
  comparison	
  has	
  been	
  facilitated	
  by	
  the	
  genera3on	
  of	
  
   WSMO-­‐LITE-­‐TC	
  as	
  a	
  counterpart	
  of	
  SAWSDL-­‐TC	
  and	
  OWLS-­‐TC	
  
   in	
  the	
  SEALS	
  repository	
  

•  The	
  current	
  comparison	
  only	
  involves	
  2	
  ontologies/
   annota3ons	
  (WSMO-­‐Lite	
  and	
  SAWSDL)	
  

•  Raw	
  and	
  Interpreta3on	
  results	
  are	
  available	
  in	
  RDF	
  via	
  the	
  
   SEALS	
  repository	
  (public	
  access)	
  

       112

Weitere ähnliche Inhalte

Ähnlich wie Seals 2nd campaign results

Artificial intelligence in qa
Artificial intelligence in qaArtificial intelligence in qa
Artificial intelligence in qaTaras Lytvyn
 
Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)SQALab
 
Key Test Design Techniques
Key Test Design TechniquesKey Test Design Techniques
Key Test Design TechniquesTechWell
 
A tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryA tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryIstvan Rath
 
Make good use of explortary testing
Make good use of explortary testingMake good use of explortary testing
Make good use of explortary testinggaoliang641
 
Testing artifacts test cases
Testing artifacts   test casesTesting artifacts   test cases
Testing artifacts test casesPetro Chernii
 
Continous Delivery Toronto Presentation
Continous Delivery Toronto PresentationContinous Delivery Toronto Presentation
Continous Delivery Toronto PresentationXebiaLabs
 
Esem2014 presentation
Esem2014 presentationEsem2014 presentation
Esem2014 presentationTanja Vos
 
TAP-Harness + friends
TAP-Harness + friendsTAP-Harness + friends
TAP-Harness + friendsSteve Purkis
 
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,..."Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...Vladimir Ivanov
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededchiportal
 
Using Lag Variables in Oracle Clinical Procedures
Using Lag Variables in Oracle Clinical ProceduresUsing Lag Variables in Oracle Clinical Procedures
Using Lag Variables in Oracle Clinical ProceduresPerficient
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingTechWell
 
&lt;p>Software Testing&lt;/p>
&lt;p>Software Testing&lt;/p>&lt;p>Software Testing&lt;/p>
&lt;p>Software Testing&lt;/p>Atul Mishra
 

Ähnlich wie Seals 2nd campaign results (20)

test
testtest
test
 
test
testtest
test
 
Artificial intelligence in qa
Artificial intelligence in qaArtificial intelligence in qa
Artificial intelligence in qa
 
Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)Формальная верификация как средство тестирования (в Java)
Формальная верификация как средство тестирования (в Java)
 
Key Test Design Techniques
Key Test Design TechniquesKey Test Design Techniques
Key Test Design Techniques
 
A tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryA tutorial on EMF-IncQuery
A tutorial on EMF-IncQuery
 
Make good use of explortary testing
Make good use of explortary testingMake good use of explortary testing
Make good use of explortary testing
 
Testing artifacts test cases
Testing artifacts   test casesTesting artifacts   test cases
Testing artifacts test cases
 
Continous Delivery Toronto Presentation
Continous Delivery Toronto PresentationContinous Delivery Toronto Presentation
Continous Delivery Toronto Presentation
 
Esem2014 presentation
Esem2014 presentationEsem2014 presentation
Esem2014 presentation
 
TAP-Harness + friends
TAP-Harness + friendsTAP-Harness + friends
TAP-Harness + friends
 
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,..."Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
"Formal Verification in Java" by Shura Iline, Vladimir Ivanov @ JEEConf 2013,...
 
Why do a designed experiment
Why do a designed experimentWhy do a designed experiment
Why do a designed experiment
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic needed
 
Using Lag Variables in Oracle Clinical Procedures
Using Lag Variables in Oracle Clinical ProceduresUsing Lag Variables in Oracle Clinical Procedures
Using Lag Variables in Oracle Clinical Procedures
 
AutoTest.ppt
AutoTest.pptAutoTest.ppt
AutoTest.ppt
 
AutoTest.ppt
AutoTest.pptAutoTest.ppt
AutoTest.ppt
 
AutoTest.ppt
AutoTest.pptAutoTest.ppt
AutoTest.ppt
 
How to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated TestingHow to Actually DO High-volume Automated Testing
How to Actually DO High-volume Automated Testing
 
&lt;p>Software Testing&lt;/p>
&lt;p>Software Testing&lt;/p>&lt;p>Software Testing&lt;/p>
&lt;p>Software Testing&lt;/p>
 

Kürzlich hochgeladen

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Kürzlich hochgeladen (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Seals 2nd campaign results

  • 1. Results  of  the  second  worldwide   evalua3on  campaign  for   seman3c  tools   ©  the  SEALS  Project   h>p://www.seals-­‐project.eu/  
  • 2. 2nd  SEALS  Yards3cks  for   Ontology  Management  
  • 3. 2nd  SEALS  Yards3cks  for  Ontology   Management   •  Conformance  and  interoperability  results   •  Scalability  results   •  Conclusions   3
  • 4. Conformance  evalua3on   •  Ontology  language  conformance   –  The  ability  to  adhere  to  exis3ng  ontology  language   specifica3ons   •  Goal:  to  evaluate  the  conformance  of  seman3c   technologies  with  regards  to  ontology  representa3on   languages   Tool X O1 O1’ O1’’ Step 1: Import + Export O1 = O1’’ + α - α’ 4
  • 5. Metrics   •  Execu9on  informs  about  the  correct  execu3on:     –  OK.  No  execu3on  problem   –  FAIL.  Some  execu3on  problem   –  Pla+orm  Error  (P.E.)  PlaQorm  excep3on   •  Informa9on  added  or  lost  in  terms  of  triples,  axioms,  etc.   Oi = Oi’ + α - α’ •  Conformance  informs  whether  the  ontology  has  been   processed  correctly  with  no  addi3on  or  loss  of   informa3on:   –  SAME  if  Execu'on  is  OK  and  Informa'on  added  and   Informa'on  lost  are  void   –  DIFFERENT  if  Execu'on  is  OK  but  Informa'on  added  or   Oi = Oi’ ? Informa'on  lost  are  not  void   –  NO  if  Execu'on  is  FAIL  or  P.E.   5
  • 6. Interoperability  evalua3on   •  Ontology  language  interoperability   –  The  ability  to  interchange  ontologies  and  use  them   •  Goal:  to  evaluate  the  interoperability  of  seman3c  technologies  in   terms  of  the  ability  that  such  technologies  have  to  interchange   ontologies  and  use  them   Tool X Tool Y O1 O1’ O1’’ O1’’’ O1’’’’ Step 1: Import + Export Step 2: Import + Export O1 = O1’’ + α - α’ O1’’=O1’’’’ + β - β’ Interchange O1 = O1’’’’ + α - α’ + β - β’ 6
  • 7. Metrics   •  Execu9on  informs  about  the  correct  execu3on:     –  OK.  No  execu3on  problem   –  FAIL.  Some  execu3on  problem   –  Pla+orm  Error  (P.E.)  PlaQorm  excep3on   –  Not  Executed.  (N.E.)  Second  step  not  executed   •  Informa9on  added  or  lost  in  terms  of  triples,  axioms,  etc.   Oi = Oi’ + α - α’ •  Interchange  informs  whether  the  ontology  has  been   interchanged  correctly  with  no  addi3on  or  loss  of   informa3on:   –  SAME  if  Execu'on  is  OK  and  Informa'on  added  and  Informa'on   lost  are  void   –  DIFFERENT  if  Execu'on  is  OK  but  Informa'on  added  or   Informa'on  lost  are  not  void   Oi = Oi’ ? –  NO  if  Execu'on  is  FAIL,  N.E.,  or  P.E.   7
  • 8. Test  suites  used   Name   Defini9on   Nº  Tests   RDF(S)  Import  Test  Suite   Manual   82   OWL  Lite  Import  Test  Suite   Manual   82   OWL  DL  Import  Test  Suite   Keyword-­‐driven  generator   561   OWL  Full  Import  Test  Suite   Manual   90   OWL  Content  Pa>ern   Expressive  generator   81   OWL  Content  Pa>ern  Expressive   Expressive  generator   81   OWL  Content  Pa>ern  Full  Expressive   Expressive  generator   81   8
  • 9. Tools  evaluated   1st  Evalua3on   Campaign   2nd  Evalua3on   Campaign   9
  • 10. Evalua3on  Execu3on   •  Evalua3ons  automa3cally  performed  with  the  SEALS   PlaQorm   –  h>p://www.seals-­‐project.eu/   SEALS •  Evalua3on  materials  available   Test Suite Test Suite Test Suite Raw Result –  Test  Data   –  Results   Test Suite Interpretation –  Metadata   Conformance Interoperability Scalability 10
  • 12. RDF(S)  conformance  results   •  Jena  and  Sesame  behave   iden3cally  (no  problems)   •  The  behaviour  of  the  OWL  API-­‐ based  tools  (NeOn  Toolkit,  OWL   API  and  Protégé  4)  has   significantly  changed   –  Transform  ontologies  to  OWL  2   –  Some  problems   •  Less  in  newer  versions   •  Protégé  OWL  improves   12
  • 13. OWL  Lite  conformance  results   •  Jena  and  Sesame  behave   iden3cally  (no  problems)   •  The  OWL  API-­‐based  tools  (NeOn   Toolkit,  OWL  API  and  Protégé  4)   improve   –  Transform  ontologies  to  OWL  2   •  Protégé  OWL  improves   13
  • 14. OWL  DL  conformance  results   •  Jena  and  Sesame  behave   iden3cally  (no  problems)   •  OWL  API  and  Protégé  4  improve   •  NeOn  Toolkit    worsenes   •  Protégé  OWL  behaves   iden3cally   •  Robustness  increases   14
  • 15. Content  pa>ern  conformance  results   •  New  issues  iden3fied  in   the  OWL  API-­‐based  tools   (NeOn  Toolkit,  OWL  API   and  Protégé  4)   •  New  issue  iden3fied  in   Protégé  4   •  No  new  issues   15
  • 16. Interoperability  results   1st  Evalua3on   2nd  Evalua3on   Campaign   Campaign   •  Same  analysis  as  in   conformance   •  OWL  DL:  New  issue  found   in  interchanges  from   Protégé  4  to  Protégé  OWL   •  Conclusions:   –  RDF-­‐based  tool  have  no   interoperability  problems   –  OWL-­‐based  tools  have  no   interoperability  problems   with  OWL  Lite  but  have   some  with  OWL  DL.   –  Tools  based  on  the  OWL   API  cannot  interoperate   using  RDF(S)  (they   convert  ontologies  into   OWL  2)   04.08.2010 16
  • 17. 2nd  SEALS  Yards3cks  for  Ontology   Management   •  Conformance  and  interoperability  results   •  Scalability  results   •  Conclusions   17
  • 18. Scalability  evalua3on   Tool X O1 O1’ O1’’ Step 1: Import + Export O1 = O1’’ + α - α’ 18
  • 19. Execu3on  se^ngs   Test  suites:   •  Real  World.  Complex  ontologies  from  biological  and   medical  domains   •  Real  World  NCI.  Thesaurus  subsets  (1.5-­‐2  3mes  bigger)   •  LUBM.  Synthe3c  ontologies   Execu9on  Environment:   •  Win7-­‐64bit,  Intel  Core  2  Duo  CPU,  2.40GHz,  4.00  GB  RAM   (Real  World  Ontologies  Test  Collec'ons)   •  WinServer-­‐64bit,  AMD  Dual  Core,  2.60  GHz  (4  Processors),   8.00  GB  RAM  (LUBM  Ontologies  Test  Collec'on)   Constraint:   •  30  min  threshold  per  test  case   19
  • 20. Real  World  Scalability  Test  Suite   Test   Size   Triples   Protégé   Protégé4   Protégé OWL  API   OWL  API   Neon     Neon   Jena  v. Sesame   MB   OWL     v.41   4  v.42   v.310   v.324   v.232   v.252   270   v.265   RO1   0.2   3K   5  (sec)   2   2   2   2   3   2   3   2   RO2   0.6   4K   2   2   2   2   2   2   2   3   1   RO3   1   11K   11   3   4   12   5   7   7   8   2   RO4   3   31K   4   5   5   5   4   5   5   5   3   RO5   4   82K   8   8   10   7   7   12   7   8   4   RO6   6   92K   8   9   12   9   9   11   14   9   4   RO7   10   135K   10   11   11   11   10   13   11   10   4   RO8   10   167K   14   9   8   8   9   11   11   12   4   RO9   20   270K   22   20   24   18   16   19   19   18   7   R10   24   315K   68   21   24   19   18   26   20   19   8   R11   26   346K   162   25   19   22   21   27   22   22   9   R12   40   407K   -­‐   24   22   26   23   28   30   26   9   R13   44   646K   -­‐   36   33   35   34   44   40   37   13   R14   46   671K   -­‐   30   27   28   28   35   37   41   13   R15   84   864K   -­‐   34   26   32   26   36   33   69   21   R16   117   1623K   -­‐   -­‐   -­‐   -­‐   -­‐   -­‐   -­‐   102   33   20
  • 21. Real  World  NCI  Scalability  Test  Suite   Test   Size   Triples   Protégé   Protégé4   Protégé4   OWL  API   OWL  API   NTK  v. NTK  v. Jena  v. Sesame   MB   OWL     v.41   v.42   v.310   v.324   232   252   270   v.265   NO1   0.5   3.6K   10  (sec)   5   6   4   3   4   4   4   2   NO2   0.6   4.3K   4   3   3   3   3   3   3   3   2   NO3   1   11K   5   4   4   4   4   4   4   3   2   NO4   4   31K   9   5   8   5   5   6   5   5   3   NO5   11   82K   13   7   10   8   8   9   8   9   5   NO6   14   109K   17   8   10   9   10   10   10   10   5   NO7   18   135K   19   9   12   10   10   12   12   11   5   NO8   23   167K   23   10   14   11   11   13   13   14   7   NO9   38   270K   37   15   16   15   13   18   17   20   9   N10   44   314K   74   16   18   16   17   21   19   23   10   N11   48   347K   136   17   19   16   18   21   20   24   10   N12   56   407K   -­‐   20   22   19   19   26   24   30   13   N13   89   646K   -­‐   29   28   28   29   39   35   47   18   N14   92   671K   -­‐   28   32   28   29   39   35   49   21   N15   118   864K   -­‐   34   36   34   36   48   45   63   26   N16   211   1540K   -­‐   61   61   62   71   83   100   282   41   21
  • 22. LUBM  Test  Suite   Test   Size   Protégé   Protégé4   Protégé4   OWL  API   OWL  API   NTK  v. NTK  v. Jena  v. Sesame   MB   OWL     v.41   v.42   v.310   v.324   232   252   270   v.265   LO1   8   29   20   25   15   29   11   16   17   5   LO2   19   1M52   19   30   18   30   16   22   30   8   LO3   28   2M59   17   28   27   40   20   26   42   10   LO4   39   4M05   24   33   33   41   28   39   47   12   LO5   51   17M27   36   40   -­‐   54   -­‐   54   59   14   LO6   60   22M43   41   45   -­‐   60   -­‐   1M04   1M03   16   LO7   72   26M32   1M1   53   -­‐   1M18   -­‐   1M28   1M17   19   LO8   82   -­‐   1M16   59   -­‐   1M3   -­‐   -­‐   1M27   20   LO9   92   -­‐   1M37   1M8   -­‐   2M12   -­‐   -­‐   1M39   23   L10   105   -­‐   2M2   1M31   -­‐   2M53   -­‐   -­‐   1M48   27   L11   116   -­‐   3M18   -­‐   -­‐   -­‐   -­‐   -­‐   2M02   33   L12   129   -­‐   4M59   -­‐   -­‐   -­‐   -­‐   -­‐   2M15   35   L13   143   -­‐   7M21   -­‐   -­‐   -­‐   -­‐   -­‐   2M33   40   L14   153   -­‐   9M07   -­‐   -­‐   -­‐   -­‐   -­‐   2M4   42   L15   162   -­‐   11M23   -­‐   -­‐   -­‐   -­‐   -­‐   2M52   43   L16   174   -­‐   14M09   -­‐   -­‐   -­‐   -­‐   -­‐   3M02   44   L17   184   -­‐   17M   -­‐   -­‐   -­‐   -­‐   -­‐   3M2   46   L18   197   -­‐   23M05   -­‐   -­‐   -­‐   -­‐   -­‐   3M34   51   L19   251   -­‐   27M21   -­‐   -­‐   -­‐   -­‐   -­‐   3M49   1M12   22
  • 23. LUBM  Test  Suite  (II)   Test   Size  ,   Protégé4   Jena  v. Sesame   Test   Size  ,   Sesame  v. Test   Size  ,   Sesame  v. MB   v.41   270   v.265   MB   265   MB   265   L20   263   -­‐   4M05   1M11   L36   412   1M44   Le51   1,105   -­‐   L21   284   -­‐   4M17   1M03   L37   421   1M45   Le52   1,205   -­‐   L22   242   -­‐   4M18   1M07   L38   430   1M49   Le53   1,302   -­‐   L23   251   -­‐   4M36   1M03   L39   441   1M49   Le54   1,404   -­‐   L24   263   -­‐   4M56   1M07   L40   453   1M55   Le55   1,514   -­‐   L25   284   -­‐   5M31   1M17   L41   467   2M05   L26   297   -­‐   5M35   1M18   L42   480   2M04   L27   307   -­‐   5M46   1M22   L43   489   2M14   L28   317   -­‐   6M09   1M27   L44   498   2M13   L29   330   -­‐   6M13   1M3   L45   510   2M23   L30   340   -­‐   6M23   1M3   LUBM  EXTENDED  TEST  SUITE   L31   354   -­‐   8M03   1M35   Le46   598   2M49   L32   363   -­‐   8M07   1M31   16M58   Le47   705   L33   375   -­‐   9M19   1M33   Le48   802   -­‐   L34   386   -­‐   -­‐   1M3   Le49   906   -­‐   L35   399   -­‐   -­‐   1M39   Le50   1,001   -­‐   23
  • 24. 2nd  SEALS  Yards3cks  for  Ontology   Management   •  Conformance  and  interoperability  results   •  Scalability  results   •  Conclusions   24
  • 25. Conclusions  –  Test  data   •  Test  suites  are  not  exhaus3ve   –  The  new  test  suites  helped  detec3ng  new  issues   •  A  more  expressive  test  suite  does  not  imply   detec3ng  more  issues   •  We  used  exis3ng  ontologies  as  input  for  the  test   data  generator   –  Requires  a  previous  analysis  of  the  ontologies  to   detect  defects     –  We  found  ontologies  with  issues  that  we  had  to   correct   25
  • 26. Conclusions  -­‐  Results   •  Tools  have  improved  their  conformance,  interoperability,   and  robustness   •  High  influence  of  development  decisions     –  the  OWL  API  radically  changed  the  way  of  dealing  with  RDF   ontologies     •  need  tools  for  easy  evalua3on   •  need  stronger  regression  tes3ng   •  The  automated  genera3or  defined  test  cases  that  a  person   would  have  never  though  about  but  which  iden3fied  new   tool  issues   •  using  bigger  ontologies  for  conformance  and   interoperability  tes3ng  makes  much  more  difficult  to  find   problems  in  the  tools   26
  • 27. Evaluating Storage and Reasoning Systems
  • 28. Index •  Evaluation scenarios •  Evaluation descriptions •  Test data •  Tools •  Results •  Conclusion
  • 29. Advanced  reasoning  system   •  Descrip3on  logic  based  system  (DLBS)   •  Standard  reasoning  services   –  Classifica3on   –  Class  sa3sfiability   –  Ontology  sa3sfiability   –  Logical  entailment  
  • 30. Exis3ng  evalua3ons   •  Datasets   –   Synthe3c  genera3on   –   Hand  craked  ontologies   –   Real-­‐world  ontologies   •  Evalua3ons   –  KRSS  benchmark   –  TANCS  benchmark   –  Gardiner  dataset   04.08.2010 30
  • 31. Evaluation criteria •  Interoperability –  the capability of the software product to interact with one or more specified systems –  a system must •  conform to the standard input formats •  be able to perform standard inference services •  Performance –  the capability of the software to provide appropriate performance, relative to the amount of resources used, under stated conditions
  • 32. Evaluation metrics •  Interoperability –  Number of tests passed without parsing errors –  Number of inference tests passed •  Performance –  Loading time –  Inference time
  • 33. Class satisfiability evaluation •  Standard inference service that is widely used in ontology engineering •  The goal: to assess both DLBS s interoperability and performance •  Input –  OWL ontology –  One or several class IRIs •  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • 35. Ontology satisfiability evaluation •  Standard inference service typically carried out before performing any other reasoning task •  The goal: to assess both DLBS s interoperability and performance •  Input –  OWL ontology •  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • 37. Classification evaluation •  Inference service that is typically carried out after testing ontology satisfiability and prior to performing any other reasoning task •  The goal: to assess both DLBS s interoperability and performance •  Input –  OWL ontology •  Output –  OWL ontology –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • 39. Logical entailment evaluation •  Standard inference service that is the basis for query answering •  The goal: to assess both DLBS s interoperability and performance •  Input –  2 OWL ontologies •  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe
  • 41. Storage and reasoning systems evaluation component •  SRS component is intended to evaluate the description logic based systems (DLBS) –  Implementing OWL-API 3 de-facto standard for DLBS –  Implementing SRS SEALS DLBS interface •  SRS supports test data in all syntactic formats supported by OWL-API 3 •  SRS saves the evaluation results and interpretations in MathML 3 format
  • 42. DLBS interface •  Java methods to be implemented by system developers –  OWLOntology loadOntology(IRI iri) –  boolean isSatisfiable(OWLOntology onto, OWLClass class) –  boolean isSatisfiable(OWLOntology onto) –  OWLOntology classifyOntology(OWLOntology onto) –  URI saveOntology(OWLOntology onto, IRI iri) –  boolean entails(OWLOntology onto1, OWLOntology onto2)
  • 43. Testing Data •  The ontologies from the Gardiner evaluation suite. –  Over 300 ontologies of varying expressivity and size. •  Various versions of the GALEN ontology •  Various ontologies that have been created in EU funded projects, such as SEMINTEC, VICODI and AEO •  155 entailment tests from OWL 2 test cases repository
  • 44. Evaluation setup •  3  DLBSs   –  FaCT++  C++  implementa3on  of  FaCT  OWL  DL  reasoner   –  HermiT  Java  based  OWL  DL  reasoner  u3lizing  novel  hypertableau   algorithms   –  Jcel  Java  based  OWL  2  EL  reasoner   –  FaCT++C    evaluated  without  OWL  prepareReasoner()  call   –  HermiTC  evaluated  without  OWL  prepareReasoner()  call   •  2  AMD  Athlon(tm)  64  X2  Dual  Core  Processor  4600+  machines   with  2GB  of  main  memory     –  DLBSs  were  allowed  to  allocate  up  to  1  GB  
  • 45. Evaluation results: Classification FaCT++ HermiT jcel ALT, ms 68 506 856 ART, ms 15320 167808 2144 TRUE 160 145 16 FALSE 0 0 0 ERROR 47 33 4 UNKNOWN 3 32 0
  • 46. Evaluation results: Class satisfiability FaCT++ HermiT jcel ALT, ms 1047 255 438 ART, ms 21376 517043 1113 TRUE 157 145 15 FALSE 1 0 0 ERROR 36 35 5 UNKNOWN 16 30 0
  • 47. Evaluation results: Ontology satisfiability FaCT++ HermiT jcel ALT, ms 1315 410 708 ART, ms 25175 249802 1878 TRUE 134 146 16 FALSE 0 0 0 ERROR 45 33 4 UNKNOWN 0 31 0
  • 48. Evaluation results: Entailment FaCT++ HermiT ALT, ms 14 33 ART, ms 1 20673 TRUE 46 119 FALSE 67 14 ERROR 34 9 UNKNOWN 0 3
  • 49. Evaluation results: Non entailment FaCT++ HermiT ALT, ms 47 92 ART, ms 5 127936 TRUE 7 7 FALSE 0 1 ERROR 3 1 UNKNOWN 0 1
  • 50. Comparative evaluation: Classification FaCT++C HermiTC ALT, ms 309 207 ART, ms 3994 2272 TRUE 112 112
  • 51. Comparative evaluation: Class satisfiability FaCT++C HermiTC ALT, ms 333 225 ART, ms 216 391 TRUE 113 113
  • 52. Comparative evaluation: Ontology satisfiability FaCT++C HermiTC ALT, ms 333 225 ART, ms 216 391 TRUE 113 113
  • 53. Comparative evaluation: Entailment FaCT++C HermiTC ALT, ms 7 7 ART, ms 2 24 TRUE 1 1
  • 54. Comparative evaluation: Non- Entailment FaCT++C HermiTC ALT, ms 22 18 ART, ms 2 43 TRUE 4 4
  • 55. Comparative evaluation: Classification FaCT++C HermiTC FaCT++ HermiT jcel ALT, ms 398 355 1471 771 856 ART, ms 11548 1241 36650 2817 2144 TRUE 16 16 16 16 16
  • 56. Comparative evaluation: Class satisfiability FaCT++C HermiTC FaCT++ HermiT jcel ALT, ms 382 342 532 1062 438 ART, ms 159 223 7603 3437 1113 TRUE 15 15 15 15 15
  • 57. Comparative evaluation: Ontology satisfiability FaCT++C HermiTC FaCT++ HermiT jcel ALT, ms 360 365 1389 1262 708 ART, ms 11548 202 36650 4790 1878 TRUE 16 16 16 16 16
  • 58. Challenging ontologies: Classification Ontology Mosquito GALEN mged go worm- -anatomy anatomy Classes 1864 2749 229 19528 6731 Relations 2 413 102 1 5 FaCT++C,LT ms 3760 663 189 4362 783 FaCT++C,RT ms 9568 9970 355 28041 45739 HermiTC,LT ms 510 609 273 4328 973 HermiTC,RT ms 944 12623 27974 12698 2491
  • 59. Challenging ontologies: Classification Ontology plans information human Fly- emap anato my Classes 118 121 8342 6326 13731 Relations 263 197 1 3 1 FaCT++C, LT ms 67 106 3186 662 1965 FaCT++C, RT ms 661 126 132607 5016 156714 HermiTC, LT ms 67 95 1192 746 1311 HermiTC, RT ms 115576 7064 3842 6564 7097
  • 60. Challenging ontologies: Class satisfiability Ontology not GALEN mged go plans GALEN Class Digestion Trimetho Thing GO_0042 schedule prim 447 Classes 3087 2749 229 19528 118 Relations 413 413 102 1 263 FaCT++C, LT 1130 652 174 4351 78 FaCT++C, RT 3215 1065 160 1465 79 HermiTC, LT 1087 680 358 3961 67 HermiTC, RT 11210 9108 4333 2776 3459
  • 61. Challenging ontologies: Ontology satisfiability Ontology not GALEN mged go plans GALEN Classes 3087 2749 229 19528 118 Relations 413 413 102 1 263 FaCT++C, LT 992 618 189 4383 67 FaCT++C, RT 3047 1057 170 1413 74 HermiTC, LT 1166 590 346 4371 69 HermiTC, RT 11562 9408 3197 2687 1827
  • 62. Conclusion •  Errors: –  datatypes not supported in the systems –  syntax related : a system was unable to register a role or a concept –  expressivity errors •  Execution time is dominated by small number of hard problems
  • 63. SEALS  Ontology  Matching   Evalua3on  campaign   …  also  known  as  OAEI  2011.5   6/26/12 63
  • 64. Ontology  Matching   Person   People   Author   Author   <  Author,  Author,  =,  0.97  >   writes   Commi>eeMember   <  Paper,  Paper,  =,  0.94  >   Reviewer   <  reviews,  reviews,  =,  0.91  >   <  writes,  writes,  =,  0.7  >   PCMember   <  Person,  People,  =,  0.8  >   reviews   <  Document,  Doc,  =,  0.7  >   <  Reviewer,  Review,  =,  0.6  >   reviews   …   Doc   Document   Paper   Paper   writes   Review   6/26/12 64
  • 65. OAEI  &  SEALS   •  OAEI  :  Ontology  Alignment  Evalua3on  Ini3a3ve   –  Organized  as  annual  campaign  from  2005  to  2012   –  Included  in  Ontology  Matching  workshop  at  ISWC   –  Different  tracks  (evalua3on  scenarios)  organized  by   different  researchers   •  Star3ng  in  2010:  Support  from  SEALS   –  OAEI  2010,  OAEI  2011,  and  OAEI  2011.5   6/26/12 65
  • 67. Jose  Aguirre   OAEI  tracks   Jerome    Euzenat   INRIA  Grenoble   •  Benchmark   –  Matching  different  versions  of  the  same  ontology   –  Scalability:     Size    run3mes   •  Conference   •  Mul3Farm   •  Anatomy   •  Large  BioMed   6/26/12 67
  • 68. Ondřej  Šváb-­‐Zamazal   OAEI  tracks   Vojtěch  Svátek   Prague  University   of  Economics   •  Benchmark   •  Conference   –  Same  domain,  different  ontology   –  Manually  generated  reference  alignment   •  Mul3Farm   •  Anatomy   •  Large  BioMed   6/26/12 68
  • 69. Chris3an  Meilicke,   OAEI  tracks   Cassia  Trojahn   University  Mannheim   INRIA  Grenoble   •  Benchmark   •  Conference   •  Mul3Farm:  Mul3lingual  Ontology  Matching   –  Based  on  Conference   –  Testcases  for  Spanish,  German,   French,  Russian,  Portuguese,   Czech,  Dutch,  Chinese   •  Anatomy   •  Large  BioMed   6/26/12 69
  • 70. Chris3an  Meilicke,   OAEI  tracks   Heiner  Stuckenschmidt   University  Mannheim   •  Benchmark   •  Conference   •  Mul3Farm   •  Anatomy   –  Matching  mouse   on  human  anatomy   –  Run3mes   •  Large  BioMed   6/26/12 70
  • 71. Ernesto  Jimenez  Ruiz   OAEI  tracks   Bernardo  Cuenca  Grau   Ian  Horrocks   University  of  Oxford   •  Benchmark   •  Conference   •  Mul3Farm   •  Anatomy   •  Large  BioMed   –  Very  large  dataset  (FMA-­‐NCI)   –  Includes  coherence  analysis   6/26/12 71
  • 72. Detailed  results   h>p://oaei.ontologymatching.org/2011.5/ results/index.html   6/26/12 72
  • 73. Ques3ons?   Write  a  mail  to  Chris3an  Meilicke   chris3an@informa3k.uni-­‐mannheim.de   6/26/12 73
  • 74. IWEST  2012  workshop  located  at  ESWC  2012   Seman3c  Search  Systems   Evalua3on  Campaign   6/26/12 74
  • 75. Two  phase  approach   •  Seman3c  search  tools  evalua3on  demands  a   user-­‐in-­‐the-­‐loop  phase   –  usability  criterion   •  Two  phases:   –  User-­‐in-­‐the-­‐loop   –  Automated   6/26/12 75
  • 76. Evalua3on  criteria  by  phase   Each  phase  will  address  a  different  subset  of   criteria.   •  Automated  phase:  query  expressiveness,   scalability,  performance   •  User-­‐in-­‐the-­‐loop  phase:  usability,  query   expressiveness   6/26/12 76
  • 77. Par3cipants   Tool   Descrip9on   UITL   Auto   K-­‐Search   Form-­‐based   x   x   Ginseng   Natural  language  with  constrained  vocabulary  and   x   grammar   NLP-­‐Reduce   Natural  language  for  full  English  ques3ons,  sentence   x   fragments,  and  keywords.   Jena  Arq   SPARQL  query  engine.  Automated  phase  baseline   x   RDF.Net  Query   SPARQL-­‐based   x   Seman3c  Crystal   Graph-­‐based   x   Affec3ve  Graphs   Graph-­‐based   x   6/26/12 77
  • 78. Usability  Evalua3on  Setup   •  Data:  Mooney  Natural  Language  Learning  Data   •  Subjects:    20  (10  expert  users;  10  casual  users)   –  Each  subject  evaluated  the  5  par3cipa3ng  tools   •  Task:  Formulate  5  ques3ons  in  each  tool’s  interface     •  Data  Collected:    success  rate,  input  3me,  number  of   a>empts,  response  3me,  user  sa3sfac3on   ques3onnaires,  demographics   04.08.2010 78
  • 79. 1  concept,   1  rela3on   Ques3ons   1)  Give  me  all  the  capitals  of  the  USA?   2  concepts,  2  rela3ons   2)  What  are  the  ci9es  in  states  through  which  the   Mississippi  runs?   compara3ve   3)  Which  states  have  a  city  named  Columbia  with  a  city   popula3on  over  50,000?   superla3ve   4)  Which  lakes  are  in  the  state  with  the  highest  point?   5)  Tell  me  which  rivers  do  not  traverse  the   nega3on              state  with  the  capital  Nashville?   04.08.2010 79
  • 80. Automated  Evalua3on  Setup   •  Data:  EvoOnt  dataset   –  Five  sizes:  1K  10K  100K  1M  10M  triples   •  Task:  Answer  10  ques3ons  per  dataset  size   •  Data  Collected:    ontology  load  3me,  query  3me,  number   of  results,  result  list   •  Analyses:  precision,  recall,  f-­‐measure,  mean  query  3me,   mean  3me  per  result,  etc   04.08.2010 80
  • 81. Configura3on   •  All  tools  executed  on  SEALS  PlaQorm   •  Each  tool  executed  within  a  Virtual  Machine   Linux   Windows   OS   Ubuntu  10.10  (64-­‐bit)   Windows  7  (64-­‐bit)   Num  CPUs   2   4   Memory  (GB)   4   4   Tools   Arq  v2.8.2  and  Arq  v2.9.0   RDF  Query  v0.5.1-­‐beta   6/26/12 81
  • 83. Graph-­‐based  tools  most  liked     (highest  ranks  and  average  SUS  scores)   Tool 100.0 Semantic-Crystal •  Perceived  by  expert  users   System Usability Scale "SUS" Questionnaire score Affective-Graphs K-Search Ginseng Nlp-Reduce 80.0 as  intui9ve  allowing  them   to  easily  formulate  more   60.0 complex  queries.   40.0 •  Casual  users  enjoyed  the   fun  and  visually-­‐appealing   20.0 interfaces  which  created  a   17 pleasant  search   .0 experience.     Casual Expert UserType 04.08.2010 83
  • 84. Form-­‐based  approach  most  liked  by  casual   users   •  Perceived  by  casual  users  as   Tool 5 Extended Questionnaire Question "The system's query Semantic-Crystal language was easy to understand and use" score Affective-Graphs K-Search Ginseng Nlp-Reduce midpoint  between  NL  and   4 graph-­‐based.   •  Allow  more  complex  queries   3 than  the  NL  does.   •  Less  complicated  and  less   2 61 query  input  3me  than  the   graph-­‐based.     1 17 •  Together  with  graph-­‐based:   Casual Expert most  liked  by  expert  users   UserType 04.08.2010 84
  • 85. Casual  Users  liked  Controlled-­‐NL  approach   •  Casuals:     Tool •  liked  guidance  through   100.0 Semantic-Crystal System Usability Scale "SUS" Questionnaire score Affective-Graphs sugges3ons.   K-Search Ginseng Nlp-Reduce 80.0 •  Prefer  to  be  ‘controlled’  by  the   language  model,  allowing  only   60.0 valid  queries.   40.0 •  Experts:     •  restric3ve  and  frustra3ng.   20.0 •  Prefer  to  have  more  flexibility   and  expressiveness  rather  than   .0 17 support  and  restric3on.   Casual Expert UserType 04.08.2010 85
  • 86. Free-­‐NL  challenge:  habitability  problem   1.0 Tool Semantic-Crystal Affective-Graphs •  Free-­‐NL  liked  for  its  simplicity,   K-Search .8 Ginseng Nlp-Reduce familiarity,  naturalness  and  low   query  input  3me  required.   Answer found rate 42 96 .6 •  Facing  habitability  problem:   mismatch  between  users  query   98 .4 terms  and  tools  ones.   .2 99 •  Lead  to  lowest  success  rate,   highest  number  of  trials  to  get   .0 97 Casual Expert UserType a  sa3sfying  answer,  and  in  turn   very  low  user  sa3sfac3on.   04.08.2010 86
  • 88. Overview   •  K-­‐Search  couldn’t  load  the  ontologies   –  external  ontology  import  not  supported   –  cyclic  rela3ons  with  concepts  in  remote  ontologies  not   supported   •  Non-­‐NL  tools  transform  queries  a  priori   •  Na3ve  SPARQL  tools  exhibit  differences  in  query   approach  (see  load  and  query  3mes)     6/26/12 88
  • 89. Ontology  load  3me   Arq v2.8.2 ontology load time Arq v2.9.0 ontology load time 100000 RDF Query v0.5.1-beta ontology load time •  RDF  Query  loads   ontology  on-­‐the-­‐fly.   Load  3mes  therefore   independent  of   Time (ms) 10000 dataset  size.   •  Arq  loads  ontology   1000 into  memory.     1 10 100 1000 Dataset size (thousands of triples) 6/26/12 89
  • 90. Query  3me   Arq v2.8.2 mean query time •  RDF  Query  loads   Arq v2.9.0 mean query time ontology  on-­‐the-­‐fly.   100000 RDF Query v0.5.1-beta mean query time Query  3mes  therefore   incorporate  load  3me.     •  Expensive  for  more   than  one  query  in  a   Time (ms) 10000 session.   •  Arq  loads  ontology   into  memory.     1000 •  Query  3mes  largely   independent  of   dataset  size   1 10 100 1000 Dataset size (thousands of triples) 6/26/12 90
  • 91. SEALS  Seman3c  Web  Service  Tools   Evalua3on  Campaign  2011   Seman9c  Web  Service  Discovery   Evalua9on  Results   04.08.2010 6/26/1204.08.2010 91
  • 92. Evalua3on  of  SWS  Discovery   •  Finding  Web  Services  based  on  their  seman3c   descrip3ons     •  For  a  given  goal,  and  a  given  set  of  service   descrip3ons,  the  tool  returns  the  match  degree   between  the  goal  and  each  service     •  Measurement  services  are  provided  via  the  SEALS   plaQorm  to  measure  the  rate  of  matching   correctness   92 92
  • 93. Campaign Overview http://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/ semantic-web-service-tools-evaluation-campaign-2011 •   Goal   –  Which  ontology/annota3on  is  the  best:  WSMO-­‐Lite,  OWL-­‐S  or   SAWSDL?   •  Assump3ons:   –  Same  corresponding  Test  Collec3ons  (TCs)   –  Same  corresponding  Matchmaking  algorithms  (Tools)   –  The  corresponding  tools  will  belong  to  the  same   provider   –  The  level  of  performance  of  a  tool  for  a  specific  TC  is   of  secondary  importance     93 93
  • 94. Campaign Overview http://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/ semantic-web-service-tools-evaluation-campaign-2011 Given  that  a  tool  T  can  apply  the  same  corresponding   matchmaking  algorithm  M  to  corresponding  test   collec3ons,  say,  TC1,  TC2  and  TC3,  we  would  like  to   compare  the  performance  (e.g.  Precision,  Recall)   among  MTC1,  MTC2  and  MTC3   94 94
  • 95. Background:  S3  Challenge   h>p://www-­‐ags.d•i.uni-­‐sb.de/~klusch/s3/index.html     T1   T2   ……   Tn   TI   TII   ……   TXV   ……   M1   M2   ……   Mn   MI   MII   ……   MXV   TCa  (e.g  owl-­‐s)   TCb  (e.g.  sawsdl)   ……   95 95
  • 96. Background:  S3  Challenge   h>p://www-­‐ags.d•i.uni-­‐sb.de/~klusch/s3/index.html     1st  Evalua9on  Campaign  (2010)   T1   T2   ……   Tn   TI   TII   ……   TXV   ……   M1   M2   ……   Mn   MI   MII   ……   MXV   TCa  (e.g  owl-­‐s)   TCb  (e.g.  sawsdl)   ……   96 96
  • 97. Background:  SWS  Challenge   h>p://sws-­‐challenge.org/wiki/index.php/Scenario:_Shipment_Discovery     T1   TI   Ta   M1   MI   Ma   ……   Formalism1(e.g.  ocml)   FormalismI(e.g.  owl-­‐s)   Formalisma   Goal  descrip3ons  (e.g.  plain  text)     97 97
  • 98. SEALS  2nd     SWS  Discovery  Evalua3on   T1   T2   T3   ……   M   TC1  (e.g  owl-­‐s)   TC2  (e.g.  sawsdl)   TC3  (e.g.  wsmo-­‐lite)   ……   98 98
  • 99. SEALS  Test  Collec3ons   •  WSMO-­‐LITE-­‐TC  (1080  services,  42  goals)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4b   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4g     •  SAWSDL-­‐TC  (1080  services,  42  goals)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/SAWSDL-­‐TC/3.0-­‐1b   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/SAWSDL-­‐TC/3.0-­‐1g   •  OWLS-­‐TC  (1083  services,  42  goals)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/OWLS-­‐TC/4.0-­‐11b   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/OWLS-­‐TC/4.0-­‐11g   99
  • 100. Metrics  –  Galago  (1)   100 100
  • 101. Metrics  –  Galago  (2)   101 101
  • 102. SWS  Discovery  Evalua3on  Workflow   102
  • 103. SWS  Tool  Deployment   Wrapper  for  SEALS  plaQorm   103
  • 104. Tools   WSMO-­‐LITE-­‐TC   SAWSDL-­‐TC   OWLS-­‐TC   WSMO-­‐LITE-­‐OU1   SAWSDL-­‐OU1   SAWSDL-­‐URJC2   OWLS-­‐URJC2   SAWSDL-­‐M03   OWLS-­‐M03   1.  Ning  Li,  The  Open  University   2.  Ziji  Cong  et  al.,  University  of  Rey  Juan  Carlos     3.  Ma>hias  Klusch  et  al.  German  Research  Center  for  Ar3ficial  Intelligence   104 104
  • 105. Tools   WSMO-­‐LITE-­‐TC   SAWSDL-­‐TC   OWLS-­‐TC   WSMO-­‐LITE-­‐OU1   SAWSDL-­‐OU1   SAWSDL-­‐URJC2   OWLS-­‐URJC2   SAWSDL-­‐M03   OWLS-­‐M03   1.  Ning  Li,  The  Open  University   2.  Ziji  Cong  et  al.,  University  of  Rey  Juan  Carlos     3.  Ma>hias  Klusch  et  al.  German  Research  Center  for  Ar3ficial  Intelligence   105 105
  • 106. Evalua3on  Execu3on   •  Evalua3on  workflow  was  executed  on  the  SEALS   PlaQorm   •  All  tools  were  executed  within  a  Virtual  Machine   Windows   OS   Windows  7  (64-­‐bit)   Num  CPUs   4   Memory  (GB)   4   Tools   WSMO-­‐LITE-­‐OU,  SAWSDL-­‐OU   106 6/26/12
  • 107. Par3al  Evalua3on  Results   WSMO-­‐LITE  vs.  SAWSDL     WSMO-­‐LITE-­‐OU   SAWSDL-­‐OU   M   WSMO-­‐LITE-­‐TC   SAWSDL-­‐TC   107
  • 108. *  This  table  only  shows  the  results  that  are  different   108
  • 109. Analysis     •  Out  of  42  goals,  only  19  have  different  results  in  terms   of  Precision  and  recall   •  On  17  out  of  19  occasions,  WSMO-­‐Lite  improves   discovery  precision  over  SAWSDL  through  specializing   service  seman3cs     •  WSMO-­‐Lite  performs  worse  than  SAWSDL  in  6  of  19   occasions  on  discovery  recall  while  performing  the   same  for  the  other  13  occasions   109
  • 110. Analysis     •  Goal  #17:  novel_author_service.wsdl  (Educ3on  domain)   h>p://seals.s32.at/tdrs-­‐web/testdata/persistent/WSMO-­‐LITE-­‐TC-­‐SWRL/1.0-­‐4b/suite/ 17/component/GoalDocument/   •  Services  chosen  from  SAWSDL  but  not  WSMO-­‐Lite   (Economy  domain)   •  roman3cnovel_authormaxprice_service.wsdl   •  roman3cnovel_authorprice_service.wsdl   •  roman3cnovel_authorrecommendedprice_service   •  short-­‐story_authorprice_service.wsdl   •  science-­‐fic3on-­‐novel_authorprice_service.wsdl   •  sciencefic3onbook_authorrecommendedprice_service.wsdl   •  ……….   110
  • 111. Lessons  Learned   •  WSMO-­‐LITE-­‐OU  tends  to  perform  be>er  than   SAWSDL-­‐OU  in  terms  of  precision,  but  slightly  worse   in  recall.   •  The  only  feature  of  WSMO-­‐Lite  used  against  SAWSDL   was  the  service  category  (based  on  TC  domains).   –  Services  were  filtered  by  service  category  in  WSMO-­‐LITE-­‐ OU  and  not  in  SAWSDL-­‐OU   •  Further  tests  with  addi3onal  tools  and  measures  are   needed  for  any  conclusive  results  about  WSMO-­‐Lite   vs.  SAWSDL  (many  tools  are  not  available  yet)   111
  • 112. Conclusions   •  This  has  been  the  first  SWS  evalua3on  campaign  in  the   community  focusing  on  the  impact  of  the  service  ontology/ annota3on  on  performance   •  This  comparison  has  been  facilitated  by  the  genera3on  of   WSMO-­‐LITE-­‐TC  as  a  counterpart  of  SAWSDL-­‐TC  and  OWLS-­‐TC   in  the  SEALS  repository   •  The  current  comparison  only  involves  2  ontologies/ annota3ons  (WSMO-­‐Lite  and  SAWSDL)   •  Raw  and  Interpreta3on  results  are  available  in  RDF  via  the   SEALS  repository  (public  access)   112