This document discusses bootstrapping the analysis of large-scale web service networks. It begins with the background and challenges of web service analysis. The proposed analysis roadmap includes generating a reference ontology from WSDL web services, annotating web services, matching and generating a network, and applying social network analysis algorithms. Evaluation of the matching scheme and network properties are also discussed. Future work includes benchmarking other annotation methods and investigating additional network properties.
Wi iat-bootstrapping the analysis of large-scale web service networks-v3
1. Bootstrapping the Analysis of
Large-scale Web Service Networks
Shahab Mokarizadeh, Royal Institute of Technology , Sweden
Peep Kungas, Tartu University, Estonia
Mihhail Matskin, Royal Institute of Technology, Sweden
IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
2. Background
Why web service analysis?
Identifying Missing but Valuable Web service (to be implemented)
Discovering correlation among public , governmental and private
sector web services
Discovery of the most/least exploited concept(s)s, web
service(s), we service provider(s)
…..
Initial challenge?
Vast majority of available services are not semantically annotated
or even come with any sort of documentation !
2 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
3. Analysis Roadmap
• Generate Reference Ontology
• Initially only WSDL web services
• Web service Annotation
• Web Service Matching & Network generation
• Apply Social Network Analysis Algorithms
• Information Diffusion among Web service communities
• Analysis the Impact of Services /Concept on other services
or concepts
3
IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
4. Remind: WSDL Structure
Image from : Web Services and Security,1/17/2006 ,Marco Cova
4 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
5. Ontology Learning from Information Elicitation
WSDL Interfaces1 Term Extraction
Syntactic Refinement
Ontology Discovery
Ontology Learning Input: Pattern-based
- Message Part names of input/output Semantic Analysis
parameters Term Disambiguation
- XML Schema leaf element names of
complex types Class and Relation
Determination
Ontology Organization
[1] ”Ontology Learning for Cost-Effective Large-scale Semantic Adding Relations
Annotation of XML Schemas and Web Service Interfaces". in Porc.
EKAW 2010, LNAI 6317,pp.401-410, 2010 Reference
5 Ontology
IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
6. Annotation Heuristics2
entity_reference ← synset{…}
Concept in Ontology Instances in Ontology (terms)
Example:
Password ← {password, pwd, strPassword, authPassword, pass}
Address ← {addr, address1, postal_address}
[2] P.Küngas, and M. Dumas.“Cost-Effective Semantic Annotation of XML Schemas and Web
Service Interfaces”. Proc. IEEE Conference on Services Computing, 2009, pp.372-379,
6 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
7. Web service Matching Scheme
Matching of basic elements of Web service input and output
parameters (ontological instances)
Web service matching Simplified as Instance Matching
Rule based matching scheme.
- A matching rule reveals existence of kind of semantic relation
between the given two instances.
7 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
8. Instance Matching Rules (1)
Rule-1: Same concept . Example: (addr, addr_line) :
{addr, addr_line} instanceOf Address .
Rule-2: Synonyms Concepts . Example: ( loc, place)
{loc} instanceOf Location ,
{place} instanceOf Place
Place isSynonymOf Location
Rule-3: Subcalss Concepts. Example: (loc, city):
{loc} instanceOf Location,
{city} instanceOf City,
City isSubClassOf Location
8 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
9. Instance Matching Rules (2)
Rule-4: Rule 2 + Rule 3 .
Example : (bidUId, id)
{bidUId} instanceOf BidUniqueCode,
{id} instanceOf ContractIdentifier
BidUniqueCode isSynonymOf ContractIdentifier
Rule-5: Interrelated by an ontological relations (other than isSynonymOf):
Example :
Person hasPropertyXXX FirstName.
9 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
10. Evaluate Matching Scheme -1
1- Classical Approach (Precision, Recall, F-measure)
1. Need a Golden Annotation /Ontology to compare with .
2. Identify :
True Positives (TP) : the common annotations between golden and
generated ontology
False Positives (FP) : annotations made only by generated ontology
False Negatives (FN): annotations made by golden ontology but not
discovered by the generated ontology).
3. Compute:
10
IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
11. Evaluate Matching Scheme - 2
2-Tracking Performance of Matching Scheme in Network Model
• Generate Semantic Network model out of Annotated Web
service corpus.
• Track the performance of exploited Annotation &
Matching scheme in the network properties .Web service
(WSDL) networks (in small size) observed to exhibit:
• Small-worldness model
Scale free model
Correlation degree on nodes ?
11 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
12. Web service Network Models
2-Projecting Matching Scheme Accuracy in Network Model
Operations Parameters Concepts Semantic Network
WS1 - WS3 : Web services
WS1 P1 C1
C1
OP1
OP1 - OP3 : Web service P2
Operations C2
WS2 C2 C3
P3
OP2 C3
P1 - P6 : Basic Elements of Input P4
/ Output Parameters C5
C4 C4
WS3 P5
C1 – C5 : Ontological Concepts OP3 C5
P6
Representing the Parameter
Annotated Web service
12 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
13. Evaluating Network Properties
Small Worldness
Small world networks are networks with the following characteristics:
1. LRandom ≤ LActual L: Shortest Path Length
2. CRandom << CActual C: Clustering Coefficient
Sindex : Small worldness Index
In other words:
> 1, λ > 1, Sindex > 1
Small-worldness scales linearly with
network size.
13 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
14. Evaluating Network Properties
Scale free Networks
Scale free Networks:
Fitted to power-law function y c.x
Many nodes with few links
# of nodes with M links (log)
A few nodes with many links
# of links (M) (log)
14
IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
15. Evaluating Network Properties
Assortativity of Node Degree (Correlation Degree on Nodes)
Positive Correlation : if vertices with high number of
connection tend to be connected with other nodes which also
have many links . Observed in social networks : e.g. network of
actors.
Negative Correlation: if the preference is to attach to those
having small quantity of connection. Observed in technological
and biological networks : e.g. Internet, protein interactions.
15 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
16. Experimental Datasets
SOATrader dataset: 1,000,000 terms form SOATrader collection
of 15000 WSDL s collected from different repositories in the Web
between 2005-2007.
SOATarder: ( http://www.soatrader.com/web-services) .
ASSAM dataset3: 146 WSDLs collected by Hess et. al and
annotated by ASSAM tools .We use all unique terms (appr. 375 )
with any frequency from this collection.
ASSAM : http://www.andreas-hess.info/projects/annotator/
[3] A.Heß, N.Kushmeric, ”Machine Learning for Annotating Semantic Web services
“,AAAI Spring Symposium Semantic Web Services, 2004
16 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
17. Golden Ontology
SOATrader dataset: The golden annotation is handcrafted by
authors based on top 2000 recurrent terms.
ASSAM : Exploit the golden annotation developed by ASSAM
developers and exploited as reference ontology in their
experiment with ASSAM Web service annotation tool.
17 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
18. Evaluation Result - 1
Precision, Recall, F-Measure
0.6
0.5
0.4
0.3
0.2 Rule-1
0.1 Rules 1-4
0 Rules 1-5
Recall
Precision
Recall
Precision
F-Measure
F-Measure
Top2000 ASSAM
18 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
19. Dataset for Network Evaluation
Ideal :Use all dataset of WSDL/XSD elements (approx. 1,000,000
terms) from SOATrader collection (appr. 1 million term) and ASSAM
collection ( appr. 10000 terms)
Problem with Large dataset:
- The larger is dataset, the bigger will be ontology, the harder will be
verifying and enhancing the quality of annotation
- Not Cost Effective (human and computation cost) nor Scalable for
analysis purpose.
Proposal: limit SOATarder experimental dataset to the following four
arbitrary chosen thresholds ( minimum frequency of occurrence of
term) 10, 15, 20 and 25( h10, h15, h20, h25 ) , covering 30000
(unique) most recurrent terms.
19 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
20. Annotation Progress
h25 h20 h15 h10
Learned ontology size 4523 5614 7378 11610
Annotated elements 588057 596625 621336 663618
Total elements 998916 998916 998916 998916
Percentage of total 59% 60% 62% 66%
20 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
21. Analysis of Small Worldness
Dataset Networks L C Sindex
Entire Syntactic Actual 3.283 0.2968 591.08
SOATarder Random-ER 3.9229 0.00062
h 25 Generated Actual 2.4256 0.259 7.5769
Random-ER 2.4756 0.0348
h20 Generated Actual 2.3882 0.2811 8.8148
Random-ER 2.4851 0.0331
h15 Generated Actual 2.3724 0.2805 8.2753
Random-ER 2.3396 0.0334
h10 Generated Actual 2.5322 0.2449 18.2709
Random-ER 2.7662 0.0146
Top2000 Golden Actual 2.1895 0.3761 2.8404
Random-ER 1.8852 0.1146
Generated Actual 2.08475 0.3209 3.3878
Random-ER 2.0667 0.0939
ASSAM Golden Actual 4.5653 0.2147 3.1464
Random-ER 3.546 0.05304
Generated Actual 3.0592 0.4803 21.4835
Rule. 1 Random-ER 3.8451 0.0281
21 Generated Actual 2.5732 0.4057 8.5288
Rules .1-4 Random-ER 3.1267 0.0578
23. Plot of Degree Distribution
Out-degree Distribution
of Random Annotation
Out-degree Distribution
of Actual Annotation
23
IEEE/WIC/ACM International Conference of Web Intelligence
24. Conclusion & Future work
Performance of Web service Annotation scheme can be tracked
in the properties of Web service networks models.
An efficient matching scheme eliminates or at least minimizes
deviation from small-worldness conditions , shows strong negative
correlation degree and follows scale-free model.
A major threat :
Network theories are incomplete : e.g. emergence of power-laws is so
normal to rely on !
Evaluated dataset may not represent the model governing whole picture
Future work:
Benchmarking other WS annotation & matching methods
Investigating other network properties
24 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
25. Thanks !
Grateful to have your Questions ,
Critics and Suggestions?
SHAHABM@KTH.SE
25 22-27 Aug 2011
IEEE/WIC/ACM International Conference of Web Intelligence
26. Backup Slides
IEEE/WIC/ACM International Conference of Web
26 Intelligence 22-27 Aug 2011
27. What Is Going To Be Annotated?
Note: We annotate ONLY basic elements of Web service input and
output parameter (message part names and XML Scheme basic
element names).
WSDL Semantic Annotation Ontology
<wsdl:types>
Address
<complexType name="Address">
<sequence>
hasZipCode hasCityName
……
<element name="Zip" type="string“/>
….. ZipCode
<element name="City" type="string“/>
</sequence>
</complexType>
(…) CityName
</wsdl:types>
IEEE/WIC/ACM International Conference of Web
Intelligence 22-27 Aug 2011
27
28. Example of Generated Ontology
Input Terms: “userId”,” username”,“Zip”,“addr_line”,
“userPostalAddress”,“online_usr”,….
OnlineUser
isSubClassOf
hasAddress
User
PostalAddress
hasName hasIdentifier isSubClassOf
Address hasAddressLine
UserName
UserIdentifier hasZipCode
PostalCode ZipCode AddressLine
isSynonymOf
IEEE/WIC/ACM International Conference of Web
28 Intelligence 22-27 Aug 2011