SlideShare ist ein Scribd-Unternehmen logo
1 von 46
OPAL – Real Understanding of Real Estate Forms Xiaonan GuoOxford University Computing Laboratory, DIADEM group
Diversity in Web Form Design 2 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Diversity in Web Form Design 2 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Diversity in Web Form Design 2 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Diversity in Web Form Design 3 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Diversity in Web Form Design 3 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
OPAL – Ontology-based web Pattern Analysis with Logic Overview Data models and Mappings Browser, Segmentation, Annotation, and Domain Model Segmentation and Phenomenological Mapping Analysis and Evaluation Future Work Outline 4 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Overview 5 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Domain-independent Hierarchical modeling Domain knowledge + Declarative Definition Domain-aware  Form Analysis OPAL
Overview 6 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Browser Model 7 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Browser Model 7 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Browser Model 7 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Browser Model 8 May 27, 2011 OPAL - ontology-based web pattern analysis with logic html_element(e_320_input,320,321,319,input,d1). html_attr(e_320_input_name,e_320_input,name,"location“,d1). html_attr(e_320_input_type,e_320_input,type,"radio“,d1). ... box(e_320_input,106,253,120,267,14,14). css_attr(e_320_input,bottom,"auto"). css_attr(e_320_input,clear,"none").  css_attr(e_320_input,color,"rgb(0,0,0)"). ...
Segmentation Model 9 May 27, 2011 OPAL - ontology-based web pattern analysis with logic grouping of related form elements, e.g. fields, labels achieved via Segmentation Mapping, which groups form fields assigns labels to fields and groups
Segmentation Model – groups 10 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Form elements are grouped if they occur in sequence  they have similarities in attribute values or appearances their least common ancestor contains no other elements group(Es) :- similarFieldSequence(Es), 	leastCommonAncestor(A,Es), not hasAdditionalField(A,Es). leastCommonAncestor(A,Es) :- commonAncestor(A,Es), 	not ( child(C,A), commonAncestor(C,Es) ). partOf(E,A) :-  	group(Es), member(E,Es), leastCommonAncestor(A,Es).
Segmentation Model – groups 161111 May 27, 2011 OPAL - ontology-based web pattern analysis with logic group([e_320_input,e_326_input, 	e_332_input,e_338_input,e_344_input]). leastCommonAncestor( 	e_319_td,[e_320_input,e_326_input,...,e_344_input]). partOf(e_320_input,e_319_td). partOf(e_326_input,e_319_td). partOf(e_332_input,e_319_td). partOf(e_338_input,e_319_td). partOf(e_344_input,e_319_td).
Segmentation Model – labels 12 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment
Segmentation Model – labels 13 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment hasBasicLabel(E,L,T) :- 	html_element(E,_,_,_,input,_),	html_attr(_,E,id,ID,_), 	html_element(N,_,_,_,label,_),	html_attr(_,N,for,ID,_), 	child(L,N), html_text(L,T,_).
Segmentation Model – labels 14 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment greatestUniqueAncestor(A,E) :- uniqueDescendant(E,A), 	not ( parent(P,A), uniqueDescendant(E,P) ). hasBasicLabel(E,L,T) :-  	group(E), 	greatestUniqueAncestor(A,E), 	descendant(L,A), html_text(L,T,_).
Segmentation Model – labels 15 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment hasLabel(E,L,T) :- 	partOf(E,G), leastCommonAncestor(G,Es), group(Es), 	hasNoLabel(Es), textLists(LLs,G), 	sameLength(Es,LLs),	 	labelOneToOne(E,Ls,Es,LLs), 	member(L,Ls),  	html_text(L,T,_).
Segmentation Model – labels 16 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment hasLabel(E,L,LText) :-  hasTextBox(F,B), descendantTextOf(L,B), html_text(L,_,_,_,_), node_text(L,LText,true)
Segmentation Model – labels 17 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment
Segmentation Model – labels 18 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasLabel(e_320_input,t_322,”Nailsea / Backwell”). hasLabel(e_358_select,t_354,”Min. beds”).
Segmentation Model – labels 18 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasLabel(e_319_td,t_316,”Area: ”).
Segmentation Model – labels 19 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasLabel (e_304_input,t_302,"To Buy:"). hasLabel (e_308_input,t_306,"To Rent:"). hasLabel(e_320_input,t_322,"Nailsea / Backwell").hasLabel(e_326_input,t_328,"Portishead / Pill").hasLabel(e_338_input,t_340,"Yatton / Congresbury").hasLabel(e_332_input,t_334,"Clevedon"). hasLabel(e_344_input,t_346,"Bristol / Weston-super-mare"). hasBasicLabel(e_358_select,t_354,"Min Beds"). hasBasicLabel(e_515_select,t_400,"Min Price").hasBasicLabel(e_705_select,t_594,"Max Price").hasBasicLabel(e_788_select,t_784,"View Order"). hasLabel(e_319_td,t_316,"Area"). hasLabel(e_297_tbody,t_290,”Find a property to buy or rent…").
Overview 20 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Obtained from Browser model Relying on domain-specific knowledge, represents linguistic annotations machine learning based classifications Annotation Model 21 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Annotation Model 22 May 27, 2011 OPAL - ontology-based web pattern analysis with logic annotation(attrclob_d1_5560,attrclob_d1,80,87,"Nailsea"). annotationFeature(attrclob_d1_5560,"majorType","location"). annotationFeature(attrclob_d1_5560,"minorType“,"district_county_etc"). annotation(elclob_d1_1707,elclob_d1,2028,2038,"Min. price"). annotationFeature(elclob_d1_1707,"modifier“,"min"). annotationFeature(elclob_d1_1707,"minorType“,"price"). annotationFeature(elclob_d1_1707,"majorType“,"reform.label").
Describes conceptual entities on forms as in domain ontology Achieved via phenomenological mapping, which correlates labels with annotations classifies form elements Domain Model 23 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Domain Model – Ontology 24 May 27, 2011 OPAL - ontology-based web pattern analysis with logic purpose={buy, rent, combined}
Domain Model – Ontology 24 May 27, 2011 priceType={min, max, approximate, range} OPAL - ontology-based web pattern analysis with logic
Form elements are annotated as follows Form elements are classified with Concept C and Facets C_F Domain Model – Classification 25 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasAnnotation(E,A) :-  	hasLabel(E,_,T),  	annotation(Aid,_,_,_,T), 	annotationFeature(Aid,_,Anno). C(X) :- leafSegment(X), hasAnnotation(X,A), Clabel(A). C_F(X,F) :- C(X), hasAnnotation(X,F), C_FLabel(F). C_F(X,F) :- C(X), hasValueAnnotation(X,F), C_FValue(F)
Domain Model – Classification 26 May 27, 2011 OPAL - ontology-based web pattern analysis with logic priceElement(e_515_select,e_286_form). priceType(e_515_select,"min").
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Segment Price Element(max) ...
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Segment Price Element(max) ...
Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element Real- Estate Form AreaBranch Element ... Price Element(min) Price Segment Price Element(max) ...
Overview 28 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Analysis and Evaluation 29 May 27, 2011 OPAL - ontology-based web pattern analysis with logic UK Real Estate Domain 50 domain web forms (sampled from over 2800) Tested Domain-independent and Domain-aware
Analysis and Evaluatio  - Real-Estate 30 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Domain independent
Analysis and Evaluatio  - Real-Estate 31 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Domain independent Domain aware
Improve structural segmentation Visual segmentation and labeling Ontology guided segmentation Accelerate domain adaption Calling for machine learning for ontology creation Enhance ambiguity resolution Necessitating probabilistic logic in the future Interactive form filling / probing Conclusion and Future Work 32 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
Thank you very much ! May 27, 2011 OPAL - ontology-based web pattern analysis with logic

Weitere ähnliche Inhalte

Ähnlich wie OPAL Presentation

Dotnet a survey of xml tree patterns
Dotnet  a survey of xml tree patternsDotnet  a survey of xml tree patterns
Dotnet a survey of xml tree patternsEcway Technologies
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patternsecway
 
Java a survey of xml tree patterns
Java  a survey of xml tree patternsJava  a survey of xml tree patterns
Java a survey of xml tree patternsecwayerode
 
Android a survey of xml tree patterns
Android  a survey of xml tree patternsAndroid  a survey of xml tree patterns
Android a survey of xml tree patternsecway
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracingNetworks
 
X Som Graduation Presentation
X Som   Graduation PresentationX Som   Graduation Presentation
X Som Graduation PresentationGiorgio Orsi
 
Expressive Query Answering For Semantic Wikis
Expressive Query Answering For  Semantic WikisExpressive Query Answering For  Semantic Wikis
Expressive Query Answering For Semantic WikisJie Bao
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
 
PhD Presentation
PhD PresentationPhD Presentation
PhD Presentationmskayed
 
Taxonomy management in eZ Publish, the new eZ Tags datatype
Taxonomy management in eZ Publish, the new eZ Tags datatypeTaxonomy management in eZ Publish, the new eZ Tags datatype
Taxonomy management in eZ Publish, the new eZ Tags datatypeIvo Lukač
 
Development of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrievalDevelopment of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrievalAmjad Ali
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)Jie Bao
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Databaseijbuiiir1
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Gong Cheng
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
 
Ontology engineering of automatic text processing methods
Ontology engineering of automatic text processing methodsOntology engineering of automatic text processing methods
Ontology engineering of automatic text processing methodsIJECEIAES
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 

Ähnlich wie OPAL Presentation (20)

Dotnet a survey of xml tree patterns
Dotnet  a survey of xml tree patternsDotnet  a survey of xml tree patterns
Dotnet a survey of xml tree patterns
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Java a survey of xml tree patterns
Java  a survey of xml tree patternsJava  a survey of xml tree patterns
Java a survey of xml tree patterns
 
Android a survey of xml tree patterns
Android  a survey of xml tree patternsAndroid  a survey of xml tree patterns
Android a survey of xml tree patterns
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a Nutshell
 
X Som Graduation Presentation
X Som   Graduation PresentationX Som   Graduation Presentation
X Som Graduation Presentation
 
Expressive Query Answering For Semantic Wikis
Expressive Query Answering For  Semantic WikisExpressive Query Answering For  Semantic Wikis
Expressive Query Answering For Semantic Wikis
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
PhD Presentation
PhD PresentationPhD Presentation
PhD Presentation
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Taxonomy management in eZ Publish, the new eZ Tags datatype
Taxonomy management in eZ Publish, the new eZ Tags datatypeTaxonomy management in eZ Publish, the new eZ Tags datatype
Taxonomy management in eZ Publish, the new eZ Tags datatype
 
Development of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrievalDevelopment of a new indexing technique for XML document retrieval
Development of a new indexing technique for XML document retrieval
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Database
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Ontology engineering of automatic text processing methods
Ontology engineering of automatic text processing methodsOntology engineering of automatic text processing methods
Ontology engineering of automatic text processing methods
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 

Mehr von Giorgio Orsi

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseGiorgio Orsi
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Giorgio Orsi
 
Joint Repairs for Web Wrappers
Joint Repairs for Web WrappersJoint Repairs for Web Wrappers
Joint Repairs for Web WrappersGiorgio Orsi
 
SAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionSAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionGiorgio Orsi
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_finalGiorgio Orsi
 
Query Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesQuery Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesGiorgio Orsi
 
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014Giorgio Orsi
 
Deos 2014 - Welcome
Deos 2014 - WelcomeDeos 2014 - Welcome
Deos 2014 - WelcomeGiorgio Orsi
 
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsHeuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsGiorgio Orsi
 
Datalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web DatabasesDatalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web DatabasesGiorgio Orsi
 
AMBER WWW 2012 Poster
AMBER WWW 2012 PosterAMBER WWW 2012 Poster
AMBER WWW 2012 PosterGiorgio Orsi
 
AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)Giorgio Orsi
 
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)Giorgio Orsi
 
Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012Giorgio Orsi
 
OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012Giorgio Orsi
 
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Giorgio Orsi
 

Mehr von Giorgio Orsi (20)

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)
 
Joint Repairs for Web Wrappers
Joint Repairs for Web WrappersJoint Repairs for Web Wrappers
Joint Repairs for Web Wrappers
 
SAE: Structured Aspect Extraction
SAE: Structured Aspect ExtractionSAE: Structured Aspect Extraction
SAE: Structured Aspect Extraction
 
diadem-vldb-2015
diadem-vldb-2015diadem-vldb-2015
diadem-vldb-2015
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_final
 
Query Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesQuery Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological Databases
 
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
 
Deos 2014 - Welcome
Deos 2014 - WelcomeDeos 2014 - Welcome
Deos 2014 - Welcome
 
Perv a ds-rr13
Perv a ds-rr13Perv a ds-rr13
Perv a ds-rr13
 
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsHeuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
 
Datalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web DatabasesDatalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web Databases
 
AMBER WWW 2012 Poster
AMBER WWW 2012 PosterAMBER WWW 2012 Poster
AMBER WWW 2012 Poster
 
AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)
 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012
 
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
 
Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012
 
OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012
 
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...
 
Table Recognition
Table RecognitionTable Recognition
Table Recognition
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

OPAL Presentation

  • 1. OPAL – Real Understanding of Real Estate Forms Xiaonan GuoOxford University Computing Laboratory, DIADEM group
  • 2. Diversity in Web Form Design 2 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 3. Diversity in Web Form Design 2 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 4. Diversity in Web Form Design 2 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 5. Diversity in Web Form Design 3 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 6. Diversity in Web Form Design 3 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 7. OPAL – Ontology-based web Pattern Analysis with Logic Overview Data models and Mappings Browser, Segmentation, Annotation, and Domain Model Segmentation and Phenomenological Mapping Analysis and Evaluation Future Work Outline 4 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 8. Overview 5 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Domain-independent Hierarchical modeling Domain knowledge + Declarative Definition Domain-aware Form Analysis OPAL
  • 9. Overview 6 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 10. Browser Model 7 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 11. Browser Model 7 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 12. Browser Model 7 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 13. Browser Model 8 May 27, 2011 OPAL - ontology-based web pattern analysis with logic html_element(e_320_input,320,321,319,input,d1). html_attr(e_320_input_name,e_320_input,name,"location“,d1). html_attr(e_320_input_type,e_320_input,type,"radio“,d1). ... box(e_320_input,106,253,120,267,14,14). css_attr(e_320_input,bottom,"auto"). css_attr(e_320_input,clear,"none"). css_attr(e_320_input,color,"rgb(0,0,0)"). ...
  • 14. Segmentation Model 9 May 27, 2011 OPAL - ontology-based web pattern analysis with logic grouping of related form elements, e.g. fields, labels achieved via Segmentation Mapping, which groups form fields assigns labels to fields and groups
  • 15. Segmentation Model – groups 10 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Form elements are grouped if they occur in sequence they have similarities in attribute values or appearances their least common ancestor contains no other elements group(Es) :- similarFieldSequence(Es), leastCommonAncestor(A,Es), not hasAdditionalField(A,Es). leastCommonAncestor(A,Es) :- commonAncestor(A,Es), not ( child(C,A), commonAncestor(C,Es) ). partOf(E,A) :- group(Es), member(E,Es), leastCommonAncestor(A,Es).
  • 16. Segmentation Model – groups 161111 May 27, 2011 OPAL - ontology-based web pattern analysis with logic group([e_320_input,e_326_input, e_332_input,e_338_input,e_344_input]). leastCommonAncestor( e_319_td,[e_320_input,e_326_input,...,e_344_input]). partOf(e_320_input,e_319_td). partOf(e_326_input,e_319_td). partOf(e_332_input,e_319_td). partOf(e_338_input,e_319_td). partOf(e_344_input,e_319_td).
  • 17. Segmentation Model – labels 12 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment
  • 18. Segmentation Model – labels 13 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment hasBasicLabel(E,L,T) :- html_element(E,_,_,_,input,_), html_attr(_,E,id,ID,_), html_element(N,_,_,_,label,_), html_attr(_,N,for,ID,_), child(L,N), html_text(L,T,_).
  • 19. Segmentation Model – labels 14 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment greatestUniqueAncestor(A,E) :- uniqueDescendant(E,A), not ( parent(P,A), uniqueDescendant(E,P) ). hasBasicLabel(E,L,T) :- group(E), greatestUniqueAncestor(A,E), descendant(L,A), html_text(L,T,_).
  • 20. Segmentation Model – labels 15 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment hasLabel(E,L,T) :- partOf(E,G), leastCommonAncestor(G,Es), group(Es), hasNoLabel(Es), textLists(LLs,G), sameLength(Es,LLs), labelOneToOne(E,Ls,Es,LLs), member(L,Ls), html_text(L,T,_).
  • 21. Segmentation Model – labels 16 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment hasLabel(E,L,LText) :- hasTextBox(F,B), descendantTextOf(L,B), html_text(L,_,_,_,_), node_text(L,LText,true)
  • 22. Segmentation Model – labels 17 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Texts are assigned to fields and groups using Field: HTML <label>, Greatest unique ancestor Segment: Text-field alignment in groups Page: Visual alignment
  • 23. Segmentation Model – labels 18 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasLabel(e_320_input,t_322,”Nailsea / Backwell”). hasLabel(e_358_select,t_354,”Min. beds”).
  • 24. Segmentation Model – labels 18 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasLabel(e_319_td,t_316,”Area: ”).
  • 25. Segmentation Model – labels 19 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasLabel (e_304_input,t_302,"To Buy:"). hasLabel (e_308_input,t_306,"To Rent:"). hasLabel(e_320_input,t_322,"Nailsea / Backwell").hasLabel(e_326_input,t_328,"Portishead / Pill").hasLabel(e_338_input,t_340,"Yatton / Congresbury").hasLabel(e_332_input,t_334,"Clevedon"). hasLabel(e_344_input,t_346,"Bristol / Weston-super-mare"). hasBasicLabel(e_358_select,t_354,"Min Beds"). hasBasicLabel(e_515_select,t_400,"Min Price").hasBasicLabel(e_705_select,t_594,"Max Price").hasBasicLabel(e_788_select,t_784,"View Order"). hasLabel(e_319_td,t_316,"Area"). hasLabel(e_297_tbody,t_290,”Find a property to buy or rent…").
  • 26. Overview 20 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 27. Obtained from Browser model Relying on domain-specific knowledge, represents linguistic annotations machine learning based classifications Annotation Model 21 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 28. Annotation Model 22 May 27, 2011 OPAL - ontology-based web pattern analysis with logic annotation(attrclob_d1_5560,attrclob_d1,80,87,"Nailsea"). annotationFeature(attrclob_d1_5560,"majorType","location"). annotationFeature(attrclob_d1_5560,"minorType“,"district_county_etc"). annotation(elclob_d1_1707,elclob_d1,2028,2038,"Min. price"). annotationFeature(elclob_d1_1707,"modifier“,"min"). annotationFeature(elclob_d1_1707,"minorType“,"price"). annotationFeature(elclob_d1_1707,"majorType“,"reform.label").
  • 29. Describes conceptual entities on forms as in domain ontology Achieved via phenomenological mapping, which correlates labels with annotations classifies form elements Domain Model 23 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 30. Domain Model – Ontology 24 May 27, 2011 OPAL - ontology-based web pattern analysis with logic purpose={buy, rent, combined}
  • 31. Domain Model – Ontology 24 May 27, 2011 priceType={min, max, approximate, range} OPAL - ontology-based web pattern analysis with logic
  • 32. Form elements are annotated as follows Form elements are classified with Concept C and Facets C_F Domain Model – Classification 25 May 27, 2011 OPAL - ontology-based web pattern analysis with logic hasAnnotation(E,A) :- hasLabel(E,_,T), annotation(Aid,_,_,_,T), annotationFeature(Aid,_,Anno). C(X) :- leafSegment(X), hasAnnotation(X,A), Clabel(A). C_F(X,F) :- C(X), hasAnnotation(X,F), C_FLabel(F). C_F(X,F) :- C(X), hasValueAnnotation(X,F), C_FValue(F)
  • 33. Domain Model – Classification 26 May 27, 2011 OPAL - ontology-based web pattern analysis with logic priceElement(e_515_select,e_286_form). priceType(e_515_select,"min").
  • 34. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
  • 35. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
  • 36. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
  • 37. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Element(max) ...
  • 38. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Segment Price Element(max) ...
  • 39. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element AreaBranch Element ... Price Element(min) Price Segment Price Element(max) ...
  • 40. Domain Model 27 May 27, 2011 OPAL - ontology-based web pattern analysis with logic ... AreaBranch Element AreaBranch Element Area-Branch Segment Geographic Segment AreaBranch Element AreaBranch Element Real- Estate Form AreaBranch Element ... Price Element(min) Price Segment Price Element(max) ...
  • 41. Overview 28 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 42. Analysis and Evaluation 29 May 27, 2011 OPAL - ontology-based web pattern analysis with logic UK Real Estate Domain 50 domain web forms (sampled from over 2800) Tested Domain-independent and Domain-aware
  • 43. Analysis and Evaluatio - Real-Estate 30 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Domain independent
  • 44. Analysis and Evaluatio - Real-Estate 31 May 27, 2011 OPAL - ontology-based web pattern analysis with logic Domain independent Domain aware
  • 45. Improve structural segmentation Visual segmentation and labeling Ontology guided segmentation Accelerate domain adaption Calling for machine learning for ontology creation Enhance ambiguity resolution Necessitating probabilistic logic in the future Interactive form filling / probing Conclusion and Future Work 32 May 27, 2011 OPAL - ontology-based web pattern analysis with logic
  • 46. Thank you very much ! May 27, 2011 OPAL - ontology-based web pattern analysis with logic