SlideShare ist ein Scribd-Unternehmen logo
1 von 23
100.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 1
Creating Semantic Fingerprints
for Web Resources
Katrin Krieger, Jens Schneider, Christian Nywelt, Dietmar Rösner
Otto-von-Guericke Universität Magdeburg (Germany)
200.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 2
Motivation
• Automatic extraction of information and generating formal
semantic descriptions are important aspects of Semantic Web
research
query
compare
combine
http://mehmetveysiadam.com
300.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 3
400.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 4
500.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 5
Semantic Fingerprints (SF)
• Semantic signatures of Web documents
• Representing concepts to be found in documents as well as
relationships between these concepts
• Graph structures with concepts as nodes and relationships as
edges
• Can be used to compute semantic relatedness, e.g. in e-learning
scenarios
600.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 6
Desired Properties of Semantic Fingerprints
P1 Concepts are distinct and unambiguous
P2 Concepts are connected through relationships
P3 Documents with similar content will
yield similar SF
P4 A SF covers all essential concepts
belonging to a document
700.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 7
General Idea
• Hypothesis: semantically related concepts of a domain are
connected through relationships
• This information is inherent in LOD datasets which we can exploit
to disambiguate concepts
• This information is sufficient to build semantic fingerprints
800.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 8
How to automatically obtain a Semantic Fingerprint
1. Extract keywords from Web document
2. Create nodes by mapping keywords to semantic concepts
3. Add edges by finding relations
4. Remove irrelevant nodes and edges
5. Identify all connected subgraphs
6. Choose semantic fingerprint from connected subgraphs
900.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 9
Extracting Keywords and Mapping to Concepts
• Use Natural Language Processing (NLP) tools to extract nouns and
noun phrases
• Query dataset to find concepts whose labels correspond with
keywords
1000.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 10
Result of step #1
Disconnected graph with n
concepts per keyword
1200.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 12
Find relationships
• Expand each node and search for neighboring concepts to “grow”
the graph (BFS) up to a certain path length n
1300.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 13
Result of Step #2
• Graph with connected subgraphs
1400.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 14
Removing irrelevant nodes and edges
Which nodes and edges are really relevant for the semantic
fingerprint?
Heuristics:
• Path length
• Number of connecting paths
• Occurences in paths
• Number of corresponding keywords
• Interconnection property
1500.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 15
Identifying subgraphs and picking the SF
• Identify subgraphs by performing BFS
• Determine which of the subgraphs is the semantic fingerprint
• Cover as many keywords as possible
• Number of concepts in the subgraph
1600.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 16
Evaluation
P1 Concepts are distinct and unambiguous
P2 Concepts are connected through relationships
P3 Documents with similar content will
yield similar SF
P4 A SF covers all essential concepts
belonging to a document
1700.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 17
Quantitative Evaluation
• P3: Documents with similar content will yield similar SF
• Extraction of 11 different KW lists from real world e-learning
documents
• Generation of SF for all KW lists
• Generation of SF for all (|KWi
| k)− -tuple subsets for each KWi
with |KWi
|
denoting the number of keywords in KWi
and varied k from 1 to 4
• Comparison of SF of original KW lists with varied KW lists
• Number of contained concepts
• Number of common concepts
1800.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 18
Quantitative Evaluation (2)
●
Number of concepts in 1992 SF vary
from 0 to 22
●
SF with 14-16 concepts make up 8.3%
●
SF with 10-13 concepts make up 20.8%
●
Grouping into bins
●
Majority of SF with one KW
less still have ≥90% KW in
common with original SF
1900.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 19
Quantitative Evaluation
• P1: Concepts are distinct and unambiguous
• P4: A SF covers all essential concepts belonging to a document
• Evaluation with human reviewers:
• the reviewers rated the behavior of our algorithm as comprehensible and
the fingerprints as suitable for the keyword lists
• The reviewers also found that some concepts seem to be more important
than others.
2000.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 20
Conclusion
• New method to create a formal semantic description of
a document
• Exploits inherent properties and structures in LOD
datasets
• No need for other methods such as statistics
Open Issues
• Runtime is rather high and expensive in computing
resources
• Not all semantic relations from the documents are also
in the dataset
• Scalability
2100.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 21
Outlook
• Exploit DOM structure of the document
• Add weights to keywords
• Investigate other data structures and adapted expansion
algorithms
• Study other methods to capture semantic relationships from text
2200.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 22
Thank you for your attention.
What are your questions?
img src: https://flic.kr/p/6DBVxb
katrin.krieger@ovgu.de
2300.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 23
SF for KW ={“haskell”, “fold”, “higher order function”,
“prove”}
2400.00.2009OVGU Präsentation
/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 24
In use Slideshare
Connector
Slideshare
Connector
StackOverflow
Connector
StackOverflow
Connector
Freebase
Connector
Freebase
Connector
DBpedia
Connector
DBpedia
Connector
LectureSlide
Connector
LectureSlide
Connector
Educational
metadata
Educational
metadata
RESTbased
Web-Service
(Codename: Guinan)

Weitere ähnliche Inhalte

Ähnlich wie Creating Semantic Fingerprints for Web Documents

Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchDaniel Schneiter
 
Conference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesConference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesAliaksandr Birukou
 
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018OW2
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysKoneksys
 
Domain Driven Design Big Picture Strategic Patterns
Domain Driven Design Big Picture Strategic PatternsDomain Driven Design Big Picture Strategic Patterns
Domain Driven Design Big Picture Strategic PatternsMark Windholtz
 
Context Sensitive Help_Remedy
Context Sensitive Help_RemedyContext Sensitive Help_Remedy
Context Sensitive Help_RemedyDebjani Sen
 
OAC - From Cloud Entry to Data Engineering to Data Science
OAC - From Cloud Entry to Data Engineering to Data ScienceOAC - From Cloud Entry to Data Engineering to Data Science
OAC - From Cloud Entry to Data Engineering to Data ScienceChristian Berg
 
2019-Nov: Domain Driven Design (DDD) and when not to use it
2019-Nov: Domain Driven Design (DDD) and when not to use it2019-Nov: Domain Driven Design (DDD) and when not to use it
2019-Nov: Domain Driven Design (DDD) and when not to use itMark Windholtz
 
Achieving Better Business Productivity through Apps for Office
Achieving Better Business Productivity through Apps for Office Achieving Better Business Productivity through Apps for Office
Achieving Better Business Productivity through Apps for Office Jason Himmelstein
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
 
MongoDB Evening Austin, TX 2017
MongoDB Evening Austin, TX 2017MongoDB Evening Austin, TX 2017
MongoDB Evening Austin, TX 2017MongoDB
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020
 Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020 Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020
Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020Sébastien Paulet
 
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointSemantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointDIQA Projektmanagement GmbH
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB
 
Scribe online 03 scribe online cdk and api overview
Scribe online 03   scribe online cdk and api overviewScribe online 03   scribe online cdk and api overview
Scribe online 03 scribe online cdk and api overviewScribe Software Corp.
 

Ähnlich wie Creating Semantic Fingerprints for Web Documents (20)

Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using Elasticsearch
 
Conference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesConference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferences
 
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
OSS Projects Knowledge Mining with CROSSMINER, OW2con'18, June 7-8, 2018
 
IN103 MongoDB What You Need To Know
IN103 MongoDB What You Need To KnowIN103 MongoDB What You Need To Know
IN103 MongoDB What You Need To Know
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Benefits and Features of CodeIgniter.pdf
Benefits and Features of CodeIgniter.pdfBenefits and Features of CodeIgniter.pdf
Benefits and Features of CodeIgniter.pdf
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
Domain Driven Design Big Picture Strategic Patterns
Domain Driven Design Big Picture Strategic PatternsDomain Driven Design Big Picture Strategic Patterns
Domain Driven Design Big Picture Strategic Patterns
 
Context Sensitive Help_Remedy
Context Sensitive Help_RemedyContext Sensitive Help_Remedy
Context Sensitive Help_Remedy
 
OAC - From Cloud Entry to Data Engineering to Data Science
OAC - From Cloud Entry to Data Engineering to Data ScienceOAC - From Cloud Entry to Data Engineering to Data Science
OAC - From Cloud Entry to Data Engineering to Data Science
 
2019-Nov: Domain Driven Design (DDD) and when not to use it
2019-Nov: Domain Driven Design (DDD) and when not to use it2019-Nov: Domain Driven Design (DDD) and when not to use it
2019-Nov: Domain Driven Design (DDD) and when not to use it
 
Achieving Better Business Productivity through Apps for Office
Achieving Better Business Productivity through Apps for Office Achieving Better Business Productivity through Apps for Office
Achieving Better Business Productivity through Apps for Office
 
Open Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
 
MongoDB Evening Austin, TX 2017
MongoDB Evening Austin, TX 2017MongoDB Evening Austin, TX 2017
MongoDB Evening Austin, TX 2017
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020
 Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020 Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020
Cortex/Syntex : Digitalize your company information -aOS South Asia 24/10/2020
 
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePointSemantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint
 
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
 
Scribe online 03 scribe online cdk and api overview
Scribe online 03   scribe online cdk and api overviewScribe online 03   scribe online cdk and api overview
Scribe online 03 scribe online cdk and api overview
 

Kürzlich hochgeladen

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Kürzlich hochgeladen (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

Creating Semantic Fingerprints for Web Documents

  • 1. 100.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 1 Creating Semantic Fingerprints for Web Resources Katrin Krieger, Jens Schneider, Christian Nywelt, Dietmar Rösner Otto-von-Guericke Universität Magdeburg (Germany)
  • 2. 200.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 2 Motivation • Automatic extraction of information and generating formal semantic descriptions are important aspects of Semantic Web research query compare combine http://mehmetveysiadam.com
  • 3. 300.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 3
  • 4. 400.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 4
  • 5. 500.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 5 Semantic Fingerprints (SF) • Semantic signatures of Web documents • Representing concepts to be found in documents as well as relationships between these concepts • Graph structures with concepts as nodes and relationships as edges • Can be used to compute semantic relatedness, e.g. in e-learning scenarios
  • 6. 600.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 6 Desired Properties of Semantic Fingerprints P1 Concepts are distinct and unambiguous P2 Concepts are connected through relationships P3 Documents with similar content will yield similar SF P4 A SF covers all essential concepts belonging to a document
  • 7. 700.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 7 General Idea • Hypothesis: semantically related concepts of a domain are connected through relationships • This information is inherent in LOD datasets which we can exploit to disambiguate concepts • This information is sufficient to build semantic fingerprints
  • 8. 800.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 8 How to automatically obtain a Semantic Fingerprint 1. Extract keywords from Web document 2. Create nodes by mapping keywords to semantic concepts 3. Add edges by finding relations 4. Remove irrelevant nodes and edges 5. Identify all connected subgraphs 6. Choose semantic fingerprint from connected subgraphs
  • 9. 900.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 9 Extracting Keywords and Mapping to Concepts • Use Natural Language Processing (NLP) tools to extract nouns and noun phrases • Query dataset to find concepts whose labels correspond with keywords
  • 10. 1000.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 10 Result of step #1 Disconnected graph with n concepts per keyword
  • 11. 1200.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 12 Find relationships • Expand each node and search for neighboring concepts to “grow” the graph (BFS) up to a certain path length n
  • 12. 1300.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 13 Result of Step #2 • Graph with connected subgraphs
  • 13. 1400.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 14 Removing irrelevant nodes and edges Which nodes and edges are really relevant for the semantic fingerprint? Heuristics: • Path length • Number of connecting paths • Occurences in paths • Number of corresponding keywords • Interconnection property
  • 14. 1500.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 15 Identifying subgraphs and picking the SF • Identify subgraphs by performing BFS • Determine which of the subgraphs is the semantic fingerprint • Cover as many keywords as possible • Number of concepts in the subgraph
  • 15. 1600.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 16 Evaluation P1 Concepts are distinct and unambiguous P2 Concepts are connected through relationships P3 Documents with similar content will yield similar SF P4 A SF covers all essential concepts belonging to a document
  • 16. 1700.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 17 Quantitative Evaluation • P3: Documents with similar content will yield similar SF • Extraction of 11 different KW lists from real world e-learning documents • Generation of SF for all KW lists • Generation of SF for all (|KWi | k)− -tuple subsets for each KWi with |KWi | denoting the number of keywords in KWi and varied k from 1 to 4 • Comparison of SF of original KW lists with varied KW lists • Number of contained concepts • Number of common concepts
  • 17. 1800.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 18 Quantitative Evaluation (2) ● Number of concepts in 1992 SF vary from 0 to 22 ● SF with 14-16 concepts make up 8.3% ● SF with 10-13 concepts make up 20.8% ● Grouping into bins ● Majority of SF with one KW less still have ≥90% KW in common with original SF
  • 18. 1900.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 19 Quantitative Evaluation • P1: Concepts are distinct and unambiguous • P4: A SF covers all essential concepts belonging to a document • Evaluation with human reviewers: • the reviewers rated the behavior of our algorithm as comprehensible and the fingerprints as suitable for the keyword lists • The reviewers also found that some concepts seem to be more important than others.
  • 19. 2000.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 20 Conclusion • New method to create a formal semantic description of a document • Exploits inherent properties and structures in LOD datasets • No need for other methods such as statistics Open Issues • Runtime is rather high and expensive in computing resources • Not all semantic relations from the documents are also in the dataset • Scalability
  • 20. 2100.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 21 Outlook • Exploit DOM structure of the document • Add weights to keywords • Investigate other data structures and adapted expansion algorithms • Study other methods to capture semantic relationships from text
  • 21. 2200.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 22 Thank you for your attention. What are your questions? img src: https://flic.kr/p/6DBVxb katrin.krieger@ovgu.de
  • 22. 2300.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 23 SF for KW ={“haskell”, “fold”, “higher order function”, “prove”}
  • 23. 2400.00.2009OVGU Präsentation /22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 24 In use Slideshare Connector Slideshare Connector StackOverflow Connector StackOverflow Connector Freebase Connector Freebase Connector DBpedia Connector DBpedia Connector LectureSlide Connector LectureSlide Connector Educational metadata Educational metadata RESTbased Web-Service (Codename: Guinan)