SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Fourth International Conference on Topic Maps – Research and Applications (TMRA 2008)




Towards an Automatic Semantic Integration of
Information
Dr. Jörg Wurzer, iQser AG
Prof. Dr. Stefan Smolnik, European Business School (EBS)

Leipzig, October 16, 2008
Agenda

• Status quo and motivation


• New paradigm: information access by context


• Proof of Concept at EADS


• Technical architecture


• Analysis process & queries


• Further research & questions
Motivation

• The quantity of digital information is still growing. IDC 2008: 60% per year


• Information is dispersed over documents and various applications/databases


• Growing need for creating knowledge based on available information


  • Profound knowledge for management decisions, completing tasks and
    business processes, development of new products, sales and marketing
    campaigns


• Topic Maps can adopt the new results of research in semantic technologies
Todays solution I: full-text search

• Advantages: easy to use, generally accepted, high user experiences


• Disadvantages:


  • Result quality depends on the keyword selection


  • Results are presented as long document lists, which have to be assessed
    intellectually by the users


  • The result set does not necessarily consider the user’s intention


  • Each application has its own search functionality (no standards)
Todays solution II: directory hierarchy

• Advantages: content like documents can be organized considering their
  meaning, context, and applicability


• Disadvantages:


  • A manually created hierarchy provides a static view on the content, but in
    practice, the user need different views like on customers, projects and
    products dimensions


  • Documents are usually needed in several contexts; in this case, the
    documents are stored redundantly; problem: editing of all relevant
    documents


  • Directory hierarchies often reflect the current state of knowledge; however,
    some documents can not be included appropriately in the hierarchy
New paradigm: access content in any context

• Automatically created topic maps of all content object types


  • Multiple links between the content objects establish a semantic, non-
    hierachical network; links are created semantically


  • The user chooses his focus of interest; a topic map provides the related
    content; example: customers are linked to projects, contracts, products,
    employees, and service calls.


  • Exploring the available data by navigating through a topic map


  • The content could be located in heterogeneous sources and could be
    stored in different formats or data models; even external content could be
    included
Proof of Concept of iQser Middleware at EADS

• Devision Defence and Communcation Systems


• Requirements:


  • Analysis of unstructured data of military information


     • Automatically created network of content objects


     • Automatically created network of main concepts


  • All links between documents have to be justified


  • Benchmark: a system with a manually created ontology
Application screenshot (modified data due to confidentiality)
Results

• The created topic map provides transparent relations between documents


• The terms tree provides users with an overview of the document base’s
  content as well as of related fundamental facts


  • In the Poc for EADS, the concept-tree shows that “Biber” is a bridge tank
    and the location of the anti-missile defense


• The tree’s information quality as well as the topic map’s quality is high and
  can compete with that of a manually created ontology
Uniform Information Layer (UIL)

• Single point of access for all content object types


  • Connector for each type of structured and unstructured content from any
    source (document, database, application): transforms data into a
    semantically typed generic content object and stores modified data back.


  • No redundantly stored data


• Searching across heterogeneous sources including the web is possible


  • Users can specify search queries by means of attributes
Architecture of iQser Semantic Middleware
Analysis process

• All content changes (and changes of the topic map) trigger an event


• All user actions are tracked


• All changes or specific amounts of user actions trigger the analysis process


  • Combination of three analysis methods: Syntax Analyzer, Pattern Analyzer,
    Semantic Analyzer


  • More analyzers could be included according to customers needs


• Pairs of content objects can have n relations with calculated weights
Syntax Analyzer

• Each content object can have multiple key attributes defined in the content
  provider


  • Examples: full name of a person, sender and recipient of an email, project
    ID


• The Syntax Analyzer looks wether these key attributes are related to
  attributes of other content objects in the data pool
Pattern Analyzer

• The Pattern Analyzer extracts the meaningful words according to significance


• Transforms a selected set of words into a data query; the result is a list of
  similar content objects


  • The similarity is described by a weight between 0 and 1


• The Pattern Analyzer considers the context of used words in a text; it
  therefore reflects the different use of words in different contexts
Semantic Analyzer

• Background: the meaning of words and sentences in a language is not
  defined abstractly but indirectly manifested in the daily use of language


• The Semantic Analyzer evaluates the tracked user actions


  • If two content objects are selected, edited, or created in a sequence, the
    Semantic Analyzer creates a link between these objects


  • The weight of such a link will grow, if the same sequence of content objects
    occurs again


  • The weights of content object links can shrink, if a weight has a value larger
    than 1


  • The topic map is self-optimizing considering the customers’ interests
Querying associated information

• Users can specify search queries aiming at a precise result by means of


  • attibutes


  • semantic types


  • relations (context search)


• All changes in the data pool and in the topic map can be used to trigger or
  control a process
Further research

• Developing more applications as concrete use cases based on the iQser
  Semantic Middleware


• Developing and evaluating additional analysis methods


• Implementing complex queries with multiple contexts
Thank you!
Dr. Jörg Wurzer
+49 172 6680073
www.iqser.com
joerg.wurzer@iqser.net
Technical details

• Hardware: Pentium(R) Dual Core 3 GHz, 2 GB RAM


• Software: Windows XP 2002 SP3, JBoss 4.0.4 GA, Sun JDK 1.5_12


• JBoss JVM heap size configuration: -Xms128m -Xmx512m


• 3 GB of data (Word, Excel, PowerPoint, Plain Text, HTML) are indexed and
  analyzed in 14 hours


• More than 70 % of CPU resources for I/O waits


• CPU needed less than 400 MB memory

Weitere ähnliche Inhalte

Was ist angesagt?

Record matching
Record matchingRecord matching
Record matching
Nishna Ma
 
Insight Consulting Project
Insight Consulting ProjectInsight Consulting Project
Insight Consulting Project
Kuhan Wang
 

Was ist angesagt? (16)

Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
A First Step Towards Content Protecting Plagiarism Detection
A First Step Towards Content Protecting Plagiarism Detection  A First Step Towards Content Protecting Plagiarism Detection
A First Step Towards Content Protecting Plagiarism Detection
 
Gaurav web mining
Gaurav web miningGaurav web mining
Gaurav web mining
 
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
 
Web Mining Projects Topics
Web Mining Projects TopicsWeb Mining Projects Topics
Web Mining Projects Topics
 
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted DataPrivacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
 
A common meta model for data analysis based on DSM
A common meta model for data analysis based on DSMA common meta model for data analysis based on DSM
A common meta model for data analysis based on DSM
 
Using SLE for creation of data warehouses
Using SLE for creation of data warehousesUsing SLE for creation of data warehouses
Using SLE for creation of data warehouses
 
web mining
web miningweb mining
web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Record matching
Record matchingRecord matching
Record matching
 
Insight Consulting Project
Insight Consulting ProjectInsight Consulting Project
Insight Consulting Project
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 

Andere mochten auch

Andere mochten auch (19)

idSpace: Distributed Collaborative Product Innovation
idSpace: Distributed Collaborative Product InnovationidSpace: Distributed Collaborative Product Innovation
idSpace: Distributed Collaborative Product Innovation
 
A step towards TMDM 3.0
A step towards TMDM 3.0A step towards TMDM 3.0
A step towards TMDM 3.0
 
TMCL Edit
TMCL EditTMCL Edit
TMCL Edit
 
Topic Maps for Association Rule Mining
Topic Maps for Association Rule MiningTopic Maps for Association Rule Mining
Topic Maps for Association Rule Mining
 
tolog - a topic maps query language
tolog - a topic maps query languagetolog - a topic maps query language
tolog - a topic maps query language
 
Topic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared VocabulariesTopic Maps Exchange in the Absence of Shared Vocabularies
Topic Maps Exchange in the Absence of Shared Vocabularies
 
bioequivalence
bioequivalencebioequivalence
bioequivalence
 
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
 
Creating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space ExperimentsCreating Topic Maps Ontologies for Space Experiments
Creating Topic Maps Ontologies for Space Experiments
 
Fernando Sancho Caparrini
Fernando Sancho CaparriniFernando Sancho Caparrini
Fernando Sancho Caparrini
 
TMRAP - Topic Maps Remote Access Protocol
TMRAP - Topic Maps Remote Access ProtocolTMRAP - Topic Maps Remote Access Protocol
TMRAP - Topic Maps Remote Access Protocol
 
interchangeability
interchangeabilityinterchangeability
interchangeability
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
idSpace
idSpaceidSpace
idSpace
 
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic MapsSocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environments
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
HStrategies
HStrategiesHStrategies
HStrategies
 
vbhc
vbhcvbhc
vbhc
 

Ähnlich wie Towards an automatic semantic integration of information

Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
Nina Jeliazkova
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
IJRAT
 
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
IJTET Journal
 

Ähnlich wie Towards an automatic semantic integration of information (20)

2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Efficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining ProcessEfficient Practices for Large Scale Text Mining Process
Efficient Practices for Large Scale Text Mining Process
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Relevant updated data retrieval architectural model for continous text extrac...
Relevant updated data retrieval architectural model for continous text extrac...Relevant updated data retrieval architectural model for continous text extrac...
Relevant updated data retrieval architectural model for continous text extrac...
 
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
 
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University ChennaiBig Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
Evaluation criteria for nosql databases
Evaluation criteria for nosql databasesEvaluation criteria for nosql databases
Evaluation criteria for nosql databases
 
Database management system
Database management systemDatabase management system
Database management system
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
 

Mehr von tmra

Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
tmra
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
tmra
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
tmra
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
tmra
 
Presentation final
Presentation finalPresentation final
Presentation final
tmra
 
Mappe1
Mappe1Mappe1
Mappe1
tmra
 

Mehr von tmra (20)

Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...Topic Maps for improved access to and use of content in relational databases ...
Topic Maps for improved access to and use of content in relational databases ...
 
External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Database
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federation
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Maps
 
Hatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map MergingHatana - Virtual Topic Map Merging
Hatana - Virtual Topic Map Merging
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorer
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge management
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontology
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
 
Mappe1
Mappe1Mappe1
Mappe1
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semantics
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integration
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Framework
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Towards an automatic semantic integration of information

  • 1. Fourth International Conference on Topic Maps – Research and Applications (TMRA 2008) Towards an Automatic Semantic Integration of Information Dr. Jörg Wurzer, iQser AG Prof. Dr. Stefan Smolnik, European Business School (EBS) Leipzig, October 16, 2008
  • 2. Agenda • Status quo and motivation • New paradigm: information access by context • Proof of Concept at EADS • Technical architecture • Analysis process & queries • Further research & questions
  • 3. Motivation • The quantity of digital information is still growing. IDC 2008: 60% per year • Information is dispersed over documents and various applications/databases • Growing need for creating knowledge based on available information • Profound knowledge for management decisions, completing tasks and business processes, development of new products, sales and marketing campaigns • Topic Maps can adopt the new results of research in semantic technologies
  • 4. Todays solution I: full-text search • Advantages: easy to use, generally accepted, high user experiences • Disadvantages: • Result quality depends on the keyword selection • Results are presented as long document lists, which have to be assessed intellectually by the users • The result set does not necessarily consider the user’s intention • Each application has its own search functionality (no standards)
  • 5. Todays solution II: directory hierarchy • Advantages: content like documents can be organized considering their meaning, context, and applicability • Disadvantages: • A manually created hierarchy provides a static view on the content, but in practice, the user need different views like on customers, projects and products dimensions • Documents are usually needed in several contexts; in this case, the documents are stored redundantly; problem: editing of all relevant documents • Directory hierarchies often reflect the current state of knowledge; however, some documents can not be included appropriately in the hierarchy
  • 6. New paradigm: access content in any context • Automatically created topic maps of all content object types • Multiple links between the content objects establish a semantic, non- hierachical network; links are created semantically • The user chooses his focus of interest; a topic map provides the related content; example: customers are linked to projects, contracts, products, employees, and service calls. • Exploring the available data by navigating through a topic map • The content could be located in heterogeneous sources and could be stored in different formats or data models; even external content could be included
  • 7.
  • 8. Proof of Concept of iQser Middleware at EADS • Devision Defence and Communcation Systems • Requirements: • Analysis of unstructured data of military information • Automatically created network of content objects • Automatically created network of main concepts • All links between documents have to be justified • Benchmark: a system with a manually created ontology
  • 9. Application screenshot (modified data due to confidentiality)
  • 10. Results • The created topic map provides transparent relations between documents • The terms tree provides users with an overview of the document base’s content as well as of related fundamental facts • In the Poc for EADS, the concept-tree shows that “Biber” is a bridge tank and the location of the anti-missile defense • The tree’s information quality as well as the topic map’s quality is high and can compete with that of a manually created ontology
  • 11. Uniform Information Layer (UIL) • Single point of access for all content object types • Connector for each type of structured and unstructured content from any source (document, database, application): transforms data into a semantically typed generic content object and stores modified data back. • No redundantly stored data • Searching across heterogeneous sources including the web is possible • Users can specify search queries by means of attributes
  • 12. Architecture of iQser Semantic Middleware
  • 13. Analysis process • All content changes (and changes of the topic map) trigger an event • All user actions are tracked • All changes or specific amounts of user actions trigger the analysis process • Combination of three analysis methods: Syntax Analyzer, Pattern Analyzer, Semantic Analyzer • More analyzers could be included according to customers needs • Pairs of content objects can have n relations with calculated weights
  • 14. Syntax Analyzer • Each content object can have multiple key attributes defined in the content provider • Examples: full name of a person, sender and recipient of an email, project ID • The Syntax Analyzer looks wether these key attributes are related to attributes of other content objects in the data pool
  • 15. Pattern Analyzer • The Pattern Analyzer extracts the meaningful words according to significance • Transforms a selected set of words into a data query; the result is a list of similar content objects • The similarity is described by a weight between 0 and 1 • The Pattern Analyzer considers the context of used words in a text; it therefore reflects the different use of words in different contexts
  • 16. Semantic Analyzer • Background: the meaning of words and sentences in a language is not defined abstractly but indirectly manifested in the daily use of language • The Semantic Analyzer evaluates the tracked user actions • If two content objects are selected, edited, or created in a sequence, the Semantic Analyzer creates a link between these objects • The weight of such a link will grow, if the same sequence of content objects occurs again • The weights of content object links can shrink, if a weight has a value larger than 1 • The topic map is self-optimizing considering the customers’ interests
  • 17. Querying associated information • Users can specify search queries aiming at a precise result by means of • attibutes • semantic types • relations (context search) • All changes in the data pool and in the topic map can be used to trigger or control a process
  • 18. Further research • Developing more applications as concrete use cases based on the iQser Semantic Middleware • Developing and evaluating additional analysis methods • Implementing complex queries with multiple contexts
  • 19. Thank you! Dr. Jörg Wurzer +49 172 6680073 www.iqser.com joerg.wurzer@iqser.net
  • 20. Technical details • Hardware: Pentium(R) Dual Core 3 GHz, 2 GB RAM • Software: Windows XP 2002 SP3, JBoss 4.0.4 GA, Sun JDK 1.5_12 • JBoss JVM heap size configuration: -Xms128m -Xmx512m • 3 GB of data (Word, Excel, PowerPoint, Plain Text, HTML) are indexed and analyzed in 14 hours • More than 70 % of CPU resources for I/O waits • CPU needed less than 400 MB memory