SlideShare ist ein Scribd-Unternehmen logo
1 von 21
STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
Overview Motivation & Applications Why STL?  Semantic Terminology Linguistic Evaluation Conclusion and future work 2
Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic  Phrase and dependency structure 7
STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable  with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant  and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant  and Equipment Property, Plant  and Equipment [Total] 11
Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of 	    	concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship   12
Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
Cont. Trade Debts Payable After More Than One Year  [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year  14
Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
Linguistic	 Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
Experiment An overview of similarity measures 18
Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic  Contribution Terminology Contribution Linguistic  Contribution 19
Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms  9M term pairs richer set of linguistic operations “recognise” => “recognition”  	by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21

Weitere ähnliche Inhalte

Was ist angesagt?

110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...helggeist
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and FundamentalSundar B N
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Stepstrivesa
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRLMamta Binani
 

Was ist angesagt? (10)

Overview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.comOverview of XBRL by FinDynamics.com
Overview of XBRL by FinDynamics.com
 
Gaia 5
Gaia 5Gaia 5
Gaia 5
 
Xbrl india[1]
Xbrl india[1]Xbrl india[1]
Xbrl india[1]
 
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
110 Introduction To Xbrl Taxonomies And Instance Documents Sept 2007 Print Ve...
 
XBRL - Features and Fundamental
XBRL - Features and FundamentalXBRL - Features and Fundamental
XBRL - Features and Fundamental
 
XBRL Conversion Steps
XBRL Conversion StepsXBRL Conversion Steps
XBRL Conversion Steps
 
Understanding XBRL
Understanding XBRLUnderstanding XBRL
Understanding XBRL
 
XBRL Fundamentals
XBRL FundamentalsXBRL Fundamentals
XBRL Fundamentals
 
XBRL Overview
XBRL OverviewXBRL Overview
XBRL Overview
 
Xbrl slideshare
Xbrl slideshareXbrl slideshare
Xbrl slideshare
 

Ähnlich wie STL: A similarity measure based on semantic and linguistic information

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlTobias Wunner
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesMike Bennett
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxSanjoy Kumar Roy
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlIfk Bigfood
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federationCory Casanave
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSiJohn O'Gorman
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesjps619
 
Data Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike BennettData Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike BennettConnected Data World
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsJohn Bauer
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11SAP Technology
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations Icd_crisci
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportMMMTechLaw
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryNeo4j
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesjps619
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Megan Bowe
 

Ähnlich wie STL: A similarity measure based on semantic and linguistic information (20)

Semantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrlSemantic, terminological and linguistic analysis of xbrl
Semantic, terminological and linguistic analysis of xbrl
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Financial Industry Semantics and Ontologies
Financial Industry Semantics and OntologiesFinancial Industry Semantics and Ontologies
Financial Industry Semantics and Ontologies
 
Arch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptxArch CoP - Domain Driven Design.pptx
Arch CoP - Domain Driven Design.pptx
 
Les week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrlLes week 6 inleiding tot xbrl
Les week 6 inleiding tot xbrl
 
Implementing information federation
Implementing information federationImplementing information federation
Implementing information federation
 
Language First Protocol from QSi
Language First Protocol from QSiLanguage First Protocol from QSi
Language First Protocol from QSi
 
Chapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvaluesChapter 12-assigning instancefactvalues
Chapter 12-assigning instancefactvalues
 
42109 scudeletti (1)
42109 scudeletti (1)42109 scudeletti (1)
42109 scudeletti (1)
 
Data Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike BennettData Model vs Ontology Development – a FIBO perspective | Mike Bennett
Data Model vs Ontology Development – a FIBO perspective | Mike Bennett
 
SSO Strategy Implementation Considerations
SSO Strategy Implementation ConsiderationsSSO Strategy Implementation Considerations
SSO Strategy Implementation Considerations
 
What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11What's new for Text in SAP HANA SPS 11
What's new for Text in SAP HANA SPS 11
 
Data Modeling Presentations I
Data Modeling Presentations IData Modeling Presentations I
Data Modeling Presentations I
 
CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09CV Tuyen Ly Eng 2017 01-09
CV Tuyen Ly Eng 2017 01-09
 
Cloud insights m&a and capital markets report
Cloud insights m&a and capital markets reportCloud insights m&a and capital markets report
Cloud insights m&a and capital markets report
 
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial IndustryFIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
FIBO in Neo4j: Applying Knowledge Graphs in the Financial Industry
 
Chapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrulesChapter 15-understanding andusingbusinessrules
Chapter 15-understanding andusingbusinessrules
 
Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"Wetzel, "CORE, Cost of Resource Exchange Update"
Wetzel, "CORE, Cost of Resource Exchange Update"
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018Intro to xAPI Camp DevLearn 2018
Intro to xAPI Camp DevLearn 2018
 

Kürzlich hochgeladen

Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 

Kürzlich hochgeladen (20)

Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

STL: A similarity measure based on semantic and linguistic information

  • 1. STL : A Similarity Measure Based on Semantic, Terminological and Linguistic Information Nitish Aggarwal joint work with Tobias Wunner, MihaelArcan DERI, NUI Galway firstname.lastname@deri.org Friday,19th Aug, 2011 DERI, Friday Meeting
  • 2. Overview Motivation & Applications Why STL? Semantic Terminology Linguistic Evaluation Conclusion and future work 2
  • 3. Motivation & Applications SemanticAnnotation Similarity between corpus data and ontology concepts SAP AG held €1615 million in short-term liquid assets (2009) “dbpedia:SAP_AG” “xEBR:LiquidAssets” at “dbpedia:year:2009” 3
  • 4. SemanticSearch Similarity between Query and index object Motivation & Applications SAP liquid asset in 2010 Current asset of SAP last year “dbpedia:SAP_AG” “xEBR:liquid asset” at “dbpedia:year:2010” Net cash of SAP in 2010 SAP total amount received in 2010 4
  • 5. Motivation & Applications OntologyMatching & Alignment Similarity between ontology concepts ifrs:StatementOfFinancialPosition xebr:KeyBalanceSheet Assets Ifrs:Assets ifrs:BiologicalAssets xebr:SubscribedCapitalUnpaid Ifrs:CurrentAssets Ifrs:NonCurrentAssets xebr:FixedAssets xebr:CurrentAssets ifrs:PropertyPlantAndEquipment xebr:TangibleFixedAssets xebr:IntangibleFixedAssets xebr:Amount Receivable xebr:Liquid Assets Similarity = ? Similarity = ? ifrs:CashAndCashEquivalents Ifrs:TradeAndOtherCurrentReceivables Ifrs:Inventories 5
  • 6. Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based LSA, ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 6
  • 7. Why STL? Semantic Semanticstructure and relations Terminology complex terms expressing the same concept Linguistic Phrase and dependency structure 7
  • 8. STL Definition Linear combination of semantic, terminological and linguistic obtained by using a linear regression Formula used STL = w1*S + w2*T + w3*L + Constant w1, w2, w3 represent the contribution of each 8
  • 9. Semantic WuPalmer 2*depth(MSCA) / depth(c1) + depth(c2) Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content (Pirro09) Overcome the analysis of large corpora 9
  • 10. Cont. Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Pirro_Similarity 10
  • 11. Cont. MSCA subconcepts = 48 IC (TFA) = 0.32 Assets Subscribed Capital Unpaid Fixed Assets Current Assets Pirro_Sim = 0.33 Pirro_Sim =? Stocks Tangible Fixed Assets Amount Receivable subconcepts = 6 IC (AR) = 0.69 subconcepts = 9 IC (TFA) = 0.60 Amount Receivable [total] Amount Receivable with in one year Amount Receivable after more than one year Other Tangible Fixed Assets Property, Plant and Equipment Payments on account and asset in construction Furniture Fixture and Equipment Trade Debtors Other Fixture Land and Building Other Debtors Plant and Machinery Other Property, Plant and Equipment Property, Plant and Equipment [Total] 11
  • 12. Limitation Does semantic structure reflect a good similarity? not necessarily e.g. In xEBR, parent-child relation for describing the layout of concepts “Work in progress” is not a type of asset, although both are linked via the parent-child relationship 12
  • 13. Terminology Definition Common naming convention Ngram Vs subterms In financial domain, bigram ”Intangible Fixed” is a subtring of ”Other Intangible Fixed Assets” but not a subterm. Terminological similarity maximal subterm overlap 13
  • 14. Cont. Trade Debts Payable After More Than One Year [[Trade][Debts]][Payable][After More Than One Year] [SAP:Payable] [Ifrs:After More Than One Year] [Investoword:Debt] [FinanceDict:Trade Debts] [Investopedia:Trade] Financial[Debts][Payable][After More Than One Year] Financial Debts Payable After More Than One Year 14
  • 15. Multilingual Subterms Translatedsubterms Available in otherlanguages Advantage Reflect terminological similarities that may be available in one language but not in others. ”Property Plant and Equipment”@en ”Sachanlagen”@de ”Tangible Fixed Asset” @en 15
  • 16. Linguistic Syntactic Information Beyond simple word order phrase structure Dependency structure Phrase structure Intangible fixed : adj adj > ?? Intangible fixed assets : adj adj n > NP Dependency structure Amounts receivable : N Adv : receive:mod, amounts:head Received amounts : V N : receive:mod, amounts:head 16
  • 17. Evaluation Data Set xEBR finance vocabulary 269 terms (concept labels) 72,361(269*269) termpairs Benchmarks SimSem59: sample of 59 term pairs SimSem200 : sample of 200 term pairs (under construction) 17
  • 18. Experiment An overview of similarity measures 18
  • 19. Experiment Results (Simsem59) STL formula used STL = 0.1531 * S + 0.5218 * T + 0.1041 * L + 0.1791 Correlation between similarity scores & simsem59 Semantic Contribution Terminology Contribution Linguistic Contribution 19
  • 20. Conclusion STL outperforms more traditional similarity measures Largest contribution by T (Terminological Analysis) Multilingual subterms performs better than monolingual 20
  • 21. Future work Evaluation on larger data set and vocabularies (IFRS) 3000+ terms 9M term pairs richer set of linguistic operations “recognise” => “recognition” by derivation rule verb_lemma+"ion” Similarity between subterms “Staff Costs” and "Wages And Salaries" 21