SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
/15	
A	New	Seman-c	Similarity	Based	Measure	
for	Assessing	Research	Contribu-on	
Petr	Knoth	&	Drahomira	Herrmannova	
Knowledge	Media	ins-tute,	The	Open	University	
1
/15	
Current	impact	metrics	
	
	
	
	
	
	
	
	
	
•  Pros:	simplicity,	availability	for	evalua-on	purposes	
•  Cons:	insufficient	evidence	of	quality	and	research	
contribu-on	
2
/15	
Problems	of	current	impact	metrics	
•  Sen-ment,	seman-cs,	context	and	mo-ves	[Nicolaisen,	2007]	
•  Popularity	and	size	of	research	communi-es	[Brumback,	
2009;	Seglen,	1997]	
•  Time	delay	[Priem	and	Hemminger,	2010]	
•  Skewness	of	the	distribu-on	[Seglen,	1992]	
•  Differences	between	types	of	research	papers	[Seglen,	1997]	
•  Ability	to	game/manipulate	cita-ons	[Arnold	and	Fowler,	
2010;	Editors,	2006]	
	
3
/15	
Alterna-ve	metrics	
•  Alt-/Webo-metrics	etc.	
–  Impact	s-ll	dependent	on	the	number	of	interac-ons	in	a	
scholarly	communica-on	network	
•  Full-text	(Semantometrics)	
–  Contribu-on	to	the	discipline	dependent	on	the	content	of	
the	manuscript.	
4
/15	
Approach	
Premise:	Full-text	needed	to	assess	publica-on’s	research	
contribu-on.	
Hypothesis:	Added	value	of	publica-on	p	can	be	es-mated	
based	on	the	seman-c	distance	from	the	publica-ons	cited	by	p	
to	publica-ons	ci-ng	p.	
	
	
	
	
5
/15	
Approach	
Premise:	Full-text	needed	to	assess	publica-on’s	research	
contribu-on.	
Hypothesis:	Added	value	of	publica-on	p	can	be	es-mated	
based	on	the	seman-c	distance	from	the	publica-ons	cited	by	p	
to	publica-ons	ci-ng	p.	
	
	
	
	
5
/15	
Approach	
Premise:	Full-text	needed	to	assess	publica-on’s	research	
contribu-on.	
Hypothesis:	Added	value	of	publica-on	p	can	be	es-mated	
based	on	the	seman-c	distance	from	the	publica-ons	cited	by	p	
to	publica-ons	ci-ng	p.	
	
	
	
	
5
/15	
Contribu-on	measure	
6
/15	
Contribu-on	measure	
p	
6
/15	
Contribu-on	measure	
p	
6
/15	
Contribu-on	measure	
p	
6
/15	
Contribu-on	measure	
p	
A	
6
/15	
Contribu-on	measure	
p	
A	 B	
6
/15	
Contribu-on	measure	
p	
A	 B	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
6
/15	
Contribu-on	measure	
p	
A	 B	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
⎧
⎨
⎪
⎩
⎪
dist(a,b) =1− sim(a,b)
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
⎧
⎨
⎪
⎩
⎪
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Contribu-on	measure	
p	
A	 B	dist(a,b)	
dist(b1,b2)	
Contribution p( )=
B
A
⋅
1
| B |⋅| A |
⋅ dist(a,b)
a∈A,b∈B,a≠b
∑
X =
1 | A |=1∨| B |=1
1
| X | | X |−1( )
⋅ dist x1, x2( )
x1∈X,x2 ∈X,x1≠x2
∑ | A |>1∧| B |>1
(
)
*
+
*
dist(a,b) =1− sim(a,b)
Average	distance	of	
the	set	members	
6
/15	
Datasets	
•  Requirements	
– Availability	of	full-text	
– Density	
– Mul-disciplinarity	
– (Availability	of	cita-ons)	
7
/15	
Datasets	
Full-text	 Density	 Mul5disciplinarity	
CORE	 ✓	 ✗	 ✓	
Open	Cita-on	Corpus	 ✓	 -	 ✗	
ACM	Dataset	 ✗	 -	 ✓	
DBLP+Cita-on	 ✗	 -	 ✓	
iSearch	Collec-on	 ✓	 ✗	 ✗	
8
/15	
Our	dataset	
•  10	seed	publica-ons	from	CORE	with	varying	
level	of	cita-ons	
•  missing	ci-ng	and	cited	publica-ons	
downloaded	manually	
•  only	freely	accessible	English	documents	were	
downloaded	
•  in	total	716	documents	(~50%	of	the	complete	
network)	
•  2	days	to	gather	the	data	
9
/15	
Results	
Publica5on	no.	 |B|	(Cita5on	score)	 |A|	(No.	of	references)	 Contribu5on	
1	 5	(9)	 6	(8)	 0.4160	
2	 7	(11)	 52	(93)	 0.3576	
3	 12	(20)	 15	(31)	 0.4874	
4	 14	(27)	 27	(72)		 0.4026	
5	 16	(30)		 12	(21)		 0.5117	
6	 25	(41)		 8	(13)	 0.4123	
7	 39	(71)		 70	(128)	 0.4309	
8	 53	(131)	 3	(10)		 0.5197	
9	 131	(258)	 22	(32)	 0.5058	
10	 172	(360)	 17	(20)	 0.5004	
474	(958)	 232	(428)	
10
/15	
Results	
11
/15	
Current	impact	metrics	vs	Semantometrics	
Unaffected	by	 Current	impact	metrics	 Semantometrics	
Cita-on	sen-ment,	seman-cs,	context,	
mo-ves	
✗	 ✔	
Popularity	&	size	of	res.	communi-es	 ✗	 ✔	
Time	delay	 ✗	 ✗/✔*	
Skewness	of	the	cita-on	distribu-on	 ✗	 ✔	
Differences	between	types	of	res.	papers	 ✗	 ✔	
Ability	to	game/manipulate	the	metrics	 ✗	 ✗/✔**	
*	reduced	to	1	cita-on		
**	assuming	that	self-cita-ons	are	not	taken	into	account	
12
/15	
Conclusions	
•  Full-text	necessary	
•  Semantometrics	are	a	new	class	of	methods		
•  We	showed	one	method	to	assess	the	
research	contribu-on	
13
/15	
References	
•  Jeppe	Nicolaisen.	2007.	Cita-on	Analysis.	Annual	Review	of	
Informa-on	Science	and	Technology,	41(1):609-641.	
•  Douglas	N	Arnold	and	Kris-ne	K	Fowler.	2010.	Nefarious	
numbers.	No-ces	of	the	American	Mathema-cal	Society,	
58(3):434-437.	
•  Roger	A	Brumback.	2009.	Impact	factor	wars:	Episode	V	--	The	
Empire	Strikes	Back.	Journal	of	child	neurology,	24(3):260-2,	
March.	
•  The	PLoS	Medicine	Editors.	2006.	The	impact	factor	game.	
PLoS	medicine,	3(6),	June.	
14
/15	
References	
•  Jason	Priem	and	Bradely	M.	Hemminger.	2010.	Scientometrics	
2.0:	Toward	new	metrics	of	scholarly	impact	on	the	social	
Web.	First	Monday,	15(7),	July.	
•  Per	Omar	Seglen.	1992.	The	Skewness	of	Science.	Journal	of	
the	American	Society	for	Informa-on	Science,	43(9):628-638,	
October.	
•  Per	Omar	Seglen.	1997.	Why	the	impact	factor	of	journals	
should	not	be	used	for	evalua-ng	research.	BMJ:	Bri-sh	
Medical	Journal,	314(February):498-502.	
15

Weitere ähnliche Inhalte

Mehr von Dasha Herrmannova

Mehr von Dasha Herrmannova (10)

Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy ComplianceDo Authors Deposit on Time? Tracking Open Access Policy Compliance
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
 
Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation Semantometrics: Text Analysis in Research Evaluation
Semantometrics: Text Analysis in Research Evaluation
 
Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?
 
An Analysis of the Microsoft Academic Graph
An Analysis of the Microsoft Academic GraphAn Analysis of the Microsoft Academic Graph
An Analysis of the Microsoft Academic Graph
 
Visual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document CollectionsVisual Search for Supporting Content Exploration in Large Document Collections
Visual Search for Supporting Content Exploration in Large Document Collections
 
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
Unsupervised Identification of Study Descriptors in Toxicology Research: An E...
 
Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Simple Yet Effective Methods for Large-Scale Scholarly Publication RankingSimple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
 
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysin...
 
Mining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal SeminarMining Research Publication Networks for Impact -- KMi Internal Seminar
Mining Research Publication Networks for Impact -- KMi Internal Seminar
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing a Research Publication's Contribution