Mining Research Publication Networks for Impact -- KMi Internal Seminar
1. Mining Research Publication Networks for Impact
PhD Topic Presentation
Drahomira Herrmannova
Knowledge Media Institute
The Open University
KMi Internal Seminar, November 2013
1 / 19
2. Table of Contents
1 Research Aim
Motivation
Problem statement
2 Literature review
State of the art
Limitations
3 Research objectives
Research questions
Selected approach
Tasks and plans
4 Pilot study
5 References
2 / 19
4. Who needs this anyway?
• Researchers
• How to select relevant literature for reading?
• Librarians
• How to select journal subscriptions?
• Universities, funding agencies and other institutions
• How to aid reviewers of funding and grant proposals, hiring
committees etc.?
• Publishers and editors
• How can publishers evaluate and promote their journals?
• Society
• How to evaluate the returns of research to the society?
4 / 19
5. The growth of scholarly literature
Figure : Monthly submission rate (since 1991) for Arxiv.org. Source:
http://arxiv.org/
5 / 19
6. The growth of journal subscription costs
Figure : Expenditures in ARL libraries (1986 – 2009). Source: [1]
6 / 19
7. What’s being used
• Peer review
• Qualitative evaluation method
• Traditionally the main filter for controlling the quality of
published research
• Classical quantitative methods
• Typically based on citations and/or productivity
• Citation counts
• JIF
• h-index
7 / 19
8. So, what’s the problem?
• Peer review
• Speed and cost
• Biased opinion
• Doesn’t limit the amount of published research
• Classical quantitative methods
• Quality vs. impact
• Reasons for citation
• Citation half-life
• Manipulation and gaming
• Author variability
• Field effects
8 / 19
9. Bibliometrics today
Two changes which influenced the evolution of bibliometrics
• creation of the Web and web-related developments
• growth of Open Access publishing
9 / 19
10. Bibliometrics today
Two ideas driving the current research
1 Development of new metrics (improvements and replacements
of JIF)
• h-index
• Eigenfactor
• SJR
2
Concerns about the validity of using citations
• Methods using different data
• Patent analysis
• Webometrics
• Altmetrics
• Full-text analysis
• “Fixing” citations (field normalisation of indicators)
10 / 19
11. Limitations
• Limitations of citation-based metrics
• Citation bias
• Incomplete journal coverage
• Author variability
• Field effects
• Uncited publications
• Manipulation of metrics
• Using JIF for research evaluation
• Limitations of web-based metrics
• Gaming web-based and social metrics
• Problems of data collection
• Adoption of social media by users
• Accumulated advantage
• Limitations of text-based metrics
• Full-text not always available
11 / 19
12. Research questions
Question 1: What factors influence the quality of a research
publication (with regard to the publication type)?
Question 2: What is the relationship (if there is any) between the
impact of a publication, measured by the classical
bibliometric methods, and the quality of a
publication?
Question 3: How can we detect the factors influencing quality in
order to evaluate the quality of a research
publication?
Question 4: How can this evaluation be used in other disciplines?
12 / 19
13. Selected approach
• Single number vs. collection of metrics and indicators
• Analysis of full-text
• Until quite recently not easily available
• Full-text – the best indicator of publication quality
• For example
• Co-word analysis
• Analysis of citation context
• Semantic similarity of publications
• Additional indicators
• Famous author or collaboration with famous authors
• Citing or is being cited outside of the research area
• Paper published in a field-specific prestigious journal
13 / 19
14. Requirements for science evaluation methods
Source: [2]
1
Reliable and accurate, comparable or better than the peer
review system
2
Easy to understand.
3
Economical in terms of development and maintenance, time
required to understand it, etc.
4
Faster than citations, at least comparable to the speed of peer
review
5
Resistant to manipulation and gaming
14 / 19
15. Tasks and plans
Data collection
Task 1: Identify information sources that may provide relevant
publication data
• Mostly done
Task 2a: Investigate factors that influence the quality of research
publications
Task 2b: Using the identified information sources, develop various
relevant data structures such as:
• collaboration networks
• citation, co-citation and bibliographic coupling
networks
• clusters of semantically related publications
• clusters of publications corresponding to different
topics
15 / 19
16. Tasks and plans
Data analysis
Task 3a: Study the possibilities of application of NLP for the
evaluation of research publications
Task 3b: Investigate the developed data structures using graph
and network theory as well as bibliometric indicators
16 / 19
17. Tasks
Development of new methods
Task 4a: Analyse the possibilities of combining the studied
methods in order to design a set of new methods for
estimating quality
Task 4b: Evaluate the proposed methods against current
standards
Task 4c: Analyse the use of the new methods in other
disciplines
17 / 19
18. Task 1
Identification of data sources
Source
CSX
MAS
JSTOR
DBLP
CORE
ArXiv
KDD
iSearch
DBLP+C
ACM
OCC
MD
X
X
X
X
-
API
X
X
X
X
-
OAI-PMH
X
X
X
-
dumps
X
X
X
X
X
X
X
X
X
cit.
X
X
X
X
X
X
X
X
X
FT
X
*
*
*
X
X
X
X
-
Table : Stars (*) represent sources, which don’t store full-text but provide
links to the full-text where available. MD stands for multidisciplinary.
18 / 19
19. References
[1] Kyrillidou, Martha and Morris, Shaneka.
ARL Statistics 2008 - 2009.
Association of Research Libraries, Washington, DC, 2011.
[2] Taraborelli, Dario.
Soft peer review: Social software and distributed scientific
evaluation.
Proceedings of the 8th International Conference on the Design
of Cooperative Systems (COOP ’08), Carry-le-Rouet, France,
2008.
19 / 19
20. How many metrics?
Scientometrics: study of science and research
Bibliometrics: study of scientific literature
Informetrics: study of any type of information
Webometrics: informetric studies of the web
Cybermetrics: informetric studies of the whole Internet
Altmetrics: study of science and research using data from
social media
20 / 19