Bibliometric Research Synthesis
bibliometrix: An R-tool for comprehensive science mapping analysis
In the seminar we propose and use a unique tool, developed in the R language, which follows a classic logical bibliometric workflow that we reconstruct. We have designed and produced an R-tool for comprehensive bibliometric analyses. R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques and is highly extensible. In addition to enabling statistical operations, it is an object-oriented and functional programming language; hence, you can automate your analyses and create new functions. It has an open-software nature, which means it is well supported by the user community and new functions are regularly contributed by users, many of whom are prominent statisticians. As it is programmed in R, the proposed tool is flexible, can be rapidly upgraded, and can be integrated with other statistical R-packages. It is therefore useful in a constantly changing field such as bibliometrics.
2. PhD Seminar Series: Bibliometric Research Synthesis
bibliometrix: An R-tool for
comprehensive science mapping analysis
Massimo Aria and Corrado Cuccurullo
massimo.aria@unina.it; corrado.cuccurullo@unicampania.it
Bibliometrix package www.bibliometrix.org
3. Practice experience
⢠Seminars goals
The aim of this seminar cycle is twofold.
1. First, we want to bring together in a single
seminar cycle all the knowledge on research
synthesis through a recommended workflow,
from problem formulation to report writing.
2. Second, we present our open-source bibliometrix
R-package for performing comprehensive
bibliometric analyses, and discuss how
bibliometrix is a valid tool for performing
bibliometric studies. We illustrate the main
bibliometrix functions in the workflow, using
topics selected by the participants to the
seminars.
⢠Lab activities
In this seminar, doctoral students have to synthesize a
large volume of studies and organize articles into
different intellectual and conceptual maps that
represents distinct research streams and
interrelationships. The assignment culminates with
doctoral students presenting (a) graphical
representations of their bibliometric analysis, (b) a
depiction of how each stream relates to other
streams, and (c ) a list of main contributing authors
and/or works in each stream and substream.
⢠Effective learning and usefulness
The exercise proves useful for all involved. Doctoral
students learn valuable skills in analysing and
conceptualizing vast amounts of literature. This is a
useful skill for any researcher. Moreover, doctoral
students will be able to use bibliometric maps to
identify gaps and trends in the literature.
Given the value of this experience, we ask doctoral
students to present and defend their own maps of
literature.
4. Key readings
⢠Research Synthesis
⢠ThomÊ, A. M. T., Scavarda, L. F., & Scavarda, A. J. (2016).
Conducting systematic literature review in operations
management. Production Planning & Control, 27(5), 408-
420.
⢠Cooper, H. (2015). Research synthesis and meta-analysis: A
step-by-step approach (Vol. 2). Sage publications.
⢠Briner RB , Denyer D (2012) Systematic review and
evidence synthesis as a practice and scholarship tool in
Rousseau, D. M. (Ed.). (2012). The Oxford handbook of
evidence-based management. Oxford University Press.
⢠Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & Prisma
Group. (2009). Preferred reporting items for systematic
reviews and meta-analyses: the PRISMA statement. PLoS
medicine, 6(7), e1000097.
⢠Massaro, M., Dumay, J., & Guthrie, J. (2016). On the
shoulders of giants: undertaking a structured literature
review in accounting. Accounting, Auditing & Accountability
Journal, 29(5), 767-801.
⢠Webster, J., & Watson, R. T. (2002). Analyzing the past to
prepare for the future: Writing a literature review. MIS
quarterly, xiii-xxiii.
⢠Torraco, R. J. (2005). Writing integrative literature reviews:
Guidelines and examples. Human resource development
review, 4(3), 356-367.
⢠General science mapping workflow
⢠Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for
comprehensive science mapping analysis, Journal of
Informetrics, 11(4), pp 959-975
⢠Cobo, M. J., Lopez-Herrera, A. G., Herrera-Viedma, E., &
Herrera, F. (2011). Science Mapping Software Tools:
Review, Analysis, and Cooperative Study Among Tools.
Journal of the American Society for Information Science and
Technology.
⢠bibliometrix R-package (http://www.bibliometrix.org)
⢠Bibliometrix Tutorial
⢠Bibliometrix function map
⢠Citation Indicators
⢠Waltman, L. (2016). A review of the literature on citation
impact indicators. Journal of Informetrics, 10(2), 365-391.
⢠Intellectual Map
⢠Yang, S., Han, R., Wolfram, D., & Zhao, Y. (2016). Visualizing
the intellectual structure of information science (2006â
2015): Introducing author keyword coupling analysis. Journal
of Informetrics, 10(1), 132-150.
⢠Co-word analysis and Research Front
⢠Cuccurullo, C., Aria, M., & Sarto, F. (2016). Foundations and
trends in performance management. A twenty-five years
bibliometric analysis in business and public administration
domains, Scientometrics, DOI: 10.1007/s11192-016-1948-8.
⢠Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., &
Herrera, F. (2011). An approach for detecting, quantifying,
and visualizing the evolution of a research field: A practical
application to the fuzzy sets theory field. Journal of
Informetrics, 5(1), 146-166.
⢠Datascience and big data
⢠George, G., Osinga, E. C., Lavie, D., & Scott, B. A. (2016). Big
data and data science methods for management
research. Academy of Management Journal, 59(5), 1493-
1507.
⢠Sivarajah, U., Kamal, M. M., Irani, Z., & Weerakkody, V.
(2017). Critical analysis of Big Data challenges and analytical
methods. Journal of Business Research, 70, 263-286.
5. Key readings in Strategy
⢠Conceptual
⢠Nag, R., Hambrick, D. C., & Chen, M. J. (2007). What is
strategic management, really? Inductive derivation of a
consensus definition of the field. Strategic management
journal, 28(9), 935-955.
⢠Hoskisson, R. E., Hitt, M. A., Wan, W. P., & Yiu, D. (1999).
Theory and research in strategic management: Swings of a
pendulum. Journal of management, 25(3), 417-456.
⢠Adcroft, A., & Willis, R. (2008). A snapshot of strategy
research 2002-2006. Journal of Management History, 14(4),
313-333.
⢠Bibliometric articles (General)
⢠RamosâRodrĂguez, A. R., & RuĂzâNavarro, J. (2004). Changes
in the intellectual structure of strategic management
research: A bibliometric study of the Strategic Management
Journal, 1980â2000. Strategic Management Journal, 25(10),
981-1004.
⢠Nerur, S. P., Rasheed, A. A., & Natarajan, V. (2008). The
intellectual structure of the strategic management field: An
author coâcitation analysis. Strategic Management
Journal, 29(3), 319-336.
⢠Furrer, O., Thomas, H., & Goussevskaia, A. (2008). The
structure and evolution of the strategic management field:
A content analysis of 26 years of strategic management
research. International Journal of Management
Reviews, 10(1), 1-23.
⢠Phelan, S. E., Ferreira, M., & Salvador, R. (2002). The first
twenty years of the Strategic Management
Journal. Strategic Management Journal, 23(12), 1161-1168.
⢠RondaâPupo, G. A., & GuerrasâMartin, L. Ă. (2012).
Dynamics of the evolution of the strategy concept 1962â
2008: a coâword analysis. Strategic Management
Journal, 33(2), 162-188.
⢠Maia, J. L., Serio, L. C., & Alves Filho, A. G. (2015). Almost
two decades after: a bibliometric effort to map research on
strategy as practice using two data sources. European
Journal of Economics, Finance and Administrative
Sciences, 73, 7-31.
⢠Acedo, F. J., Barroso, C., & Galan, J. L. (2006). The
resourceâbased theory: dissemination and main
trends. Strategic Management Journal, 27(7), 621-636.
⢠Vogel, R., & Gßttel, W. H. (2013). The dynamic capability
view in strategic management: a bibliometric
review. International Journal of Management
Reviews, 15(4), 426-446.
⢠Di Stefano, G., Peteraf, M., & Verona, G. (2010). Dynamic
capabilities deconstructedâĄ: a bibliographic investigation
into the origins, development, and future directions of the
research domain. Industrial and Corporate Change, 19(4),
1187-1204.
⢠Dagnino, G. B., Levanti, G., Minà , A., & Picone, P. M. (2015).
Interorganizational network and innovation: A bibliometric
study and proposed research agenda. Journal of Business &
Industrial Marketing, 30(3/4), 354-377.
6.
7. Context
⢠Topic Relevance
The number of academic publications is increasing at a
rapid pace and it is becoming increasingly unfeasible to
remain current with everything that is being published.
Moreover, the emphasis on empirical contributions has
resulted in voluminous and fragmented research
streams. This hampers the ability to accumulate
knowledge and actively collect evidence through a set
of previous research papers. Therefore, literature
reviews are increasingly assuming a crucial role in
synthesizing past research findings to effectively use
the existing knowledge base, advance a line of
research, and provide evidence-based insight into the
practice of exercising and sustaining professional
judgment and expertise.
⢠Bibliometrics
Scholars use different qualitative and quantitative
literature reviewing approaches to understand and
organize earlier findings. Among these, bibliometrics
has the potential to introduce a systematic,
transparent, and reproducible review process based on
the statistical measurement of science, scientists, or
scientific activity. Unlike other techniques,
bibliometrics provides more objective and reliable
analyses. The overwhelming volume of new
information, conceptual developments, and data are
the milieu where bibliometrics becomes useful by
providing a structured analysis to a large body of
information, to infer trends over time, themes
researched, identify shifts in the boundaries of the
disciplines, to detect the most proliďŹc scholars and
institutions, and to present the âbig pictureâ of extant
research.
Bibliometrics for:
⢠Research valuation
⢠Science Mapping
Altmetrics
8. Bibliometrix
⢠Complexity of bibliometric analysis
Although over time, the use of bibliometrics has been
extended to all disciplines, bibliometric analysis is
complex because it entails several steps that employ
numerous and diverse analyses and mapping software
tools, which are frequently available only under
commercial licenses.
These difficulties are compounded by the reality that
few researchers and practitioners are trained in how to
review literature and to identify evidence-based
practices.
The cumbersome nature of the process reduces the
possibilities and the potential of bibliometrics,
especially for scholars who have no general
programming skills.
Recently, automated workflows to assemble specialized
software into a comprehensive and organized data flow
have begun to emerge for bibliometrics. They are
particularly well suited to multi-step analyses using
different types of software tools.
⢠Bibliometrix: one tool for the whole bibliometric
workflow
In the seminar we propose and use a unique tool,
developed in the R language, which follows a classic
logical bibliometric workflow that we reconstruct.
We have designed and produced an R-tool for
comprehensive bibliometric analyses. R is a language
and environment for statistical computing and
graphics. It provides a wide variety of statistical and
graphical techniques and is highly extensible. In
addition to enabling statistical operations, it is an
object-oriented and functional programming
language; hence, you can automate your analyses and
create new functions. It has an open-software nature,
which means it is well supported by the user
community and new functions are regularly
contributed by users, many of whom are prominent
statisticians.
As it is programmed in R, the proposed tool is flexible,
can be rapidly upgraded, and can be integrated with
other statistical R-packages. It is therefore useful in a
constantly changing field such as bibliometrics.
13. Recommended workflow
for science mapping
Study
design
Data
collection
Data
Analysis
Data
visualization
Interpretation
⢠Data retrieval (Database)
⢠Data loading and converting
⢠Data cleaning.
⢠Network extraction
⢠Data normalization
⢠Data reduction
⢠Software tools for science mapping
⢠R-packages for bibliometric analysis
14. Scientific document is the basic unit of a
complex relational system Co-citations
Word
co-occurrences
Collaborations
17. Data collection: Main steps
Data retrieval
Data importing and converting
Doc Authors Title Abstract Source Keywords Affilaition âŚ
Bibliographic dataframe
Data downloading
19. Data collection
PRISMA diagram
⢠Keywords for query (Boolean operators)
⢠Timespan & timeslices
⢠Language (English)
⢠Types of documents (articles, âŚ)
⢠Subject Categories (Mgmt, Fin, Ops, âŚ)
⢠Sources (ABS, 2015; one-journal or âŚ)
20. Data Analysis
Coupling
Two works (A & B) refer
to a common work (a)
Co-citation
Two works (a & b) are cited
together by a common work (A)
Intellectual
structure
Conceptual
Structure
(research front)
21. Main functionsSoftware assisted
workflow steps
bibliometrix
functions
Description
Data loading and converting
⢠readFiles() ⢠Loads a sequence of Scopus and Clarivate Analytics WoS export files into R
⢠Convert2df() ⢠Creates a bibliographic data frame
⢠retrievalByAuthorID()
⢠Uses Scopus API search to obtain information regarding documents on a set of
authors using Scopus ID
Descriptive bibliometric
analysis
⢠biblioAnalysis() ⢠Returns an object of class bibliometrix
⢠summary() and plot() ⢠Summarize the main results of the bibliometric analysis
⢠citations() ⢠Identifies the most cited references or authors
⢠localCitations() ⢠Identifies the most cited local authors
⢠dominance() ⢠Calculates the authorsâ dominance ranking
⢠Hindex() ⢠Measures productivity and citation impact of a scholar
⢠lotka() ⢠Estimates Lotkaâs law coefficients for scientific productivity
⢠keywordGrowth() ⢠Calculates yearly cumulative occurrences of top keywords/terms
⢠keywordAssociation() ⢠Associates authors' keywords to keywords plus
Document x Attribute matrix
creation
⢠metaTagExtraction() ⢠Extracts other field tags, different from the standard WoS/Scopus codify
⢠termExtraction()
⢠Extracts and stems terms from textual fields (abstract, title, author's keywords, and
others) of a bibliographic data frame
⢠cocMatrix() ⢠Computes a Document x Attribute matrix
Normalization ⢠normalizeSimilarity()
⢠Calculates association strength, inclusion index, Jaccardâs coefficient, and Saltonâs
similarity coefficient among objects of a bibliographic network
Data Reduction ⢠conceptualStructure() ⢠Creates conceptual structure map of a scientific field using MCA and Clustering
Network matrix creation
⢠biblioNetwork()
⢠Calculates the most frequently used bibliographic coupling, co-citation,
collaboration, and co-occurrence networks
⢠histNetwork() ⢠Creates a historical co-citation network from a bibliographic data frame
Mapping
⢠networkPlot() ⢠Plots a bibliographic network using internal R library or VOSviewer software
⢠histPlot() ⢠Plots a historical co-citation network
⢠conceptualStructure() ⢠Plots conceptual structure map of a scientific field using MCA and Clustering
27. Matrix âDocument x Attributeâ
⢠Documentâs attributes are connected to each other through the Doc itself: author(s) to journal,
keywords to publication date, etc.
⢠An attribute is an item of information associated to the document and stored in a field tag within
the bibliometric data frame (e.g., authors, publication source, keywords, cited references,
affiliations).
⢠These connections of different attributes generate a binary rectangular matrices (Document x
Attribute) that, in some cases, it can be represented as a bipartite networks
⢠Furthermore, scientific publications regularly contain references to other scientific works. This
generates a further network, namely, co-citation or coupling network
⢠These networks are analyzed in order to capture meaningful properties of the underlying research
system, and in particular to determine the influence of bibliometric units such as scholars and
journals.
28. Matrix đˇđđđ˘đđđđĄ Ă đ đđđđđđđđ
(cocMatrix function)
Ref X ref Y Ref Z
Doc A 1 0 1
Doc B 0 1 0
Doc C 1 0 1
Doc D 0 1 0
Doc E 0 0 1
Doc F 1 1 0
Doc G 0 0 1
A
B
C
D
X
Y
ZE
F
G
matrix đ¨ đˇđđđ˘đđđđĄ Ă đ đđđđđđđđ
Bipartite graphCiting documents
Cited documents
30. Co-citation coupling
âCo-citation Couplingâ is the mirror image of âBibliographic couplingâ
⢠Co-citation coupling is a method used to establish a subject similarity
between two documents.
⢠If papers A and B are both cited by paper C, they may be said to be related
to one another, even though they don't directly cite each other.
⢠If papers A and B are both cited by many other papers, they have a stronger
relationship. The more papers they are cited by, the stronger their
relationship is.
31. Co-citation network
⢠A coupling network can be obtained using the general formulation:
đľđđđđđĄ = đ´ đśđ
â˛
Ă đ´ đśđ
⢠Like matrix đľđđđ˘đ, matrix đľđđđđđĄ is also symmetric.
⢠The main diagonal of đľđđđđđĄ contains the number of cases in which a
reference is cited in our dataframe.
⢠In other words, the diagonal element đľđđ is the number of local
citations of the reference đ.
32. Co-citation analysis
đ¨ đŞđš
Ă
đ¨ đŞđš
â˛
Ref X Ref Y Ref Z
Doc A 1 0 1
Doc B 0 1 0
Doc C 1 0 1
Doc D 0 1 0
Doc E 0 0 1
Doc F 1 1 0
Doc G 0 0 1
Doc A Doc B Doc C Doc D Doc E Doc F Doc G
Ref X 1 0 1 0 0 1 0
Ref Y 0 1 0 1 0 1 0
Ref Z 1 0 1 0 1 0 1
33. X Y Z
X 3 1 2
Y 1 3 0
Z 2 0 4
3 1 2
Co-citation analysis (2)
Matrix đŠ đđđđđ
Degree
Co-citation Network
X Y
Z
37. Historiograph
⢠Historiographic analysis generates
chronological tables as well as historiographs
which highlight the most-cited works in and
outside the collection.
⢠It will be used to help scholars quickly identify
the most significant work on a topic and trace its
year-by-year historical development.
41. Whatâs next?
⢠Shiny
⢠Lab of bibliometrics and data-knowledge discovery
⢠Bibliometrix R community
⢠Bibliometrix social (Follow us!)
⢠https://www.facebook.com/bibliometrix/
⢠https://twitter.com/search?q=%23bibliometrix&src=typd
⢠We are already working on new developments. They concern
⢠the extension of compatibility with other bibliographic databases such as PubMed
⢠The search of grey literature
⢠the improvement of reference disambiguation by string metric-based algorithms
⢠the introduction of direct citation and tri-citation analysis
⢠the use of hybrid methods that combine bibliometric and semantic approaches. The last-mentioned
development includes term-burst detection through expectile smoothing, thematic mapping and
evolution and latent semantic analysis