1) The document discusses using full-text data rather than just metadata to create improved term maps for visualizing topics in scientific literature.
2) It compares different approaches for creating term maps using full-text data from publications in the Journal of Informetrics, including using titles/abstracts vs full text, binary vs full counting of term co-occurrences, and mapping at the publication level vs paragraph level.
3) The results show that full-text data yields richer maps than just titles and abstracts, and that full counting is preferable to binary counting when using full text. Paragraph-level maps provide more fine-grained structure but areas may not always represent literature topics.
1. Using full-text data to create
improved term maps
Nees Jan van Eck1, Ludo Waltman1, Min Song2, and Yoo Kyung Jeong2
1Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands
2Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
16th International Conference on Scientometrics & Informetrics
Wuhan, China, October 19, 2017
2. Introduction
⢠Traditionally bibliometric analyses are based on
meta data of scientific publications
⢠Full text of scientific publications is increasingly
becoming available in structured formats
⢠We study different approaches for creating
VOSviewer term maps using full text data
⢠We perform comparisons with a traditional
approach based on titles and abstracts
1
4. Interpretation of a term map
⢠Size:
â The larger a term, the higher the frequency of occurrence of the
term
⢠Distance:
â In general, the smaller the distance between two terms, the
higher the relatedness of the terms, as measured by co-
occurrences
â Horizontal and vertical axes have no special meaning
⢠Colors:
â Colors indicate clusters of closely related terms
3
5. Creating a term map
1. Input English-language text corpus
2. Identify terms
3. Count co-occurrences of terms
4. Create layout and clustering
4
6. Counting co-occurrences of terms
⢠Full counting:
â All occurrences of a term in a document are counted
⢠Binary counting:
â Only the presence or absence of a term matters
â Number of occurrences of a term is not taken into account
5
7. Data
⢠Full text of publications in Journal of Informetrics
⢠688 publications in the period 2007-2016
⢠Downloaded in XML format using the Elsevier
ScienceDirect Article Retrieval API
6
Average
per pub.
Sections 6.0
Paragraphs 42.1
Sentences 191.1
13. Conclusions
⢠Full text vs. titles and abstracts:
â Full text yields richer maps than titles and abstracts
â Richer maps may be useful for interactive visualization, perhaps
not for static visualization
⢠Full counting vs. binary counting:
â When using full text data, full counting is preferable over binary
counting
⢠Paragraph level vs. publication level:
â Paragraph-level maps have more fine-grained structure than
publication-level maps
â However, areas in paragraph-level maps do not always represent
topics in the literature
12
14. Future research
⢠Use full-text data for creating other types of maps,
in particular co-citation maps
13