Variations in citation practices across the scientific landscape: Analysis based on a large full-text corpus
1. Variations in citation practices across
the scientific landscape:
Analysis based on a large full-text corpus
Wout S. Lamers*, Nees Jan van Eck & Ludo Waltman
<*w.s.lamers, ecknjpvan, waltmanlr>@cwts.leidenuniv.nl
Centre for Science and Technology Studies (CWTS), Leiden University
2. Goals
• Publication full text as a new data source
– Extent of engagement with cited work?
– What is the role of cited works in new narrative?
– Do scientific disciplines differ in how they utilize cited works?
• This paper: combine full text features with
maps of science to show variation across fields
– Can we map differences in scientific activities / ways in which
knowledge is produced?
– Focus on verbs & activities
1
9. Data collection
8
Scientific
fields
868
…to test for differences between tooth types (e.g., Grine, 2005).
...as Grine (2005) also noted this in humans.
Citing
sentences
…thicker enamel compared to the lingual cusps.22,23
…
~58.000.000
10. Data collection
9
…to test for differences between tooth types (e.g., Grine, 2005).
...as Grine (2005) also noted this in humans.
…thicker enamel compared to the lingual cusps.22,23
…
11. Data collection
10
...as Grine (2005) also noted this in humans.
…thicker enamel compared to the lingual cusps.22,23
…
…
…
…
…
…
…
…
…
…
…
…
……
…
…
…
…
…
…
…
…
…
……
…
…
…
…
…
…
…
…
…
…
…
…
…
…
……
…
…
…
…
…
…to test for differences between tooth types (e.g., Grine, 2005).
12. Features of interest
• …to test for differences between tooth types (e.g.,
Grine, 2005).
• ...as Grine (2005) also noted this in humans.
• …thicker enamel compared to the lingual
cusps.22,23
11
13. Features of interest
Citation styles – author-year and others
• …to test for differences between tooth types (e.g.,
Grine, 2005).
• ...as Grine (2005) also noted this in humans.
• …thicker enamel compared to the lingual
cusps.22,23
Known disciplinary variation
12
14. Features of interest
Direct (‘integral’) inclusion of cited author in sentence
• …to test for differences between tooth types (e.g.,
Grine, 2005).
• ...as Grine (2005) also noted this in humans.
• …thicker enamel compared to the lingual
cusps.22,23
Indicative of higher engagement
13
15. Features of interest
Use of verbs within the sentence
• …to test for differences between tooth types (e.g.,
Grine, 2005).
• ...as Grine (2005) also noted this in humans.
• …thicker enamel compared to the lingual
cusps.22,23
Activities associated with cited literature
14
19. Observations
• Author-year labels favored in SSH and Life and
Earth sciences
• Author names included in sentences in SSH and
Mathematics, CS, Physics
• Limited overlap!
19
20. A closer look at verbs
20
• Verbs represent actions, activities
– Either performed by the cited authors
– Or otherwise intrinsic to the research field
• Do certain types of verbs occur more often in
certain scientific fields?
• Surprisal score
log2
𝑝 𝑣𝑒𝑟𝑏,𝑓𝑖𝑒𝑙𝑑
𝑝 𝑣𝑒𝑟𝑏,𝑡𝑜𝑡𝑎𝑙
21. A closer look at individual verbs:
suggest
24
Half as common
in scientific field
Twice as common
in scientific field
24. Suggest – advocate - propose
• Often used synonymously
– Wiek and Walter (2009) ADVOCATE a transdisciplinary approach;
Walker et al. (2001) PROPSE an adaptive approach.
• Suggest
– Inference from data: …located between 2.6 and 2.9µm, which
SUGGEST the presence of some hydroxylated phase…
• Propose
– Call to action: Saleh et al. (2003) PROPOSE to distinguish whether a
design can respond to known or unknown environmental changes…
– Presentation: Ratanamahatana and Gunopulos (2003) PROPOSE
another filter algorithm for attribute selection.
• Advocate
– As a research object: CEOs may seek greater salaries and ADVOCATE
high levels of stock option compensation … (Brandes et al., 2003).
27
27. Think - deduce
• Deduce
– Strong focus on logical inference
• Using Theorem 4 and Theorem 3.1 of [14], we immediately deduce
the following:
• From this we deduce that PQ=QP(cf. [2, Remark 2.2]).
• Think
– Informal language
• We think that the growth of CNPs may suffer from the processes of
carbohydrate polymerization…
• So, I think that the desirable inspecting results can be obtained by
integrating several different … testing technologies [71].
– In the context of the research object
• Brown (2004) reported that one interviewee stated that strong
candidates are those who “think like a scientist” (p. 251).
30
30. Argue - discuss
• Argue
– Attempts at persuasion
• Similarly, some scholars argue that policies are symbolic, simply
maintaining the status quo (see Kelly, 2005 for a review).
• Alvesson and Karreman (2004) argue that it is important to investigate
how subjectivity is formed “empirically”.
• Discuss
– Often simply an indicator of contents
• Fleszar and Hindi [2] discuss some straightforward modifications of SS.
• Neuberger and Counsell (2002) discuss the limitations of the Impact
Factor...
– As a research object in SSH
– In the context of disagreements
• Monton & van Fraassen (2003, p. 412) discuss Rosen’s objection,
somewhat tangentally, in a footnote.
35
33. Demonstrate - observe
• Observe
• The same trend has been observed by Hameed [6].
• GCC185 has been observed to bind and regulate syntaxin 16 [45].
• Demonstrate
– Used for observation
• Other studies demonstrate at least ~40% of the occult cancers being
located in the (distal) fallopian tube.9,11,25,30,31,33,36
– Inference from data
• Recent data demonstrate that the mouse collagen X promoter is…
• Shapiro (2005) demonstrates a similar pattern amongst Food Stamp
recipients, who reduce caloric intake over the course of…
• Apparent disciplinary language preference
38
36. Prove - conclude
• Prove
– Demonstration of effectiveness
• The method has been PROVED under controlled laboratory
conditions, up to a SNR=-30dB[3].
– As conclusion
• The current genome-wide association studies PROVED this
assumption to be correct [37–39].
• Conclude
• Another comparison also CONCLUDED that ?SD values calculated by
the Dorris–Gray method were more accurate compared to…
• As CONCLUDED by Watson et al. (2005), degradation rates have to
be decreased during upscaling from lab to field dimensions.
41
37. Findings
42
• Large differences in verb usage across the scientific
landscape
• Polysemy is evident, and makes it difficult to
establish types or groups of similar verbs
• Verb may indicate a research object, instead of a
research act
• Challenge: no guarantee verb is actually strongly
associated with cited work
43. A closer look at verb types:
• Discourse verbs somewhat more frequent in social
sciences, humanities, health
• Research verbs occur slightly more in physical
sciences, social sciences, health
• Cognition verbs seem strongly favoured in social
science, humanities, maths, CS
Overall, effects are not very pronounced.
49
44. Experiment: LDA over all verbs and
research fields
• Raw verb counts, scientific fields as ‘documents’
50
50. nutrition, biochemistry and neurology
56
increase reduce inhibit activate alter
attenuate ameliorate exert elevate evoke
51. social sciences and humanities
57
find examine reflect focus face argue
perceive engage learn argue pay
52. Challenges
• Topics seem strongly associated with distinct
regions of the scientific landscape
• But, dominated by field-specific verbs – let, dope,
pay, inhibit, ameliorate, secrete, soil, …
• Solution? Restrict analysis to only reporting verbs
or only ‘generic activities’?
– What is and what is not a reporting verb?
• follow, generate, process, correspond, compute, ascertain, uncover,
invoke, judge, standardize, defend, appreciate, profile, ..
– How do we restrict ourselves to ‘generic activities’ when that is
explicitly not what we aim to find?
58