SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Fletcher Series. 2016 Aug 26;1(1-10)
Abstracts Matter. But...
How much so?
Rascon CA1
1cynthia.alexander@gmail.com, San Francisco CA, 94105, USA.
Abstractff
The number of times a scientific paper is cited (citations count) has emerged as proxy of a paper’s
success within its field. Here, I aim to address how relevant is an abstract to a scientific publication,
and furthermore which features of such abstracts play the largest impact in a paper’s success (as
estimated by citations count).
The data set comprised all abstracts of scientific papers from 22 top biotech journals published in
the period of 1995-2016, a total of 310,175 papers. Journals name or the affiliation of the heads of
laboratories where not incorporated in this model, which aimed to be solely based on the abstracts
title and content. Data cleaning, and feature engineering largely relying on NLP metrics (LSA, Tf-idf,
POS-tagger), gave an good insight on what better predicts citation count across the
Biotech papers have a steady
trending curve
Figure 1. Number of citations per paper by year of publishing. The corpus data set after
cleaning is comprised by 202,173 abstracts. Each cyan dot represents a single paper
(transparency 0.3).
A journal prestige is dependent
on its impact factor
Figure 2. Journals used for the data set and the number of citations per paper
published between 1995-2010 shown as a violin plot. This differences reflect to some
extent each journals impact factor (the yearly average number of citations).
Figure 3. Final set of 134,374 papers (1995-2010). The
total number of citations per paper, (target, y), was
binned in two classes: under or over 10 total citations
since the paper’s publishing date (0 or 1, respectively).
(left side: Example of an Abstract and citation count)
.
Abstracts binned in two classes:
0 for 1-9 (25%), or 1 for 10 or more (75%) total citations
LAS, Tf-idf, and Positional Tagging
selected as star features, with Random
Forests as the model of choiceR
Figure 4. ROC and Precision/Recall curves for the top performing models.
Model over the last 5 years (2005-2009)
to predict the ‘success’ of 2010 papers:R
Figure 5. ROC and Precision/Recall curves for the top performing models. This time
modeling on 2005-2009 papers to predict 2010 papers ‘success’.
Features identified as important by RF for
predicting coming years’ papers success:
Figure 6. Feature importances as ranked by Random Forests, for a model trained on 2005-2009 and
tested on 2010 papers. *Abstract LSA (100 comp.), **Abstract LSA on Tfidf (100 comp.), *** in Title LSA
C2- **
C2- *
C4- *
C7- **
C4- **
POS tag ‘:’
C8- **
C5- **
Abstract length
C3- **
C1- *
C31-***
C15- **
C15- *
C14- *
C16- **
C3- *
C6- *
POS tag ‘.’
C29- **
1st – Next Generation Sequencing
sequenc: 0.20, method: 0.17, data: 0.16, genom: 0.16, avail: 0.14
2nd – Cellular regulation / gene
expression
cell: 0.71, activ: 0.19, induc: 0.08, regul: 0.08, mice: 0.07
3rd – Cellular models (methods)
cell: 0.28, use: 0.23, data: 0.19, method: 0.17, model: 0.16
4th – Applied genomics (mutants)
genom: 0.25, sequenc: 0.25, protein: 0.19,mutant: 0.12, human: 0.11
5th – Basic research (DNA related)
gene: 0.28, dna: 0.27, rna: 0.20, transcript: 0.20, genom: 0.17
Abstracts matter about:
81%
Need to consider:
Are better scientist simply better communicators?
Or… Great scientist are also really good at
communicating?
I did not incorporate a feature to account for
novelty. (quite the opposite)
It is circular to say the more papers exist in a filed
the more likely it is to be cited in the future.
However this suggests that trends exist in
academia. *duh*
Abstracts matter about:
81%
Future directions:
Multi-class case
Extend prediction forecast window. 2017??
Examine those abstracts in which the model did
poorly.
Flask app to ‘score’ new abstracts.
Time series, model topic trends over time. Is it too
early or is it too late for a paper to come out?

Weitere ähnliche Inhalte

Ähnlich wie Paper Abstracts Matter... But How much?

Chemical intelligence that makes hidden knowledge effortlessly reachable
Chemical intelligence that makes hidden knowledge effortlessly reachableChemical intelligence that makes hidden knowledge effortlessly reachable
Chemical intelligence that makes hidden knowledge effortlessly reachable
ChemAxon
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Hi I need to understand stemplots. thanksSolution .pdf
Hi I need to understand stemplots. thanksSolution               .pdfHi I need to understand stemplots. thanksSolution               .pdf
Hi I need to understand stemplots. thanksSolution .pdf
amitsalesraipur
 
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01  Points 30ACriminal Justice Statistics Lab 4CRJS-3020-01  Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
CruzIbarra161
 
Six sigma statistics
Six sigma statisticsSix sigma statistics
Six sigma statistics
Shankaran Rd
 
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
hyacinthshackley2629
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
Rozainita Rosley
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
vie_dels
 

Ähnlich wie Paper Abstracts Matter... But How much? (20)

Chemical intelligence that makes hidden knowledge effortlessly reachable
Chemical intelligence that makes hidden knowledge effortlessly reachableChemical intelligence that makes hidden knowledge effortlessly reachable
Chemical intelligence that makes hidden knowledge effortlessly reachable
 
The influence of data curation on QSAR Modeling – examining issues of qualit...
 The influence of data curation on QSAR Modeling – examining issues of qualit... The influence of data curation on QSAR Modeling – examining issues of qualit...
The influence of data curation on QSAR Modeling – examining issues of qualit...
 
Hi I need to understand stemplots. thanksSolution .pdf
Hi I need to understand stemplots. thanksSolution               .pdfHi I need to understand stemplots. thanksSolution               .pdf
Hi I need to understand stemplots. thanksSolution .pdf
 
SEM in IB - A critical look
SEM in IB - A critical lookSEM in IB - A critical look
SEM in IB - A critical look
 
Quantitive Time Series Analysis of Malware and Vulnerability Trends
Quantitive Time Series Analysis of Malware and Vulnerability TrendsQuantitive Time Series Analysis of Malware and Vulnerability Trends
Quantitive Time Series Analysis of Malware and Vulnerability Trends
 
Evaluacion cuatro
Evaluacion cuatroEvaluacion cuatro
Evaluacion cuatro
 
Research on Haberman dataset also business required document
Research on Haberman dataset also business required documentResearch on Haberman dataset also business required document
Research on Haberman dataset also business required document
 
Bab 4.ppt
Bab 4.pptBab 4.ppt
Bab 4.ppt
 
Sampling Data in T-SQL
Sampling Data in T-SQLSampling Data in T-SQL
Sampling Data in T-SQL
 
Session02
Session02Session02
Session02
 
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...
 
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01  Points 30ACriminal Justice Statistics Lab 4CRJS-3020-01  Points 30A
Criminal Justice Statistics Lab 4CRJS-3020-01 Points 30A
 
Six sigma statistics
Six sigma statisticsSix sigma statistics
Six sigma statistics
 
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MININGUNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
UNDERSTANDING LEAST ABSOLUTE VALUE IN REGRESSION-BASED DATA MINING
 
Major project.pptx
Major project.pptxMajor project.pptx
Major project.pptx
 
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Presentation (9).pptx
Presentation (9).pptxPresentation (9).pptx
Presentation (9).pptx
 
IPPTCh008.pptx
IPPTCh008.pptxIPPTCh008.pptx
IPPTCh008.pptx
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
 

Kürzlich hochgeladen

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 

Kürzlich hochgeladen (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Paper Abstracts Matter... But How much?

  • 1. Fletcher Series. 2016 Aug 26;1(1-10) Abstracts Matter. But... How much so? Rascon CA1 1cynthia.alexander@gmail.com, San Francisco CA, 94105, USA. Abstractff The number of times a scientific paper is cited (citations count) has emerged as proxy of a paper’s success within its field. Here, I aim to address how relevant is an abstract to a scientific publication, and furthermore which features of such abstracts play the largest impact in a paper’s success (as estimated by citations count). The data set comprised all abstracts of scientific papers from 22 top biotech journals published in the period of 1995-2016, a total of 310,175 papers. Journals name or the affiliation of the heads of laboratories where not incorporated in this model, which aimed to be solely based on the abstracts title and content. Data cleaning, and feature engineering largely relying on NLP metrics (LSA, Tf-idf, POS-tagger), gave an good insight on what better predicts citation count across the
  • 2. Biotech papers have a steady trending curve Figure 1. Number of citations per paper by year of publishing. The corpus data set after cleaning is comprised by 202,173 abstracts. Each cyan dot represents a single paper (transparency 0.3).
  • 3. A journal prestige is dependent on its impact factor Figure 2. Journals used for the data set and the number of citations per paper published between 1995-2010 shown as a violin plot. This differences reflect to some extent each journals impact factor (the yearly average number of citations).
  • 4. Figure 3. Final set of 134,374 papers (1995-2010). The total number of citations per paper, (target, y), was binned in two classes: under or over 10 total citations since the paper’s publishing date (0 or 1, respectively). (left side: Example of an Abstract and citation count) . Abstracts binned in two classes: 0 for 1-9 (25%), or 1 for 10 or more (75%) total citations
  • 5. LAS, Tf-idf, and Positional Tagging selected as star features, with Random Forests as the model of choiceR Figure 4. ROC and Precision/Recall curves for the top performing models.
  • 6. Model over the last 5 years (2005-2009) to predict the ‘success’ of 2010 papers:R Figure 5. ROC and Precision/Recall curves for the top performing models. This time modeling on 2005-2009 papers to predict 2010 papers ‘success’.
  • 7. Features identified as important by RF for predicting coming years’ papers success: Figure 6. Feature importances as ranked by Random Forests, for a model trained on 2005-2009 and tested on 2010 papers. *Abstract LSA (100 comp.), **Abstract LSA on Tfidf (100 comp.), *** in Title LSA C2- ** C2- * C4- * C7- ** C4- ** POS tag ‘:’ C8- ** C5- ** Abstract length C3- ** C1- * C31-*** C15- ** C15- * C14- * C16- ** C3- * C6- * POS tag ‘.’ C29- ** 1st – Next Generation Sequencing sequenc: 0.20, method: 0.17, data: 0.16, genom: 0.16, avail: 0.14 2nd – Cellular regulation / gene expression cell: 0.71, activ: 0.19, induc: 0.08, regul: 0.08, mice: 0.07 3rd – Cellular models (methods) cell: 0.28, use: 0.23, data: 0.19, method: 0.17, model: 0.16 4th – Applied genomics (mutants) genom: 0.25, sequenc: 0.25, protein: 0.19,mutant: 0.12, human: 0.11 5th – Basic research (DNA related) gene: 0.28, dna: 0.27, rna: 0.20, transcript: 0.20, genom: 0.17
  • 8. Abstracts matter about: 81% Need to consider: Are better scientist simply better communicators? Or… Great scientist are also really good at communicating? I did not incorporate a feature to account for novelty. (quite the opposite) It is circular to say the more papers exist in a filed the more likely it is to be cited in the future. However this suggests that trends exist in academia. *duh*
  • 9. Abstracts matter about: 81% Future directions: Multi-class case Extend prediction forecast window. 2017?? Examine those abstracts in which the model did poorly. Flask app to ‘score’ new abstracts. Time series, model topic trends over time. Is it too early or is it too late for a paper to come out?

Hinweis der Redaktion

  1. The impact factor (IF) of an academic journal is a measure reflecting the yearly average number of citations to recent articles published in that journal.
  2. Took some time to get to this curve, data cleaning