SlideShare ist ein Scribd-Unternehmen logo
1 von 15
text mining, machine learning,
NLP and all that (in 10 minutes)
Byron C Wallace
Brown Center for Evidence Based Medicine
#CochraneTech
why do we need this stuff?
[Bastian et al, PLoS Medicine 2010]
why do we need this stuff?
[Bastian et al, PLoS Medicine 2010]
PubMed growth
[http://altmetrics.org/wp-content/uploads/2010/10/medline-articles-by-year-lg.png]
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
what can we automate
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
what can we automate
what can we automate?
learner
unlabeled
data U
expert
labeled
data L
predictive
model
abstracts from PubMed search
doctor conducting review
manually screened abstracts
SVM
how does this work?
SVMs
o
x
o
o
o o o
o
o
o
x
x
x
x
x
x xx
x xx
x
support
vectors
margino
bag of words1.2 Supervised M achine Learn
I am a Nigerian
prince writing
to you about an
inheritance...
...
dinner
about
prince
call
...
work
nigerian
yesterday
office
inheritance
...
...
0
1
1
0
...
0
1
0
0
1
...
Figure 1.4: The (binary) Bag-of-Words (BoW) representation.
special considerations for the case
of systematic reviews
• class imbalance – far fewer relevant than
irrelevant abstracts
– asymmetric costs sensitivity more important than
specificity
• reviewer time is scarce and expensive
– better models, fewer labels: active learning and
dual supervision
how do we do?
“Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining”
Genetics in Medicine 2012
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
beyond citation screening
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data
beyond citation screening
Questions?
byron_wallace@brown.edu
http://www.cebm.brown.edu/software
www.cebm.brown.edu/byron

Weitere ähnliche Inhalte

Mehr von Cochrane.Collaboration

Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Cochrane.Collaboration
 
2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus Muellner2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus MuellnerCochrane.Collaboration
 
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth GilbertCochrane.Collaboration
 
1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain Chalmers1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain ChalmersCochrane.Collaboration
 
5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd Antes5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd AntesCochrane.Collaboration
 
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang GaissmaierCochrane.Collaboration
 
Cochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & BibliometricsCochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & BibliometricsCochrane.Collaboration
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane.Collaboration
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane.Collaboration
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processesCochrane.Collaboration
 
Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...Cochrane.Collaboration
 
Globalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health careGlobalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health careCochrane.Collaboration
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processesCochrane.Collaboration
 
Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...Cochrane.Collaboration
 
Balancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatmentBalancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatmentCochrane.Collaboration
 
Let’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journalLet’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journalCochrane.Collaboration
 
Evidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision makerEvidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision makerCochrane.Collaboration
 
Corporate responsibility for the right to health
Corporate responsibility for the right to healthCorporate responsibility for the right to health
Corporate responsibility for the right to healthCochrane.Collaboration
 

Mehr von Cochrane.Collaboration (20)

Crowdsourcing and Cochrane
Crowdsourcing and CochraneCrowdsourcing and Cochrane
Crowdsourcing and Cochrane
 
Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013Future of the article C Mavergames March 2013
Future of the article C Mavergames March 2013
 
2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus Muellner2. Opening of the Austrian Cochrane Branch - Marcus Muellner
2. Opening of the Austrian Cochrane Branch - Marcus Muellner
 
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
3. Opening of the Austrian Cochrane Branch - Ruth Gilbert
 
1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain Chalmers1. Opening of the Austrian Cochrane Branch - Iain Chalmers
1. Opening of the Austrian Cochrane Branch - Iain Chalmers
 
5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd Antes5. Opening of the Austrian Cochrane Branch - Gerd Antes
5. Opening of the Austrian Cochrane Branch - Gerd Antes
 
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
4. Opening of the Austrian Cochrane Branch - Wolfgang Gaissmaier
 
Cochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & BibliometricsCochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
Cochrane Database of Systematic Reviews: Indexing, Citations & Bibliometrics
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies Consultation
 
Cochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies ConsultationCochrane Collaboration - Register of Studies Consultation
Cochrane Collaboration - Register of Studies Consultation
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processes
 
Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...Connecting patients to the best-evidence through technology: An effective sol...
Connecting patients to the best-evidence through technology: An effective sol...
 
Globalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health careGlobalizing management of high quality evidence for health care
Globalizing management of high quality evidence for health care
 
Evidence in the era of globalization
Evidence in the era of globalizationEvidence in the era of globalization
Evidence in the era of globalization
 
Globalizing clinical and health care policy processes
Globalizing clinical and health care policy processesGlobalizing clinical and health care policy processes
Globalizing clinical and health care policy processes
 
Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...Globalizing the application of evidence-based policy and practices: the Phili...
Globalizing the application of evidence-based policy and practices: the Phili...
 
Balancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatmentBalancing benefits and risks of drug treatment
Balancing benefits and risks of drug treatment
 
Let’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journalLet’s celebrate the death of the medical journal
Let’s celebrate the death of the medical journal
 
Evidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision makerEvidence to policy to action – the view of a decision maker
Evidence to policy to action – the view of a decision maker
 
Corporate responsibility for the right to health
Corporate responsibility for the right to healthCorporate responsibility for the right to health
Corporate responsibility for the right to health
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Text mining, machine learning, NLP and all that (in 10 minutes)

  • 1. text mining, machine learning, NLP and all that (in 10 minutes) Byron C Wallace Brown Center for Evidence Based Medicine #CochraneTech
  • 2. why do we need this stuff? [Bastian et al, PLoS Medicine 2010]
  • 3. why do we need this stuff? [Bastian et al, PLoS Medicine 2010]
  • 5. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data what can we automate
  • 6. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data what can we automate
  • 7. what can we automate?
  • 8. learner unlabeled data U expert labeled data L predictive model abstracts from PubMed search doctor conducting review manually screened abstracts SVM how does this work?
  • 9. SVMs o x o o o o o o o o x x x x x x xx x xx x support vectors margino
  • 10. bag of words1.2 Supervised M achine Learn I am a Nigerian prince writing to you about an inheritance... ... dinner about prince call ... work nigerian yesterday office inheritance ... ... 0 1 1 0 ... 0 1 0 0 1 ... Figure 1.4: The (binary) Bag-of-Words (BoW) representation.
  • 11. special considerations for the case of systematic reviews • class imbalance – far fewer relevant than irrelevant abstracts – asymmetric costs sensitivity more important than specificity • reviewer time is scarce and expensive – better models, fewer labels: active learning and dual supervision
  • 12. how do we do? “Towards Modernizing the Systematic Review Pipeline: Efficient Updating via Data Mining” Genetics in Medicine 2012
  • 13. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data beyond citation screening
  • 14. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data beyond citation screening