Open biomedical knowledge using crowdsourcing and citizen science
1. Open biomedical knowledge
using crowdsourcing and
citizen science
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org
November 5, 2015
UCSD
Slides: slideshare.net/andrewsu
27. Wikidata for biology
27
is a
regulates
Interacts
with
Protein
Glycoprotein
Neural
development
VLDL receptor
Amyloid
precursor
protein
Property:P31
Property:P128
Property:P129
Q8054
Q187126
Q1345738
Q1979313
Q423510
Q414043
Reelin
http://www.wikidata.org/wiki/Q414043
32. Open biomedical knowledge
32
Free text to structured data
MyVariant.info MyGene.info
Integration of molecular
biology databases via
high performance APIs
Biomedical Linked
Open Data
33. The biomedical literature is massive…
33
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1983 1988 1993 1998 2003 2008 2013
Number of new PubMed-indexed articles
35. … but it is very hard to query and compute
35
Imatinib
Crizotinib
Erlotinib
Gefitinib
Sorafenib
Lapatinib
Dasatinib
…
Acute myeloid leukemia
Acute lymphoblastic leukemia
Chronic myelogenous leukemia
Chronic lymphocytic leukemia
Hodgkin lymphoma
Non-Hodgkin lymphoma
Myeloma
…
AND
36. The Network of BioThings
36
1. Identify biomedical concepts in text
… We report a case of familial systemic
mastocytosis with the rare KIT K509I germ
line mutation. In vitro treatment with imatinib,
dasatinib and PKC412 reduced cell viability
of primary mast cells harboring KIT K509I
mutation. Both patients with familial systemic
mastocytosis had remarkable hematological
and skin improvement after three months of
imatinib treatment.
Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres.
GENES
DISEASES
DRUGS
VARIANTS
37. The Network of BioThings
37
imatinib
dasatinib
PKC412
Familial systemic
mastocytosis
KIT
K509I
1. Identify biomedical concepts in text
2. Identify relationships between concepts
Mutation
of
Mutation
causes
causes
treats
inhibits
38. 38
Goal: Assemble a network of biomedical
knowledge that is comprehensive,
current, computable and traceable.
39. Question: Can Citizen Scientists
collectively perform concept recognition in
biomedical texts?
39
45. 45
Paid crowdsourcing
• F = 0.84
• 28 days
• 212 workers
• Total cost: $0
$$$
• F = 0.87
• 9 days
• 145 workers
• Total: $630.96
“Help science, please”
Citizen Science
46. Does Citizen Science scale?
46
1,000,000 articles * 10 AE / article
15,828
volunteers
needed
10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation
events per year
Number of annotation
events per year
per volunteer
48. Annotating the relationships
48
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as
well as in ex vivo acute myeloid leukemia
(AML) and chronic lymphocytic leukemia
(CLL) patient tumor samples. Thus, inhibition
of CDK9 may represent an interesting
approach as a cancer therapeutic target
especially in hematologic malignancies.
therapeutic target
subject
predicate
object
GENE
DISEASE
49. 49
Goal: Assemble a network of biomedical
knowledge that is comprehensive,
current, computable and traceable.
64. Why do I Mark2Cure?
64
I am retired, have a doctorate in
medical humanities, and have two
children with Gaucher disease. I am
just looking for some way to put my
education to use. Sounds like a perfect
situation for me.
My 4 year old daughter Phoebe is
living with and battling rare
disease.
I have Ehlers Danlos Syndrome. I hope to help people
learn about this painful and debilitating disorder, so that
others like me can receive more effective medical care.
Take part in
something that
helps humanity.
I Mark2Cure in memory of
my son Mike who had type 1
diabetes.
Studied biology in
college and I really
miss it!
In memory of my daughter
who had Cystic Fibrosis
Give back
65. Open biomedical knowledge
65
Free text to structured data
MyVariant.info MyGene.info
Integration of molecular
biology databases via
high performance APIs
Biomedical Linked
Open Data
66. 66
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
Gene Wiki / Wikidata
Ben Good
Sebastian Burgstaller
Tim Putman
Julia Turner
Ginger Tsueng
Andra Waagmeester
Elvira Mitraka, UMB
Lynn Schriml, UMB
Justin Leong, UBC
Paul Pavlidis, UBC
Join the team!
http://bit.ly/JoinSuLab
Slides: slideshare.net/andrewsu
Funding and Support
BioGPS: GM83924
Gene Wiki: GM089820
MyGene / MyVariant: HG008473
BD2K COE: GM114833
Icon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys
Other Group members
Jake Bruggemann
Ramya Gamini
Karthik Gangavarapu
Louis Gioia
Toby Li
Greg Stupp
MyGene / MyVariant
Chunlei Wu
Cyrus Afrasiabi
Kevin Xin
Adam Mark
Mark2Cure
Max Nanis
Ginger Tsueng
Jennifer Fouquier
Ben Good
Chunlei Wu
All Mark2Curators!