Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Wikipedia as an engine for scientific communication and collaboration at massive scale
1. Wikipedia as an engine for
scientific communication and
collaboration at massive scale
Andrew Su, Ph.D.
@andrewsu
asu@scripps.edu
http://sulab.org OK
ScienceWriters2012 OK
October 27, 2012
2. 2
The biomedical literature is growing rapidly
Number of PubMed-indexed articles
1,000,000
800,000
600,000
400,000
200,000
0
1979 1984 1989 1994 1999 2004 2009
3. 3
The biomedical literature is growing rapidly
Average of articlesof humantypical scientist
Number capacity read by scientist
20
10
0
1979 1984 1989 1994 1999 2004 2009
7. 7
10k gene “stubs” within Wikipedia ≈ “Gene Wiki”
Protein structure
Gene
summary
Symbols and
identifiers
Gene Ontology
annotations
Protein
interactions
Tissue expression
Linked pattern
references
Links to structured
databases
Huss, PLoS Biol, 2008
8. 8
Gene Wiki has a critical mass of readers
Rank 1001-1010: Specialists Rank 101-110: Scientists
CSDA Tau protein
CNTNAP2 Interleukin 10
IGSF8 APC
Adenosine A3 receptor C-Met
RYR1 Factor V
ETV6 Interleukin 8
Small heterodimer partner CD44
5-HT1D receptor Histamine H1 receptor
TRPC6 Kappa Opioid receptor
Interleukin-6 receptor Dihydrofolate reductase
Rank 1-10: Laypeople
Total: 4.0 million views / month
Insulin
Titin
Human chorionic gonadotropin
Vasopressin
ANKH
CLOCK
Catalase
Erythropoietin
Glucagon
Parathyroid hormone
Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
9. 9
Gene Wiki has a critical mass of readers
Huss, PLoS Biol, 2008; Huss, NAR, 2010; Good, NAR, 2011
10. 10
Gene Wiki has a critical mass of editors
Editor count Editors
Edit count
Edits
Increase of ~10,000 words / month from >1,000 edits
Currently 1.42 million words
Approximately equal to 230 full-length articles
Huss, NAR, 2010; Good, NAR, 2011
11. 11
A review article for every gene is powerful
Reelin: 98 editors, 703 edits since July 2002
Hyperlinks to related concepts
Heparin: 358 editors, 654 edits since June 2003
AMPK: 109 editors, 203 edits since March 2004
RNAi: 394 editors, 994 edits since October 2002
References to the literature
12. 12
The Gene Wiki is timely and current
Manny Ramirez
suspended for doping
Catalase linked to
premature gray hair
Also, MGAT2 (obesity), ALDH2 (heart attack), SOX21 (hair
loss), SATB1 (breast cancer), TSLP (asthma), CCR5 (HIV), …
Huss, NAR, 2010
13. 13
The Gene Wiki is (reasonably) reliable
Per edit Average Probability
probability lifetime by time
Cumulative edits
Good edits 98.9% 115.4 d 99.968%
Vandalism 1.1% 3.4 d 0.032%
Date (0.63% for
WP overall)
Good, NAR, 2011
14. 14
Making the Gene Wiki more reliable
Novartis is a multinational 2 The company name is derived
pharmaceutical company from old Greek, and means
based in Basel, Switzerland "destroyer of birds".
that manufactures drugs such
as clozapine
(Clozaril), diclofenac
(Voltaren), …
2
Good, NAR, 2011 http://www.wikitrust.net/
15. 15
Making the Gene Wiki more reliable
Novartis is a multinational 2 The company name is derived
pharmaceutical company from old Greek, and means
based in Basel, Switzerland "destroyer of birds".
that manufactures drugs such
as clozapine
(Clozaril), diclofenac
(Voltaren), …
36211 total edits 36 total edits
* *
*
*
* *
*
* *
*
* *
* *
High-trust author Low-trust author
Good, NAR, 2011 http://www.wikitrust.net/
19. 19
Collaborators Group members
Doug Howe, ZFIN Ben Good Max Nanis
John Hogenesch, U Penn
Jon Huss, GNF
Salvatore Loguercio Chunlei Wu
Luca de Alfaro, UCSC Ian Macleod
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
http://slideshare.com/andrewsu
Many Wikipedia editors
WP:MCB Project
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
+Andrew Su
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820)
Hinweis der Redaktion
next gen sequencing identifies candidate genesAlso Microarray data, proteomics, GWAS, methylation, post-translational modifications, translocation detection, etc.What do these genes do?
Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization