The document describes the Gene Wiki, a crowdsourced online portal for annotating human genes. It notes that the "long tail" of scientists can help directly participate in gene annotation. The Gene Wiki has grown significantly, with over 1 million words contributed and 4.3 million views per month. Content from the Gene Wiki improves gene enrichment analysis and allows mining of novel gene ontology annotations. Future work aims to integrate the Gene Wiki with other databases to enable dynamic queries across genes, diseases, and SNPs. Crowdsourcing from scientists is positioned as a valuable source of information on gene function.
Beyond the EU: DORA and NIS 2 Directive's Global Impact
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
1. The Gene Wiki: Crowdsourcing human gene
annotation
Andrew Su, Ph.D.
The Scripps Research Institute
ISMB
Special Session: Harnessing community
intelligence for bioinformatics
#ISMB #SS7
July 17, 2012
2. 2
The Long Tail is a prolific source of content
Short
Head
Content
produced
Long Tail
Contributors (sorted)
News : Newspapers Blogs
Video: TV/Hollywood YouTube
Product reviews: Consumer reports Amazon reviews
Food reviews: Food critics Yelp
Talent judging: Olympics American Idol
Gene annotation: Manual curation Gene Wiki
3. 3
We can harness the
Long Tail of scientists
to directly participate in
the gene annotation
process.
5. 5
Wikipedia has breadth and depth
Articles
Words
(millions)
Wikipedia Britannica
Online
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
7. 7
Wiki success depends on a positive feedback
Gene wiki page utility
1 100
2 200
Number of Number of
contributors users
8. 8
10,000 gene “stubs” within Wikipedia Utility
Users
Contributors
Protein structure
Gene
summary
Symbols and
identifiers
Gene Ontology
annotations
Protein
interactions
Tissue expression
Linked pattern
references
Links to structured
databases
Huss, PLoS Biol, 2008
9. 9
Gene Wiki has a critical mass of readers
Utility
Users
Contributors
Total: ~4.3 million
views / month
Huss, PLoS Biol, 2008; Good, NAR, 2011
10. 10
Gene Wiki has a critical mass of editors
Utility
~10,000 words added / month
Users
Contributors
Total 1.42 million words
≈ 230 full-length articles
4.3 million views / month
Cumulative edits
Productive
edits
1000 edits / month
Vandalism
Good, NAR, 2011
11. 11
A review article for every gene is powerful
Reelin: 98 editors, 703 edits since July 2002
Hyperlinks to related concepts
Heparin: 358 editors, 654 edits since June 2003
AMPK: 109 editors, 203 edits since March 2004
RNAi: 394 editors, 994 edits since October 2002
References to the literature
12. 12
Making the Gene Wiki more computable
Free text Structured annotations
13. 13
Filling the gaps in gene annotation
Good, BMC Genomics 2011, 12:603
NCBI Entrez Gene: 3362
Gene Wiki
mapping
Wikilink Candidate
assertion
GO:0004993
GO exact
synonym
Annotator
14. 14
Filling the gaps in gene annotation
Good, BMC Genomics 2011, 12:603
NCBI Entrez Gene: 334
Gene Wiki
mapping
Wikilink Candidate
assertion
GO:0006897
GO exact
match
Annotator
15. 15
Novel GO annotations – so what?
Good, BMC Genomics 2011, 12:603
6319
11,022 ~100,000
“novel” 4703 (43%)
annotations annotations
annotations match known
mined from from GO
@ 48-64% annotations
Gene Wiki consortium
specificity
16. 16
Gene Wiki content improves enrichment analysis
axon Enrichment
guidance GO term
analysis
(GO:0007411)
811 articles
264 genes PubMed Concept
Gene list
abstracts recognition
GO:0007411
Yes No
Linked genes Yes 13 2
through
No 251 12033
PubMed
P = 1.55 E-20
17. 17
Gene Wiki content improves enrichment analysis
muscle Enrichment
contraction GO term
analysis
(GO:0006936)
251 articles
87 genes PubMed Concept
Gene list
abstracts recognition
+
Gene Wiki
87 articles
GO:0006936 GO:0006936
Linked genes Linked genes
through through
PubMed PubMed +
Gene Wiki
P = 1.0 P = 1.22 E-09
18. 18
Gene Wiki content improves enrichment analysis
More
p-value significant with
(PubMed + GW) PubMed only
Muscle
contraction
More
significant with
PubMed + GW
p-value (PubMed only)
19. 19
Gene Wiki+ for integrative queries
mwsync
http://genewikiplus.org
27. 27
Collaborators Group members
Doug Howe, ZFIN Erik Clarke Ian Macleod
John Hogenesch, U Penn
Jon Huss, GNF
Ben Good Max Nanis
Luca de Alfaro, UCSC Salvatore Loguercio Chunlei Wu
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
Fondation Jean Dausset ISMB travel support
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
Many Wikipedia editors
WP:MCB Project
Contact
http://sulab.org
asu@scripps.edu
@andrewsu
+Andrew Su
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820)
Hinweis der Redaktion
Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
Tried on 773 GO categories, significant in 356 cases (46%)
We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores
Also want to convince you that the Long Tail of bioinformatics developers is valuable too, but first have to convince you that there is a bottleneck in tool development.