2. Toxicogenomics: study if a chemical causes
damage to genes
Text mining: teach a computer to “read”
articles and extract explicit information
Next-generation text mining: teach a
computer to find implicit information in
articles
3.
4. Drug safety is essential!
But… how to minimize animal testing?
Image source: The Independent, July 12, 2012
5. Toxicogenomics data Interpretation using
knowledge from manually
curated databases
Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/
6. Toxicogenomics data Interpretation using
knowledge from manually
curated databases
Not sufficient in coverage
We hypothesize that next-generation text mining
can increase the information coverage
Image sources: Verhallen and Piersma, 2011, de Jong et al 2011, http://www.flickr.com/photos/jseita/3764113525/
7. Next-generation text mining = concept profile
matching
Information cloud for
a gene concept Shared concepts
Information cloud
for a chemical
concept
Image source: Herman van Haagen
7
8. Concepts come from a thesaurus and are identified
in text with concept identification software
A good
thesaurus =
the basis for
good concept
identification
Image source: Herman van Haagen
9. Research objectives:
• Investigate information coverage in public
biomedical and chemical thesauri and
databases
• Provide methods to improve the quality
and coverage
• Give recommendations for use
• Investigate added value of next-
generation text mining when interpreting
toxicogenomics data
9
11. A thesaurus of chemical concepts1 and
methods1,2,3 to prepare a thesaurus to be
used with concept identification software
http://www.biosemantics.org/casper http://www.biosemantics.org/jochem
1. Hettne et al. Bioinformatics, 2009
2. Hettne et al. Journal of Biomedical Semantics, 2010
11
3. Hettne et al. Journal of Cheminformatics, 2010
12. A next-generation text mining-based method
for interpreting biological data
Next-generation
Biological data Statistical test text mining
12
This method gives more, and more specific results1
than other available tools
http://www.biosemantics.org/weightedglobaltest
1. Jelier R, Goeman JJ, Hettne KM, Schuemie MJ, den Dunnen JT, 't Hoen PA. Briefings in Bioinformatics, 2011
13. Application to toxicogenomics
Hettne et al. (submitted)
http://www.biosemantics.org/index.php?page=chemicalresponse-specific-gene-sets
14. See developmental defects in stem cells instead of
in animal embryos
Embryonic
structure
1.
2. Posterior neuropore open
A) Control group rat embryo B)Triazole-exposed rat embryo
Image sources1. Verhallen and Piersma, 2011, 2. De Jong et al 2012
15. Toxicity class prediction (case study: Triazoles)
25 times larger chemical-gene matrix compared to manual
work (Comparative Toxicogenomics Database)
Chemical
1.
Image source 1: Verhallen and Piersma, 2011
16. Conclusions
Next-generation text mining combined with
statistical tests complements, and is
sometimes superior to, manually curated
databases in:
- Relating chemical information to gene
expression data
- Identifying toxic effects already at the
gene expression stage
- Discriminating between different classes
of chemicals
17. Future
1. Make the method easier to use
(currently being worked on)
2. Apply the method for new drugs
with unknown toxicity
Early prediction of toxicity ->
less animal testing and safer drugs