- The document discusses associating organisms with their environments through automated mining of literature, assisted curation, and exploratory visualization approaches. It focuses on identifying environment ontology (EnvO) terms in text and annotating species descriptions in the Encyclopedia of Life (EOL) with EnvO terms.
- Methods are presented for identifying EnvO terms that describe environments and habitats in unstructured text using a named entity recognition tool. Sample output showing identified EnvO terms in text is shown.
- The tool can annotate EOL species pages by linking habitat and other text from species descriptions to the appropriate EnvO terms through an ENV-EOL annotation file. This supports querying relationships between species and their environments.
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Β
2015 06 hcmr - emodnet-eubon - associating organisms with their environments - low res
1. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Evangelos Pafilis
Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC)
Hellenic Centre for Marine Research (HCMR), Heraklion Crete, Greece
pafilis@hcmr.gr, http://epafilis.info
Associating Organisms With Their Environment:
Automated Mining, Assisted Curation and
Exploratory Visualization Approaches
3. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
http://eol.org/data_objects/31415353
Information in Free Text
4. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
http://eol.org/data_objects/31415353
Information in Free Text
5. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Microbes are key players in both healthy and degraded coral reefs.
A combination of metagenomics, microscopy, culturing, and water
chemistry were used to characterize microbial communities on four
coral atolls in the Northern Line Islands, central Pacific.
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (βProject Descriptionβ,
Dinsdale et al, 2008)
Microbial mat samples were collected from the hydrothermal vent
field located in the Kolumbo submarine volcanic crater, off the coast
of the island of Santorini. The bacteria and archaea community
composition was evaluated further via shotgun metagenomics
analysis
In-house HCMR document (Polymenakou, Oulas, et al.)
Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)
Figure 1. Sampling sites on a cross-slope transect. [β¦.]
Oceanographically, the stations represent the abyssal plain
(GeoB12815), the continental rise (GeoB12808, GeoB12811), the
continental slope (GeoB12803, GeoB12802), the shelf break
(GeoB12807) and the shelf (GeoB12806). Surface sediments were
recovered by gravity and multi-coring. [.β¦]
Scientific
web pages
Literature
(abstracts,
full-text
articles,
legends)
In-house
documents
6. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Microbes are key players in both healthy and degraded coral reefs.
A combination of metagenomics, microscopy, culturing, and water
chemistry were used to characterize microbial communities on four
coral atolls in the Northern Line Islands, central Pacific.
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (βProject Descriptionβ,
Dinsdale et al, 2008)
Microbial mat samples were collected from the hydrothermal vent
field located in the Kolumbo submarine volcanic crater, off the coast
of the island of Santorini. The bacteria and archaea community
composition was evaluated further via shotgun metagenomics
analysis
In-house HCMR document (Polymenakou, Oulas, et al.)
Source: http://onlinelibrary.wiley.com/doi/10.1111/1758-2229.12264/full (Lagostina et al., 2015)
Figure 1. Sampling sites on a cross-slope transect. [β¦.]
Oceanographically, the stations represent the abyssal plain
(GeoB12815), the continental rise (GeoB12808, GeoB12811), the
continental slope (GeoB12803, GeoB12802), the shelf break
(GeoB12807) and the shelf (GeoB12806). Surface sediments were
recovered by gravity and multi-coring. [.β¦]
Scientific
web pages
Literature
(abstracts,
full-text
articles,
legends)
In-house
documents
11. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
ENVIRONMENTS
http://environments.hcmr.gr
http://environments-eol.blogspot.gr/
ββ― Dictionary based, Open Source
ββ― Environment Ontology
ββ― Fast (4000 PubMed abstracts / sec) *
ββ― Based on SPECIES name recognition
tagger (Pafilis et al, PLOS ONE)
ββ― E600 gold standard: EnvO-based
corpus of EOL Species pages
ββ― Recognition Accuracy β Mention Level:
- F1: 82.0%
87.1% of the TPs: exact id
among predicted ones
*: based a single-thread run on an Intel 2,27GHz, 24 GB RAM processing a set of 536,052
abstracts
Pafilis,E. et al. (2015) ENVIRONMENTS and EOL: identification of Environment Ontology terms in text and
the annotation of the Encyclopedia of Life. Bioinformatics, 10.1093/bioinformatics/btv045
12. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
biome
environmental
feature
environmental
material
environmental
condition
habitat
β¦
β¦
β¦
β¦
β¦
Based on slides by Dr. Pier Luigi Buttigier, AWI, Bremenhaven, Germany
http://environmentontology.org
~1600 terms, June 2013
EnvO: source of environment descriptor
names and synonyms
13. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
ENVIRONMENTS β Improving Accuracy
ββ― Increasing matches in text
ββ― orthographic variation supported
e.g. freshwater, fresh water, and fresh-water
ββ― Case-insensitive matching
ββ― Synonym generation to reflect the way environment descriptive
terms are mentioned in text (both generic and EnvO specific)
ββ― Preventing overmatching (i.e. avoiding increased FP)
ββ― βstopword-listβ (e.g. spring, well, range)
Action Example
Add a variant in which
non-informative words
have been removed
epipelagic zone β epipelagic
estuarine biome β estuarine
Plural form addition sediment β sediments
Adjective form addition lagoon β lagoonal
14. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
EnvO parts Not included:
species
tissues
foods
Limitations β Known Issues
negation not supported
conflicts with anatomy terms
(e.g. mouth, blowhole)
Scope
18. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Parr CS, et al. The Encyclopedia of Life v2: Providing Global Access to Knowledge
About Life on Earth (2014) Biodiversity Data Journal 2: e1079
http://eol.org/
19. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
http://eol.org/info/discover_what
β’β― Encyclopedia of Life (EOL) http://www.eol.org
β’β― one-stop-shop for biodiversity knowledge
β’β― Over 3 Mi Taxa
20. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
β’β― Encyclopedia of Life (EOL) http://www.eol.org
β’β― one-stop-shop for biodiversity knowledge
β’β― Over 3 Mi Taxa
http://eol.org/info/discover_what
21. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
ID: ENVO:00000192
Name: mudflat
ID: ENVO:00000020
Name: lake
http://eol.org/data_objects/31415353
Information in Free Text
30. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Data Tab
Phoenicopterus ruber (Greater Flamingo) EOL Taxon Page Data Tab: http://eol.org/pages/913221
Parts of the screenshot truncated for illustration purposes
31. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Integrated in Traitbank
http://eol.org/info/516Parr CS, et al. TraitBank: Practical semantics for organism
attribute data, Semantic Web Journal, under review
42. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
BioCreative V Track 5: Interactive Curation (IAT)
Dr. L. Hirschman, Dr. C. Arighi et al.
September 2015, Sevilla, Spain
Beta:
β’β― Entity highlighting
β’β― Suggested Term:
β’β― sorting
β’β― Selection
β’β― exporting
β’β― Integration with Metagenomics Resources
43. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
http://jensenlab.org/
http://tissues.jensenlab.org/ - Santos A et al., PeerJ in press
preprint: http://biorxiv.org/content/early/2014/11/10/010975
http://diseases.jensenlab.org/ - Pletscher-Frankild,S., et al.
(2014) DISEASES: Text mining and data integration of disease-
gene associations. Methods, 74, 83β89.
44. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
large scale biological questions
46. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
When and where is a species
most likely to be found, and vice versa?
47. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Which other species are most likely
to be found near by?
48. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Where ?
β’β― Location
β’β― Environment type
When ?
β’β― Life Stage
β’β― Periods of year
51. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
β’β― ~15000 EnvO term tags in the EOL Pages
β’β― For 234 bird species reported in Crete,
Greece (accord to GBIF, NCBI Taxonomy)
52. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
β’β― Statistics (e.g term count per taxon)
β’β― Data provenance (e.g. EOL Text Type)
β’β― Taxonomy
β’β― Interactive Visualizations
53. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Developed by Dr. Umer Ijaz, University of Glasgow (Umer.Ijaz@glasgow.ac.uk)
http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/summarize_v0.2/summarize.html
Try it live:
http://
environments.hcmr.gr/
cretan-birds/
summarize.html
54. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Developed by Dr. Umer Ijaz, University of Glasgow (Umer.Ijaz@glasgow.ac.uk),
http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/HEAPcloud_v0.1/
55. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Specific text sources to address specific biological questions
http://rs.tdwg.org/ontology/voc/SPMInfoItems#Habitat
http://rs.tdwg.org/ontology/voc/SPMInfoItems#Migration
http://rs.tdwg.org/ontology/voc/SPMInfoItems#Reproduction
http://rs.tdwg.org/ontology/voc/SPMInfoItems#TrophicStrategy
Hierarchy of ontological data structures
e.g. habitat only, environmental material only , environmental feature only
Taxonomy
e.g. re-calculate term counts at the family or higher level taxon
Further Inspection
56. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
!β― Species β Environment association
!β― ENVIRONMENTS:
!β― Dictionary-based environment descriptive term identification
!β― Ontological Community standards, e.g. EnvO: name source
!β― EOL: global aggregator of biodiversity knowledge
!β― Semantically typed text clauses
!β― ENVIRONMENTS and EOL
!β― processing the EOL Taxon pages to extractenvironment descriptors
!β― Raw annotations (tabular format)
!β― Browse via the EOL Web Interface (Broad audience)
!β― Search and Retrieve via TraitBank@EOL
!β― Extract: standard compliant environement term suggestion
!β― Lightweight interactive browsing escort
!β― Process user selected text
!β― Simple and generic annotation retrieval
!β― Large scale biological questions
!β― Integrative data analysis / interactive visualization
!β― Citizen science project support
Summary
57. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Based on processing PubMed, Apr 2013
Co-mention based analysis
Visualization β incorporate in community resources
LifeWatchGreece - RvLab
More Entity types (depending on resources)
β’β― Functional Traits (feeding type, body features, etc)
β’β― Localities (co-ordindates, geographical locations)
Next Steps
58. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Digging-out Information
http://hartpurylrc.files.wordpress.comPhoto by Dr Chatzinikolaou E
59. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Acknowledgements
Amvrakikos Lagoons, May 2011
ACTION ES1103
Thank You!
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina
Vasileiadou, Lucia Fanini, Sarah Faulwetter, Anastasis Oulas,
Alexandros Gkougkousis et al. (RvLAB)
NNF CPR: Lars Juhl Jensen, Sune Frankild, U Mass: Rob Stevenson
Uni Glasgow: Christopher Quince, Umer Ijaz
EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz
MM-MPI: J. Schnetzer, E Pereira, AWI: Dr P. Buttigieg
BioCreative: Dr. L. Hirschman (MITRE, DoE Award No DE-SC0010838)
SEQenv (https://bitbucket.org/seqenv), Reflect (http://reflect.ws)
Genomes OnLine Database, Virome / Metagenomes Online
Funding: EOL Rubenstein Fellowship, LifeWatc Greece, MARBIGEN,
NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,βSEQenvβ Hackathons (COST ES1103)
60. EMDONET/EUBON β 10th June 2015 β HCMR, Crete Greece
Acknowledgements
Thank You!
Amvrakikos Lagoons, May 2011
ACTION ES1103
id: ENVO:00000038
name: lagoon
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina
Vasileiadou, Lucia Fanini, Sarah Faulwetter, Anastasis Oulas,
Alexandros Gkougkousis et al. (RvLAB)
NNF CPR: Lars Juhl Jensen, Sune Frankild, U Mass: Rob Stevenson
Uni Glasgow: Christopher Quince, Umer Ijaz
EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz
MM-MPI: J. Schnetzer, E Pereira, AWI: Dr P. Buttigieg
BioCreative: Dr. L. Hirschman (MITRE, DoE Award No DE-SC0010838)
SEQenv (https://bitbucket.org/seqenv), Reflect (http://reflect.ws)
Genomes OnLine Database, Virome / Metagenomes Online
Funding: EOL Rubenstein Fellowship, LifeWatc Greece, MARBIGEN,
NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,βSEQenvβ Hackathons (COST ES1103)