SlideShare a Scribd company logo
1 of 18
Download to read offline
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   1
4th International Microbial Diversity Conference,
Bari - Nov. 2017
Text-mining and ontologies
new approaches to knowledge discovery of
microbial diversity
Claire Nédellec, Bibliome MaIAGE
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   2
Microbial	
  diversity,	
  information	
  sources	
  
Where	
  do	
  micro-­‐organisms	
  live?	
  	
  
A	
  critical	
  information	
  that	
  is	
  collected	
  and	
  stored	
  in	
  many	
  public	
  databases	
  
Huge	
  amount	
  of	
  isolation	
  site	
  information	
  on	
  micro-­‐organisms	
  
• Data	
  sources:	
  organism	
  collections,	
  sequence	
  databases,	
  ...	
  
	
  
	
  
	
  
• Documents:	
  scientific	
  papers,	
  reports	
  
7	
  millions	
  PubMed	
  references	
  on	
  micro-­‐organism	
  habitats	
  [Deléger	
  et	
  al,	
  2016]	
  
	
  
Often	
  available	
  for	
  automatic	
  pipelines	
  	
  
on-­‐line	
  access,	
  programming	
  interface	
  
But	
  under	
  exploited	
  because	
  expressed	
  in	
  unstructured	
  free	
  text	
  
Number	
  of	
  articles	
  
about	
  "bacteria"	
  in	
  
PubMed	
  
24,150	
  "isolated	
  from"	
  entries	
  in	
  BacDive	
  (DSMZ)	
  
18,000	
  "isolation"	
  entries	
  in	
  ATCC	
  	
  
25,000	
  "isolation	
  site"	
  for	
  bacteria	
  &	
  archae	
  in	
  Genome	
  On	
  Line	
  Database	
  
	
  
Number	
  of	
  complete	
  
genome	
  sequences	
  
at	
  JGI	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   3
	
  
From	
  free	
  text	
  to	
  knowledge	
  
	
  
Isolation	
  site,	
  always	
  in	
  free	
  text	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Unified	
  representation	
  of	
  habitat	
  descriptions	
  	
  
a	
  major	
  challenge	
  for	
  data	
  access	
  and	
  curation	
  	
  
⇒	
  Facilitate	
  Information	
  access	
  by	
  reference	
  keywords	
  
⇒	
  Enable	
  Interoperability	
  among	
  databases	
  
⇒	
  Enhance	
  databases	
  by	
  scientific	
  published	
  knowledge	
  
GenBank	
  example	
  
Species TaxID Isolation site
Acetobacter lovaniensis 104100 fermented dairy products
Acetobacter lovaniensis 104100 fermented rice flour
Acetobacter lovaniensis 104100 vinegar
Acetobacter lovaniensis 104100 water kefir
fermented food
Needs	
  
1.	
  A	
  classification	
  of	
  Habitats	
  relevant	
  to	
  microorganism	
  studies	
  
2.	
  Information	
  extraction	
  method	
  for	
  mapping	
  free	
  text	
  entities	
  to	
  the	
  classes	
  
	
  
OntoBiotope Ontology
Alvis text-mining Suite
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   4
Copyright Inra
Alvis pipeline - Florilège database
	
  
Mapping	
  various	
  terms	
  to	
  an	
  habitat	
  classification	
  
	
  
	
  
	
  
PubMed DOCUMENT TAXON HABITAT HABITAT TERM
PMID: 21549046, 21247298, 16204502,
15992268, 2116711, 2116712,
15992260, 1348242, 11530195,
23042180, 23208291, 10458115,
11456331, 21669068, 17954748,
8867607, 23433372, 26325149,
8977904, 23880504, 8227616,
16156701, 15553633, 20494189,
24715203, 21441322, 19114514,
2125110, 19254151, 22980010
Listeria
monocytogenes
,
dairy
farm
Dairy farm, dairy farm environments, dairy
farms, dairy farm environmental samples,
environment of dairy farms, potential dairy
farm, Dairy farm environmental samples, single
dairy farm, Irish dairy farms, high-prevalence
dairy farm, dairy farm environment, dairy farms
of different size, local dairy farm, second
Northwest dairy farm, dairy cattle farms,
selected dairy farms, dairy farm, Dairy farms
	
  
	
  
Term	
  variation	
  
10,000	
  habitats	
  of	
  Listeria	
  monocytogenes	
  in	
  PubMed	
  
Reference class
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   5
A	
  classification	
  with	
  a	
  hierarchical	
  structure	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Higher	
  habitat	
  classes	
  needed	
  
for	
  ecology	
  &	
  evolution	
  studies	
  
10,000	
  habitats	
  of	
  Listeria	
  monocytogenes	
  in	
  PubMed	
  
Alvis IR semantic search engine
Scientific paper
extracts
Habitat
classes
Listeria	
  monocytogenes	
  contamination	
  in	
  Chinese	
  beef	
  processing	
  plants.
Listeria	
  monocytogenes	
  isolated	
  from	
  artisanal	
  Portuguese	
  cheses-­‐making	
  	
  dairy.
the	
  presence	
  of L.	
  monocytogenes	
  in	
  samples	
  collected	
  from	
  crab	
  processing	
  plant	
  
Portuguese	
  cheses-­‐making	
  	
  dairy.
L.	
  monocytogenes	
  persisting	
  in	
  a	
  	
  cold-­‐smoked	
  fish	
  processing	
  plant.
two L.	
  monocytogenes	
  	
  cheese	
  dairy	
  isolates
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   6
OntoBiotope	
  ontology	
  
A	
  large	
  ontology	
  dedicated	
  to	
  microorganism	
  biotopes	
  
	
  
	
  
What	
  structure	
  for	
  the	
  habitat	
  classification	
  
Microbiology	
  research	
  domains	
  
Reuse	
  of	
  existing	
  habitat	
  classifications	
  (ATCC,	
  GOLD,	
  FedEx2)	
  
Gather	
  habitats	
  with	
  similar	
  physico-­‐chemical	
  properties	
  
	
  
Ontology	
  scope	
  
Extensive	
  study	
  of	
  habitat	
  terminology	
  in	
  text	
  (databases	
  and	
  papers)	
  
paper mill sludge /	
  anaerobic sludge of paper mill waste water	
  
Collaborations	
  with	
  microbiologists	
  in	
  focused	
  projects	
  (phytobiome,	
  food	
  microbiome)	
  
	
  
Evaluation	
  
Text-­‐mining	
  benchmarks:	
  Bacteria	
  Biotope	
  in	
  BioNLP	
  Shared	
  Tasks	
  
Through	
  its	
  use	
  in	
  applications	
  (e.g.	
  food	
  positive	
  flora)	
  
	
  
	
  
2329	
  habitat	
  classes	
  
492	
  synonyms	
  
13	
  levels	
  	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   7
Habitats	
  in	
  OntoBiotope	
  ontology	
  
Distributed	
  since	
  2012,	
  	
  http://agroportal.lirmm.fr/ontologies/ONTOBIOTOPE	
  
14	
  
19	
   21	
   43	
  
55	
  
120	
  
281	
  
352	
  
369	
  480	
  
801	
  
experimental	
  medium	
  
aquaculture	
  habitat	
  
bacteria	
  associated	
  habitat	
  
medical	
  environment	
  
agricultural	
  habitat	
  
habitat	
  wrt	
  chemico-­‐physical	
  property	
  
artiBicial	
  environment	
  
living	
  organism	
  
natural	
  environment	
  habitat	
  
part	
  of	
  living	
  organism	
  
food	
  
49	
  classes	
  in	
  the	
  
gastrointestinal	
  tract	
  
subtree	
  	
  
35	
  classes	
  in	
  the	
  
waste	
  subtree	
  
the	
  largest	
  classes	
  
51	
  classes	
  in	
  the	
  soil	
  
subtree	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   8
	
  
51	
  classes	
  in	
  the	
  soil	
  
subtree	
  
Contribution	
  welcome
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   9
Information	
  extraction	
  from	
  text	
  and	
  mapping	
  to	
  the	
  habitat	
  classes	
  	
  
	
  
	
  
Ontology	
  
	
  
lives_in	
  
newborn	
  
gut	
  
Article	
  text	
  
Bifidobacterium	
  longum	
  is	
  found	
  in	
  newborn	
  
infant	
  as	
  a	
  normal	
  component	
  of	
  gut	
  flora	
  
Article	
  text	
  
Bifidobacterium	
  longum	
  is	
  found	
  in	
  newborn	
  
infant	
  as	
  a	
  normal	
  component	
  of	
  gut	
  flora	
  
Bifidobacterium	
   longum	
   subsp.	
   longum	
   is	
  
found	
   in	
   newborn	
   infant	
   as	
   a	
   normal	
  
component	
  of	
  gut	
  flora.	
  
Information	
  
Bacteria:	
   Bifidobacterium	
  longum	
  
hosted	
  by:	
  newborn	
  infant	
  [baby]	
  
lives_in:	
   gut	
  [intestine]	
  
	
  
Information	
  
Bacteria:	
   Bifidobacterium	
  longum	
  
hosted	
  by:	
  newborn	
  infant	
  [baby]	
  
lives_in:	
   gut	
  [intestine]	
  
	
  
Bacteria	
   Bifidobacterium	
  longum	
  	
  
	
   subsp.	
  longum	
  	
   [taxid:	
  1679]	
  
hosted	
  by	
   newborn	
  infant	
   [baby]	
  
lives_in	
   gut	
  	
   [intestine]	
  
	
  
Ontology	
  
simplified	
  view	
  
	
  
Information	
  
Extraction	
  
Text	
  of	
  articles	
  
Formal	
  representation	
  of	
  the	
  information	
  
	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   10
Information	
  extraction	
  and	
  classification	
  -­‐	
  Process	
  
	
  
	
  
	
  
...	
  virulence	
  of	
  aquatic	
  pathogen	
  Vibrio	
  anguillarum	
  towards	
  sea	
  bass	
  larvae	
  ...	
  
	
  
	
  
	
  
	
  
	
  
Artificial	
  Intelligence	
  methods	
  (machine	
  learning	
  and	
  natural	
  language	
  processing)	
  	
  
Implemented	
  in	
  several	
  components	
  (>	
  1	
  hundred)	
  of	
  Alvis	
  text-­‐mining	
  pipeline.	
  
	
  
1.	
  Entity	
  recognition	
  =	
  identification	
  (text	
  boundaries)	
  and	
  broad	
  type	
  assignment	
  	
  
2.	
  Entity	
  classification	
  =	
  assignment	
  of	
  an	
  OntoBiotope	
  class	
  
3.	
  Relationship	
  prediction	
  =	
  links	
  microorganism	
  mentions	
  to	
  their	
  habitats	
  in	
  the	
  text	
  
	
  
Microbial	
  species	
   HabitatHabitat	
  	
  
aquatic	
  environment	
  	
  
marine	
  farm	
  fish	
  
Dicentrarchus labrax	
   larvae	
  
Lives	
  in	
   TaxID5560
	
  
Ratkovic	
  et	
  al.,	
  BMC	
  Bioinformatics,	
  2012	
  
Nédellec	
  et	
  al.,	
  Handbook	
  on	
  Ontology,	
  2009	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   11
	
  
Bibliographic	
  sources	
  	
  
Semantic	
  ressources	
  
ontologies	
  
Information	
  
extraction	
  
Full-­‐text	
   data	
  
and	
  metadata	
  
Services	
  
http://bibliome.jouy.inra.fr/demo/ontobiotope/alvisir2/webapi/search	
  
	
  
Ba	
  &	
  Bossy,	
  LREC	
  2016	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   12
	
  
Extract	
  of	
  
OntoBiotope,	
  
milk	
  product	
  
subtree	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   13
OntoBiotope	
  pipeline,	
  applied	
  to	
  PubMed	
  
	
  
BioNLP-­‐ST	
  
Entity	
  detection	
   Detection	
  and	
  
classification	
  
Relation	
  (lives	
  in)	
  
Recall	
   65%	
   50%	
   70	
  
Precision	
   81%	
   62%	
   51,4	
  
	
  
PubMed	
  
Documents	
   2,3	
  millions	
  
Habitats	
   18,5	
  millions	
  
Taxa	
   8,4	
  millions	
  
Relations	
   7,2	
  millions	
  
	
  
Text	
  source	
  
Data	
  of	
  the	
  international	
  competition	
  on	
  bacteria	
  information	
  
extraction	
  
Nédellec	
  et	
  al.,	
  BMC	
  Bioinformatics,	
  2015	
  
Ratkovic	
  et	
  al.,	
  BMC	
  Bioinformatics,	
  2012	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   14
From	
  research	
  lab	
  to	
  infrastructure,	
  	
  
an	
  European	
  Open	
  Science	
  perspective	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
   	
  
Deployment	
  on	
  
OpenMinTed,	
   European	
  
text-­‐mining	
  infrastructure	
  	
  
offers	
   to	
   the	
   scientific	
  
communities	
  
	
  
A	
   fully	
   open	
   access	
   in	
   a	
  
unified	
  framework	
  	
  
	
  
Reproducibility	
  and	
  
flexibility.	
  	
  
Full-­‐text	
   paper	
   collection	
  
and	
   database	
   aggregation	
  
and	
  standardisation	
  
Przybyła	
  et	
  al.,	
  Database,	
  2016	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   15
	
  
	
  
	
  
	
  
	
  
Treemap	
  visualization	
  for	
  
biodiversity	
  analytics
Semantic	
  relational	
  search	
  through	
  all	
  
PubMed	
  references	
  
On-­‐line	
  services	
  	
  
Data	
  integration	
  	
  
http://genome.jouy.inra.fr/Florilege/
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   16
On-­‐going	
  projects,	
  examples	
  of	
  application	
  
Food	
  positive	
  flora	
  	
  
(Florilege)	
  MD	
  Poster	
  S2-­‐23.	
  
Characterization	
  of	
  biodiversity,	
  phenotypes,	
  uses	
  and	
  molecules	
  produced/degraded	
  	
  
Food	
  innovation	
  (nutrient	
  production,	
  biopreservation)	
  
1	
  millions	
  phenotypes.	
  1,1	
  million	
  relationships	
  taxon	
  -­‐	
  phenotype	
  
Tracing	
  the	
  origin	
  	
  
(FoodMicrobiome	
  Transfert)	
  
Cheese	
  ingredients	
  and	
  cheese	
  processing	
  bring	
  unexpected	
  strains	
  
Text-­‐mining	
  contributes	
  to	
  express	
  plausible	
  hypotheses	
  on	
  the	
  source	
  
	
  
Likelihood	
  of	
  organism	
  identification	
  (metagenomics),	
  consistency	
  with	
  previous	
  results	
  
(Visa	
  TM	
  project)	
  
Has	
  this	
  microorganism	
  already	
  be	
  identified	
  in	
  this	
  place?	
  
Of	
  the	
  same	
  family?	
  In	
  a	
  similar	
  place?	
  In	
  a	
  similar	
  ecosystem?	
  
	
  
[INRA	
  -­‐	
  CNIEL]	
  
[INRA	
  Food	
  WG]	
  
[INRA,	
  AgroPortal,	
  Inist]	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   17
Conclusion	
  	
  
	
  
	
  
Millions	
  of	
  microorganism	
  habitat	
  descriptions,	
  exponentially	
  increasing.	
  
Invaluable	
  information	
  for	
  fundamental	
  research	
  and	
  applications	
  
Largely	
  underused	
  because	
  mostly	
  expressed	
  in	
  free	
  text	
  
	
  
	
  
OntoBiotope	
  ontology	
  and	
  Information	
  Extraction	
  from	
  text	
  
provides	
  a	
  formal	
  representation	
  of	
  microorganisms	
  biotopes	
  
	
  
Open	
  up	
  new	
  research	
  opportunities	
  
• Not	
  only	
  for	
  data	
  curation	
  and	
  indexing	
  in	
  information	
  systems	
  
• Analysis	
   in	
   combination	
   with	
   experimental	
   data	
   for	
   integrative	
   and	
   predictive	
  
biology	
  
A	
  prime	
  example	
  is	
  metagenomics	
  &	
  biodiversity	
  in	
  OpenMinTeD	
  
	
  
	
  
	
  
 
4th	
  International	
  Conference	
  on	
  Microbial	
  Diversity,	
  2017	
  Bari	
   	
  	
  	
   18
Acknowledgements	
  and	
  funding	
  
Mouhamadou	
  Ba,	
  Baptiste	
  Bohuon,	
  Robert	
  Bossy,	
  Philippe	
  Bessières,	
  Estelle	
  Chaix,	
  Louise	
  Deléger,	
  
Sandra	
  Dérozier,	
  Arnaud	
  Ferré,	
  Wiktoria	
  Golik,	
  Julien	
  Jourde,	
  Valentin	
  Loux,	
  Frédéric	
  Papazian,	
  
Jean-­‐	
  Zorana	
  Ratkovic,	
  Dialekti	
  Valsamou	
  	
  
	
  
	
  
	
  
	
   	
  
MEM	
   Méta-­‐omiques	
  des	
  
Ecosystèmes	
  
Microbiens	
  

More Related Content

What's hot

Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...
Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...
Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...BRNSS Publication Hub
 
The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)
The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)
The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)Laura Berry
 
Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...
Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...
Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...IJEAB
 
Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...
Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...
Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...iosrjce
 
Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...
Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...
Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...YogeshIJTSRD
 
Human Microbiome Project (HMP)
Human Microbiome Project (HMP)Human Microbiome Project (HMP)
Human Microbiome Project (HMP)christina163032
 
The inhibitory activity of L. crispatus against uropathogenes in vitro
The inhibitory activity of L. crispatus against uropathogenes in vitroThe inhibitory activity of L. crispatus against uropathogenes in vitro
The inhibitory activity of L. crispatus against uropathogenes in vitroIJMCERJournal
 
Antimicrobial Activity Mucus D. Latifrons
Antimicrobial Activity Mucus D. LatifronsAntimicrobial Activity Mucus D. Latifrons
Antimicrobial Activity Mucus D. LatifronsWiner Daniel Reyes
 
Classifcation of living organisms
Classifcation of living organismsClassifcation of living organisms
Classifcation of living organismsindianeducation
 
Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeastLaura Berry
 
Metagenome : fungal and bacterial interactions
Metagenome : fungal and bacterial interactionsMetagenome : fungal and bacterial interactions
Metagenome : fungal and bacterial interactionsLaurence Delhaes
 
DNA - based signatures defend against biological warfare agents and their makers
DNA - based signatures defend against biological warfare agents and their makersDNA - based signatures defend against biological warfare agents and their makers
DNA - based signatures defend against biological warfare agents and their makersherbalbiz
 
Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...
Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...
Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...inventionjournals
 
Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...
Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...
Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...LPE Learning Center
 

What's hot (20)

Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...
Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...
Analyses of Bacterial Community Dynamics Present in Culex quinquefasciatus Co...
 
The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)
The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)
The Role of the Skin Microbiome in Atopic Dermatitis (Eczema)
 
A Repellent Net as a New Technology to Protect Cabbage Crops; Gardening Guide...
A Repellent Net as a New Technology to Protect Cabbage Crops; Gardening Guide...A Repellent Net as a New Technology to Protect Cabbage Crops; Gardening Guide...
A Repellent Net as a New Technology to Protect Cabbage Crops; Gardening Guide...
 
Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...
Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...
Insecticidal activities of diketopiperazines of Nomuraea rileyi entomopathoge...
 
Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...
Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...
Microbiological Investigations on Gryllotalpa Africana [Orthoptera: Gryllotal...
 
metagenomics
metagenomicsmetagenomics
metagenomics
 
Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...
Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...
Antibiotic Enteric Resistant Bacteria are Abundant on Lettuce from Urban Agri...
 
Human Microbiome Project (HMP)
Human Microbiome Project (HMP)Human Microbiome Project (HMP)
Human Microbiome Project (HMP)
 
The inhibitory activity of L. crispatus against uropathogenes in vitro
The inhibitory activity of L. crispatus against uropathogenes in vitroThe inhibitory activity of L. crispatus against uropathogenes in vitro
The inhibitory activity of L. crispatus against uropathogenes in vitro
 
Antimicrobial Activity Mucus D. Latifrons
Antimicrobial Activity Mucus D. LatifronsAntimicrobial Activity Mucus D. Latifrons
Antimicrobial Activity Mucus D. Latifrons
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Classifcation of living organisms
Classifcation of living organismsClassifcation of living organisms
Classifcation of living organisms
 
Domains of unknown function are essential in yeast
Domains of unknown function are essential in yeastDomains of unknown function are essential in yeast
Domains of unknown function are essential in yeast
 
Aishwarya ray
Aishwarya rayAishwarya ray
Aishwarya ray
 
Metagenome : fungal and bacterial interactions
Metagenome : fungal and bacterial interactionsMetagenome : fungal and bacterial interactions
Metagenome : fungal and bacterial interactions
 
(2007) Bacterial Survivability and Transferability on Biometric Devices
(2007) Bacterial Survivability and Transferability on Biometric Devices(2007) Bacterial Survivability and Transferability on Biometric Devices
(2007) Bacterial Survivability and Transferability on Biometric Devices
 
DNA - based signatures defend against biological warfare agents and their makers
DNA - based signatures defend against biological warfare agents and their makersDNA - based signatures defend against biological warfare agents and their makers
DNA - based signatures defend against biological warfare agents and their makers
 
Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...
Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...
Preliminary evaluation of the larvicidal efficacy of coelomic fluid of Eudril...
 
Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...
Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...
Impacts of Anaerobic Digestion and Solid Liquid Separation on Pathogen Destru...
 
Aijrfans14 205
Aijrfans14 205Aijrfans14 205
Aijrfans14 205
 

Similar to Text-mining and ontologies - new approaches to knowledge discovery of microbial diversity

Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010tgarnett
 
Applying For A Master´S In Microbial Biotechnology Essay
Applying For A Master´S In Microbial Biotechnology EssayApplying For A Master´S In Microbial Biotechnology Essay
Applying For A Master´S In Microbial Biotechnology EssayCrystal Williams
 
bats bacterioma 2.pdf
bats bacterioma 2.pdfbats bacterioma 2.pdf
bats bacterioma 2.pdfssuser5aa5ba
 
20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_clubagosti
 
Parfrey smbe euk_2013_final
Parfrey smbe euk_2013_finalParfrey smbe euk_2013_final
Parfrey smbe euk_2013_finalLaura_Parfrey
 
Enhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxEnhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxWilliam Ulate
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...AnitaPoudel5
 

Similar to Text-mining and ontologies - new approaches to knowledge discovery of microbial diversity (20)

New challenges in microalgae biotechnology
New challenges in microalgae biotechnologyNew challenges in microalgae biotechnology
New challenges in microalgae biotechnology
 
Naimcc pnas india
Naimcc pnas indiaNaimcc pnas india
Naimcc pnas india
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Applying For A Master´S In Microbial Biotechnology Essay
Applying For A Master´S In Microbial Biotechnology EssayApplying For A Master´S In Microbial Biotechnology Essay
Applying For A Master´S In Microbial Biotechnology Essay
 
Examples of ontology applications
Examples of ontology applicationsExamples of ontology applications
Examples of ontology applications
 
Examples of Ontology Applications
Examples of Ontology ApplicationsExamples of Ontology Applications
Examples of Ontology Applications
 
Currsci Sep25 2004
Currsci Sep25 2004Currsci Sep25 2004
Currsci Sep25 2004
 
Rebecca Skloot
Rebecca SklootRebecca Skloot
Rebecca Skloot
 
bats bacterioma 2.pdf
bats bacterioma 2.pdfbats bacterioma 2.pdf
bats bacterioma 2.pdf
 
20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club
 
Microbiome Profiling of Bacteria, Fungi AND Archaea
Microbiome Profiling of Bacteria, Fungi AND ArchaeaMicrobiome Profiling of Bacteria, Fungi AND Archaea
Microbiome Profiling of Bacteria, Fungi AND Archaea
 
Session 2: Next generation national fruit fly diagnostics and handbook
Session 2: Next generation national fruit fly diagnostics and handbookSession 2: Next generation national fruit fly diagnostics and handbook
Session 2: Next generation national fruit fly diagnostics and handbook
 
Parfrey smbe euk_2013_final
Parfrey smbe euk_2013_finalParfrey smbe euk_2013_final
Parfrey smbe euk_2013_final
 
Currsci Jan10 2003
Currsci Jan10 2003Currsci Jan10 2003
Currsci Jan10 2003
 
Session 7: Probiotic diets to increase Queensland fruit fly male performance ...
Session 7: Probiotic diets to increase Queensland fruit fly male performance ...Session 7: Probiotic diets to increase Queensland fruit fly male performance ...
Session 7: Probiotic diets to increase Queensland fruit fly male performance ...
 
Bio Technology
Bio TechnologyBio Technology
Bio Technology
 
Marine Bio-Prospecting
Marine Bio-Prospecting Marine Bio-Prospecting
Marine Bio-Prospecting
 
Silvia Alonso
Silvia Alonso Silvia Alonso
Silvia Alonso
 
Enhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxEnhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptx
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
 

Recently uploaded

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 

Recently uploaded (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 

Text-mining and ontologies - new approaches to knowledge discovery of microbial diversity

  • 1.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         1 4th International Microbial Diversity Conference, Bari - Nov. 2017 Text-mining and ontologies new approaches to knowledge discovery of microbial diversity Claire Nédellec, Bibliome MaIAGE
  • 2.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         2 Microbial  diversity,  information  sources   Where  do  micro-­‐organisms  live?     A  critical  information  that  is  collected  and  stored  in  many  public  databases   Huge  amount  of  isolation  site  information  on  micro-­‐organisms   • Data  sources:  organism  collections,  sequence  databases,  ...         • Documents:  scientific  papers,  reports   7  millions  PubMed  references  on  micro-­‐organism  habitats  [Deléger  et  al,  2016]     Often  available  for  automatic  pipelines     on-­‐line  access,  programming  interface   But  under  exploited  because  expressed  in  unstructured  free  text   Number  of  articles   about  "bacteria"  in   PubMed   24,150  "isolated  from"  entries  in  BacDive  (DSMZ)   18,000  "isolation"  entries  in  ATCC     25,000  "isolation  site"  for  bacteria  &  archae  in  Genome  On  Line  Database     Number  of  complete   genome  sequences   at  JGI  
  • 3.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         3   From  free  text  to  knowledge     Isolation  site,  always  in  free  text                       Unified  representation  of  habitat  descriptions     a  major  challenge  for  data  access  and  curation     ⇒  Facilitate  Information  access  by  reference  keywords   ⇒  Enable  Interoperability  among  databases   ⇒  Enhance  databases  by  scientific  published  knowledge   GenBank  example   Species TaxID Isolation site Acetobacter lovaniensis 104100 fermented dairy products Acetobacter lovaniensis 104100 fermented rice flour Acetobacter lovaniensis 104100 vinegar Acetobacter lovaniensis 104100 water kefir fermented food Needs   1.  A  classification  of  Habitats  relevant  to  microorganism  studies   2.  Information  extraction  method  for  mapping  free  text  entities  to  the  classes     OntoBiotope Ontology Alvis text-mining Suite
  • 4.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         4 Copyright Inra Alvis pipeline - Florilège database   Mapping  various  terms  to  an  habitat  classification         PubMed DOCUMENT TAXON HABITAT HABITAT TERM PMID: 21549046, 21247298, 16204502, 15992268, 2116711, 2116712, 15992260, 1348242, 11530195, 23042180, 23208291, 10458115, 11456331, 21669068, 17954748, 8867607, 23433372, 26325149, 8977904, 23880504, 8227616, 16156701, 15553633, 20494189, 24715203, 21441322, 19114514, 2125110, 19254151, 22980010 Listeria monocytogenes , dairy farm Dairy farm, dairy farm environments, dairy farms, dairy farm environmental samples, environment of dairy farms, potential dairy farm, Dairy farm environmental samples, single dairy farm, Irish dairy farms, high-prevalence dairy farm, dairy farm environment, dairy farms of different size, local dairy farm, second Northwest dairy farm, dairy cattle farms, selected dairy farms, dairy farm, Dairy farms     Term  variation   10,000  habitats  of  Listeria  monocytogenes  in  PubMed   Reference class
  • 5.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         5 A  classification  with  a  hierarchical  structure                     Higher  habitat  classes  needed   for  ecology  &  evolution  studies   10,000  habitats  of  Listeria  monocytogenes  in  PubMed   Alvis IR semantic search engine Scientific paper extracts Habitat classes Listeria  monocytogenes  contamination  in  Chinese  beef  processing  plants. Listeria  monocytogenes  isolated  from  artisanal  Portuguese  cheses-­‐making    dairy. the  presence  of L.  monocytogenes  in  samples  collected  from  crab  processing  plant   Portuguese  cheses-­‐making    dairy. L.  monocytogenes  persisting  in  a    cold-­‐smoked  fish  processing  plant. two L.  monocytogenes    cheese  dairy  isolates
  • 6.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         6 OntoBiotope  ontology   A  large  ontology  dedicated  to  microorganism  biotopes       What  structure  for  the  habitat  classification   Microbiology  research  domains   Reuse  of  existing  habitat  classifications  (ATCC,  GOLD,  FedEx2)   Gather  habitats  with  similar  physico-­‐chemical  properties     Ontology  scope   Extensive  study  of  habitat  terminology  in  text  (databases  and  papers)   paper mill sludge /  anaerobic sludge of paper mill waste water   Collaborations  with  microbiologists  in  focused  projects  (phytobiome,  food  microbiome)     Evaluation   Text-­‐mining  benchmarks:  Bacteria  Biotope  in  BioNLP  Shared  Tasks   Through  its  use  in  applications  (e.g.  food  positive  flora)       2329  habitat  classes   492  synonyms   13  levels    
  • 7.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         7 Habitats  in  OntoBiotope  ontology   Distributed  since  2012,    http://agroportal.lirmm.fr/ontologies/ONTOBIOTOPE   14   19   21   43   55   120   281   352   369  480   801   experimental  medium   aquaculture  habitat   bacteria  associated  habitat   medical  environment   agricultural  habitat   habitat  wrt  chemico-­‐physical  property   artiBicial  environment   living  organism   natural  environment  habitat   part  of  living  organism   food   49  classes  in  the   gastrointestinal  tract   subtree     35  classes  in  the   waste  subtree   the  largest  classes   51  classes  in  the  soil   subtree  
  • 8.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         8   51  classes  in  the  soil   subtree   Contribution  welcome
  • 9.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         9 Information  extraction  from  text  and  mapping  to  the  habitat  classes         Ontology     lives_in   newborn   gut   Article  text   Bifidobacterium  longum  is  found  in  newborn   infant  as  a  normal  component  of  gut  flora   Article  text   Bifidobacterium  longum  is  found  in  newborn   infant  as  a  normal  component  of  gut  flora   Bifidobacterium   longum   subsp.   longum   is   found   in   newborn   infant   as   a   normal   component  of  gut  flora.   Information   Bacteria:   Bifidobacterium  longum   hosted  by:  newborn  infant  [baby]   lives_in:   gut  [intestine]     Information   Bacteria:   Bifidobacterium  longum   hosted  by:  newborn  infant  [baby]   lives_in:   gut  [intestine]     Bacteria   Bifidobacterium  longum       subsp.  longum     [taxid:  1679]   hosted  by   newborn  infant   [baby]   lives_in   gut     [intestine]     Ontology   simplified  view     Information   Extraction   Text  of  articles   Formal  representation  of  the  information    
  • 10.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         10 Information  extraction  and  classification  -­‐  Process         ...  virulence  of  aquatic  pathogen  Vibrio  anguillarum  towards  sea  bass  larvae  ...             Artificial  Intelligence  methods  (machine  learning  and  natural  language  processing)     Implemented  in  several  components  (>  1  hundred)  of  Alvis  text-­‐mining  pipeline.     1.  Entity  recognition  =  identification  (text  boundaries)  and  broad  type  assignment     2.  Entity  classification  =  assignment  of  an  OntoBiotope  class   3.  Relationship  prediction  =  links  microorganism  mentions  to  their  habitats  in  the  text     Microbial  species   HabitatHabitat     aquatic  environment     marine  farm  fish   Dicentrarchus labrax   larvae   Lives  in   TaxID5560   Ratkovic  et  al.,  BMC  Bioinformatics,  2012   Nédellec  et  al.,  Handbook  on  Ontology,  2009  
  • 11.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         11   Bibliographic  sources     Semantic  ressources   ontologies   Information   extraction   Full-­‐text   data   and  metadata   Services   http://bibliome.jouy.inra.fr/demo/ontobiotope/alvisir2/webapi/search     Ba  &  Bossy,  LREC  2016  
  • 12.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         12   Extract  of   OntoBiotope,   milk  product   subtree  
  • 13.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         13 OntoBiotope  pipeline,  applied  to  PubMed     BioNLP-­‐ST   Entity  detection   Detection  and   classification   Relation  (lives  in)   Recall   65%   50%   70   Precision   81%   62%   51,4     PubMed   Documents   2,3  millions   Habitats   18,5  millions   Taxa   8,4  millions   Relations   7,2  millions     Text  source   Data  of  the  international  competition  on  bacteria  information   extraction   Nédellec  et  al.,  BMC  Bioinformatics,  2015   Ratkovic  et  al.,  BMC  Bioinformatics,  2012  
  • 14.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         14 From  research  lab  to  infrastructure,     an  European  Open  Science  perspective                     Deployment  on   OpenMinTed,   European   text-­‐mining  infrastructure     offers   to   the   scientific   communities     A   fully   open   access   in   a   unified  framework       Reproducibility  and   flexibility.     Full-­‐text   paper   collection   and   database   aggregation   and  standardisation   Przybyła  et  al.,  Database,  2016  
  • 15.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         15           Treemap  visualization  for   biodiversity  analytics Semantic  relational  search  through  all   PubMed  references   On-­‐line  services     Data  integration     http://genome.jouy.inra.fr/Florilege/
  • 16.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         16 On-­‐going  projects,  examples  of  application   Food  positive  flora     (Florilege)  MD  Poster  S2-­‐23.   Characterization  of  biodiversity,  phenotypes,  uses  and  molecules  produced/degraded     Food  innovation  (nutrient  production,  biopreservation)   1  millions  phenotypes.  1,1  million  relationships  taxon  -­‐  phenotype   Tracing  the  origin     (FoodMicrobiome  Transfert)   Cheese  ingredients  and  cheese  processing  bring  unexpected  strains   Text-­‐mining  contributes  to  express  plausible  hypotheses  on  the  source     Likelihood  of  organism  identification  (metagenomics),  consistency  with  previous  results   (Visa  TM  project)   Has  this  microorganism  already  be  identified  in  this  place?   Of  the  same  family?  In  a  similar  place?  In  a  similar  ecosystem?     [INRA  -­‐  CNIEL]   [INRA  Food  WG]   [INRA,  AgroPortal,  Inist]  
  • 17.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         17 Conclusion         Millions  of  microorganism  habitat  descriptions,  exponentially  increasing.   Invaluable  information  for  fundamental  research  and  applications   Largely  underused  because  mostly  expressed  in  free  text       OntoBiotope  ontology  and  Information  Extraction  from  text   provides  a  formal  representation  of  microorganisms  biotopes     Open  up  new  research  opportunities   • Not  only  for  data  curation  and  indexing  in  information  systems   • Analysis   in   combination   with   experimental   data   for   integrative   and   predictive   biology   A  prime  example  is  metagenomics  &  biodiversity  in  OpenMinTeD        
  • 18.   4th  International  Conference  on  Microbial  Diversity,  2017  Bari         18 Acknowledgements  and  funding   Mouhamadou  Ba,  Baptiste  Bohuon,  Robert  Bossy,  Philippe  Bessières,  Estelle  Chaix,  Louise  Deléger,   Sandra  Dérozier,  Arnaud  Ferré,  Wiktoria  Golik,  Julien  Jourde,  Valentin  Loux,  Frédéric  Papazian,   Jean-­‐  Zorana  Ratkovic,  Dialekti  Valsamou               MEM   Méta-­‐omiques  des   Ecosystèmes   Microbiens