SlideShare a Scribd company logo
1 of 45
Primer for Predocs
17-19 January 2011
Rafael Jimenez
rafael@ebi.ac.uk
EnCORE
presentation
Data integration
Table of contents
• Data integration
 Why do we need it?
 What is it?
 Problems
 Suggestions
 Different approaches
 Important variables
 Tools
Molecular Biology Database resources
Human Genes and
Diseases
13%
Proteomics Resources
1%
Other Molecular
Biology Databases
3%
Immunological
databases
2%
Plant databases
7%
Organelle databases
2%
Human and other
Vertebrate Genomes
8%
Nucleotide Sequence
Databases
9%
RNA sequence
databases
5%
Protein sequence
databases
13%
Structure Databases
9%
,Genomics Databases
non-vertebrate
19%
Metabolic and
Signaling Pathways
9%
Nucleic Acids Research annual
Database Issue and the NAR online
Molecular Biology Database
Collection in 2009. MY Galperin, GR
Cochrane - Nucleic Acids Research,
~1440
resources
http://www.oxfordjournals.org/nar/database/c
Biological pathway resources
Other
4%
Protein-Protein
Interactions
34%
Metabolic Pathways
20%Pathway Diagrams
10%
Transcription
Factors / Gene
Regulatory Networks
15%
Protein-Compound
Interactions
11%
Protein Sequence
Focused
6%
http://www.pathguide.org
~303
resources
Why so many data sources?
• Many data types
• Many communities
• Different ways to structure data
• Control
• Reputation
• Easy publication
23.08.18 6
DB
GUI
API
WS
A AA A
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
A AA A
A Annotator Database
Graphical User Interface
Application programming interface
Web Services
GUI
API
WS
User
Data collection
Ideally Reality
23.08.18 7
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Tim Hubbard
23.08.18 8
Data integration
DB
GUI
API
WS
DB DB DBDB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
DB
GUI
API
WS
NO YES
Database Query InterfaceQI User
Combining data residing in different sources
… providing users with a unified view of these data.
23.08.18 9
Utility of bioinformaticsScientificimpact
Too little
bioinformatics
Too many databases
Too diverse interfaces
Integration of
Problems
Many data sources
• Many sources to maintain
• New sources appearing
• Just 20% has a sustained future*
• How to find them?
Different query interfaces
data integration?
Variable results
• Formats
• Schemas
• Controlled vocabularies
• Minimum information guidelines
Redundant results
* Merali Z. et all. Databases in peril. Nature 2005.
Suggestions
– Scientific and political independence of the databases
– Cross-database queries spanning domain and
organizational boundaries
– Sharing and adoption rather than reinventing
– Adoption of standards
– Coordination to avoid redundant content
– Infrastructure to avoid volatile resources
– Registries to find resources and services
QI
i
1
Data centralization
Curators / Annotators
Original data sources
Third party implementations
Users
Examples:
•Uniprot
•GenBank
•IntAct
S
i
S
integration
standardization
….….
….....
….….
….....
….….
….....
1
Data centralization
UniProtKb
Examples:
•Uniprot
•GenBank
•IntAct
QI
i
2
Data warehousing
Curators / Annotators
Original data sources
Third party implementations
Users
Examples:
•Pathway Commons
•String
•Atlas
S
i
S
integration
standardization
2
Data warehousing
Examples:
•Pathway Commons
•String
•Atlas
QI
i
3
Dataset integration
Curators / Annotators
Original data sources
Third party implementations
Users
Examples:
•Your own script
•Workflows
i
S
integration
standardization
QTL
genomic
regions
genes
in QTL
metabolic
pathways
(KEGG)
3
Dataset integration
ESIP meeting,Santa Barbara, CA, July 2009 - P. Missier
Examples:
•Your own script
•Workflows
QI QIQI
i
4
Hyperlinks
Curators / Annotators
Original data sources
Third party implementations
Users
Examples:
•SRS
•Entrez
i
S
integration
standardization
4
Hyperlinks
SRS
Examples:
•SRS
•Entrez
QI QIQI
SP SP SP
QI
S
5
Federated databases
Curators / Annotators
Original data sources
Third party implementations
Users
Examples:
•DAS
•PSICQUIC
•EnCore
•RDF
i
i
S
integration
standardization
….….
….....
….….
….....
….….
….....
PSICQUIC PSICQUIC PSICQUIC
5
Federated databases
PSICQUIC
Examples:
•DAS
•PSICQUIC
•EnCore
•RDF
i
6
View integration
Curators / Annotators
Original data sources
Third party implementations
Users
QI QIQI
QI
Examples:
•BioZon
•TAMBIS
i
S
integration
standardization
6
View integration
Examples:
•BioZon
•TAMBIS
1 3
5
Popular approaches
4
6
2
Scope
Integration of datasets
leverage
1
2
Software
engineers
Bioinformaticians
Standardization and integration
of Databases
Biologists
& data analysis
Integrating different domains
Integration per domain
SPSPSP
Domain
Domain 1
QI
Domain 2
QI
Domain …
QI
QI
SP = Common identifiers, Controlled vocabularies, Common formats, Common schemas, Minimum information guidelines
1
2
leverage
Domain
Standards
• Standardization per domain
• Common identifiers
• Controlled vocabularies
• Common formats
• Common schemas
• Minimum information guidelines
• Common query interfaces
sequence databases
(INSDC)
EMBL
DDBJ
NCBI
interactions
IMEx
IntAct
BIND
DIP
MINT
…
mass spec
ProteomeXchange
PRIDE
PeptideAtlas
GPMDB
Tranche
…
Domain
Sharing infrastructures
• Multiple repositories in a particular field
Collaboration and data exchange
More data coverage
Less redundancy
Adoption of standards
23.08.18 29
Architecture
Database Query InterfaceQI User
Warehousing Federation
Architecture
• Data warehousing
– Pull data from several resources into one resource.
– Main features:
• Data centralization
• High maintenance
• Data out of date
• Modifications (schema, format, content, …)
• Federation
– Data residing in different sources with a common standard
protocol and query system.
– Main features:
• Fresh data (original)
• Data redundancy
• Data inconsistency
Query Interface
Graphical User Interface (GUI)
leverage
1
2
Software
engineers
Bioinformaticians
Programmatic interface
• API
• WS
Biologists
<xml>
…
</xml>
Custom
workflows & analysis
Federation
Warehousing
Databases
Datasets
Same
Different
Scope Domain Architecture
Programmatic
GUI
Interface
<xml>
…
</xml>
Data integration variables
Tools
• Standard formats/schemas
– i.e. DAS, PSI-MI, MzML , BioPAX , SBML , GFF3, CellML, …
• Controlled vocabularies
– i.e. Gene Ontology, Sequence Ontology, Pathway Ontology, Molecular Interaction, …
– Registries: Bioportal, OLS
• ~ 200 ontologies
• Minimum information guidelines
– i.e. MIAME, MIAPE , MIMIx , MIRIAM, …
– Registry: MIBBI
• ~ 35 guidelines
• ID Mapping services
– i.e. PICR, David , CRONOS , BridgeDB , Uniprot API , Ensembl API , DAS , Biomart, …
• API
– i.e. ENSEMBL API, Uniprot API, Biomart API, …
• Webservices
– i.e. ClustalW, ArrayExpress, Blast, …
– Registries: Biocatalogue, DASregistry, …
• ~ 2000 services
– Projects: Biomoby, EMBOSS, DAS, PSICQUIC, EMBRACE , soaplab , ENCORE , …
• Workflow management systems
– i.e. Taverna, Pegasys, Galaxy, …
Standard formats/schemas
BioPAX
PSI-MI 2
SBML,
CellML
Genetic
Interactions
Molecular Interactions
Pro:Pro All:All
Interaction Networks
Molecular Non-molecular
Pro:Pro TF:Gene Genetic
Regulatory Pathways
Low Detail High Detail
Database Exchange
Formats
Simulation Model
Exchange Formats
Rate
Formulas
Metabolic Pathways
Low Detail High Detail
Biochemical
Reactions
Small Molecules
Low Detail High Detail
Anatoly Sorokin
Standard formats/schemas
Controlled vocabularies
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
Ontology Lookup Service
Minimum information guidelines
• PSI: Proteomics Standards Initiative
– Work group of the Human Proteome Organization
– Defines community standards for data in proteomics
• … facilitating data comparison, exchange and verification
Minimum information guidelines
38
• MIAPE: The Minimum Information About a Proteomics Experiment
• Data and metadata from proteomics experiments
• Data: results
• Metadata: data about the data
• Where the samples came from
• How the analysis were performed
Minimum information guidelines
MIMIx
• MIAPE document guideline for molecular interactions
• 1. Manuscript information
• 2. Experiment
• 3. Interaction
• 4. Confidence
ID Mapping services
Logical xref
(hyperlinked)
Inactive xref
Secondary
Identifier
Active xref
(hyperlinked)
Richard Cote
Web services!
•REST
•SOAP
http://www.ebi.ac.uk/Tools/picr/
Protein Identifier Cross-Reference Service
Web services
Web services
Workflow management systems
Taverna
Workflow management systems
Examples from myExperiment
OLS
PICR
Biomart and
Microarray analysis
ChEBI
Thank you!
Questions?
ProteomicsServicesTeam

More Related Content

What's hot

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
innocent87
 
The future of scientific information & communication
The future of scientific information & communicationThe future of scientific information & communication
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
EBI
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
Connected Data World
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Benjamin Good
 

What's hot (20)

ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
ICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris TownICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris Town
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016
 
A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
 
The future of scientific information & communication
The future of scientific information & communicationThe future of scientific information & communication
The future of scientific information & communication
 
Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications
 
Systems Biology Systems
Systems Biology SystemsSystems Biology Systems
Systems Biology Systems
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 

Similar to Data integration

Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
Dr. Haxel Consult
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 

Similar to Data integration (20)

Data standards in bioinformatics
Data standards in bioinformatics	Data standards in bioinformatics
Data standards in bioinformatics
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Enfin, DAS and BioMart
Enfin, DAS and BioMartEnfin, DAS and BioMart
Enfin, DAS and BioMart
 
The expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry communityThe expansive reach of ChemSpider as a resource for the chemistry community
The expansive reach of ChemSpider as a resource for the chemistry community
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Webservices and Workflows. Taverna, Biocatalgue and myExperiment.
Webservices and Workflows. Taverna, Biocatalgue and myExperiment.Webservices and Workflows. Taverna, Biocatalgue and myExperiment.
Webservices and Workflows. Taverna, Biocatalgue and myExperiment.
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
PSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUICPSI-MI standards and PSICQUIC
PSI-MI standards and PSICQUIC
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 

More from Rafael C. Jimenez

The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
Rafael C. Jimenez
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
Rafael C. Jimenez
 

More from Rafael C. Jimenez (20)

BMB Resource Integration Workshop
BMB Resource Integration WorkshopBMB Resource Integration Workshop
BMB Resource Integration Workshop
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Proteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resourcesProteomics repositories integration using EUDAT resources
Proteomics repositories integration using EUDAT resources
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Standards
StandardsStandards
Standards
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 
An introduction to programmatic access
An introduction to programmatic accessAn introduction to programmatic access
An introduction to programmatic access
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.Challenges of big data. Summary day 1.
Challenges of big data. Summary day 1.
 
Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.Challenges of big data. Aims of the workshop.
Challenges of big data. Aims of the workshop.
 
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...Data submissions and archiving raw data in life sciences. A pilot with Proteo...
Data submissions and archiving raw data in life sciences. A pilot with Proteo...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
SASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course informationSASI, A lightweight standard for exchanging course information
SASI, A lightweight standard for exchanging course information
 

Recently uploaded

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

Data integration

Editor's Notes

  1. As a biologist I would prefer to see all the information in one unique database. Centralized databases have this mission. The aim to collect all the information for one specific domain. However … Medium-size labs and organizations are capable to produce large amounts of data. The it becomes harder to submit data to centralized repositories. Moreover data producers like to control and structure their own databases, developing their own GUI and access protocols. For us, the users, it becomes harder to access the information. For one specific domain we might find different databases, using different GUIs. We might end up downloading data in different formats complicating the integration of results. After integration we might find a problem of high redundancy in our results.
  2. This workflow searches for genes which reside in a QTL (Quantitative Trait Loci) region in the mouse, Mus musculus. The workflow requires an input of: a chromosome name or number; a QTL start base pair position; QTL end base pair position. Data is then extracted from BioMart to annotate each of the genes found in this region. The Entrez and UniProt identifiers are then sent to KEGG to obtain KEGG gene identifiers. The KEGG gene identifiers are then used to searcg for pathways in the KEGG pathway database. this is pathways_and_gene_annotations_for_qtl_phenotype_28303 exec with chromosome = 17 start_position = 28500000 end_position = 32500000
  3. The HUPO Proteomics Standards Initiative (PSI) defines community standards for data representation in proteomics to facilitate data comparison, exchange and verification. The PSI was founded at the HUPO meeting in Washington, April 28-29, 2002 MIAPE: The Minimum Information About a Proteomics Experiment . Guidance document specifying the data and metadata that should be collected from proteomics experiments Where samples came from and how analyses were performed Data accompanied by context: &amp;apos;metadata&amp;apos; (&amp;apos;data about the data&amp;apos;)
  4. Integration of biological data of various types and development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both generation of new bioinformatics tools and experimental validation of computational predictions. Beyond the use of common standards to format individual datasets, there is a need for sophisticated informatics platforms to enable mining data across various domains, sources, formats and types. The aim of the EnCORE project is to integrate across different disciplines an extensive list of database resources and analysis tools in a computationally accessible and extensible manner, facilitating automated data retrieval and processing with a special focus on systems biology. The EnCORE platform is available as a collection of webservices with a common standard format easy to integrate in Workflow management software such as Taverna. Additionally EnCORE services are also accessible thought EnVISION, a web graphical user interface providing elaborated information such as molecular interaction, biological pathways and computational models of pathways.