SlideShare ist ein Scribd-Unternehmen logo
1 von 13
InterMine 
Integrated Data Warehouse 
Use Cases: Arabidopsis & Medicago Genome Projects 
Vivek Krishnakumar 
Plant Genomics Group (EUK) 
IFX Research WIPS Meeting, 03 October 2014
Overview 
• Introduction 
• InterMine 
 Integrated data warehouse, Extensible data model, 
Flexible query system 
 Web and Programmatic Interface 
 Other InterMine instances 
• Use cases 
 Arabidopsis Information Portal (AIP) 
 Medicago truncatula Genome Database (MTGD) 
• Summary 
 Advantages 
 Caveats
Introduction 
For genome projects that wish to expose their 
data via the web (query, visualize, warehouse) 
to foster scientific collaboration, there are 
several technologies available: 
• JCVI developed software 
 Manatee (backed by an RDBMS) 
• Externally developed software 
 BioMart (federated from various databases) 
 Tripal (powered by Drupal, backed by CHADOdb) 
 InterMine
InterMine 
• Functions as a data warehouse for the integration of complex 
biological data. Integration across data types occurs based on 
a common identifier (e.g. gene primary ID) 
• Uses a flexible and extensible data model, controlled by XML 
files, driven by ontologies (Sequence [SO], Gene [SO], etc.) 
 Genomics, Proteomics, Interactions, Homology, 
Expression, Pathways (and more data types) 
 Parsers for commonly used biological data formats 
 Provides framework for adding your own data 
• Offers a flexible query system, optimized via precomputed 
tables (no need for schema denormalization) 
Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data 
Bioinformatics (2012) 28 (23): 3163-3165
InterMine (contd.) 
• Provides a user-friendly web interface exposing 
powerful features: 
 Analysis of lists (facilitate enrichment studies) 
 Full-featured report pages (one-stop shop) 
 Interactive result tables (sort, filter, summarize) 
 Visual query builder (no need to write SQL!) 
 Quick search and Region-based search 
• Fosters development of external applications 
using data hosted within InterMine via Application 
Programming Interfaces (API): 
 RESTful 
 Perl, Python, Ruby, Java, JavaScript 
Kalderimis, A. et al. InterMine: extensive web services for modern biology 
Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
Public “Mines” 
• InterMine supports querying across mines 
for cross-database integration 
• Vast number of warehouses powered by 
InterMine already exist
Arabidopsis Information Portal (AIP) 
• AIP origins 
 Funded by NSF in response to community needs, following 
termination of funding to TAIR 
• AIP objectives 
 Develop a community web resource that… 
– is sustainable and fundable and community-extensible 
– hosts analysis & visualization tools, user data spaces 
 Federation: integrate diverse data sets from distributed data 
sources; foster development of tools for and by the community 
 Maintenance of the Col-0 gold standard annotation 
• AIP methods 
 Assimilate TAIR data 
 Host an InterMine instance devoted to Arabidopsis (thale cress) 
 Offer and consume RESTful web services 
 Integrate and utilize iPlant resources
ThaleMine 
https://apps.araport.org/thalemine 
• An InterMine interface 
to Arabidopsis genomic 
data 
• Integrates a wide 
variety of data types 
(A-E, H), some of 
which are warehoused 
and others are 
federated via web 
services 
• Embedded elements 
visualizing gene 
structure (JBrowse, not 
shown), interaction 
networks (F), 
expression patterns (G)
Visual Query Builder 
Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
Interactive Result Tables Region-based search 
Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
MedicMine 
http://medicmine.jcvi.org 
• NSF funded project to 
assist with the curation 
of the Medicago 
truncatula Genome 
Assembly and 
Annotation (funding 
ended August 2014) 
• In order to warehouse 
and prolong the project 
data, an InterMine 
interface for Medicago 
was implemented 
(backed by a CHADO 
database) 
• Provides similar kind of 
functionality available via 
ThaleMine
Summary 
• Advantages 
 InterMine is a powerful biological data warehouse 
 Performs complex data integration 
 Allows fast and flexible querying 
 Well documented programmatic interface 
 Cookie-cutter, user-friendly web interface 
 Facilitates cross-talk between “mines” 
• Caveats 
 Adding more data requires a full database rebuild (incremental loading 
is not possible) because of the integration step 
• About InterMine: 
 Developed by the Micklem Lab at the University of Cambridge, UK 
 Written in Java, backed by PostgreSQLdb, deployed under Tomcat. 
Documentation and downloads available at http://www.intermine.org
Chris Town, PI 
Chris Nelson 
PM 
Lisa McDonald 
Education and 
Outreach 
Coordinator 
Jason Miller, Co-PI 
Technical Lead 
Erik Ferlanti 
SE 
Vivek Krishnakumar 
BE 
Svetlana Karamycheva 
BE 
Maria Kim 
BE 
Gos Micklem, co-PI Sergio Contrino 
Eva Huala 
Project lead, TAIR 
Software Engineer 
Bob Muller 
Technical lead, TAIR 
Matt Vaughn 
co-PI Steve Mock 
Advanced Computing 
Interfaces 
Rion Dooley, 
Web and Cloud 
Services 
Matt Hanlon, 
Web and Mobile 
Applications 
Ben Rosen 
BA

Weitere ähnliche Inhalte

Was ist angesagt?

Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentManjulaPatel
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Artificial Intelligence Institute at UofSC
 
Bioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of MinhoBioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of Minhointrofini
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterGlobus
 
Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25emorency
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discoverySyed Ahmad Chan Bukhari, PhD
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainMichael Genkin
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0mehmood78
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challengehopbeat
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Updateimgcommcall
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesBhojaraju Gunjal
 
Metid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for ScienceMetid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for Scienceale93756
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishArchiver
 

Was ist angesagt? (20)

Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
 
COBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF SecretariatCOBWEB: Brief Introduction, GBIF Secretariat
COBWEB: Brief Introduction, GBIF Secretariat
 
Bioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of MinhoBioinformatics presentation to students University of Minho
Bioinformatics presentation to students University of Minho
 
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus PosterNIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
 
Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25Maelstrom-Research: Mica 2012 04-25
Maelstrom-Research: Mica 2012 04-25
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discovery
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research Domain
 
The VIVO Ontology Project
The VIVO Ontology ProjectThe VIVO Ontology Project
The VIVO Ontology Project
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
eCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design ChallengeeCitizen Sensible-Data Design Challenge
eCitizen Sensible-Data Design Challenge
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
PRISM Project Update
PRISM Project UpdatePRISM Project Update
PRISM Project Update
 
The agINFRA Germplasm Working Group
The agINFRA Germplasm Working GroupThe agINFRA Germplasm Working Group
The agINFRA Germplasm Working Group
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challenges
 
Metid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for ScienceMetid Match 2014 - SEEK for Science
Metid Match 2014 - SEEK for Science
 
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
 

Andere mochten auch

დედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტებიდედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტებიirmasurmanidze5
 
Cami lo anongcar
Cami lo anongcarCami lo anongcar
Cami lo anongcarharniel
 
Persuasive design presentationd3=r1
Persuasive design presentationd3=r1Persuasive design presentationd3=r1
Persuasive design presentationd3=r1Sebastian Daum
 
The piece of paper
The piece of paperThe piece of paper
The piece of paperharniel
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopVivek Krishnakumar
 

Andere mochten auch (8)

Ux in dm d4=r1
Ux in dm d4=r1Ux in dm d4=r1
Ux in dm d4=r1
 
დედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტებიდედამიწის წყლისა და ხმელეთის ობიექტები
დედამიწის წყლისა და ხმელეთის ობიექტები
 
An overview of BizTalk
An overview of BizTalkAn overview of BizTalk
An overview of BizTalk
 
Cami lo anongcar
Cami lo anongcarCami lo anongcar
Cami lo anongcar
 
Dracaena
DracaenaDracaena
Dracaena
 
Persuasive design presentationd3=r1
Persuasive design presentationd3=r1Persuasive design presentationd3=r1
Persuasive design presentationd3=r1
 
The piece of paper
The piece of paperThe piece of paper
The piece of paper
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 

Ähnlich wie Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Bradford Condon
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipLiaquat Rahoo
 
DLF 2008 Spring Forum - HarvestChoice
DLF 2008 Spring Forum  - HarvestChoiceDLF 2008 Spring Forum  - HarvestChoice
DLF 2008 Spring Forum - HarvestChoicelibsys
 
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...ChemAxon
 
Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Stephen Katz
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Dag Endresen
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017Vivien Bonazzi
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...aceas13tern
 
GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures Francisco Pando
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainPierre Larmande
 

Ähnlich wie Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting (20)

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
 
New ICT Trends and Issues of Librarianship
New ICT Trends and Issues of LibrarianshipNew ICT Trends and Issues of Librarianship
New ICT Trends and Issues of Librarianship
 
DLF 2008 Spring Forum - HarvestChoice
DLF 2008 Spring Forum  - HarvestChoiceDLF 2008 Spring Forum  - HarvestChoice
DLF 2008 Spring Forum - HarvestChoice
 
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
 
Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures GBIF: An infrastructure for infrastructures
GBIF: An infrastructure for infrastructures
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Enabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic DomainEnabling knowledge management in the Agronomic Domain
Enabling knowledge management in the Agronomic Domain
 

Mehr von Vivek Krishnakumar

What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017Vivek Krishnakumar
 
JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017Vivek Krishnakumar
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureVivek Krishnakumar
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Vivek Krishnakumar
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumVivek Krishnakumar
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMinesVivek Krishnakumar
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428Vivek Krishnakumar
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
 

Mehr von Vivek Krishnakumar (9)

What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
 
JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD Minisymposium
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
 

Kürzlich hochgeladen

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 

Kürzlich hochgeladen (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 

Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

  • 1. InterMine Integrated Data Warehouse Use Cases: Arabidopsis & Medicago Genome Projects Vivek Krishnakumar Plant Genomics Group (EUK) IFX Research WIPS Meeting, 03 October 2014
  • 2. Overview • Introduction • InterMine  Integrated data warehouse, Extensible data model, Flexible query system  Web and Programmatic Interface  Other InterMine instances • Use cases  Arabidopsis Information Portal (AIP)  Medicago truncatula Genome Database (MTGD) • Summary  Advantages  Caveats
  • 3. Introduction For genome projects that wish to expose their data via the web (query, visualize, warehouse) to foster scientific collaboration, there are several technologies available: • JCVI developed software  Manatee (backed by an RDBMS) • Externally developed software  BioMart (federated from various databases)  Tripal (powered by Drupal, backed by CHADOdb)  InterMine
  • 4. InterMine • Functions as a data warehouse for the integration of complex biological data. Integration across data types occurs based on a common identifier (e.g. gene primary ID) • Uses a flexible and extensible data model, controlled by XML files, driven by ontologies (Sequence [SO], Gene [SO], etc.)  Genomics, Proteomics, Interactions, Homology, Expression, Pathways (and more data types)  Parsers for commonly used biological data formats  Provides framework for adding your own data • Offers a flexible query system, optimized via precomputed tables (no need for schema denormalization) Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data Bioinformatics (2012) 28 (23): 3163-3165
  • 5. InterMine (contd.) • Provides a user-friendly web interface exposing powerful features:  Analysis of lists (facilitate enrichment studies)  Full-featured report pages (one-stop shop)  Interactive result tables (sort, filter, summarize)  Visual query builder (no need to write SQL!)  Quick search and Region-based search • Fosters development of external applications using data hosted within InterMine via Application Programming Interfaces (API):  RESTful  Perl, Python, Ruby, Java, JavaScript Kalderimis, A. et al. InterMine: extensive web services for modern biology Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
  • 6. Public “Mines” • InterMine supports querying across mines for cross-database integration • Vast number of warehouses powered by InterMine already exist
  • 7. Arabidopsis Information Portal (AIP) • AIP origins  Funded by NSF in response to community needs, following termination of funding to TAIR • AIP objectives  Develop a community web resource that… – is sustainable and fundable and community-extensible – hosts analysis & visualization tools, user data spaces  Federation: integrate diverse data sets from distributed data sources; foster development of tools for and by the community  Maintenance of the Col-0 gold standard annotation • AIP methods  Assimilate TAIR data  Host an InterMine instance devoted to Arabidopsis (thale cress)  Offer and consume RESTful web services  Integrate and utilize iPlant resources
  • 8. ThaleMine https://apps.araport.org/thalemine • An InterMine interface to Arabidopsis genomic data • Integrates a wide variety of data types (A-E, H), some of which are warehoused and others are federated via web services • Embedded elements visualizing gene structure (JBrowse, not shown), interaction networks (F), expression patterns (G)
  • 9. Visual Query Builder Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
  • 10. Interactive Result Tables Region-based search Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
  • 11. MedicMine http://medicmine.jcvi.org • NSF funded project to assist with the curation of the Medicago truncatula Genome Assembly and Annotation (funding ended August 2014) • In order to warehouse and prolong the project data, an InterMine interface for Medicago was implemented (backed by a CHADO database) • Provides similar kind of functionality available via ThaleMine
  • 12. Summary • Advantages  InterMine is a powerful biological data warehouse  Performs complex data integration  Allows fast and flexible querying  Well documented programmatic interface  Cookie-cutter, user-friendly web interface  Facilitates cross-talk between “mines” • Caveats  Adding more data requires a full database rebuild (incremental loading is not possible) because of the integration step • About InterMine:  Developed by the Micklem Lab at the University of Cambridge, UK  Written in Java, backed by PostgreSQLdb, deployed under Tomcat. Documentation and downloads available at http://www.intermine.org
  • 13. Chris Town, PI Chris Nelson PM Lisa McDonald Education and Outreach Coordinator Jason Miller, Co-PI Technical Lead Erik Ferlanti SE Vivek Krishnakumar BE Svetlana Karamycheva BE Maria Kim BE Gos Micklem, co-PI Sergio Contrino Eva Huala Project lead, TAIR Software Engineer Bob Muller Technical lead, TAIR Matt Vaughn co-PI Steve Mock Advanced Computing Interfaces Rion Dooley, Web and Cloud Services Matt Hanlon, Web and Mobile Applications Ben Rosen BA