SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
MongoDB and research Jan Aerts, PhD Wellcome Trust Sanger Institute Hinxton, UK [email_address] @jandot
Disclaimer 1
Disclaimer 2
Acknowledgments MongoDB community Karen Ambrose 10gen
 
transcriptomics genomics proteomics *omics
transcriptomics genomics proteomics *omics instantiationomics metabolomics spliceomics interactomics metallomics lipidomics orfeomics phenomics histomics
Academia != industry
heterogeneous systems
transitory
little optimization
slow adoption of new technology (don't break anything that works)
data management = afterthought money
Who are the players?
[object Object],genome hackers (lone bioinformaticians) bench-based scientists Drawings by Morag Ann Lewis
[object Object],genome hackers (lone bioinformaticians) bench-based scientists heavy investment in infrastructure/pipelines data exchange => standards!
[object Object],genome hackers (lone bioinformaticians) bench-based scientists little investment in infrastructure little time/effort for optimization one-off getting it done creating legacy need IT support for heavier work often self-taught
[object Object],genome hackers (lone bioinformaticians) bench-based scientists use whatever everyone else is using "normalization?"
The data landscape
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. Flat text files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
2. Binary compressed flat files ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
3. MySQL and Oracle Curated data Meta-data Raw data: BLOBs ,[object Object],[object Object],[object Object],[object Object],[object Object],Denormalized copy
4. AceDB -  A   C aenorhabditis  e legans  d ata b ase object-oriented Author "Patel B"  Full_name "Bala Patel"  Laboratory CB  Paper [cgc1011]  Paper [cgc533]  Mail "Laboratory of Molecular Biology"  Mail "Hills Road, Cambridge"  Fax "050 3456789"    Paper [cgc533]  Title "Yet more of those Genes"  Journal "Cell Reports"  Volume 3  Year 1993
 
Challenges in *omics - Where can MongoDB play a role?
explosion of data every  researcher must be able to handle data
low stepping stone for bench-based scientists big data
 
Takeoff within research community? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thank you! Questions? [email_address] @jandot http://saaientist.blogspot.com

Weitere ähnliche Inhalte

Ähnlich wie MongoDB and research

リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factoryリアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline FactoryRyosuke Otsuya
 
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Borlaug Global Rust Initiative
 
Sales Ranking Results
Sales Ranking ResultsSales Ranking Results
Sales Ranking ResultsWill Liang
 
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...IRJET Journal
 
Jardim bot2010 jc
Jardim bot2010 jcJardim bot2010 jc
Jardim bot2010 jcjhcapelo
 
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...AUTHELECTRONIC
 
Waterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - UpdatedWaterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - UpdatedJason Rota
 
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V NewOriginal MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V NewAUTHELECTRONIC
 
Girth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniquesGirth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniquesKingston Rivington
 
34 bolt certificate - bimura.pdf
34 bolt certificate - bimura.pdf34 bolt certificate - bimura.pdf
34 bolt certificate - bimura.pdfMadeHargani
 
Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...
Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...
Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...Quality Assistance s.a.
 
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...Komandur Sunder Raj, P.E.
 
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 NewOriginal IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 Newauthelectroniccom
 
Appendix b structural steel design based on allowable stress
Appendix  b structural steel design based on allowable stressAppendix  b structural steel design based on allowable stress
Appendix b structural steel design based on allowable stressChhay Teng
 
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介Hideki Takase
 
Combustion Turbine Efficiency Impact
Combustion Turbine Efficiency ImpactCombustion Turbine Efficiency Impact
Combustion Turbine Efficiency ImpactKatherine Corcoran
 
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...IBM Analytics Japan
 
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationSite directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationTyler Liang
 

Ähnlich wie MongoDB and research (20)

リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factoryリアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
リアルタイム議事録&翻訳付きのビデオ会議を作ろう ~WebRTCの最新動向~ SkyWay Media Pipeline Factory
 
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
 
Sales Ranking Results
Sales Ranking ResultsSales Ranking Results
Sales Ranking Results
 
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
IRJET - Optical Emission Technique for Understanding the Spark Gap Discharge ...
 
Variation Toolkit
Variation ToolkitVariation Toolkit
Variation Toolkit
 
Jardim bot2010 jc
Jardim bot2010 jcJardim bot2010 jc
Jardim bot2010 jc
 
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
Original N-CHANNEL MOSFET GP8NC60KD 8NC60KD 8NC60 15A 600V TO-220 New STMicro...
 
Waterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - UpdatedWaterfall Turbine Development Primer - Updated
Waterfall Turbine Development Primer - Updated
 
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V NewOriginal MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
Original MOSFET N-Channel STGF10NC60KD GF10NC60KD 10NC60 10N60 10A 600V New
 
Girth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniquesGirth flange load calculation using by fea techniques
Girth flange load calculation using by fea techniques
 
34 bolt certificate - bimura.pdf
34 bolt certificate - bimura.pdf34 bolt certificate - bimura.pdf
34 bolt certificate - bimura.pdf
 
Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...
Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...
Stratégies orthogonales pour la caractérisation de glycoprotéines thérapeutiq...
 
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
2005 ASME Power Conference Analysis of Turbine Cycle Performance Losses Using...
 
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 NewOriginal IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
Original IGBT N-CHANNEL STGP7NC60HD GP7NC60HD 7NC60 14A 600V TO-220 New
 
Appendix b structural steel design based on allowable stress
Appendix  b structural steel design based on allowable stressAppendix  b structural steel design based on allowable stress
Appendix b structural steel design based on allowable stress
 
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介Elixirでハードウェアが作れちゃう,かも!!?データフロー型ハードウェア設計環境Cockatriceのご紹介
Elixirでハードウェアが作れちゃう,かも!!? データフロー型ハードウェア設計環境Cockatriceのご紹介
 
12 sar ergen medeeleh.
12 sar ergen medeeleh.12 sar ergen medeeleh.
12 sar ergen medeeleh.
 
Combustion Turbine Efficiency Impact
Combustion Turbine Efficiency ImpactCombustion Turbine Efficiency Impact
Combustion Turbine Efficiency Impact
 
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
Db2 Warehouse v3.0 SMP 導入ガイド 20190104 Db2 Warehouse SMP v3.0 configration Ins...
 
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint PresentationSite directed mutagenesis of β2-microglobulin PowerPoint Presentation
Site directed mutagenesis of β2-microglobulin PowerPoint Presentation
 

Mehr von Jan Aerts

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data AnalysisJan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...Jan Aerts
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...Jan Aerts
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 

Mehr von Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Kürzlich hochgeladen

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 

Kürzlich hochgeladen (20)

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 

MongoDB and research

Hinweis der Redaktion

  1. Not an expert. Reason will be explained further in the presentation.
  2. * personal ideas/opinions; not necessarily Sanger’s
  3. My background
  4. there are hurdles for adoption in academia
  5. Many people in institute; many ways of doing things + many tools
  6. Data is often transitory. Apart from the raw sequencing data (served by e.g. EBI): data can often be archived once paper is written.
  7. Because transitory. A one-off script that takes 5 minutes to write and a day to run is often preferable to one that takes a day to write and 5 minutes to run.
  8. Because we want to focus on the research, not the tools. If the available tools get the work done they will suffice.
  9. In many smaller labs: data management is not part of the initial grants. Is starting to change with the next-generation sequencing data.
  10. “ Genome hacker”: very broad. From guy-who-knows-how-to-record-macros-in-Word to hardcore mathematicians.
  11. "need IT support for heavier work": set up MongoDB server => what if need sharded cluster? => investment from IT "creating legacy": if it's something that will be used after you're gone (typical contract: postdoc = 3-5 yrs), you don't want to use a technology that is not supported or actively used within the organization “ often self-taught”!!!
  12. “ normalization?”: Overkill to try and persuade them to use databases if you have to teach them normal forms.
  13. What does the data look like?
  14. Very difficult to parse without custom libraries (bio*)
  15. “ //” => start of new record
  16. State of the art. Is tab-delimited, but not really.
  17. “ ##”: header “ #”: column headers INFO field: ‘;’-separated tag-value pairs (themselves separated with a ‘=‘) FORMAT field: necessary to know what is in the NA00001 column; colon-separated
  18. Not really tab-delimited anymore because too structured Self-taught => simple scripting languages!
  19. New technologies + existing technologies improved + decreasing cost of data generation
  20. Would benefit most. "bench-based scientists": - are more and more learning perl and working with tab-delimited files - to go from Exel to database: json looks more like how they think than having to cope with normalization steps in a relational database “ big data”: auto-sharding, mapreduce, …
  21. In-road into research: via department bioinformatician: constantly looking for new things Least effort of implementing and least costly if failure
  22. Focus is often on data-exchange => a lot of effort on exchange file formats