SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Biogrid – Bioinformatics for the grid

    Joel Hedlund <yohell@ifm.liu.se>
       Biogrid User and Developer
      Linköping University, Sweden

      Birds-of-a-feather session tonight: see me after this talk!
Outline
•   What is it?
•   What is it good for?
•   Does it really work?
•   Gory details.
•   Why did we do this?
•   Profit!
What is it?



NDGF BIO Community Grid
   Bioinformatics for the Grid
What is it?
• Unified interface
  ...to popular bioinformatic applications
  ...on shared, distributed computational resources
  ...using versioned and cached databases
What is it good for?
• Burst computing
  – High demand for short periods of time
     • high during development / production
     • low during analysis / writing papers
  – Share resources to enable more efficient use
• Database accessibility
• Availibility
• Unified interface
What is NDGF?
What is NDGF?
• Nordic Data Grid Facility
• A WLCG Tier1 facility
  – Worldwide LHC Computational Grid
  – Stores and processes data from LHC at CERN
     • peak rate ≈ 1.6Gb/s, when the accelerator is running
       (and that’s after most of the data have been filtered away)
”Does it really work, this
  distributed thingie?”
”Does it really work, this
  distributed thingie?”
 Why yes, very well thank you!
NDGF
• 96% availablity
  (highest of all Tier1 facilities)

• Third largest Tier1 facility in the world
• Lowest ratio of failed ATLAS jobs
• Production goals met, and beyond
   – Goal: 8% of all ATLAS resources (10.5% provided)
   – Goal: 9% of all ALICE resources (12% provided)




                    * Data graciously stolen from Leif Nixons NorduNet 2008 talk. Thank you Leif :-)
DISTRIBUTION
    IS A
 STRENGTH
It enforces unification

It ensures availability
Does it really work?


 It’s good enough for LHC.
It’s good enough for Bioinformatics.
Gory details
Biogrid provides
Optimised applications:
  – BLAST
  – ClustalW
  – HMMER
  – Muscle
  – Mafft




                          Planned: molecular dynamics, phylogeny...
Biogrid provides
Versioned, indexed and cached databases
  – UniProtKB (subreleases)
  – Uniref (subreleases)




                       Planned: genomes (EnsEMBL), nucleotides (EMBL)...
Cached database access




Database files are transfered to the cluster at most once per project.
Unified Interface
Unified Interface
Unified Interface


             DATA




             RESULTS
Unified Interface
• XRSL Job Description
  Standard in ARC Grid Middleware

• Well defined runtime environments
   $HMMERDIR: node local (fast) scratch dir containing db files
   prepare_db: download and unpack db files on the fly from front node to $HMMERDIR
XRSL Job Description
(jobName=refinehmm-family023)
(runTimeEnvironment=APPS/BIO/HMMER2.3.2)
(cpuTime=3000)
(executable=refinehmm.jobscript.sh)
(inputFiles=
  (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz)
  (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz)
  (family023.hmm ””)
)
(outputfiles=
  (family023.refined.hmm ””)
)
XRSL Job Description
(jobName=refinehmm-$HMM_NAME)
(runTimeEnvironment=APPS/BIO/HMMER2.3.2)
(cpuTime=3000)
(executable=refinehmm.jobscript.sh)
(inputFiles=
  (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz)
  (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz)
  ($HMM_NAME.hmm ””)
)
(outputfiles=
  ($HMM_NAME.refined.hmm ””)
)
Unified Interface
• Run on any resource I can access:
  $ ngsub myjob.xrsl

• ...or run on my buddy’s cluster:
  $ ngsub -c kiniini.csc.fi myjob.xrsl

• Check jobs:
  $ ngstat refinehmm-family023
  (or use Grid Monitor web interface at www.nordugrid.org)

• Fetch results:
  $ ngget refinehmm-family*



                     DATA                GRID
                                                RESULTS
What do I need?
    1. A resource with ARC and Biogrid REs
    2. An ARC client
    3. A Grid Certificate
       (available from a number of global certificate authorities)

    4. Time allowance on the resource



(   5. Biogrid VO Membership
       Not really necessary, but it will get you 1 & 4   )
What do I need?



...or you can just grab the RE scripts off the biogrid website,
        and your db of choice from the biogrid dCache.
Why did we do this?
Bioinformatic applications...
  – CPU intensive
  – Small input and output files
  – ”Large” databases can be cached

...are very well suited for distributed computing.
Profit!
Subclassification of the MDR superfamily

• 15000 members
    from all kingdoms of life

• 500 families
    25% sequence identity

•   40 human members
•   Different substrate specificities
•   Different subunit & cofactor count
•   2 HMMs available for superfamily detection
•   None for any of the individual families
Subclassification of the MDR superfamily

• We made HMMs for all MDR (sub)families
  with 20+ members.
• 86 families
• 34 detected subfamilies to 14 of these
• 11579 / 15000 sequences classified
• ≈5000*hmmsearch vs UniProtKB



                                Manuscript in preparation
refinehmm
• Algorithm for automated HMM refinement
• Produces stable and reliable HMMs
• Developed using Biogrid REs and resources




                Will also be open source software once the paper is out.
Acknowledgements
  • Olli Tourunen                       Supercomputing centers
    Biogrid developer
                                        • NSC
  • Bengt Persson                         Jens Larsson, Leif Nixon
    Biogrid PI
                                        • HPC2N
  • NDGF                                  Åke Sandgren
    Michael Grønager
    Josva Kleist                        • Others
                                          C3SE, CSC, Uppmax, Lunarc, PDC,
  • Biogrid co-applicants                 Aalborg University, Oslo University
    Ann-Charlotte Berglund Sonnhammer
    Erik Sonnhammer
    Inge Jonassen                                                 Joel Hedlund
                                                              yohell@ifm.liu.se
                                                    Biogrid User and Developer
                                                  Linköping University, Sweden

Birds-of-a-feather session tonight: see me after the talk!
Acknowledgements
  • Olli Tourunen                       Supercomputing centers
    Biogrid developer
                                        • NSC
  • Bengt Persson                         Jens Larsson, Leif Nixon
    Biogrid PI
                                        • HPC2N
  • NDGF                                  Åke Sandgren
    Michael Grønager
    Josva Kleist                        • Others
                                          C3SE, CSC, Uppmax, Lunarc, PDC,
  • Biogrid co-applicants                 Aalborg University, Oslo University
    Ann-Charlotte Berglund Sonnhammer
    Erik Sonnhammer
    Inge Jonassen                                                 Joel Hedlund
                                                              yohell@ifm.liu.se
                                                    Biogrid User and Developer
                                                  Linköping University, Sweden

Birds-of-a-feather session tonight: see me after the talk!

Weitere ähnliche Inhalte

Ähnlich wie Biogrid - Distributed Bioinformatics for the Grid

Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewEdizonJambormias2
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupinside-BigData.com
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTimeScience
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Globus
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...TimeScience
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
Dp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_finalDp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_finalBikramjit Chowdhury
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposiumguest5e6f31
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
 

Ähnlich wie Biogrid - Distributed Bioinformatics for the Grid (20)

ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
 
Mastering Bio Grid
Mastering Bio GridMastering Bio Grid
Mastering Bio Grid
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
 
HiPipe Professional
HiPipe ProfessionalHiPipe Professional
HiPipe Professional
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
Climb bath
Climb bathClimb bath
Climb bath
 
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen ...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Dp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_finalDp2 ppt by_bikramjit_chowdhury_final
Dp2 ppt by_bikramjit_chowdhury_final
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
DeepLabCut AI Residency
DeepLabCut AI ResidencyDeepLabCut AI Residency
DeepLabCut AI Residency
 

Mehr von bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009bosc
 

Mehr von bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Biogrid - Distributed Bioinformatics for the Grid

  • 1. Biogrid – Bioinformatics for the grid Joel Hedlund <yohell@ifm.liu.se> Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after this talk!
  • 2. Outline • What is it? • What is it good for? • Does it really work? • Gory details. • Why did we do this? • Profit!
  • 3. What is it? NDGF BIO Community Grid Bioinformatics for the Grid
  • 4. What is it? • Unified interface ...to popular bioinformatic applications ...on shared, distributed computational resources ...using versioned and cached databases
  • 5. What is it good for? • Burst computing – High demand for short periods of time • high during development / production • low during analysis / writing papers – Share resources to enable more efficient use • Database accessibility • Availibility • Unified interface
  • 7. What is NDGF? • Nordic Data Grid Facility • A WLCG Tier1 facility – Worldwide LHC Computational Grid – Stores and processes data from LHC at CERN • peak rate ≈ 1.6Gb/s, when the accelerator is running (and that’s after most of the data have been filtered away)
  • 8.
  • 9.
  • 10. ”Does it really work, this distributed thingie?”
  • 11. ”Does it really work, this distributed thingie?” Why yes, very well thank you!
  • 12. NDGF • 96% availablity (highest of all Tier1 facilities) • Third largest Tier1 facility in the world • Lowest ratio of failed ATLAS jobs • Production goals met, and beyond – Goal: 8% of all ATLAS resources (10.5% provided) – Goal: 9% of all ALICE resources (12% provided) * Data graciously stolen from Leif Nixons NorduNet 2008 talk. Thank you Leif :-)
  • 13. DISTRIBUTION IS A STRENGTH
  • 14. It enforces unification It ensures availability
  • 15. Does it really work? It’s good enough for LHC. It’s good enough for Bioinformatics.
  • 17. Biogrid provides Optimised applications: – BLAST – ClustalW – HMMER – Muscle – Mafft Planned: molecular dynamics, phylogeny...
  • 18. Biogrid provides Versioned, indexed and cached databases – UniProtKB (subreleases) – Uniref (subreleases) Planned: genomes (EnsEMBL), nucleotides (EMBL)...
  • 19. Cached database access Database files are transfered to the cluster at most once per project.
  • 22. Unified Interface DATA RESULTS
  • 23. Unified Interface • XRSL Job Description Standard in ARC Grid Middleware • Well defined runtime environments $HMMERDIR: node local (fast) scratch dir containing db files prepare_db: download and unpack db files on the fly from front node to $HMMERDIR
  • 24. XRSL Job Description (jobName=refinehmm-family023) (runTimeEnvironment=APPS/BIO/HMMER2.3.2) (cpuTime=3000) (executable=refinehmm.jobscript.sh) (inputFiles= (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz) (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz) (family023.hmm ””) ) (outputfiles= (family023.refined.hmm ””) )
  • 25. XRSL Job Description (jobName=refinehmm-$HMM_NAME) (runTimeEnvironment=APPS/BIO/HMMER2.3.2) (cpuTime=3000) (executable=refinehmm.jobscript.sh) (inputFiles= (sp.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_sprot.fasta.gz) (tr.gz srm://srm.ndgf.org/biogrid/db/uniprot/UniProt14.8/uniprot_trembl.fasta.gz) ($HMM_NAME.hmm ””) ) (outputfiles= ($HMM_NAME.refined.hmm ””) )
  • 26. Unified Interface • Run on any resource I can access: $ ngsub myjob.xrsl • ...or run on my buddy’s cluster: $ ngsub -c kiniini.csc.fi myjob.xrsl • Check jobs: $ ngstat refinehmm-family023 (or use Grid Monitor web interface at www.nordugrid.org) • Fetch results: $ ngget refinehmm-family* DATA GRID RESULTS
  • 27. What do I need? 1. A resource with ARC and Biogrid REs 2. An ARC client 3. A Grid Certificate (available from a number of global certificate authorities) 4. Time allowance on the resource ( 5. Biogrid VO Membership Not really necessary, but it will get you 1 & 4 )
  • 28. What do I need? ...or you can just grab the RE scripts off the biogrid website, and your db of choice from the biogrid dCache.
  • 29. Why did we do this? Bioinformatic applications... – CPU intensive – Small input and output files – ”Large” databases can be cached ...are very well suited for distributed computing.
  • 31. Subclassification of the MDR superfamily • 15000 members from all kingdoms of life • 500 families 25% sequence identity • 40 human members • Different substrate specificities • Different subunit & cofactor count • 2 HMMs available for superfamily detection • None for any of the individual families
  • 32. Subclassification of the MDR superfamily • We made HMMs for all MDR (sub)families with 20+ members. • 86 families • 34 detected subfamilies to 14 of these • 11579 / 15000 sequences classified • ≈5000*hmmsearch vs UniProtKB Manuscript in preparation
  • 33. refinehmm • Algorithm for automated HMM refinement • Produces stable and reliable HMMs • Developed using Biogrid REs and resources Will also be open source software once the paper is out.
  • 34. Acknowledgements • Olli Tourunen Supercomputing centers Biogrid developer • NSC • Bengt Persson Jens Larsson, Leif Nixon Biogrid PI • HPC2N • NDGF Åke Sandgren Michael Grønager Josva Kleist • Others C3SE, CSC, Uppmax, Lunarc, PDC, • Biogrid co-applicants Aalborg University, Oslo University Ann-Charlotte Berglund Sonnhammer Erik Sonnhammer Inge Jonassen Joel Hedlund yohell@ifm.liu.se Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after the talk!
  • 35. Acknowledgements • Olli Tourunen Supercomputing centers Biogrid developer • NSC • Bengt Persson Jens Larsson, Leif Nixon Biogrid PI • HPC2N • NDGF Åke Sandgren Michael Grønager Josva Kleist • Others C3SE, CSC, Uppmax, Lunarc, PDC, • Biogrid co-applicants Aalborg University, Oslo University Ann-Charlotte Berglund Sonnhammer Erik Sonnhammer Inge Jonassen Joel Hedlund yohell@ifm.liu.se Biogrid User and Developer Linköping University, Sweden Birds-of-a-feather session tonight: see me after the talk!