SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Bioconductor with Python, What else ?
                ISMB / BOSC


     Laurent Gautier [laurent@cbs.dtu.dk]

                  DMAC / CBS


               July 10th, 2010




                                            1 / 20
Disclaimer
  • This is not about the comparative merits of scripting
    languages
  • This is about being able to access natively libraries
    implemented in a different language




                                                            2 / 20
About Bioconductor



    • Set of open-source packages for R
    • Started circa 2002 with a focus on microarrays
    • Rooted in statistics, data analyis, and visualization
    • Several hundred packages, addresses NGS, HTS, flow
      cytometry, protein-protein interactions, . . .
    • Biannual releases
    • Presence on the publication circuit ( > 2, 300 citations for
      the BioC publication, > 600 for limma, > 500 for affy )




                                                                     3 / 20
About Python


    • Simple and clear all-purpose scripting language
    • Sometimes used in introductions to programming
    • Popular for agile development
    • Bioinformatics libraries:
        • biopython (libraries for bioinformatics)
        • galaxy (web front-end to pipelines)
        • PyCogent, pygr, bx-python (biological sequences-oriented)
    • Large selection of libraries:
        • Web development: Zope, Django, Google App Engine
        • Scientific computing: Scipy / Numpy
        • Cloud computing: Disco, execnet
        • Interface with C: ctypes, Cython




                                                                      4 / 20
A view on R/bioconductor and Python in bioinformatics
                               Flow-
                             cytometry,
                            proteomics,
                               other
                             assays. . .                  Bioinformatics
                                                               data
                                                                                          Automation

                             Annotation


                                                                                  Storage /
                                                                                  Retrieval
                                           NGS
            Visualization
                                                                                                                   Non-
                                                                      Samples
                                                         Microarray                                             interactive
                                                                                                                  abilities
                                                                                                  Data
                                                                                                storage /
                                                                                                retrieval

                                                                                    Web
                                           Statistical
   R/Bioconductor                          analysis

                                                                            Algorithm
                                                                           development
                                                                                                                              Python is an all-purpose scripting
                                                                                                             Python           language.


                             Interactive
                              program-
                                                                              Scientific
                                ming
                                                                             computing

                                                 Biologists



                      Statisticians                                               Physicists




                                                                                                Computer
                                                                                                Scientists

                                                           Communities




                                                                                                                                                                   5 / 20
proteomics,
                            other
                          assays. . .                  Bioinformatics
                                                            data
                                                                                       Automation

                          Annotation


                                                                               Storage /
                                                                               Retrieval
                                        NGS
         Visualization
                                                                                                               Non-
                                                                   Samples
                                                      Microarray                                            interactive
                                                                                                              abilities
                                                                                               Data
                                                                                             storage /
                                                                                             retrieval

                                                                                 Web
                                        Statistical
R/Bioconductor                          analysis

                                                                         Algorithm
                                                                        development
                                                                                                                          Python is an all-purpos
                                                                                                         Python           language.


                          Interactive
                           program-
                                                                           Scientific
                             ming
                                                                          computing

                                              Biologists



                   Statisticians                                               Physicists
Bioinformatics
                                  data
                                                             Automation

Annotation


                                                     Storage /
                                                     Retrieval
              NGS
                                                                                      Non-
                                         Samples
                            Microarray                                             interactive
                                                                                     abilities
                                                                     Data
                                                                   storage /
                                                                   retrieval

                                                       Web
              Statistical
              analysis

                                               Algorithm
                                              development
                                                                                                 Python is an all-purpose scripting
                                                                                Python           language.


Interactive
 program-
                                                 Scientific
   ming
                                                computing

                    Biologists



sticians                                             Physicists




                                                                   Computer
                                                                   Scientists
Running R code from Python (an example)
  Aim
  Running edgeR from Python

  Method
    Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR:
    a Bioconductor package for differential expression analysis
    of digital gene expression data. Bioinformatics 26, 139-140


  Data
                                          Control                          Treated
                            lane1    lane2     lane3     lane4    lane5     lane6    lane8
         ENSG00000230758         0        0          1        0        0         0        0
         ENSG00000182463         0        2          4        1        5         5        0
         ENSG00000124208       82      124        102      136       90       120       40
         ENSG00000230753         0        0          0        3        0         0        0
         ENSG00000224628         7        8          8      18         8         7        1
         ENSG00000125835      138      209        227      295      281       220       54
         ENSG00000125834       25       31         48       56       67        61       15
         ENSG00000197818       17       27         16       26       41        39         9
         ENSG00000243473         0        0          0        2        0         0        0
         ENSG00000226325         0        0          2        0        3         1        0
                      ...      ...      ...        ...      ...      ...       ...      ...



                                                                                              7 / 20
from rpy2.robjects.packages import importr
from bioc import edger

base = importr(’base’)


summarized = edger.DGEList.new(counts = counts,
                               lib_size = base.colSums(counts),
                               group = grp)
disp = edger.estimateCommonDisp(summarized)

tested = edger.exactTest(disp)

results = edger.topTags(tested)


                                logConc    logFC    PValue   FDR
              ENSG00000127954     -31.03    37.97     0.00   0.00
              ENSG00000151503     -12.96     5.40     0.00   0.00
              ENSG00000096060     -11.78     4.90     0.00   0.00
              ENSG00000091879     -15.36     5.77     0.00   0.00
              ENSG00000132437     -14.15    -5.90     0.00   0.00
              ENSG00000166451     -12.62     4.57     0.00   0.00
              ENSG00000131016     -14.80     5.27     0.00   0.00
              ENSG00000163492     -17.28     7.30     0.00   0.00
              ENSG00000113594     -12.25     4.05     0.00   0.00
              ENSG00000116285     -13.02     4.11     0.00   0.00



                                                                    8 / 20
R code / Python code
  library(edgeR)
  summarized <- DGEList(counts = counts,
                        lib.size = colSums(counts),
                        group = grp)
  disp <- estimateCommonDisp(summarized)

  from rpy2.robjects.packages import importr
  base = importr(’base’)
  from bioc import edger

  summarized = edger.DGEList.new(count = counts,
                                 lib_size = base.colSums(counts),
                                 group = grp)

  disp = edger.estimateCommonDisp(summarized)



  Note:
    • explicit in searching through namespaces
    • call R functions as native Python functions
    • use R objects as Python objects
                                                                    9 / 20
Bioconductor library IRanges




                               10 / 20
Bioconductor library Biostrings




                                  11 / 20
Separate communities




                       12 / 20
Bilingual community




                      13 / 20
Interpreters/Translators




                           14 / 20
Cost of translation

    R package                Python module
                    lines of code
    AnnotationDbi             168 annotationdbi.py
    Biobase                   341     biobase.py
    Biostrings                591    biostrings.py
    BSgenome                  112   bsgenome.py
    edgeR                     107      edger.py
    GEOquery                  102    geoquery.py
    GGbase                    104     ggbase.py
    GGtools                    77     ggtools.py
    goseq                      43      goseq.py
    GSEABase                  149    gseabase.py
    IRanges                   295     iranges.py
    ShortRead                 301    shortread.py


                                                     15 / 20
R within Python
  • R is running as embedded into Python
  • R objects remain in the R workspace, but can be accessed
    from Python
  • Python-level shells to access the R objects
  • The rpy2 package is used to achieve so


biostrings = importr(’Biostrings’)
class AAString(XString):
    _aastring_constructor = biostrings.AAString

    @classmethod
    def new(cls, x):
        """ :param x: a string of amino-acids """
        res = cls(cls._aastring_constructor(conversion.py2ri(x)))
        _setExtractDelegators(res)
        return res

aas = AAString("PROTEIN")


                                                                    16 / 20
What is needed to continue



  More interpreters/translators
    • Many bioconductor packages.
    • Keep up-to-date existing translations.


  Keeping up-to-date
    • Frequent API-breaking changes in bioconductor
    • Taylored interfaces increase maintenance
    • Meta-programming and reflexivity can alleviate this




                                                           17 / 20
Example with meta-programming:


  class AssayData(rpy2.robjects.methods.RS4):
      """ Abstract class. That class in a ClassUnionRepresentation
       in R, that a is way to create a parent class for existing
       classes. This is currently not modelled in Python. """
      __rname__ = ’AssayData’
      __metaclass__ = rpy2.robjects.methods.RS4_Type

      __accessors__ = ((’featureNames’, ’Biobase’, ’featurenames’,
                        True, ’maps Biobase::featureNames’),
                       (’sampleNames’, ’Biobase’, ’samplenames’,
                        True, ’maps Biobase::samplenames’),
                       (’storageMode’, ’Biobase’, ’storagemode’,
                        True, ’maps Biobase::storageMode’)
                       )




                                                                     18 / 20
Example of a complete application
  A web-server to run EdgeR.
  from bottle import route, run
  from my_edger import get_toptags, make_results_page
  @route(’/’)
  def index():
      return ’’’
  <html> <body>
  <form action="/edger" method="post" enctype="multipart/form-data">
  <input type="file" name="data" /> </form>
  </body> </html>’’’

  @route(’/edger’, method=’POST’)
  def run_edger():
      data = request.files.get(’data’)
      if data:
          counts, grp = read_count_data(data.file.name)
          top_tags = get_toptags(counts, grp)
          return make_result_page(top_tags)
      else:
          abort(404, "Invalid count file.")


  run(host=’localhost’, port=8080)

                                                                   19 / 20
Acknowledgements
   • Users, and communities from R, Bioconductor, Python,
      Biopython
   • (Vincent Davis, Nicolas Rapin, Brad Chapman)

URLs
http://pypi.python.org/pypi/rpy2-bioconductor-extensions/

http://bitbucket.org/lgautier/rpy2-bioc-extensions

http://packages.python.org/rpy2-bioconductor-extensions/ http://rpy2.sourceforge.net/




                                                                                        20 / 20
21 / 20

Weitere ähnliche Inhalte

Andere mochten auch

Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With PythonSarah Guido
 
Set your objectives
Set your objectivesSet your objectives
Set your objectivesArif Mahmood
 
Tharisa platinum mine expansion project 2012
Tharisa platinum mine expansion project 2012Tharisa platinum mine expansion project 2012
Tharisa platinum mine expansion project 2012AGE Technologies
 
Abc van Limo
Abc van LimoAbc van Limo
Abc van LimoCVO-SSH
 
Web / Graphic design credentials
Web / Graphic design credentialsWeb / Graphic design credentials
Web / Graphic design credentialsVinod Batus
 
Classic memoirs from open access week 2014
Classic memoirs from open access week 2014Classic memoirs from open access week 2014
Classic memoirs from open access week 2014Vera Akpokodje
 
Plodinec nola-082610
Plodinec nola-082610Plodinec nola-082610
Plodinec nola-082610plodinec
 
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!SPB SQA Group
 
đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...
đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...
đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...Hee Young Shin
 
Recursos educativos y de formación para docentes de programas bilingües
Recursos educativos y de formación para docentes de programas bilingüesRecursos educativos y de formación para docentes de programas bilingües
Recursos educativos y de formación para docentes de programas bilingüesRosario Outes
 
Conhecendo os netbooks 2º A Prof Eliane
Conhecendo os netbooks 2º A Prof ElianeConhecendo os netbooks 2º A Prof Eliane
Conhecendo os netbooks 2º A Prof Elianedalvanice
 
Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...Dimitri Corpakis
 
Рейтинг мобильных разработчиков МТОП-20
Рейтинг мобильных разработчиков МТОП-20Рейтинг мобильных разработчиков МТОП-20
Рейтинг мобильных разработчиков МТОП-20alarin
 
Handout sekolah pilot aaa academy
Handout sekolah pilot aaa academyHandout sekolah pilot aaa academy
Handout sekolah pilot aaa academyMuhammad Abdullah
 
Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016anoukkonQompas
 
Relatiemanagement in de praktijk
Relatiemanagement in de praktijkRelatiemanagement in de praktijk
Relatiemanagement in de praktijkErna Winters
 
Digital storytelling: Putting Learning Above Output iste 2014
Digital storytelling: Putting Learning Above Output iste 2014 Digital storytelling: Putting Learning Above Output iste 2014
Digital storytelling: Putting Learning Above Output iste 2014 Kevin Amboe
 

Andere mochten auch (20)

Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Set your objectives
Set your objectivesSet your objectives
Set your objectives
 
Tharisa platinum mine expansion project 2012
Tharisa platinum mine expansion project 2012Tharisa platinum mine expansion project 2012
Tharisa platinum mine expansion project 2012
 
Abc van Limo
Abc van LimoAbc van Limo
Abc van Limo
 
Web / Graphic design credentials
Web / Graphic design credentialsWeb / Graphic design credentials
Web / Graphic design credentials
 
Installation of sensor wires and loggers
Installation of sensor wires and loggersInstallation of sensor wires and loggers
Installation of sensor wires and loggers
 
Classic memoirs from open access week 2014
Classic memoirs from open access week 2014Classic memoirs from open access week 2014
Classic memoirs from open access week 2014
 
Plodinec nola-082610
Plodinec nola-082610Plodinec nola-082610
Plodinec nola-082610
 
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
ITGM8. Илья Коробицын (Grid Dinamics) Автоматизатор, копай глубже, копай шире!
 
đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...
đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...
đáNh giá và hoàn thiện chương trình tài trợ 'thời trang và cuộc sống' của nhã...
 
Aféresis
AféresisAféresis
Aféresis
 
Recursos educativos y de formación para docentes de programas bilingües
Recursos educativos y de formación para docentes de programas bilingüesRecursos educativos y de formación para docentes de programas bilingües
Recursos educativos y de formación para docentes de programas bilingües
 
Conhecendo os netbooks 2º A Prof Eliane
Conhecendo os netbooks 2º A Prof ElianeConhecendo os netbooks 2º A Prof Eliane
Conhecendo os netbooks 2º A Prof Eliane
 
Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...Designing and implementing synergies; Coordinating investment in Research and...
Designing and implementing synergies; Coordinating investment in Research and...
 
Рейтинг мобильных разработчиков МТОП-20
Рейтинг мобильных разработчиков МТОП-20Рейтинг мобильных разработчиков МТОП-20
Рейтинг мобильных разработчиков МТОП-20
 
Amazon.com
Amazon.comAmazon.com
Amazon.com
 
Handout sekolah pilot aaa academy
Handout sekolah pilot aaa academyHandout sekolah pilot aaa academy
Handout sekolah pilot aaa academy
 
Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016Thesis_AnoukKon_421037_1662016
Thesis_AnoukKon_421037_1662016
 
Relatiemanagement in de praktijk
Relatiemanagement in de praktijkRelatiemanagement in de praktijk
Relatiemanagement in de praktijk
 
Digital storytelling: Putting Learning Above Output iste 2014
Digital storytelling: Putting Learning Above Output iste 2014 Digital storytelling: Putting Learning Above Output iste 2014
Digital storytelling: Putting Learning Above Output iste 2014
 

Ähnlich wie Gautier bosc2010 pythonbioconductor

Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Christos Kannas
 
e-BioGrid_NBIC Conference 2011 april 20
e-BioGrid_NBIC Conference 2011 april 20e-BioGrid_NBIC Conference 2011 april 20
e-BioGrid_NBIC Conference 2011 april 20INooren
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company ProfileBioDec
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Manuel GEA - Bio-Modeling Systems
 
Semantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsAmit Sheth
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Alex Hardisty
 
A Solution for Identifying the Root Cause of Problems in IT Change Management
A Solution for Identifying the Root Cause of Problems in IT Change ManagementA Solution for Identifying the Root Cause of Problems in IT Change Management
A Solution for Identifying the Root Cause of Problems in IT Change ManagementRicardo Luis dos Santos
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupIBMInfoSphereUGFR
 
The KEDRI Integrated System for Personalised Modelling
The KEDRI Integrated System for Personalised ModellingThe KEDRI Integrated System for Personalised Modelling
The KEDRI Integrated System for Personalised ModellingHealth Informatics New Zealand
 
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...Renzo Kottmann
 

Ähnlich wie Gautier bosc2010 pythonbioconductor (20)

Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
 
e-BioGrid_NBIC Conference 2011 april 20
e-BioGrid_NBIC Conference 2011 april 20e-BioGrid_NBIC Conference 2011 april 20
e-BioGrid_NBIC Conference 2011 april 20
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company Profile
 
Brizio rossibiodec
Brizio rossibiodecBrizio rossibiodec
Brizio rossibiodec
 
iRODS
iRODSiRODS
iRODS
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
Bm Systems Scientific Epa Conference Heuristic Mathematic Concepts Synergies ...
 
Semantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web Applications
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
A Solution for Identifying the Root Cause of Problems in IT Change Management
A Solution for Identifying the Root Cause of Problems in IT Change ManagementA Solution for Identifying the Root Cause of Problems in IT Change Management
A Solution for Identifying the Root Cause of Problems in IT Change Management
 
Biostatflow
BiostatflowBiostatflow
Biostatflow
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroup
 
The KEDRI Integrated System for Personalised Modelling
The KEDRI Integrated System for Personalised ModellingThe KEDRI Integrated System for Personalised Modelling
The KEDRI Integrated System for Personalised Modelling
 
Stratum Global RFID
Stratum Global RFIDStratum Global RFID
Stratum Global RFID
 
07 Stockholm Poster Oss 71109
07 Stockholm Poster Oss 7110907 Stockholm Poster Oss 71109
07 Stockholm Poster Oss 71109
 
Sumo
SumoSumo
Sumo
 
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
 

Mehr von BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

Mehr von BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Kürzlich hochgeladen

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Kürzlich hochgeladen (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Gautier bosc2010 pythonbioconductor

  • 1. Bioconductor with Python, What else ? ISMB / BOSC Laurent Gautier [laurent@cbs.dtu.dk] DMAC / CBS July 10th, 2010 1 / 20
  • 2. Disclaimer • This is not about the comparative merits of scripting languages • This is about being able to access natively libraries implemented in a different language 2 / 20
  • 3. About Bioconductor • Set of open-source packages for R • Started circa 2002 with a focus on microarrays • Rooted in statistics, data analyis, and visualization • Several hundred packages, addresses NGS, HTS, flow cytometry, protein-protein interactions, . . . • Biannual releases • Presence on the publication circuit ( > 2, 300 citations for the BioC publication, > 600 for limma, > 500 for affy ) 3 / 20
  • 4. About Python • Simple and clear all-purpose scripting language • Sometimes used in introductions to programming • Popular for agile development • Bioinformatics libraries: • biopython (libraries for bioinformatics) • galaxy (web front-end to pipelines) • PyCogent, pygr, bx-python (biological sequences-oriented) • Large selection of libraries: • Web development: Zope, Django, Google App Engine • Scientific computing: Scipy / Numpy • Cloud computing: Disco, execnet • Interface with C: ctypes, Cython 4 / 20
  • 5. A view on R/bioconductor and Python in bioinformatics Flow- cytometry, proteomics, other assays. . . Bioinformatics data Automation Annotation Storage / Retrieval NGS Visualization Non- Samples Microarray interactive abilities Data storage / retrieval Web Statistical R/Bioconductor analysis Algorithm development Python is an all-purpose scripting Python language. Interactive program- Scientific ming computing Biologists Statisticians Physicists Computer Scientists Communities 5 / 20
  • 6. proteomics, other assays. . . Bioinformatics data Automation Annotation Storage / Retrieval NGS Visualization Non- Samples Microarray interactive abilities Data storage / retrieval Web Statistical R/Bioconductor analysis Algorithm development Python is an all-purpos Python language. Interactive program- Scientific ming computing Biologists Statisticians Physicists
  • 7. Bioinformatics data Automation Annotation Storage / Retrieval NGS Non- Samples Microarray interactive abilities Data storage / retrieval Web Statistical analysis Algorithm development Python is an all-purpose scripting Python language. Interactive program- Scientific ming computing Biologists sticians Physicists Computer Scientists
  • 8. Running R code from Python (an example) Aim Running edgeR from Python Method Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 Data Control Treated lane1 lane2 lane3 lane4 lane5 lane6 lane8 ENSG00000230758 0 0 1 0 0 0 0 ENSG00000182463 0 2 4 1 5 5 0 ENSG00000124208 82 124 102 136 90 120 40 ENSG00000230753 0 0 0 3 0 0 0 ENSG00000224628 7 8 8 18 8 7 1 ENSG00000125835 138 209 227 295 281 220 54 ENSG00000125834 25 31 48 56 67 61 15 ENSG00000197818 17 27 16 26 41 39 9 ENSG00000243473 0 0 0 2 0 0 0 ENSG00000226325 0 0 2 0 3 1 0 ... ... ... ... ... ... ... ... 7 / 20
  • 9. from rpy2.robjects.packages import importr from bioc import edger base = importr(’base’) summarized = edger.DGEList.new(counts = counts, lib_size = base.colSums(counts), group = grp) disp = edger.estimateCommonDisp(summarized) tested = edger.exactTest(disp) results = edger.topTags(tested) logConc logFC PValue FDR ENSG00000127954 -31.03 37.97 0.00 0.00 ENSG00000151503 -12.96 5.40 0.00 0.00 ENSG00000096060 -11.78 4.90 0.00 0.00 ENSG00000091879 -15.36 5.77 0.00 0.00 ENSG00000132437 -14.15 -5.90 0.00 0.00 ENSG00000166451 -12.62 4.57 0.00 0.00 ENSG00000131016 -14.80 5.27 0.00 0.00 ENSG00000163492 -17.28 7.30 0.00 0.00 ENSG00000113594 -12.25 4.05 0.00 0.00 ENSG00000116285 -13.02 4.11 0.00 0.00 8 / 20
  • 10. R code / Python code library(edgeR) summarized <- DGEList(counts = counts, lib.size = colSums(counts), group = grp) disp <- estimateCommonDisp(summarized) from rpy2.robjects.packages import importr base = importr(’base’) from bioc import edger summarized = edger.DGEList.new(count = counts, lib_size = base.colSums(counts), group = grp) disp = edger.estimateCommonDisp(summarized) Note: • explicit in searching through namespaces • call R functions as native Python functions • use R objects as Python objects 9 / 20
  • 16. Cost of translation R package Python module lines of code AnnotationDbi 168 annotationdbi.py Biobase 341 biobase.py Biostrings 591 biostrings.py BSgenome 112 bsgenome.py edgeR 107 edger.py GEOquery 102 geoquery.py GGbase 104 ggbase.py GGtools 77 ggtools.py goseq 43 goseq.py GSEABase 149 gseabase.py IRanges 295 iranges.py ShortRead 301 shortread.py 15 / 20
  • 17. R within Python • R is running as embedded into Python • R objects remain in the R workspace, but can be accessed from Python • Python-level shells to access the R objects • The rpy2 package is used to achieve so biostrings = importr(’Biostrings’) class AAString(XString): _aastring_constructor = biostrings.AAString @classmethod def new(cls, x): """ :param x: a string of amino-acids """ res = cls(cls._aastring_constructor(conversion.py2ri(x))) _setExtractDelegators(res) return res aas = AAString("PROTEIN") 16 / 20
  • 18. What is needed to continue More interpreters/translators • Many bioconductor packages. • Keep up-to-date existing translations. Keeping up-to-date • Frequent API-breaking changes in bioconductor • Taylored interfaces increase maintenance • Meta-programming and reflexivity can alleviate this 17 / 20
  • 19. Example with meta-programming: class AssayData(rpy2.robjects.methods.RS4): """ Abstract class. That class in a ClassUnionRepresentation in R, that a is way to create a parent class for existing classes. This is currently not modelled in Python. """ __rname__ = ’AssayData’ __metaclass__ = rpy2.robjects.methods.RS4_Type __accessors__ = ((’featureNames’, ’Biobase’, ’featurenames’, True, ’maps Biobase::featureNames’), (’sampleNames’, ’Biobase’, ’samplenames’, True, ’maps Biobase::samplenames’), (’storageMode’, ’Biobase’, ’storagemode’, True, ’maps Biobase::storageMode’) ) 18 / 20
  • 20. Example of a complete application A web-server to run EdgeR. from bottle import route, run from my_edger import get_toptags, make_results_page @route(’/’) def index(): return ’’’ <html> <body> <form action="/edger" method="post" enctype="multipart/form-data"> <input type="file" name="data" /> </form> </body> </html>’’’ @route(’/edger’, method=’POST’) def run_edger(): data = request.files.get(’data’) if data: counts, grp = read_count_data(data.file.name) top_tags = get_toptags(counts, grp) return make_result_page(top_tags) else: abort(404, "Invalid count file.") run(host=’localhost’, port=8080) 19 / 20
  • 21. Acknowledgements • Users, and communities from R, Bioconductor, Python, Biopython • (Vincent Davis, Nicolas Rapin, Brad Chapman) URLs http://pypi.python.org/pypi/rpy2-bioconductor-extensions/ http://bitbucket.org/lgautier/rpy2-bioc-extensions http://packages.python.org/rpy2-bioconductor-extensions/ http://rpy2.sourceforge.net/ 20 / 20