Nell’iperspazio con Rocket: il Framework Web di Rust!
Gautier bosc2010 pythonbioconductor
1. Bioconductor with Python, What else ?
ISMB / BOSC
Laurent Gautier [laurent@cbs.dtu.dk]
DMAC / CBS
July 10th, 2010
1 / 20
2. Disclaimer
• This is not about the comparative merits of scripting
languages
• This is about being able to access natively libraries
implemented in a different language
2 / 20
3. About Bioconductor
• Set of open-source packages for R
• Started circa 2002 with a focus on microarrays
• Rooted in statistics, data analyis, and visualization
• Several hundred packages, addresses NGS, HTS, flow
cytometry, protein-protein interactions, . . .
• Biannual releases
• Presence on the publication circuit ( > 2, 300 citations for
the BioC publication, > 600 for limma, > 500 for affy )
3 / 20
4. About Python
• Simple and clear all-purpose scripting language
• Sometimes used in introductions to programming
• Popular for agile development
• Bioinformatics libraries:
• biopython (libraries for bioinformatics)
• galaxy (web front-end to pipelines)
• PyCogent, pygr, bx-python (biological sequences-oriented)
• Large selection of libraries:
• Web development: Zope, Django, Google App Engine
• Scientific computing: Scipy / Numpy
• Cloud computing: Disco, execnet
• Interface with C: ctypes, Cython
4 / 20
5. A view on R/bioconductor and Python in bioinformatics
Flow-
cytometry,
proteomics,
other
assays. . . Bioinformatics
data
Automation
Annotation
Storage /
Retrieval
NGS
Visualization
Non-
Samples
Microarray interactive
abilities
Data
storage /
retrieval
Web
Statistical
R/Bioconductor analysis
Algorithm
development
Python is an all-purpose scripting
Python language.
Interactive
program-
Scientific
ming
computing
Biologists
Statisticians Physicists
Computer
Scientists
Communities
5 / 20
6. proteomics,
other
assays. . . Bioinformatics
data
Automation
Annotation
Storage /
Retrieval
NGS
Visualization
Non-
Samples
Microarray interactive
abilities
Data
storage /
retrieval
Web
Statistical
R/Bioconductor analysis
Algorithm
development
Python is an all-purpos
Python language.
Interactive
program-
Scientific
ming
computing
Biologists
Statisticians Physicists
7. Bioinformatics
data
Automation
Annotation
Storage /
Retrieval
NGS
Non-
Samples
Microarray interactive
abilities
Data
storage /
retrieval
Web
Statistical
analysis
Algorithm
development
Python is an all-purpose scripting
Python language.
Interactive
program-
Scientific
ming
computing
Biologists
sticians Physicists
Computer
Scientists
17. R within Python
• R is running as embedded into Python
• R objects remain in the R workspace, but can be accessed
from Python
• Python-level shells to access the R objects
• The rpy2 package is used to achieve so
biostrings = importr(’Biostrings’)
class AAString(XString):
_aastring_constructor = biostrings.AAString
@classmethod
def new(cls, x):
""" :param x: a string of amino-acids """
res = cls(cls._aastring_constructor(conversion.py2ri(x)))
_setExtractDelegators(res)
return res
aas = AAString("PROTEIN")
16 / 20
18. What is needed to continue
More interpreters/translators
• Many bioconductor packages.
• Keep up-to-date existing translations.
Keeping up-to-date
• Frequent API-breaking changes in bioconductor
• Taylored interfaces increase maintenance
• Meta-programming and reflexivity can alleviate this
17 / 20
19. Example with meta-programming:
class AssayData(rpy2.robjects.methods.RS4):
""" Abstract class. That class in a ClassUnionRepresentation
in R, that a is way to create a parent class for existing
classes. This is currently not modelled in Python. """
__rname__ = ’AssayData’
__metaclass__ = rpy2.robjects.methods.RS4_Type
__accessors__ = ((’featureNames’, ’Biobase’, ’featurenames’,
True, ’maps Biobase::featureNames’),
(’sampleNames’, ’Biobase’, ’samplenames’,
True, ’maps Biobase::samplenames’),
(’storageMode’, ’Biobase’, ’storagemode’,
True, ’maps Biobase::storageMode’)
)
18 / 20
20. Example of a complete application
A web-server to run EdgeR.
from bottle import route, run
from my_edger import get_toptags, make_results_page
@route(’/’)
def index():
return ’’’
<html> <body>
<form action="/edger" method="post" enctype="multipart/form-data">
<input type="file" name="data" /> </form>
</body> </html>’’’
@route(’/edger’, method=’POST’)
def run_edger():
data = request.files.get(’data’)
if data:
counts, grp = read_count_data(data.file.name)
top_tags = get_toptags(counts, grp)
return make_result_page(top_tags)
else:
abort(404, "Invalid count file.")
run(host=’localhost’, port=8080)
19 / 20
21. Acknowledgements
• Users, and communities from R, Bioconductor, Python,
Biopython
• (Vincent Davis, Nicolas Rapin, Brad Chapman)
URLs
http://pypi.python.org/pypi/rpy2-bioconductor-extensions/
http://bitbucket.org/lgautier/rpy2-bioc-extensions
http://packages.python.org/rpy2-bioconductor-extensions/ http://rpy2.sourceforge.net/
20 / 20