SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Biopython Project
Update 2013
Peter Cock & the Biopython Developers,
BOSC 2013, Berlin, Germany
Twitter: @pjacock & @biopython
Introduction
2
My Employer
 After PhD joined Scottish Crop Research Institute
 In 2011, SCRI (Dundee) & MLURI (Aberdeen) merged
as The James Hutton Institute
Government funded research institute
I work mainly on the genomics of Plant Pathogens
I use Biopython in my day to day work
More about this in tomorrow’s panel discussion,
“Strategies for Funding and Maintaining Open Source
Software”
3
Biopython
 Open source bioinformatics library for Python
 Sister project to:
 BioPerl
 BioRuby
 BioJava
 EMBOSS
 etc (see OBF Project BOF meeting tonight)
 Long running!
4
Brief History of Biopython
 1999 - Started by Andrew Dalke & Jeff Chang
 2000 - First release, announcement publication
 Chapman & Chang (2000). ACM SIGBIO Newsletter 20, 15-19
 2001 - Biopython 1.00
 2009 - Application note publication
 Cock et al. (2009) DOI:10.1093/bioinformatics/btp163
 2011 - Biopython 1.57 and 1.58
 2012 - Biopython 1.59 and 1.60
 2013 - Biopython 1.61 and 1.62 beta
5
Recap from last BOSC 2012
 Eric Talevich presented in Boston
 Biopython 1.58, 1.59 and 1.60
 Visualization enhancements for chromosome and
genome diagrams, and phylogenetic trees
 More file format parsers
 BGZF compression
 Google Summer of Code students ...
 Bio.Phylo paper submitted and in review ...
 Biopython working nicely under PyPy 1.9 ...
6
Publications
7
Bio.Phylo paper published
 Talevich et al (2012) DOI:10.1186/1471-2105-13-209
8
Talevich et al. BMC Bioinformatics 2012, 13:209
http://www.biomedcentral.com/1471-2105/13/209
SOFTWARE Open Access
Bio.Phylo: A unified toolkit for processing,
analyzing and visualizing phylogenetic
trees in Biopython
Eric Talevich1*, Brandon M Invergo2, Peter JA Cock3 and Brad A Chapman4
Abstract
Background: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a
proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of
integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need
for reusable software that can read, manipulate and transform this information into the various forms required to
build computational pipelines.
Results: We built a Python software library for working with phylogenetic data that is tightly integrated with
Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with
existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees,
performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the
modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API,
providing a common set of functionality independent of the data source.
Conclusions: Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of
phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython
features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits
of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available
through the Biopython website, http://biopython.org.
Google Summer of Code (GSoC)
9
Google Summer of Code 2012
 Two students under OBF
 Lenna Peterson
 Genomic Variant Toolkit for Biopython
 Mentors: Brad Chapman & James Casbon
 http://arklenna.tumblr.com/tagged/gsoc2012
 Wibowo Arindrarto
 Bio.SearchIO - pairwise sequence search files input/
output, e.g. BLAST, HMMER
 Mentor: Peter Cock
 http://bow.web.id/blog/tag/gsoc/
 Both completed their projects & still contributing
10
Google Summer of Code 2013
 Two students under NESCent
 Zheng Ruan
 Codon alignment and analysis
 Mentors: Eric Talevich & Peter Cock
 http://zr1991.blogspot.de/
 Yanbo Ye
 Bio.Phylo: filling in the gaps
 Mentor: Eric Talevich
 http://blog.yeyanbo.com/tag/gsoc.html
11
Releases since BOSC 2012
12
Biopython 1.61
 Major refresh of sequence motif handling code
 Bio.SearchIO GSoC work as an experimental module
 Contributors:
 Brandon Invergo
 Bryan Lunt (*)
 Christian Brueffer (*)
 David Cain
 Eric Talevich
 Grace Yeo (*)
 Jeffrey Chang
 Jingping Li (*)
 Kai Blin (*) 13
 Leighton Pritchard
 Lenna Peterson
 Lucas Sinclair (*)
 Michiel de Hoon
 Nick Semenkovich (*)
 Peter Cock
 Robert Ernst (*)
 Tiago Antao
 Wibowo 'Bow' Arindrarto
Biopython 1.62
 Beta released
 Final release after BOSC/ISMB/ECCB
 Warning on translating partial codons
 Explicit is better than implicit!
 Parsers for GAF, GPA and GPI from UniProt-GOA
 Reworked feature location object model
 Cleaner handling of multi-region locations
 Linked to GTF/GFF3 parsing and other plans
 Official Python 3 support ...
14
Please test this beta!
Biopython 1.62
 Contributors (as of the beta release):
15
Alexander Campbell (*)
Andrea Rizzi (*)
Anthony Mathelier (*)
Ben Morris (*)
Brad Chapman
Christian Brueffer
David Arenillas (*)
David Martin (*)
Eric Talevich
Iddo Friedberg
Jian-Long Huang (*)
Joao Rodrigues
Kai Blin
Michiel de Hoon
Nate Sutton (*)
Peter Cock
Petra Kubincová (*)
Phillip Garland
Saket Choudhary (*)
Tiago Antao
Wibowo 'Bow' Arindrarto
Xabier Bello (*)
Python 3
16
Python 2 and 3
 Python 2.7 is the final release of Python 2
 Python 3 is similar but different to Python 2
 Most Python 2 code needs updating to run
 Big difference is Python 3 uses unicode for strings
 We need to be explicit about bytes vs unicode in many
of our parsers
 Text file IO defaults to unicode, which is slower
 Most Python libraries are gradually being updated
17
Python 3 strategy
 We’ve been testing under Python 3 for over a year
 Biopython 1.62 will officially support Python 3.3
 Current strategy:
 Develop under Python 2
 Installation under Python 3 uses 2to3 converter
 Test under Python 3
18
Cross Platform Testing
 Nightly tests with BuildBot
 Tests run on volunteer machines
 Covers multiple OS and Python combinations
 More volunteer machines welcome,
especially 64 bit Windows
 Server runs on OBF funded Amazon server
 Continuous integration with TravisCI
 Covers a range of languages using VMs
 Free service for Open Source projects
 Runs tests when code on GitHub updated
 Runs tests for GitHub pull requests (Nice!)
19
Cross Platform Testing
20
Python/OS
Linux
32 bit
Linux
64 bit
Mac OS X
64 bit
Windows
32 bit
C Python 2.5
C Python 2.6
C Python 2.7
C Python 3.1
C Python 3.2
C Python 3.3
PyPy 1.9
PyPy 2.0
Jython 2.5
Jython 2.7b
BB + Travis BB BB BB
BB + Travis BB BB BB
BB + Travis BB BB
BB BB BB BB
BB BB BB BB
BB + Travis BB BB
Travis BB BB
BB
BB BB
BB BB
This test
matrix is
quite big!
Cross Platform Testing
21
Python/OS
Linux
32 bit
Linux
64 bit
Mac OS X
64 bit
Windows
32 bit
C Python 2.5
C Python 2.6
C Python 2.7
C Python 3.1
C Python 3.2
C Python 3.3
PyPy 1.9
PyPy 2.0
Jython 2.5
Jython 2.7b
BB + Travis BB BB BB
BB + Travis BB BB BB
BB + Travis BB BB
BB BB BB BB
BB BB BB BB
BB + Travis BB BB
Travis BB BB
BB
BB BB
BB BB
Dropping
Python 2.5
support
Also
means
Jython 2.5
Cross Platform Testing
22
Python/OS
Linux
32 bit
Linux
64 bit
Mac OS X
64 bit
Windows
32 bit
C Python 2.5
C Python 2.6
C Python 2.7
C Python 3.1
C Python 3.2
C Python 3.3
PyPy 1.9
PyPy 2.0
Jython 2.5
Jython 2.7b
BB + Travis BB BB BB
BB + Travis BB BB BB
BB + Travis BB BB
BB BB BB BB
BB BB BB BB
BB + Travis BB BB
Travis BB BB
BB
BB BB
BB BB
Have been
useful in
Python 3
testing,
but won’t
support
Cross Platform Testing Plan
 Target Python 2.6, 2.7 and 3.3 (or later)
 Volunteer machines needed, especially 64 bit Windows
23
Python/OS
Linux
32 bit
Linux
64 bit
Mac OS X
64 bit
Windows
32 bit
Windows
64 bit
C Python 2.6
C Python 2.7
C Python 3.3
PyPy 1.9
PyPy 2.0
PyPy 2.1b
Jython 2.7b
BB + Travis BB BB BB
BB + Travis BB BB BB
BB + Travis BB BB
Travis BB BB
BB
BB BB
Python
2.7
variants
Python 3 strategy
 Current strategy:
 Develop under Python 2
 Installation under Python 3 uses 2to3 converter
 Test under Python 3
 Future strategy:
 Target Python 2.6, 2.7 and 3.3 (or later)
 Start writing code which works on both at same time
 Continue to use 2to3 on a case-by-case basis
during transition period, or for problem cases
24
Closing Remarks
25
Stability versus Flexibility
 We aim for rigorous cross-platform testing
 We value backwards compatibility in core code
 Flexibility through modularity?
 BioRuby’s Gems http://gems.bioruby.org
 BioPerl sub-packages on CPAN
 Can/should we move towards this with PyPI?
 Would this encourage more contributors?
 We’re trying ‘beta level’ experimental modules within
the monolithic Biopython distribution, e.g. SearchIO
26
Biopython Support: Resources
 OBF hosted website, mailing lists, bug tracker, etc
 GitHub hosted repository
 TravisCI hosted continuous integration testing
 Personal and institute BuildSlave machines for testing
27
Thank you all!
Biopython Support: People
 Google supports summer students via GSoC
 Some of the developers do contribute on work time
 However, Biopython is mainly volunteer funded
 Please help out, e.g.
 Feedback
 Bug reports
 Documentation improvements
 Review code
 More unit tests
 Enhancements or new code
 ...
28
Thank you all!

Weitere ähnliche Inhalte

Was ist angesagt?

Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
PyData
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
bosc
 

Was ist angesagt? (14)

Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
Mixed-language Python/C++ debugging with Python Tools for Visual Studio- Pave...
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
What is Python?
What is Python?What is Python?
What is Python?
 
Python Introduction
Python IntroductionPython Introduction
Python Introduction
 
LingPy : A Python Library for Historical Linguistics
LingPy : A Python Library for Historical LinguisticsLingPy : A Python Library for Historical Linguistics
LingPy : A Python Library for Historical Linguistics
 
SWIG Hello World
SWIG Hello WorldSWIG Hello World
SWIG Hello World
 
Python
PythonPython
Python
 
Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.
 
A commercial open source project in Python
A commercial open source project in PythonA commercial open source project in Python
A commercial open source project in Python
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
 
Python Introduction
Python IntroductionPython Introduction
Python Introduction
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | Edureka
 
Python
Python Python
Python
 

Ähnlich wie Biopython Project Update 2013

BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
Mark Jensen
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
Brad Chapman
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
BOSC 2010
 

Ähnlich wie Biopython Project Update 2013 (20)

BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net Developer
 
Migrating python.org to buildbot 9 and python 3
Migrating python.org to buildbot 9 and python 3Migrating python.org to buildbot 9 and python 3
Migrating python.org to buildbot 9 and python 3
 
openBIO
openBIOopenBIO
openBIO
 
Python Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech PunePython Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech Pune
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow execution
 
BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
 
The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3The Onward Journey: Porting Twisted to Python 3
The Onward Journey: Porting Twisted to Python 3
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
Australian Bioinformatics Conference (ABiC) 2014 Talk - Doing bioinformatics ...
 
Pybind11 - SciPy 2021
Pybind11 - SciPy 2021Pybind11 - SciPy 2021
Pybind11 - SciPy 2021
 
Python for beginners
Python for beginnersPython for beginners
Python for beginners
 
20101026 ASAP Seminar
20101026 ASAP Seminar20101026 ASAP Seminar
20101026 ASAP Seminar
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
 
Biopython
BiopythonBiopython
Biopython
 
Python lecture 01
Python lecture 01Python lecture 01
Python lecture 01
 
python unit2.pptx
python unit2.pptxpython unit2.pptx
python unit2.pptx
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Biopython Project Update 2013

  • 1. Biopython Project Update 2013 Peter Cock & the Biopython Developers, BOSC 2013, Berlin, Germany Twitter: @pjacock & @biopython
  • 3. My Employer  After PhD joined Scottish Crop Research Institute  In 2011, SCRI (Dundee) & MLURI (Aberdeen) merged as The James Hutton Institute Government funded research institute I work mainly on the genomics of Plant Pathogens I use Biopython in my day to day work More about this in tomorrow’s panel discussion, “Strategies for Funding and Maintaining Open Source Software” 3
  • 4. Biopython  Open source bioinformatics library for Python  Sister project to:  BioPerl  BioRuby  BioJava  EMBOSS  etc (see OBF Project BOF meeting tonight)  Long running! 4
  • 5. Brief History of Biopython  1999 - Started by Andrew Dalke & Jeff Chang  2000 - First release, announcement publication  Chapman & Chang (2000). ACM SIGBIO Newsletter 20, 15-19  2001 - Biopython 1.00  2009 - Application note publication  Cock et al. (2009) DOI:10.1093/bioinformatics/btp163  2011 - Biopython 1.57 and 1.58  2012 - Biopython 1.59 and 1.60  2013 - Biopython 1.61 and 1.62 beta 5
  • 6. Recap from last BOSC 2012  Eric Talevich presented in Boston  Biopython 1.58, 1.59 and 1.60  Visualization enhancements for chromosome and genome diagrams, and phylogenetic trees  More file format parsers  BGZF compression  Google Summer of Code students ...  Bio.Phylo paper submitted and in review ...  Biopython working nicely under PyPy 1.9 ... 6
  • 8. Bio.Phylo paper published  Talevich et al (2012) DOI:10.1186/1471-2105-13-209 8 Talevich et al. BMC Bioinformatics 2012, 13:209 http://www.biomedcentral.com/1471-2105/13/209 SOFTWARE Open Access Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython Eric Talevich1*, Brandon M Invergo2, Peter JA Cock3 and Brad A Chapman4 Abstract Background: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines. Results: We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source. Conclusions: Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.
  • 9. Google Summer of Code (GSoC) 9
  • 10. Google Summer of Code 2012  Two students under OBF  Lenna Peterson  Genomic Variant Toolkit for Biopython  Mentors: Brad Chapman & James Casbon  http://arklenna.tumblr.com/tagged/gsoc2012  Wibowo Arindrarto  Bio.SearchIO - pairwise sequence search files input/ output, e.g. BLAST, HMMER  Mentor: Peter Cock  http://bow.web.id/blog/tag/gsoc/  Both completed their projects & still contributing 10
  • 11. Google Summer of Code 2013  Two students under NESCent  Zheng Ruan  Codon alignment and analysis  Mentors: Eric Talevich & Peter Cock  http://zr1991.blogspot.de/  Yanbo Ye  Bio.Phylo: filling in the gaps  Mentor: Eric Talevich  http://blog.yeyanbo.com/tag/gsoc.html 11
  • 13. Biopython 1.61  Major refresh of sequence motif handling code  Bio.SearchIO GSoC work as an experimental module  Contributors:  Brandon Invergo  Bryan Lunt (*)  Christian Brueffer (*)  David Cain  Eric Talevich  Grace Yeo (*)  Jeffrey Chang  Jingping Li (*)  Kai Blin (*) 13  Leighton Pritchard  Lenna Peterson  Lucas Sinclair (*)  Michiel de Hoon  Nick Semenkovich (*)  Peter Cock  Robert Ernst (*)  Tiago Antao  Wibowo 'Bow' Arindrarto
  • 14. Biopython 1.62  Beta released  Final release after BOSC/ISMB/ECCB  Warning on translating partial codons  Explicit is better than implicit!  Parsers for GAF, GPA and GPI from UniProt-GOA  Reworked feature location object model  Cleaner handling of multi-region locations  Linked to GTF/GFF3 parsing and other plans  Official Python 3 support ... 14 Please test this beta!
  • 15. Biopython 1.62  Contributors (as of the beta release): 15 Alexander Campbell (*) Andrea Rizzi (*) Anthony Mathelier (*) Ben Morris (*) Brad Chapman Christian Brueffer David Arenillas (*) David Martin (*) Eric Talevich Iddo Friedberg Jian-Long Huang (*) Joao Rodrigues Kai Blin Michiel de Hoon Nate Sutton (*) Peter Cock Petra Kubincová (*) Phillip Garland Saket Choudhary (*) Tiago Antao Wibowo 'Bow' Arindrarto Xabier Bello (*)
  • 17. Python 2 and 3  Python 2.7 is the final release of Python 2  Python 3 is similar but different to Python 2  Most Python 2 code needs updating to run  Big difference is Python 3 uses unicode for strings  We need to be explicit about bytes vs unicode in many of our parsers  Text file IO defaults to unicode, which is slower  Most Python libraries are gradually being updated 17
  • 18. Python 3 strategy  We’ve been testing under Python 3 for over a year  Biopython 1.62 will officially support Python 3.3  Current strategy:  Develop under Python 2  Installation under Python 3 uses 2to3 converter  Test under Python 3 18
  • 19. Cross Platform Testing  Nightly tests with BuildBot  Tests run on volunteer machines  Covers multiple OS and Python combinations  More volunteer machines welcome, especially 64 bit Windows  Server runs on OBF funded Amazon server  Continuous integration with TravisCI  Covers a range of languages using VMs  Free service for Open Source projects  Runs tests when code on GitHub updated  Runs tests for GitHub pull requests (Nice!) 19
  • 20. Cross Platform Testing 20 Python/OS Linux 32 bit Linux 64 bit Mac OS X 64 bit Windows 32 bit C Python 2.5 C Python 2.6 C Python 2.7 C Python 3.1 C Python 3.2 C Python 3.3 PyPy 1.9 PyPy 2.0 Jython 2.5 Jython 2.7b BB + Travis BB BB BB BB + Travis BB BB BB BB + Travis BB BB BB BB BB BB BB BB BB BB BB + Travis BB BB Travis BB BB BB BB BB BB BB This test matrix is quite big!
  • 21. Cross Platform Testing 21 Python/OS Linux 32 bit Linux 64 bit Mac OS X 64 bit Windows 32 bit C Python 2.5 C Python 2.6 C Python 2.7 C Python 3.1 C Python 3.2 C Python 3.3 PyPy 1.9 PyPy 2.0 Jython 2.5 Jython 2.7b BB + Travis BB BB BB BB + Travis BB BB BB BB + Travis BB BB BB BB BB BB BB BB BB BB BB + Travis BB BB Travis BB BB BB BB BB BB BB Dropping Python 2.5 support Also means Jython 2.5
  • 22. Cross Platform Testing 22 Python/OS Linux 32 bit Linux 64 bit Mac OS X 64 bit Windows 32 bit C Python 2.5 C Python 2.6 C Python 2.7 C Python 3.1 C Python 3.2 C Python 3.3 PyPy 1.9 PyPy 2.0 Jython 2.5 Jython 2.7b BB + Travis BB BB BB BB + Travis BB BB BB BB + Travis BB BB BB BB BB BB BB BB BB BB BB + Travis BB BB Travis BB BB BB BB BB BB BB Have been useful in Python 3 testing, but won’t support
  • 23. Cross Platform Testing Plan  Target Python 2.6, 2.7 and 3.3 (or later)  Volunteer machines needed, especially 64 bit Windows 23 Python/OS Linux 32 bit Linux 64 bit Mac OS X 64 bit Windows 32 bit Windows 64 bit C Python 2.6 C Python 2.7 C Python 3.3 PyPy 1.9 PyPy 2.0 PyPy 2.1b Jython 2.7b BB + Travis BB BB BB BB + Travis BB BB BB BB + Travis BB BB Travis BB BB BB BB BB Python 2.7 variants
  • 24. Python 3 strategy  Current strategy:  Develop under Python 2  Installation under Python 3 uses 2to3 converter  Test under Python 3  Future strategy:  Target Python 2.6, 2.7 and 3.3 (or later)  Start writing code which works on both at same time  Continue to use 2to3 on a case-by-case basis during transition period, or for problem cases 24
  • 26. Stability versus Flexibility  We aim for rigorous cross-platform testing  We value backwards compatibility in core code  Flexibility through modularity?  BioRuby’s Gems http://gems.bioruby.org  BioPerl sub-packages on CPAN  Can/should we move towards this with PyPI?  Would this encourage more contributors?  We’re trying ‘beta level’ experimental modules within the monolithic Biopython distribution, e.g. SearchIO 26
  • 27. Biopython Support: Resources  OBF hosted website, mailing lists, bug tracker, etc  GitHub hosted repository  TravisCI hosted continuous integration testing  Personal and institute BuildSlave machines for testing 27 Thank you all!
  • 28. Biopython Support: People  Google supports summer students via GSoC  Some of the developers do contribute on work time  However, Biopython is mainly volunteer funded  Please help out, e.g.  Feedback  Bug reports  Documentation improvements  Review code  More unit tests  Enhancements or new code  ... 28 Thank you all!