BioRuby is a bioinformatics library for the Ruby programming language. It provides object-oriented tools for tasks like sequence analysis, format conversion, running bioinformatics tools, and working with biological data. The latest version added features like improved support for phylogenetic XML (PhyloXML), next-generation sequencing FASTQ format reading/writing, and a REST API wrapper for the NCBI database. BioRuby development follows agile principles and its large developer community contributes new code frequently on GitHub. The project aims to improve integration with R and data visualization while maintaining a stable core.
1. BioRuby
Project Update
Raoul J.P. Bonnal co-authors:
Toshiaki Katayama
r@bioruby.org Pjotr Prins
Life Science Informatics Mitsuteru Nakao
Integrative Biology Program
Fondazione INGM
Christian M Zmasek
Italy Nahoisa Goto
11th Annual Bioinformatic Open Source Conference (BOSC) 2010
Boston, Massachusetts, USA
2. Introduction
BioRuby - bioinformatics library for Ruby language
• Object oriented scripting language, functional and reflective
• has become popular by "Ruby on Rails“
• created by Matz in 1993 in Japan
3. BioRuby & Platforms
Ruby Interpreter
Performances Portability
Ruby JRuby
RubyEE Java libraries
gem install bio
Operating Systems
4. BioRuby & Platforms
BioLib
Ruby Interpreter
Performances Portability
Ruby JRuby
RubyEE Java libraries
gem install bio
Operating Systems
5. BioRuby & Platforms Cytoscape
Ruby Interpreter
Performances Portability
Ruby JRuby
RubyEE Java libraries
gem install bio
Operating Systems
6. History
2008 2009 2010
WebServices Workflows SemanticWeb
Code fest
1.3.0 1.4.0
1.3.1 BOSC
--- GSoC GSoC
+++ git
•phyloXML •Ruby 1.9.2
•NeXML I/O, RDF triples
•Infer gene duplications
GitHub: GSoC references:
http://github.com/bioruby/bioruby Ruby 1.9.2 support of BioRuby (OBF)
Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent)
Implementation of algorithm to infer gene duplications in BioRuby (OBF)
Implementing phyloXML support in BioRuby (NESCent)
8. Relevant New Features1
Bio::SQL Interoperable storage of sequences -Raoul Bonnal-
require ‘bio’
#active_record (ORM)
#your_database_adapter (MYSQL, Postgresql,JDBC)
connection =
Bio::SQL.establish_connection({‘development=>{‘hostname=>you_host_name,
‘database’=> ‘CoolBioSeqDB’,
‘adapter’=> ‘jdbcmysql’
‘username’=> ‘Raoul’,
‘password’=> ‘SmartPassword’},
‘development’)
#read a GenBank file and store:
my_sotrage = Bio::SQL::Biodatabase.find(:first)
genbank = Bio::GenBank.open(‘dbvrl1.gb’)
genbank.each_entry do |gb|
Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence,
:biodatabase=>my_sotrage)
end
#fetch an accession is easy
Bio::SQL.fetch_accession(your_accession).to_biosequence.output(:embl)
9. Relevant New Features2
Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek-
require ‘bio’ # libxml-ruby
#Create a parser
phyloxml = Bio::PhyloXML::Parser.new(‘example.xml’)
#Consume the tree
phyloxml.each do |tree|
puts tree.name
end
#Wrinting
writer = Bio::PhyloXML::Writer.new(‘my_tree.xml’)
write.writer(tree2)
#Extract information
phyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy_mollusca.xml’)
phyloxml.each do |tree|
tree.each_nome do |node|
print ‘Scientific name: ‘, node.taxonomies[0].scientific_name,‘n’
end
end Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for
evolutionary biology and
comparative genomics. BMC Bioinformatics, 10, 356.
10. Relevant New Features3
Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto-
require ‘bio’
ff_fasta = Bio::FlatFile.open(filename.fasta)
ff_qual = Bio::FlatFile.open(filename.qual)
while entry_fasta = ff_fasta.next_entry
seq = entry_fasta.to_biosequence
seq.quality_score_type = :phred
seq.quality_scores = ff_qual.next_entry.data
puts seq.output(:fastq,
:title => entry_fasta.definition)
end
● Format supported: SOLEXA, ILLUMINA
Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P.
M. (2010). The Sanger
FASTQ file format for sequences with quality scores, and
the Solexa/Illumina
FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.
11. Relevant New Features4
Bio::NCBI::REST example
require ‘bio’
ncbi = Bio::NCBI::REST::ESearch.new
ncbi.search("nucleotide", "tardigrada")
ncbi.count("nucleotide", "tardigrada")
ncbi.nucleotide("tardigrada")
ncbi.taxonomy("tardigrada")
ncbi.pubmed("tardigrada", "reldate" => 365)
ncbi.pubmed("mammoth mitochondrial genome")
Bio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGG
require ‘bio’
t = Bio::TogoWS::REST.new
puts t.entry('genbank', 'AF237819')
puts t.search('uniprot', 'lung cancer')
12. BioRuby is Agile
● OpenBio* developers are the Stakeholders
● Speed up in the iteration proccess
● Frequent meetings (mail, skype/voice chat, irc)
● Test Everything (required for new features)
– Improve quality , maintainability and guarantee portability
– Ruby Unit Testing Framework , Rspec
● GitHub
● Low barries for new developers
● 32 forks and 100 people watching us
Agile Manifesto
15. Ongoing Work
● Semantic Web (started @ BioHackathon 2010)
● Expose data in RDF
● Consuming SPARQL end points efficiently
● Ruby 1.9.2 support of BioRuby ( GSoC & OBF)
● Improved performances
● Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC &
NESCent)
● Implementation of algorithm to infer gene duplications in BioRuby
(GSoC & OBF)
16. PlugIn system
● We want a BioRuby core stable on every OS
● But… we want to use experimental code ASAP
● BioRuby + BioRuby Plugin + Rails we can have multiple
applications with an unique core and specific features
– User or Application
● Suggest Guidelines for plugin namespace
● On GitHub you can find our plugins looking for
bioruby-plugin-NAME
17. PlugIn system
The plugin system will be delivered with the next
BioRuby release
BioGraphics – Jan Aerts-
For biologists:
bioruby --plugin install graphics
For geeks:
bioruby --plugin install git://github.com/user/repo.git
It’s very experimental
18. What We Need
● Better integration with R
● Better support for data visualization (interpretation)
● Detailed Roadmap
19. Publications
BioRuby: Bioinformatics software for the Ruby programming language (submitted)
Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama
The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and
workflows (accepted)
Toshiaki Katayama et all.
Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010)
TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids
Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010)
Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010).
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic Acids Res, 38(6), 1767.1771.
Over 24 articles use BioRuby as in their analyses, check the up to date list:
http://bioruby.open-bio.org/wiki/Research_using_BioRuby
20. Acknoledgments
● BioRuby Team
Open Bioinformatics Foundation
● Toshiaki Katayama*
● Naoshita Goto*
● Pjotr Prins* Database Center for Life Science
● Mitsuteru Nakao*
● Jan Aerts*
● Christian M Zmasek*
Google Summer of Code
● All GSoC students
NESCent
National Evolutionary Synthesis Center
* co-author