SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
BioLib Development Report (BOSC
             2009)
 C and C++ libraries for BioPerl, BioJAVA,
         BioPython, BioRuby. . .
                      Pjotr Prins (pjotr.prins at wur.nl)


Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center




                                                              BioLib Development Report (BOSC 2009) – p.
The stated problem

Many high-level languages used in Biology
(Perl, R, Java. . . )
Duplication of effort in all Bio* efforts -
BioPerl, BioConductor, BioJAVA. . .
in particular for data IO/parsing/interpretation
(Alan’s keynote)




                                         BioLib Development Report (BOSC 2009) – p.
What if?

What if you need some functionality (e.g. linear
regression) in Perl, you can
   Roll your own in Perl (performance?)
   Bind against existing clib using Perl-XS (ugh)
   Bind using SWIG (better, but one-off like
   Perl::GSL)
   Bind using SWIG with Biolib (all languages)
   In fact, it may already be there (GSL or Rlib)

                                         BioLib Development Report (BOSC 2009) – p.
DRY-DRO

Do not repeat yourself (DRY)
Do not repeat ourselves (DRO)
Bio*: BioPerl, BioPython, BioRuby, BioJAVA,
BioConductor, BioHaskell, BioCPP, . . .
Limited pool of programmers in bioinformatics
Usually 2 or 3 competing implementations
Use existing implementations


                                   BioLib Development Report (BOSC 2009) – p.
Why bother?

Open Source Software is about eyes




                               BioLib Development Report (BOSC 2009) – p.
Eyes!

Eyes like these!




                   BioLib Development Report (BOSC 2009) – p.
Eyes (3)

Eyes like these!. . .




                        BioLib Development Report (BOSC 2009) – p.
Eyes (5)

Well, realistically. . .




                           BioLib Development Report (BOSC 2009) – p.
BioLib project

Objectives:
   Utilize existing C/C++ libraries
   Create mappings to all Bio* languages
   Focus on correctness and
   performance
   A central place (plumbing)
   An OBF affiliated project



                                      BioLib Development Report (BOSC 2009) – p.
Power Trio

Plumbing power trio:
   Git - modular version control
   Cmake - make file generator
   SWIG - simplified wrapper and interface
   generator




                                     BioLib Development Report (BOSC 2009) – p. 1
Power trio (1)

GIT
  Version control on steroids
  What source control should be
   Easy branching of development
   Submodules




                                   BioLib Development Report (BOSC 2009) – p. 1
Power trio (2)

CMake
  Generator for make files
  Very modular approach
  Resolves complex dependencies
  Looks like a simple
  programming language
  Easy on the eyes and mind



                                  BioLib Development Report (BOSC 2009) – p. 1
Power trio (3)

SWIG
  Code generator for mappings done right:
    Rules for generating code
    Macros (DRY)
    Pattern matching
    Flexible
    Supports many languages




                                    BioLib Development Report (BOSC 2009) – p. 1
Achievements (year one)

  Affyio: Affymetrix arrays (357 methods; 10K lines)
  Staden: Sequencer trace files (95; 16K)
  GSL: GNU Science Library (2702; 200K)
  Rlib: R routines (> 176; 43K)
  R/qtl: Quantitative genetics (> 100; 10K)*
  Libsequence: Sequence analysis (> 1000; 21K)*
  Bio++: Sequence analysis (> 1000; 52K)*

Code base 350K lines USD 10 million R&D
                                               BioLib Development Report (BOSC 2009) – p. 1
Source tree

|--   clibs
|     |-- affyio-1.8
|     |-- biolib_R
|     |-- biolib_microarray
|     |-- libsequence-1.6.6
|--   mappings
|     ‘-- swig
|         |-- perl
|         |    |-- affyio
|         |    |-- staden_io_lib
|         |    ‘-- test
|         |-- python
|         |-- ruby
104   directories, 668 files




                                        BioLib Development Report (BOSC 2009) – p. 1
Adding a C lib

Unpack C/C++ library in
./src/clibs/modulename
Add CMake file - compiles into .so shared
library
Create Perl mapping in
./src/mapping/swig/perl/module
Add SWIG .i file
Add CMake file - compiles into .pm and .so
shared library

                                  BioLib Development Report (BOSC 2009) – p. 1
CMake goodies

# Defining a C library build in Biolib:
SET (M_NAME staden_io_lib)
SET (M_VERSION 1.11.6)
FIND_PACKAGE(ZLIB REQUIRED)
BUILD_CLIB()

ADD_LIBRARY(${LIBNAME} SHARED
array.c
compress.c
compression.c
ctfCompress.c
(...)

INSTALL_CLIB()




                                          BioLib Development Report (BOSC 2009) – p. 1
CMake for Perl

# Defining a C library mapping for Perl
SET (USE_ZLIB TRUE)
SET (USE_INCLUDEPATH io_lib)

FIND_PACKAGE(MapPerl)

POST_BUILD_PERL_BINDINGS()
TEST_PERL_BINDINGS()
INSTALL_PERL_BINDINGS()




                                          BioLib Development Report (BOSC 2009) – p. 1
SWIG Map

%include <Read.h>

#define TT_ANY 0
#define TT_ZTR 7

typedef struct
{
    int         format;
    char       *trace_name;
    int         NPoints;
    int         NBases;
    (...)
} Read;

Read *read_reading(char *fn, int format);



                                            BioLib Development Report (BOSC 2009) – p. 1
Perl

use biolib::staden_io_lib;

$result = staden_io_lib::read_reading($fn,
                                      $staden_io_lib::TT_ANY);
print("format=",staden_io_libc::Read_format_get($result));
print("NBases=",$result->{NBases});
print("base=",staden_io_libc::Read_base_get($result));

Outputs:

format=7
NBases=766
base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT
     CGGTCCCAACTTAATTGTACA...




                                                     BioLib Development Report (BOSC 2009) – p. 2
Python

import biolib.staden_io_lib as io_lib

result = io_lib.read_reading(procsrffn,
                             io_lib.TT_ANY)
print result.format
print result.NBases
print result.base

7
766
NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT
CGGTCCCAACTTAATTGTACA...




                                              BioLib Development Report (BOSC 2009) – p. 2
For the Perl coder

Adding functionality in language of choice
Easier deployment - ’install biolib-perl’
Shared correctness testing
Generated API documentation




                                       BioLib Development Report (BOSC 2009) – p. 2
For the authors

Independent source trees
Increased exposure (Ruby, Perl. . . )
Added unit/integration testing environment
Deployment, multi-platform support (Linux,
OSX, Windows)
No autoconf pain (./configure and friends)
Implicit access to other libraries (GSL, Rlib)
Online generated API documentation

                                        BioLib Development Report (BOSC 2009) – p. 2
Future work

Automated API documentation (with doctests)
More libraries (Emboss, NCBI, . . . )
New code (HPC)
More languages (JAVA, R, OCaml, . . . )
Bio* integration (CPAN, Ruby gems, Python
eggs)
Debian/Fedora/OSX/Windows packages
More platforms (Windows without Cygwin)

                                        BioLib Development Report (BOSC 2009) – p. 2
Credits

Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl)

Jonathan Leto (GSL SWIG)

Xin Shuai (Google SoC libsequence)

Adam Smith (Google SoC Bio++)

Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA)

Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF)

Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC)




                                                               BioLib Development Report (BOSC 2009) – p. 2
BoF

BioLib: Birds of a Feather Session (BoF) at 16:50 hours




                                                          BioLib Development Report (BOSC 2009) – p. 2

Weitere ähnliche Inhalte

Ähnlich wie Prins Bio Lib Bosc 2009

Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
BOSC 2010
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
bosc_2008
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
Mark Jensen
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-update
Jan Aerts
 

Ähnlich wie Prins Bio Lib Bosc 2009 (20)

Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Biopython Project Update 2013
Biopython Project Update 2013Biopython Project Update 2013
Biopython Project Update 2013
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
 
The Parrot VM
The Parrot VMThe Parrot VM
The Parrot VM
 
Talk6 biopython bosc2011
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
Biopython
BiopythonBiopython
Biopython
 
Open Source .NET
Open Source .NETOpen Source .NET
Open Source .NET
 
BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
 
Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-update
 
Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
From Java to Kotlin - The first month in practice v2
From Java to Kotlin - The first month in practice v2From Java to Kotlin - The first month in practice v2
From Java to Kotlin - The first month in practice v2
 
R Dz7.5 Overview
R Dz7.5 OverviewR Dz7.5 Overview
R Dz7.5 Overview
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 

Mehr von bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
bosc
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
bosc
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
bosc
 

Mehr von bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Prins Bio Lib Bosc 2009

  • 1. BioLib Development Report (BOSC 2009) C and C++ libraries for BioPerl, BioJAVA, BioPython, BioRuby. . . Pjotr Prins (pjotr.prins at wur.nl) Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center BioLib Development Report (BOSC 2009) – p.
  • 2. The stated problem Many high-level languages used in Biology (Perl, R, Java. . . ) Duplication of effort in all Bio* efforts - BioPerl, BioConductor, BioJAVA. . . in particular for data IO/parsing/interpretation (Alan’s keynote) BioLib Development Report (BOSC 2009) – p.
  • 3. What if? What if you need some functionality (e.g. linear regression) in Perl, you can Roll your own in Perl (performance?) Bind against existing clib using Perl-XS (ugh) Bind using SWIG (better, but one-off like Perl::GSL) Bind using SWIG with Biolib (all languages) In fact, it may already be there (GSL or Rlib) BioLib Development Report (BOSC 2009) – p.
  • 4. DRY-DRO Do not repeat yourself (DRY) Do not repeat ourselves (DRO) Bio*: BioPerl, BioPython, BioRuby, BioJAVA, BioConductor, BioHaskell, BioCPP, . . . Limited pool of programmers in bioinformatics Usually 2 or 3 competing implementations Use existing implementations BioLib Development Report (BOSC 2009) – p.
  • 5. Why bother? Open Source Software is about eyes BioLib Development Report (BOSC 2009) – p.
  • 6. Eyes! Eyes like these! BioLib Development Report (BOSC 2009) – p.
  • 7. Eyes (3) Eyes like these!. . . BioLib Development Report (BOSC 2009) – p.
  • 8. Eyes (5) Well, realistically. . . BioLib Development Report (BOSC 2009) – p.
  • 9. BioLib project Objectives: Utilize existing C/C++ libraries Create mappings to all Bio* languages Focus on correctness and performance A central place (plumbing) An OBF affiliated project BioLib Development Report (BOSC 2009) – p.
  • 10. Power Trio Plumbing power trio: Git - modular version control Cmake - make file generator SWIG - simplified wrapper and interface generator BioLib Development Report (BOSC 2009) – p. 1
  • 11. Power trio (1) GIT Version control on steroids What source control should be Easy branching of development Submodules BioLib Development Report (BOSC 2009) – p. 1
  • 12. Power trio (2) CMake Generator for make files Very modular approach Resolves complex dependencies Looks like a simple programming language Easy on the eyes and mind BioLib Development Report (BOSC 2009) – p. 1
  • 13. Power trio (3) SWIG Code generator for mappings done right: Rules for generating code Macros (DRY) Pattern matching Flexible Supports many languages BioLib Development Report (BOSC 2009) – p. 1
  • 14. Achievements (year one) Affyio: Affymetrix arrays (357 methods; 10K lines) Staden: Sequencer trace files (95; 16K) GSL: GNU Science Library (2702; 200K) Rlib: R routines (> 176; 43K) R/qtl: Quantitative genetics (> 100; 10K)* Libsequence: Sequence analysis (> 1000; 21K)* Bio++: Sequence analysis (> 1000; 52K)* Code base 350K lines USD 10 million R&D BioLib Development Report (BOSC 2009) – p. 1
  • 15. Source tree |-- clibs | |-- affyio-1.8 | |-- biolib_R | |-- biolib_microarray | |-- libsequence-1.6.6 |-- mappings | ‘-- swig | |-- perl | | |-- affyio | | |-- staden_io_lib | | ‘-- test | |-- python | |-- ruby 104 directories, 668 files BioLib Development Report (BOSC 2009) – p. 1
  • 16. Adding a C lib Unpack C/C++ library in ./src/clibs/modulename Add CMake file - compiles into .so shared library Create Perl mapping in ./src/mapping/swig/perl/module Add SWIG .i file Add CMake file - compiles into .pm and .so shared library BioLib Development Report (BOSC 2009) – p. 1
  • 17. CMake goodies # Defining a C library build in Biolib: SET (M_NAME staden_io_lib) SET (M_VERSION 1.11.6) FIND_PACKAGE(ZLIB REQUIRED) BUILD_CLIB() ADD_LIBRARY(${LIBNAME} SHARED array.c compress.c compression.c ctfCompress.c (...) INSTALL_CLIB() BioLib Development Report (BOSC 2009) – p. 1
  • 18. CMake for Perl # Defining a C library mapping for Perl SET (USE_ZLIB TRUE) SET (USE_INCLUDEPATH io_lib) FIND_PACKAGE(MapPerl) POST_BUILD_PERL_BINDINGS() TEST_PERL_BINDINGS() INSTALL_PERL_BINDINGS() BioLib Development Report (BOSC 2009) – p. 1
  • 19. SWIG Map %include <Read.h> #define TT_ANY 0 #define TT_ZTR 7 typedef struct { int format; char *trace_name; int NPoints; int NBases; (...) } Read; Read *read_reading(char *fn, int format); BioLib Development Report (BOSC 2009) – p. 1
  • 20. Perl use biolib::staden_io_lib; $result = staden_io_lib::read_reading($fn, $staden_io_lib::TT_ANY); print("format=",staden_io_libc::Read_format_get($result)); print("NBases=",$result->{NBases}); print("base=",staden_io_libc::Read_base_get($result)); Outputs: format=7 NBases=766 base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  • 21. Python import biolib.staden_io_lib as io_lib result = io_lib.read_reading(procsrffn, io_lib.TT_ANY) print result.format print result.NBases print result.base 7 766 NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  • 22. For the Perl coder Adding functionality in language of choice Easier deployment - ’install biolib-perl’ Shared correctness testing Generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  • 23. For the authors Independent source trees Increased exposure (Ruby, Perl. . . ) Added unit/integration testing environment Deployment, multi-platform support (Linux, OSX, Windows) No autoconf pain (./configure and friends) Implicit access to other libraries (GSL, Rlib) Online generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  • 24. Future work Automated API documentation (with doctests) More libraries (Emboss, NCBI, . . . ) New code (HPC) More languages (JAVA, R, OCaml, . . . ) Bio* integration (CPAN, Ruby gems, Python eggs) Debian/Fedora/OSX/Windows packages More platforms (Windows without Cygwin) BioLib Development Report (BOSC 2009) – p. 2
  • 25. Credits Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl) Jonathan Leto (GSL SWIG) Xin Shuai (Google SoC libsequence) Adam Smith (Google SoC Bio++) Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA) Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF) Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC) BioLib Development Report (BOSC 2009) – p. 2
  • 26. BoF BioLib: Birds of a Feather Session (BoF) at 16:50 hours BioLib Development Report (BOSC 2009) – p. 2