SlideShare ist ein Scribd-Unternehmen logo
1 von 109
Downloaden Sie, um offline zu lesen
Open Access Publications of
      Noel O’Boyle



       November 2, 2011
Contents

I     Cheminformatics toolkits                                                             5
1 Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit                        7

2 Cinfony - combining Open Source cheminformatics toolkits behind a common interface 15

3 Open Babel: An open chemical toolbox                                                    25


II    Enzyme reaction mechanisms                                                          39
4 MACiE: a database of enzyme reaction mechanisms                                         41

5 MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for search-
  ing catalytic mechanisms                                                            43


III    QSAR                                                                               49
6 PYCHEM: a multivariate analysis package for python                                      51

7 Simultaneous feature selection and parameter optimisation using an artificial ant colony:
  case study of melting point prediction                                                   53


IV     The Rest                                                                           69
8 Userscripts for the life sciences                                                       71

9 Confab - Systematic generation of diverse low-energy conformers                         83

10 Review of “Data Analysis with Open Source Tools”                                       93

11 Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years
   on                                                                                 95




                                              3
Part I

Cheminformatics toolkits




           5
Chemistry Central Journal
 Software                                                                                                                               Open Access
 Pybel: a Python wrapper for the OpenBabel cheminformatics
 toolkit
 Noel M O'Boyle*1,2, Chris Morley3 and Geoffrey R Hutchison4

 Address: 1Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2
 1EW, UK, 2Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK, 3OpenBabel Development Team and 4Department
 of Chemistry, University of Pittsburgh, Chevron Science Center, 219 Parkman Avenue, Pittsburgh, PA 15260, USA
 Email: Noel M O'Boyle* - baoilleach@gmail.com; Chris Morley - c.morley@gaseq.co.uk; Geoffrey R Hutchison - geoffh@pitt.edu
 * Corresponding author




 Published: 9 March 2008                                                             Received: 23 January 2008
                                                                                     Accepted: 9 March 2008
 Chemistry Central Journal 2008, 2:5   doi:10.1186/1752-153X-2-5
 This article is available from: http://journal.chemistrycentral.com/content/2/1/5
 © 2008 O'Boyle et al
 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
 which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.




                   Abstract
                   Background: Scripting languages such as Python are ideally suited to common programming tasks
                   in cheminformatics such as data analysis and parsing information from files. However, for reasons
                   of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in
                   compiled languages such as C++. We describe Pybel, a Python module that provides access to the
                   OpenBabel toolkit.
                   Results: Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and
                   writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to
                   simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily
                   interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by
                   Pybel.
                   Conclusion: Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate
                   chemical information. It is open source, available cross-platform, and offers the power of the
                   OpenBabel toolkit to Python programmers.




 Background                                                                           OpenBabel is a C++ toolkit with extensive capabilities for
 Cheminformaticians often need to write once-off scripts                              reading and writing molecular file formats (over 80 are
 to create extract data from text files, prepare data for anal-                       supported) as well as for manipulating molecular data [2].
 ysis or carry out simple statistics. Scripting languages such                        Many standard chemistry algorithms are included, for
 as Perl, Python and Ruby are ideally suited to these day-                            example, determination of the smallest set of smallest
 to-day tasks [1]. Such languages are, however, an order of                           rings, bond order perception, addition of hydrogens, and
 magnitude or more slower than compiled languages such                                assignment of Gasteiger charges. In relation to cheminfor-
 as C++. Since cheminformaticians regularly deal with                                 matics, OpenBabel supports SMARTS searching [3],
 molecular files containing thousands of molecules and                                molecular fingerprints [4] (both Daylight-type, and struc-
 many cheminformatics algorithms are computationally                                  tural-key based), and includes group contribution
 expensive, cheminformatics toolkits are typically written                            descriptors for LogP [5], polar surface area (PSA) [6] and
 in compiled languages for performance.                                               molar refractivity (MR) [5].



                                                                                                                                          Page 1 of 7
Chem. Cent. J. 2008, 2, 5.                                                                                        (page number not for citation purposes)
Chemistry Central Journal 2008, 2:5                                          http://journal.chemistrycentral.com/content/2/1/5



Of the current popular scripting languages, Python [7] is        header files, SWIG generates a C file which, when com-
the de-facto standard language for scripting in cheminfor-       piled and linked with the Python development libraries
matics. Several commercial cheminformatics toolkits have         and OpenBabel, creates a Python extension module,
interfaces in Python: OpenEye's closed-source successor          openbabel. This can then be imported into a Python script
to OpenBabel, OEChem [8], is a C++ toolkit with inter-           like any other Python module using the "import openbabel"
faces in Python and Java; Rational Discovery's RDKit [9],        statement.
which is now open source, is a C++ cheminformatics
toolkit with a Python interface; the Daylight toolkit [10]       For a small number of C++ objects and functions, it was
from Daylight Chemical Information Systems, written in           necessary to add some convenience functions to facilitate
C, only has Java and C++ wrappers but PyDaylight [11],           access from Python. Certain types of molecule files have
available separately from Dalke Scientific, provides a           additional data present in addition to the connection
Python interface to the toolkit; the Cambios Molecular           table. OpenBabel stores these data in subclasses of OBGe-
Toolkit [12] from Cambios Consulting is a commercial             nericData such as OBPairData (for the data fields in mol-
C++ toolkit with a Python interface. There are also toolkits     ecule files such as MOL files and SDF files) and
entirely implemented in Python: Frowns [13], an open             OBUnitCell (for the data fields in CIF files). To access the
source cheminformatics toolkit by Brian Kelley, and PyBa-        data it is necessary to 'downcast' an instance of OBGener-
bel [14], an open source toolkit included in the MGLTools        icData to the specific subclass. For this reason, two con-
package from the Molecular Graphics Labs at the Scripps          venience functions were added to the interface file, one to
Research Institute. Note that the latter is not related to the   cast OBGenericData to OBPairData, and one to cast to
OpenBabel project; rather its name derives from the fact         OBUnitCell. Another convenience function was added to
that its aim was to implement in Python some of the func-        convert a Python list to a C array of doubles, as this type
tionality of Babel v1.6 [15], a command-line application         of input is required for a small number of OpenBabel
for converting file formats which is a predecessor of            functions.
OpenBabel.
                                                                 Iterators are an important feature of the OpenBabel C++
Here we describe the implementation and application of           library. For example, OBAtomAtomIter allows the user to
Pybel, a Python module that provides access to the               easily iterate over the atoms attached to a particular atom,
OpenBabel C++ library from the Python programming                and OBResidueIter is an iterator over the residues in a
language. Pybel builds on the basic Python bindings to           molecule. The OpenBabel iterators use the dereference
make it easier to carry out frequent tasks in cheminformat-      operator to access the data, the increment operator to iter-
ics. It also aims to be as 'Pythonic' as possible; that is, to   ate to the next element, and the boolean operator to test
adhere to Python language conventions and idioms, and            whether any elements remain. Iterators are also a core fea-
where possible to make use of Python language features           ture of the Python language. However, the iterators used
such as iterators. The result is a module that takes advan-      by OpenBabel are not automatically converted into
tage of Python's expressive syntax to allow cheminforma-         Python iterators. To deal with this, Python iterator classes
ticians to carry out tasks such as SMARTS matching, data         that wrap the dereference, increment and boolean opera-
field manipulation and calculation of molecular finger-          tors behind the scenes were added to the SWIG interface
prints in just a few lines of code.                              file, so that Python statements such as "for
                                                                 attached_obatom in OBAtomAtomIter(obatom)" work with-
Implementation                                                   out problem.
SWIG bindings
Python bindings to the OpenBabel toolkit were created            Pybel module
using SWIG [16]. SWIG (Simplified Wrapper and Inter-             The SWIG bindings provide direct access from Python to
face Generator) is a tool that automates the generation of       the C++ objects and functions in the OpenBabel API
bindings to libraries written in C or C++. One of the            (application programming interface). The purpose of the
advantages of SWIG compared to other automated wrap-             Pybel module is to wrap these bindings to present a more
ping methods such as Boost.Python [17] or SIP [18] is that       Pythonic interface to OpenBabel (Figure 1). This extra
SWIG also supports the generation of bindings to several         level of abstraction is useful as Python programmers
other languages. For example, OpenBabel also uses SWIG           expect Python libraries to behave in certain ways that a
to generate bindings for Perl, Ruby and Java. An addi-           C++ library does not. For example, in Python, attributes of
tional advantage is that SWIG will directly parse C or C++       an object are often directly accessed whereas in C++ it is
header files while Boost.Python and SIP require each C++         typical to call Get/Set functions to access them. A C++
class to be exposed manually. The input to SWIG is an            function returning a particular object might require a
interface file containing a list of OpenBabel header files       pointer to an empty object as a parameter, whereas the
for which to generate bindings. Using the signatures in the      Python equivalent would not. Even something as simple


                                                                                                                      Page 2 of 7
  Chem. Cent. J. 2008, 2, 5.                                                                  (page number not for citation purposes)
Chemistry Central Journal 2008, 2:5                                     http://journal.chemistrycentral.com/content/2/1/5



                                                            code shows how to store each molecule in a multimole-
                                                            cule SDF file in a list called allmols:

                                                            import openbabel

                                                            allmols = []

                                                            obconversion = openbabel.OBConversion()

                                                            obconversion.SetInFormat("sdf")

                                                            obmol = openbabel.OBMol()

                                                            notatend = obconversion.ReadFile(obmol,
                                                            "inputfile.sdf")

                                                            while notatend:

                                                                allmols.append(obmol)

                                                                obmol = openbabel.OBMol()

                                                                notatend = obconversion.Read(obmol)

                                                            To replace this somewhat verbose code, Pybel provides a
                                                            readfile method that takes a file format and filename and
                                                            returns molecules using the 'yield' keyword. This changes
                                                            the method into a 'generator', a Python language feature
                                                            where a method behaves like an iterator. Iterators are a
                                                            major feature of the Python language which are used for
                                                            looping over collections of objects. In Pybel, we have used
                                                            iterators where possible to simplify access to the toolkit.
                                                            As a result, the equivalent to the preceding code is:
Figure
text and1the OpenBabel C++ library
The relationship between Python modules described in the
The relationship between Python modules described           import pybel
in the text and the OpenBabel C++ library. Python
modules are shown in green; the C++ library is shown in     allmols = [mol for mol in                        pybel.read
blue.                                                       file("sdf", "inputfile.sdf")]

                                                            The benefits of iterator syntax are clear when dealing with
as differences in the conventions for the case of letters   multimolecule files. For single molecule files, however,
used in variable and method names is a problem, as it       the user needs to remember to explicitly request the itera-
makes it more likely for Python programmers to intro-       tor to return the first and only molecule using the next
duce bugs in their code.                                    method:

One of the key aims of Pybel was to reduce the amount of    mol   =    pybel.readfile("mol",                        "input
code necessary to carry out common tasks. This is espe-     file.mol").next()
cially important for a scripting language where program-
ming is often done interactively at a command prompt. In    Pybel provides replacements for two of the main classes in
addition, as for any programming language, repeated         the OpenBabel library, OBMol and OBAtom. The follow-
entry of code for routine and common tasks (so-called       ing discussion describes the Pybel Molecule class which
'boilerplate code') is a common cause of errors in code.    wraps an instance of OBMol, but the same design princi-
Reading and writing molecule files is one of the most       ples apply to the Pybel Atom class. Table 1 summarises
common tasks for users of OpenBabel but requires several    the attributes and methods of the Molecule object. By
lines of code if using the SWIG bindings. The following     wrapping the base class, Pybel can enhance the Molecule


                                                                                                                 Page 3 of 7
  Chem. Cent. J. 2008, 2, 5.                                                             (page number not for citation purposes)
Chemistry Central Journal 2008, 2:5                                                        http://journal.chemistrycentral.com/content/2/1/5



Table 1: Attributes and methods supported by the Pybel Molecule object

 Attribute              Description*

 OBMol                  The underlying OBMol object
 atoms                  A list of Pybel Atoms
 charge                 The total charge (GetTotalCharge)
 data                   A MoleculeData object for access to data fields
 dim                    The dimensionality of the coordinates (GetDimension)
 energy                 The heat of formation (GetEnergy)
 exactmass              The mass calculated using isotopic abundance (GetExactMass)
 flags                  The set of flags used internally by OpenBabel (GetFlags)
 formula                The stoichiometric formula (GetFormula)
 mod                    The number of nested BeginModify() calls (Internal use) (GetMod)
 molwt                  The standard molar mass (GetMolWt)
 spin                   The total spin multiplicity (GetTotalSpinMultiplicity)
 sssr                   The smallest set of smallest rings (GetSSSR)
 title                  The title of the molecule (often the filename) (GetTitle)
 unitcell               Unit cell data (if present)

 Method
 write                  Write the molecule to a file or return it as a string
 calcfp                 Return a molecular fingerprint as a Fingerprint object
 calcdesc               Return the values of the group contribution descriptors
 __iter__               Enable iteration over the Atoms in the Molecule

 *Where a Molecule attribute is a direct replacement for a 'Get' method of the underlying OBMol, the name of the method is given in parentheses.


object by providing (1) direct access to attributes rather                 # Using Pybel
than through the use of Get methods, (2) additional
attributes of the object, and (3) additional methods that                  value =           pybel.Molecule(mol).data                     ["com
act on the object.                                                         ment"]

(1) As mentioned earlier, it is typical in Python to access                It should be noted that all of these attributes are calculated
attribute values directly rather than using Get/Set meth-                  on-the-fly rather than stored for future access as the under-
ods. With this in mind, the Molecule class adds attributes                 lying OBMol may have been modified.
such as energy, formula and molwt (among others) which
give the values returned by calling GetEnergy(), GetFor-                   (3) Four additional methods have been added to the
mula() and GetMolWt(), respectively on the underlying                      Pybel Molecule (Table 1). The first is a write method
OBMol (see Table 1 for the full list).                                     which writes a representation of the Molecule to a file and
                                                                           takes care of error handling. As with reading molecules
(2) One of the aims of Pybel is to simplify access to some                 from files (see above), this method simplifies the proce-
of the most common attributes. With this in mind, an                       dure significantly compared to using the SWIG bindings
atoms attribute has been added which returns a list of the                 directly. In addition, a calcfp method and a calcdesc
atoms of the molecule as Pybel Atoms. Access to the data                   method have been added which calculate a binary finger-
fields associated with a molecule has been simplified by                   print for the molecule, and some descriptor values, respec-
creation of a MoleculeData object which is returned when                   tively. In the OpenBabel library these are not methods of
the data attribute of a Molecule is accessed. MoleculeData                 the OBMol, but rather are loaded as plugins (by OBFin-
presents a dictionary interface to the data fields of the                  gerprint.FindFingerprint and OBDescriptor.FindType,
molecule. Accessing and updating these field is more con-                  respectively) to which an OBMol is passed as input. The
voluted if using the SWIG bindings. Compare the follow-                    __iter__ method is a special Python method that enables
ing statements for accessing the "comment" field of the                    iteration over an object; in the case of a Molecule, the
variable mol, an OBMol:                                                    defined iterator loops over the Atoms of the Molecule.
                                                                           This feature enables constructions such as "for atom in
# Using the SWIG bindings                                                  mol" where mol is a Pybel Molecule.

value = openbabel.toPairData(mol.GetData                                   SMARTS is a query language developed by Daylight
["comment"]).GetValue()                                                    Chemical Information Systems for molecular substructure


                                                                                                                                     Page 4 of 7
  Chem. Cent. J. 2008, 2, 5.                                                                                 (page number not for citation purposes)
Chemistry Central Journal 2008, 2:5                                        http://journal.chemistrycentral.com/content/2/1/5



searching [3]. As implemented in the OpenBabel toolkit,        The OBMol wrapped by a Pybel Molecule can be accessed
finding matches of a particular substructure in a particular   through the OBMol attribute. This makes it easy to call a
molecule is a four step process that involves creating an      method not wrapped by Pybel, such as OBMol.NumRotors,
instance of OBSmartsPattern, initialising it with a            which returns the number of rotatable bonds in a mole-
SMARTS pattern, searching for a match, and finally             cule:
retrieving the result:
                                                               mol   =    pybel.readfile("mol",                         "input
obsmarts = openbabel.OBSmartsPattern()                         file.mol").next()

obsmarts.Init("[#6] [#6]")                                     numrotors = mol.OBMol.NumRotors()

obsmarts.Match(obmol)                                          Documentation and Testing
                                                               To minimise programming errors, programs written
results = obsmarts.GetUMapList()                               dynamically-typed languages such as Python should be
                                                               tested comprehensively. Pybel has 100% code coverage in
Since a SMARTS query can be thought of as a regular            terms of unit tests, as measured by Ned Batchelder's cov-
expression for molecules, in Pybel we decided to wrap the      erage.py [19]. It also has several doctests, short snippets of
SMARTS functionality in an analogous way to Python's           Python code included in documentation strings which
regular expression module, re. With these changes, the         serve as both examples of usage and as unit tests.
same process takes only two steps, an initialisation step
and a search step:                                             The Pybel API is fully documented with docstrings. These
                                                               can be accessed in the usual way with the help() com-
smarts = pybel.Smarts("[#6] [#6]")                             mand at the interactive Python prompt after importing
                                                               Pybel: for example, "help(pybel.Molecule)". In addition, the
results = smarts.findall(pybelmol)                             OpenBabel Python web page [20] contains a complete
                                                               description of how to use the SWIG bindings and the
Pybel was not written to replace the SWIG bindings but         Pybel API. The webpage also contains links to HTML ver-
rather to make it simpler to perform common tasks. As a        sions of the OpenBabel API documentation and Pybel API
result, Pybel does not attempt to wrap every single            documentation. The latter is included in Additional File 1.
method and class in the OpenBabel library. Because of
this, a user may often want to interconvert between an         Results and Discussion
OBMol and a Molecule, or an OBAtom and an Atom. This           The principle aim of Pybel is to make it simpler to use the
is quite a straightforward process. A Pybel Molecule can be    OpenBabel toolkit to carry out common tasks in chem-
created by passing an OBMol to the Molecule constructor.       informatics. These common tasks include reading and
In the following example an OBMol is created using the         writing molecule files, accessing data fields of a molecule,
SWIG bindings and then written to a file using Pybel:          computing and comparing molecular fingerprints and
                                                               SMARTS matching. Here we present some examples that
obmol = openbabel.OBMol()                                      illustrate how Pybel may be used to carry out common
                                                               cheminformatics tasks.
a = obmol.NewAtom()
                                                               Removal of duplicate molecules
a.SetAtomicNum(6)                                              When merging different datasets or as a final step in pre-
                                                               processing, it may be necessary to identify and remove
a.SetVector(0.0, 1.0, 2.0) # Set coordi                        duplicate molecules. In the following example, only the
nates                                                          unique molecules in the multimolecule SDF file "input-
                                                               file.sdf" will be written to "uniquemols.sdf". Here we will
b = obmol.NewAtom()                                            assume that a unique InChI string (IUPAC International
                                                               Chemical Identifier) indicates a unique molecule. A simi-
obmol.AddBond(1, 2, 1) # Single bond from                      lar procedure could be performed using the OpenBabel
Atom 1 to Atom 2                                               canonical SMILES format, by replacing "inchi" with "can"
                                                               in the following:
pybel.Molecule(obmol).write("mol",                    "out
putfile.mol")                                                  import pybel

                                                               inchis = []


                                                                                                                     Page 5 of 7
  Chem. Cent. J. 2008, 2, 5.                                                                 (page number not for citation purposes)
Chemistry Central Journal 2008, 2:5                                         http://journal.chemistrycentral.com/content/2/1/5



output      =     pybel.Outputfile("sdf",                       ties. This is the Lipinski Rule of Fives, so-called as the
"uniquemols.sdf")                                               numbers involved are all multiples of five. The following
                                                                example shows how to filter a database to identify only
for mol in pybel.readfile("sdf", "input                         those molecules that pass all four of the Lipinski criteria.
file.sdf"):                                                     The values of the Lipinski descriptors are also added to the
                                                                output file as data fields. Note that whereas molecular
    inchi = mol.write("inchi")                                  weight is directly available as an attribute of a Molecule,
                                                                and LogP is available as one of the three group contribu-
    if inchi not in inchis:                                     tion descriptors calculated by OpenBabel, we need to use
                                                                SMARTS pattern matching to identify the number of
        output.write(mol)                                       hydrogen bond donors and acceptors. The SMARTS pat-
                                                                terns used here correspond to the definitions of hydrogen
        inchis.append(inchi)                                    bond donor and acceptor used by Lipinski:

output.close()                                                  import pybel

Selection of similar molecules                                  HBD = pybel.Smarts("[#7,#8;!H0]")
Another common task in cheminformatics is the selection
of a set of molecules of similar structure to a target mole-    HBA = pybel.Smarts("[#7,#8]")
cule. Here we will assume that structural similarity is indi-
cated by a Tanimoto coefficient [21] of at least 0.7 with       def lipinski(mol):
respect to Daylight-type (that is, based on hashed paths
through the molecular graph) fingerprints. Note that               """Return the values of the Lipinski
Pybel redefines the | operator (bitwise OR) for Fingerprint     descriptors."""
objects as the Tanimoto coefficient:
                                                                    desc = {'molwt': mol.molwt,
import pybel
                                                                        'HBD': len(HBD.findall(mol)),
targetmol = pybel.readfile("sdf", "target
mol.sdf").next()                                                        'HBA': len(HBA.findall(mol)),

targetfp = targetmol.calcfp()                                         'LogP':               mol.calcdesc(['LogP'])
                                                                ['LogP']}
output = pybel.Outputfile("sdf", "similar
mols.sdf")                                                          return desc

for mol in pybel.readfile("sdf", "input                         passes_all_rules = lambda                    desc:        (desc
file.sdf"):                                                     ['molwt'] <= 500 and

    fp = mol.calcfp()                                                    desc ['HBD']               <=      5     and       desc
                                                                ['HBA'] <= 10 and
    if fp | targetfp >= 0.7:
                                                                             desc ['LogP'] <= 5)
        output.write(mol)
                                                                if __name__=="__main__":
output.close()
                                                                   output = pybel.Outputfile("sdf", "pas
Applying a Rule of Fives filter                                 sLipinski.sdf")
In an influential paper, Lipinski et al. [22] performed an
analysis of drug compounds that reached Phase II clinical          for   mol   in            pybel.readfile("sdf",
trials and found that they tended to occupy a certain range     "inputfile.sdf"):
of values for molecular weight, LogP, and number of
hydrogen bond donors and acceptors. Based on this, they                 descriptors = lipinski(mol)
proposed a rule with four criteria to identify molecules
that might have poor absorption or permeation proper-                   if passes_all_rules(descriptors):


                                                                                                                     Page 6 of 7
  Chem. Cent. J. 2008, 2, 5.                                                                 (page number not for citation purposes)
Chemistry Central Journal 2008, 2:5                                              http://journal.chemistrycentral.com/content/2/1/5



             mol.data.update(descriptors)                       Additional material

             output.write(mol)
                                                                     Additional file 1
                                                                     Pybel API. The HTML documentation of the Pybel API (application pro-
    output.close()                                                   gramming interface).
                                                                     Click here for file
Future work                                                          [http://www.biomedcentral.com/content/supplementary/1752-
The future development of Pybel is closely linked to any             153X-2-5-S1.zip]
changes and improvements to OpenBabel. With each new
release of the OpenBabel API, the SWIG bindings will be
updated to include any additional functionality. How-
ever, additions to the Pybel API will only occur if they sim-   Acknowledgements
plify access to new features of the OpenBabel toolkit of        The idea for the Pybel module was inspired by Andrew Dalke's work on
                                                                PyDaylight [11]. We thank the anonymous reviewers for their helpful com-
general use to cheminformaticians. In general, the Pybel
                                                                ments.
API can be considered stable, and an effort will be made
to ensure that future changes will be backwards compati-
                                                                References
ble.                                                            1.      Ousterhout JK: Scripting: Higher Level Programming for the
                                                                        21st Century. [http://home.pacbell.net/ouster/scripting.html].
Conclusion                                                      2.      OpenBabel v.2.1.1 [http://openbabel.sf.net]
                                                                3.      SMARTS – A Language for Describing Molecular Patterns
Pybel provides a high-level Python interface to the widely-             [http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html]
used OpenBabel C++ toolkit. This combination of a high          4.      Flower DR: On the properties of bit string-based measures of
                                                                        chemical similarity. J Chem Inf Comput Sci 1998, 38:379-386.
performance cheminformatics toolkit and an expressive           5.      Wildman SA, Crippen GM: Prediction of physicochemical
scripting language makes it easy for cheminformaticians                 parameters by atomic contributions. J Chem Inf Comput Sci
to rapidly and efficiently write scripts to manipulate                  1999, 39:868-873.
                                                                6.      Ertl P, Rohde B, Selzer P: Fast calculation of molecular polar
molecular data.                                                         surface area as a sum of fragment-based contributions and
                                                                        its application to the prediction of drug transport properties.
Pybel is freely available from the OpenBabel web site2                  J Med Chem 2000, 43:3714-3717.
                                                                7.      Python [http://www.python.org]
both as part of the OpenBabel source distribution and for       8.      OEChem: OpenEye Scientific Software: Santa Fe, NM. .
Windows as an executable installer. Compiled versions           9.      RDKit [http://www.rdkit.org]
                                                                10.     Daylight Toolkit: Daylight Chemical Information Systems,
are also available as packages in some Linux distributions              Inc.: Aliso Viejo, CA. .
(openbabel-python in Fedora, for example).                      11.     PyDaylight: Dalke Scientific Software, LLC: Santa Fe, NM. .
                                                                12.     Cambios Molecular Toolkit: Cambios Computing, LLC: Palo
                                                                        Alto, CA. .
Availability and Requirements                                   13.     Frowns [http://frowns.sf.net]
Project name: Pybel                                             14.     PyBabel in MGLTools [http://mgltools.scripps.edu]
                                                                15.     Babel v.1.6 [http://smog.com/chem/babel/]
                                                                16.     SWIG v.1.3.31 [http://www.swig.org]
Project home page: http://openbabel.sf.net/wiki/Python          17.     Boost.Python [http://www.boost.org/libs/python/doc/]
                                                                18.     SIP – A Tool for Generating Python Bindings for C and C++
                                                                        Libraries [http://www.riverbankcomputing.co.uk/sip/]
Operating system(s): Platform independent                       19.     coverage.py           [http://nedbatchelder.com/code/modules/cover
                                                                        age.html]
Programming language: Python                                    20.     OpenBabel Python               [http://openbabel.sourceforge.net/wiki/
                                                                        Python]
                                                                21.     Jaccard P: La distribution de la flore dans la zone alpine. Rev
Other requirements: OpenBabel                                           Gen Sci Pures Appl 1907, 18:961-967.
                                                                22.     Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental
                                                                        and computational approaches to estimate solubility and
License: GNU GPL                                                        permeability in drug discovery and development settings.
                                                                        Adv Drug Del Rev 1997, 23:3-25.
Any restrictions to use by non-academics: None

Authors' contributions
GRH is the lead developer of OpenBabel and created the
SWIG bindings. NMOB developed Pybel, and extended
the SWIG interface file. CM compiled the SWIG bindings
on Windows and added convenience functions to the
OpenBabel API to facilitate access from scripting lan-
guages. All authors read and approved the final manu-
script.


                                                                                                                               Page 7 of 7
  Chem. Cent. J. 2008, 2, 5.                                                                           (page number not for citation purposes)
Chemistry Central Journal
 Software                                                                                                                                 Open Access
 Cinfony – combining Open Source cheminformatics toolkits behind
 a common interface
 Noel M O'Boyle*1 and Geoffrey R Hutchison2

 Address: 1Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK and 2Department of Chemistry, University of
 Pittsburgh, Chevron Science Center, 219 Parkman Avenue, Pittsburgh, PA 15260, USA
 Email: Noel M O'Boyle* - oboyle@ccdc.cam.ac.uk; Geoffrey R Hutchison - geoffh@pitt.edu
 * Corresponding author




 Published: 3 December 2008                                                      Received: 9 October 2008
                                                                                 Accepted: 3 December 2008
 Chemistry Central Journal 2008, 2:24   doi:10.1186/1752-153X-2-24
 This article is available from: http://journal.chemistrycentral.com/content/2/1/24
 © 2008 O'Boyle et al




                   Abstract
                   Background: Open Source cheminformatics toolkits such as OpenBabel, the CDK and the RDKit
                   share the same core functionality but support different sets of file formats and forcefields, and
                   calculate different fingerprints and descriptors. Despite their complementary features, using these
                   toolkits in the same program is difficult as they are implemented in different languages (C++ versus
                   Java), have different underlying chemical models and have different application programming
                   interfaces (APIs).
                   Results: We describe Cinfony, a Python module that presents a common interface to all three of
                   these toolkits, allowing the user to easily combine methods and results from any of the toolkits. In
                   general, the run time of the Cinfony modules is almost as fast as accessing the underlying toolkits
                   directly from C++ or Java, but Cinfony makes it much easier to carry out common tasks in
                   cheminformatics such as reading file formats and calculating descriptors.
                   Conclusion: By providing a simplified interface and improving interoperability, Cinfony makes it
                   easy to combine complementary features of OpenBabel, the CDK and the RDKit.




 Background                                                                           In general, all of these toolkits share the same core func-
 Cheminformatics toolkits are essential to the day-to-day                             tionality although the implementation details and under-
 work of the practising cheminformatician. They enable                                lying chemical model may differ. However, as a result of
 the user to deal with such tasks as handling different                               their independent development and history, each has
 chemistry file formats, substructure searching, calculation                          functionality specific to itself and each toolkit supports
 of molecular fingerprints, and structure diagram genera-                             different sets of file formats and forcefields, and can calcu-
 tion. The main Open Source cheminformatics libraries                                 late different molecular fingerprints and molecular
 under active development are OpenBabel [1], the Chem-                                descriptors (Table 1). Despite the diversity of these
 istry Development Kit (CDK) [2], and the RDKit [3].                                  toolkits and the potential benefits in being able to access
 OpenBabel is a C++ toolkit with bindings in Perl, Python,                            all of them at the same time, there has been little work on
 Ruby and Java, the CDK is a Java toolkit, while the RDKit                            interoperability between them. This has resulted in a bal-
 is another C++ toolkit with Python bindings. While the                               kanization of this field such that users of one toolkit rarely
 CDK has its origins in academia, both OpenBabel and the                              use another toolkit.
 RDKit originated in companies (OpenEye and Rational
 Discovery, respectively) and have subsequently been                                  One way to achieve interoperability of chemical toolkits is
 developed by the community under Open Source licenses.                               through the use of standard file formats for exchange of

                                                                                                                                          Page 1 of 10
Chem. Cent. J. 2008, 2, 24.                                                                                         (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                                     http://journal.chemistrycentral.com/content/2/1/24



Table 1: Some features of toolkits which are not shared by all three toolkits.

 CDK
 A large number of descriptors (some overlap with RDKit)
 Pharmacophore searching (like RDKit*)
 Calculation of maximum common substructure
 2D structure layout (like RDKit) and depiction
 MACCS keys (also RDKit) and E-State fingerprints
 Integration with the R statistical programming environment
 Support for mass-spectrometry analysis (representations for cleavage reactions, structure generation from formulae)
 Fragmentation schemes (ring fragments, Murcko)
 3D structure generation using a template and heuristics (like OpenBabel)
 3D similarity using ultrafast shape descriptors
 Gasteiger π charge calculation

 OpenBabel
 Not just focused on cheminformatics
 Supports a very large number of chemical file formats including quantum mechanics file formats, molecular mechanics trajectories, 2D sketchers
 3D structure generation using a template method (like CDK)
 Included in all major Linux distributions
 Bindings available from several scripting languages apart from Python, as well as the Java and .NET platforms
 Conformation generation and searching
 InChI (also CDK) and InChIKey generation
 Support for crystallographic space groups
 Several forcefield implementations: UFF (also RDKit), MMFF94, MMFF94s, Ghemical
 Ability to add custom data types to atoms, bonds, residues, molecules

 RDKit
 A large number of descriptors (some overlap with CDK)
 Fragmentation using RECAP rules
 2D coordinate generation (like CDK) and depiction
 3D coordinate generation using geometry embedding
 Calculation of Cahn-Ingold-Prelog stereochemistry codes (R/S)
 Pharmacophore searching (like CDK)
 Calculation of shape similarity (based on volume overlap)
 Chemical reaction handling and transforms
 Atom pairs and topological torsions fingerprints
 Feature maps and feature-map vectors
 Machine-learning algorithms

 * Where the term "like" is used, it indicates that the implementation details differ.



data. For example, the CML project has defined a stand-                        models between different toolkits, and differences in the
ardised XML format for chemical data [4], with successive                      API for core cheminformatics tasks shared by the toolkits.
releases refining and extending the original standard. The
OpenSMILES effort [5] has attempted to resolve ambigui-                        Here we describe Cinfony, a Python module that over-
ties in the published SMILES definition [6] to create a                        comes these barriers to provide interoperability at the API
standard. While these efforts deserve support, they face                       level. Cinfony allows access to OpenBabel, the CDK, and
inevitable problems achieving consensus and they require                       the RDKit through a common interface, and uses a simple
changes to existing software to support the standard. The                      yet robust method to pass chemical models between
large number of chemical file formats supported by                             toolkits. Pybel, one of the components of Cinfony, has
OpenBabel (currently over 80) illustrates both the poten-                      been described previously [7]. It provides access to
tial of achieving a standard as well as the difficulties.                      OpenBabel from standard Python. In this work, we show
                                                                               that the API developed for Pybel may be considered a
An alternative is interoperability at the API (application                     generic API for accessing any cheminformatics toolkit. We
programming interface) level. This has the advantage that                      describe the design and implementation of the Cinfony
it does require any changes to existing software. However,                     API for OpenBabel, the RDKit and the CDK. Next, we
there are at least three barriers to overcome: the need for a                  show how Cinfony simplifies the process of accessing the
programming language that can access all the toolkits                          toolkits and how it can be used in practice to combine the
simultaneously, the difficulty of exchanging chemical                          power of the three Open Source toolkits. Finally, we dis-

                                                                                                                                     Page 2 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                                  (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                                 http://journal.chemistrycentral.com/content/2/1/24



cuss performance and some results from comparisons of                   Although the OBMol of OpenBabel has a corresponding
the toolkits.                                                           method, OBMol.AddHydrogens(), the RDKit uses a glo-
                                                                        bal method, AddHs(Mol), while the CDK requires the
Implementation                                                          user to instantiate a HydrogenAdder object, which can
Common Application Programming Interface                                then be used to add hydrogens.
Cinfony presents the same interface to three cheminfor-
matics toolkits, OpenBabel, the CDK and the RDKit.                      The Molecule methods described in the original Pybel API
These are available through three separate modules: oba-                [7] have been extended to handle hydrogen addition and
bel, cdk and rdkit. The API is designed to make it easy to              removal, structure diagram generation, assignment of 3D
carry out many of the common tasks in cheminformatics,                  geometry to 0D structures and geometry optimisation
and covers the core functionality shared by all of the                  using forcefields. Both the CDK and the RDKit are capable
toolkits. Table 2 gives an overview of the API. The com-                of 2D coordinate generation and 2D depiction. However,
plete API is available here (see Additional file 1).                    since OpenBabel currently has neither of these capabili-
                                                                        ties, a fourth toolkit, OASA, is used by Pybel for this pur-
The main class containing chemical information is the                   pose. OASA is a lightweight cheminformatics toolkit
Molecule class. Rather than create a new chemical model,                implemented in Python [8].
the Molecule class is a light wrapper around the molecule
object in the underlying library, for example, around                   A new development in the latest version of OpenBabel is
OBMol in the case of OpenBabel. Attribute values such as                3D coordinate generation and geometry optimisation
the molecular weight are calculated dynamically by query-               using one of a number of forcefields. Since these methods
ing the underlying molecule. This ensures that if the                   are also available in the RDKit, and are under develop-
underlying OBMol, for example, is altered, the attribute                ment in the CDK, two additional methods have been
values returned will still be correct. The actual underlying            added to the Cinfony Molecule: make3D(), for 3D coor-
object (an OpenBabel OBMol, a CDK Molecule, or an                       dinate generation, and localopt(), for geometry optimisa-
RDKit Mol) can be accessed directly at any point.                       tion. Particularly in the case of OpenBabel, these new
                                                                        methods simplify the process of generating 3D coordi-
The Molecule class also contains several methods that act               nates. Compare a single call to make3D() in Cinfony with
on molecules such as methods for calculating fingerprints,              the following OpenBabel code:
adding hydrogens, and calculating descriptor values. This
makes it easy to access these methods, and also brings                  structuregenerator = openbabel.OBOp.Find
them to the attention of the user. In the underlying toolkit            Type('Gen3D')
these methods may not be present as part of the molecule
class, and in fact, they can be difficult to find in the                structuregenerator.Do(mol)
toolkit's API. For example, the Cinfony method Mole-
cule.addh() adds explicit hydrogens to the molecule.                    mol.AddHydrogens()
Table 2: An overview of the Cinfony API.

 Class name       Purpose

 Molecule         Wraps a molecule instance of the underlying toolkit and provides access to methods that act on molecules
 Atom             Wraps an atom instance of the underlying toolkit
 MoleculeData     Provides dictionary-like access to the information contained in the tag fields in SDF and MOL2 files
 Outputfile       Handles multimolecule output file formats
 Smarts           Wraps the SMARTS functionality of the toolkit in an analogous way to the Python 're' module for regular expression matching
 Fingerprint      Simplifies Tanimoto calculation of binary fingerprints

 Function name
 readfile         Return an iterator over Molecules in a file
 readstring       Return a Molecule

 Variable name
 descs            A list of descriptor IDs
 forcefields      A list of forcefield IDs
 fps              A list of fingerprint IDs
 informatsaa      A list of input format IDs
 outformats       A list of output format IDs




                                                                                                                                 Page 3 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                              (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                      http://journal.chemistrycentral.com/content/2/1/24



ff      =      openbabel.OBForceField.Find                     translation process is transparent to the user. However,
Type("MMFF94")                                                 the user should be aware of known limitations of particu-
                                                               lar readers or writers. For example, the SMILES parser in
ff.Setup(mol)                                                  CDK 1.0.3 ignores atom-based stereochemistry and thus
                                                               that information is lost if a 0D rdkit or obabel Molecule
ff.SteepestDescent(50)                                         with atom-based stereochemistry is converted to a cdk
                                                               Molecule.
ff.GetCoordinates(mol)
                                                               Cinfony Molecules are interconverted using the Mole-
The Cinfony API is identical for all of the toolkits. How-     cule() constructor. For example, if obabelmol is an obabel
ever, the values returned by particular API calls are not      Molecule, then the corresponding rdkit Molecule can be
necessarily standardised across toolkits. This Cinfony         constructed using rdkit.Molecule(pybelmol). This mecha-
design decision is in agreement with the Principle of Least    nism can also be used to interface Cinfony to other chem-
Surprise [9]; when the user accesses the underlying toolkit    informatics toolkits. The only requirements are that the
directly, they will get the same result as found when using    object passed to the Molecule() constructor needs to have
Cinfony. This design decision places the responsibility on     a _cinfony attribute set to True, and an _exchange
the user to become familiar with differences in how the        attribute containing a tuple (0, SMILES string) or (1, MOL
toolkits behave. For example, all of the toolkits allow the    file) depending on whether the molecule is 0D or not.
calculation of path-based fingerprints. These encode all
paths in the molecular graph up to a path length of P into     Implementation
a binary vector of length V, but the default values for V      The Python scripting language has two main implementa-
and P are different for each toolkit: 1024 and 7 for           tions. The most widely used implementation is the origi-
OpenBabel, 1024 and 8 for the CDK, and 2048 and 7 for          nal reference implementation of Python in C, referred to
RDKit. Although it is possible to alter these parameters for   as CPython when necessary to distinguish it from other
the CDK and the RDKit and so standardise V and P to            implementations. The next most widely used implemen-
1024 and 7 for all of the toolkits, it is reasonable to        tation is Jython, an implementation of Python in Java.
assume that the developers of each package have chosen         Although most users of Python do so through CPython,
sensible defaults. In addition, the implementation details     Jython scripts have the advantage of being able to access
of each of the fingerprinters would still be different; for    Java libraries natively. They can also be compiled into Java
example, the RDKit sets four bits when hashing each            classes to be used from Java programs. Jython scripts are
molecular path, the others set one; OpenBabel does not         also useful in contexts where Java is required but it is more
set any bits for the one-atom fragments, N, C and O.           convenient to work in Python; for example, to implement
                                                               a Java web servlet or a node in a Java workflow environ-
Interoperability                                               ment such as KNIME [11].
The ability to transfer chemical models between toolkits is
essential to the goal of interoperability. However, the        As discussed earlier, one of the barriers to interoperability
internal representation of a molecule is specific to a par-    is the requirement for a programming language that can
ticular toolkit. For example, as well as the connection        simultaneously access more than one of the toolkits. From
table and coordinates (if present), it may include derived     CPython it is possible to use Cinfony modules to connect
data relating to aromaticity, the number of implicit hydro-    to OpenBabel (pybel), the CDK (cdkjpype) and the RDKit
gens on an atom, or stereochemical configuration. Fortu-       (rdkit). From Jython, there are modules for OpenBabel
nately, the problem of transfer and storage of chemical        (jybel) and the CDK (cdkjython). Convenience modules
information has already been solved by the development         obabel and cdk are provided that automatically import the
of molecular file formats, of which over 80 are now sup-       appropriate OpenBabel or CDK module depending on
ported by OpenBabel. Specifically, the MDL MOL file for-       the Python implementation. The relationship between
mat [10] and the SMILES format [5,6] are shared by all         these Cinfony modules and the underlying cheminfor-
three toolkits, and are used by Cinfony to exchange infor-     matics libraries is summarised in Figure 1.
mation on molecules with 2D or 3D coordinates (MOL
file format), and no coordinates (SMILES format), respec-      pybel and jybel
tively.                                                        OpenBabel provides SWIG [12] bindings for both CPy-
                                                               thon and Java (among other languages). pybel is a wrapper
By using existing file formats rather than trying to inter-    around the CPython bindings, and has previously been
convert the internal models themselves, Cinfony takes          described in detail [7]. jybel is an implementation of the
advantage of the existing input/output code of each            Cinfony API that allows the user to access OpenBabel
toolkit which is well-tested and mature. In addition, the      from Jython using the Java bindings. Despite the fact that


                                                                                                                   Page 4 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                       http://journal.chemistrycentral.com/content/2/1/24



                                                                 rdkit
                                                                 Support for Python scripting has been part of the design
                                                                 of the RDKit from the start. The Python bindings in RDKit
                                                                 were created using Boost.Python [14], a framework for
                                                                 interfacing Python and C++. The Cinfony module rdkit
                                                                 uses these bindings to implement its API. It is currently
                                                                 not possible to access RDKit from Jython. RDKit has only
                                                                 preliminary support for Java bindings; when these are
                                                                 complete, a corresponding module will be added to Cin-
                                                                 fony.

                                                                 Dependency handling
                                                                 A fully-featured installation of Cinfony relies on a large
Figure 1
Relationship of Cinfony modules to Open Source toolkits          number of open source libraries. In particular, the 2D
Relationship of Cinfony modules to Open Source                   depiction capabilities introduce dependencies on several
toolkits. Python modules are accessible from CPython             graphics libraries which may be problematic to install on
(green), Jython (pale blue), or both (striped green and pale     a particular platform (Cairo and its Python bindings,
blue). Java libraries are indicated by dark blue, while C++      Python Imaging Library, AGG and the Python wrapper
libraries are yellow.                                            AggDraw). With this in mind, Cinfony treats all depend-
                                                                 encies as optional and only raises an Exception if the user
                                                                 calls a method or imports a module that requires a miss-
                                                                 ing dependency.

jybel is used from a Java implementation of Python, and          For example, the Python Imaging Library (PIL) is required
accesses a C++ library through the Java Native Interface         for displaying a 2D depiction on the screen. If all of the
(JNI), the jybel code differs from pybel in very few respects.   components of cinfony are installed except for PIL, Cin-
In Jython, it is not possible to iterate directly over the       fony works perfectly except that an Exception is raised if
wrapped STL vectors used by OpenBabel as their Java              the Molecule.draw() method is called with show = True
SWIG bindings do not implement the Iterable interface.           (the default). The image can however be written to a file
Also, the current Jython implementation is 2.2 and does          without problems (show = False, filename =
not support generator expressions, which were introduced         "image.png"). Similarly, if a user is only interested in
in Python 2.4. Although both C++ and Python have the             using the CDK and the RDKit, it is not necessary to install
concept of a global function or variable, this is not the        OpenBabel.
case in Java. SWIG places such functions, and get/set
methods for accessing the variables, in a special class          Full installation instructions for Windows, MacOSX and
named openbabel. Global constants are placed in another          Linux are available from the Cinfony website. It should be
class called openbabelConstants. A convenience module,           noted that for Windows users, there is no need to compile
obabel, is provided which automatically imports the              or search for missing libraries as the dependencies are
appropriate module depending on the Python implemen-             included as binaries in the Cinfony distribution.
tation.
                                                                 Results
cdkjpype and cdkjython                                           Cinfony API
Since Jython runs on top of the Java Virtual Machine             The original Pybel API was designed to make it easy to use
(JVM), it can access Java libraries such as the CDK              OpenBabel to perform the most common tasks in chem-
natively. To access Java libraries from CPython, the             informatics and to do so using idiomatic Python. Subse-
Python library JPype [13] is needed. This starts an instance     quently, we realised that the resulting API could be
of the JVM and uses the JNI to communicate back and              considered a generic API for wrapping the core function-
forth. Overall, the differences between the two wrappers         ality of any cheminformatics toolkit. Cinfony implements
are minor. Jython and JPype differ in the syntax used to         an extended version of the original Pybel API for the CDK
handle Java exceptions. Also, JPype returns unicode              and the RDKit, as well as OpenBabel. While the original
strings from the CDK and these need to be converted to           Pybel was restricted to CPython, Cinfony can also be used
regular strings (otherwise problems arise if they are passed     from Jython to access the CDK and OpenBabel.
to an OpenBabel method expecting a std::string). The
appropriate CDK wrapper, cdkjpype or cdkjython, will be          Cinfony helps cheminformaticians avoid the steep learn-
imported if the user imports the convenience module cdk.         ing curve associated with starting to use a new toolkit.


                                                                                                                    Page 5 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                 (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                     http://journal.chemistrycentral.com/content/2/1/24



With Cinfony, all of the core functionality of the toolkits   targetfp = targetmol.calcfp()
can be accessed with the same interface. For example, in
Cinfony, a molecule can be created from a SMILES string       output = cdk.Outputfile("sdf",                         "similar
with:                                                         mols.sdf")

mol   =    toolkit.readstring("smi",                  SMI     for mol in          cdk.readfile("sdf",                  "input
LESstring)                                                    file.sdf"):

RDKit                                                             fp = mol.calcfp()

mol = Chem.MolFromSmiles(SMILESstring)                            if fp | targetfp >= 0.7:

OpenBabel                                                             output.write(mol)

mol = openbabel.OBMol()                                       output.close()

obconversion = openbabel.OBConversion()                       Alternatively, we could just have made a single change to
                                                              the original script, by replacing the import statement from
obconversion.SetInFormat("smi")                               "import pybel" with "from cinfony import cdk as pybel".

obconversion.ReadString(mol,                          SMI     Using Cinfony to combine toolkits
LESstring)                                                    Another goal of Cinfony is to make it easy to combine
                                                              toolkits in the same script. This allows the user to exploit
CDK                                                           the complementary capabilities of different toolkits
                                                              (Table 1). For example, let's suppose the user wants to (1)
builder      =      cdk.DefaultChemObject                     convert a SMILES string to 3D coordinates with OpenBa-
Builder.getInstance()                                         bel, then (2) create a 2D depiction of that molecule with
                                                              the RDKit, next (3) calculate descriptors with the CDK,
sp = cdk.smiles.SmilesParser(builder)                         and finally (4) write out an SDF file containing the
                                                              descriptor values and the 3D coordinates. The full Python
mol = sp.parseSmiles(SMILESstring)                            script is only seven lines long:

The RDKit was designed with Python scripting in mind,         from cinfony import rdkit, cdk, obabel
and of the three toolkits is the most concise. On the other
hand, OpenBabel uses a characteristically C++ approach.       mol = obabel.readstring("smi", "CCC=O")
An empty molecule is created, and is passed to an OBCon-
version instance as a container for the molecule read from    mol.make3D()
the SMILES string. The SmilesParser in the CDK requires
an instance of an object implementing the IChemObject-        rdkit.Molecule(mol).draw(show                      =     False,
Builder interface.                                            filename = "aldehyde.png")

Another advantage of a common API is that a script writ-      descs = cdk.Molecule(mol).calcdesc()
ten for one toolkit can easily be modified to use another.
As an example, here is a script that selects molecules that   mol.data.update(descs)
are similar to a particular target molecule. This script is
taken from the original Pybel paper [7], but uses the CDK     mol.write("sdf",             filename             =        "alde
instead of OpenBabel and will run equally well from           hyde.sdf")
Jython and CPython. The only differences compared to
the original script are that "pybel" has been replaced with   For cheminformaticians interested in developing QSAR or
"cdk", and the import statement has been changed from         QSPR models, Cinfony can be used to simultaneously cal-
"import pybel":                                               culate descriptors from the RDKit, the CDK and OpenBa-
                                                              bel. For example, the following script reads a multiline
from cinfony import cdk                                       input file, with each line consisting of a SMILES string fol-
                                                              lowed by a property value. For each molecule, it calculates
targetmol = cdk.readfile("sdf",                 "target       all of the OpenBabel, RDKit and CDK descriptors (except
mol.sdf").next()                                              for CDK's CPSA) and writes out the results as a tab-sepa-

                                                                                                                    Page 6 of 10
  Chem. Cent. J. 2008, 2, 24.                                                               (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                        http://journal.chemistrycentral.com/content/2/1/24



rated file suitable for reading with the statistical package R   print >> outputfile,                    "t".join(["Prop
[15]. Note that in this example script, if descriptors share     erty"] + descnames)
the same name only one is retained. This is the case for the
TPSA descriptor in OpenBabel, which is replaced by the           for smile, propval, desc in zip(smiles,
RDKit's TPSA descriptor.                                         propvals, descs):

import string                                                       descvals = [str(desc[descname])                              for
                                                                 descname in descnames]
from cinfony import obabel, cdk, rdkit
                                                                    print >> outputfile, "t".join([smile,
# Read in SMILES strings and observed prop                       str(propval)] +
erty values
                                                                 descvals)
smiles, propvals = [], []
                                                                 outputfile.close()
for line in open("data.txt"):
                                                                 Performance
     broken = line.rstrip().split()                              Accessing cheminformatics libraries using Cinfony allows
                                                                 the user to rapidly develop scripts that manipulate chem-
     smiles.append(broken [0])                                   ical information. However, there is a small price to be
                                                                 paid. Firstly, there is the cost of moving objects across the
     propvals.append(float(broken))                              interface between Python and the cheminformatics librar-
                                                                 ies. Secondly, the additional code required by Cinfony to
mols = [obabel.readstring("smi",                    smile)       implement a standard API may slow performance further.
for smile in smiles]
                                                                 To assess the performance penalty for accessing chem-
#   Calculate         descriptor         values       using      informatics toolkits using Cinfony rather than directly in
OpenBabel,                                                       the native language, we looked at two simple test cases:
                                                                 (1) iterating over an SDF file containing 25419 molecules,
# the CDK (apart from 'CPSA') and the RDKit                      (2) iterating and printing out the molecular weight of
                                                                 each of the molecules. The SDF file used was 3_p0.0.sdf,
cdkdescs = [x for x in cdk.descs if x !=                         the first portion of the drug-like subset of the ZINC 7.00
'CPSA']                                                          dataset [16]. The Cinfony scripts, Java and C++ source
                                                                 code are available as Additional file 2. The results are
descs = []                                                       shown in Table 3.

for mol in mols:                                                 While accessing the CDK using Jython is almost as fast as
                                                                 a pure Java implementation, there is a considerable over-
     d = mol.calcdesc()                                          head associated with using JPype to access the CDK from
                                                                 CPython (89% slower for the second test case). This over-
   d.update(cdk.Molecule(mol).calcdesc(cd                        head is due to passing objects between the JVM and CPy-
kdescs))                                                         thon. For OpenBabel, there is little performance cost
                                                                 associated with accessing OpenBabel from either imple-
     d.update(rdkit.Molecule(mol).calcdesc(                      mentation of Python, although the jybel scripts are some-
))                                                               what slower than pybel scripts. A small portion of this
                                                                 speed difference can be attributed to a slower startup
     descs.append(d)                                             (about 1.6 seconds for jybel, compared to 0.8 seconds for
                                                                 pybel). Finally, from the RDKit results in Table 3, it is clear
# Write a file suitable for 'read.table'                         that using Boost.Python to wrap a C++ library is more effi-
in R                                                             cient than using SWIG. The difference in run times
                                                                 between the C++ and Python implementations is negligi-
outputfile = open("inputforR.txt", "w")                          ble.

descnames = sorted(descs [0].keys(), key =                       In practice, the performance of a particular Cinfony script
string.lower)                                                    will depend on the extent to which information is passed


                                                                                                                      Page 7 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                   (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                                  http://journal.chemistrycentral.com/content/2/1/24



Table 3: Performance of Cinfony modules compared to a native Java or C++ implementation.

                                   Iterate over SDF                               Iterate and calculate molecular weight

 CDK                             Time (s)            Normalised                             Time (s)                                Normalised
 Native Java                        21.2                  1.00                                 36.8                                      1.00
 cdkjython                          23.1                  1.09                                 41.6                                      1.13
 cdkjpype                           33.0                  1.57                                 69.5                                      1.89

 OpenBabel
 Native C++                          31.9                   1.00                                43.0                                        1.00
 pybel                               34.1                   1.07                                45.1                                        1.05
 jybel                               38.0                   1.19                                49.6                                        1.15

 RDKit
 Native C++                          99.7                   1.00                              100.7                                         1.00
 rdkit                               99.9                   1.00                              101.0                                         1.00

 The times reported are wallclock times from the best of three runs on a dual-core Intel Pentium 4 3.2 GHz machine with 1GB RAM.



back and forth between Python and the underlying Java or                 ticomponent molecules. For each molecule, PubChem
C++ library. Where most of the time is spent on computa-                 provides an SDF file containing coordinates for a 2D
tion in the underlying library, the speed difference                     depiction, as well as the depiction itself as a PNG file.
between a native implementation and one using Cinfony                    PubChem uses the CACTVS toolkit [18] to generate the
is expected to be small.                                                 2D coordinates as well as the corresponding depiction.
                                                                         Using a script similar to the following, we used Cinfony to
Comparison of toolkits                                                   generate 2D depictions using OASA (the depiction library
Cinfony makes it easy to compare the results obtained by                 used by pybel), the CDK and a development version of
different toolkits for the same operations. This can be use-             RDKit that all use the same 2D coordinates taken from the
ful in identifying bugs, applying a test suite, or finding the           SDF file:
strengths and weaknesses of particular implementations.
For example, where different toolkits calculate the same                 from cinfony import pybel, rdkit
descriptors, if the calculated values are not highly corre-
lated it may indicate a bug in one or the other. Earlier, we             for toolkit in [rdkit, pybel]:
mentioned that a difference in the treatment of implicit
hydrogens causes different toolkits to give different values                  name = toolkit.__name__
for molecular weight unless hydrogens are explicitly
added. Ensuring that a particular result is in agreement                    for mol in                  toolkit.readfile("sdf",
with that obtained by another toolkit can act as a sanity                "dataset.sdf"):
check in such instances to avoid errors.
                                                                               mol.draw(filename                   =    "%s_%s.png"             %
When carrying out the same operation with several                        (mol.title, name),
toolkits, it is often convenient to iterate over the toolkits
in an outer loop:                                                                       show = False,

from cinfony import obabel, rdkit, cdk                                                  usecoords = True)

for toolkit in [obabel, rdkit, cdk]:                                     When the resulting images were compared for the
                                                                         PubChem entry CID7250053, an error was found in the
   print                  toolkit.readstring("smi",                      depiction of the stereochemistry of an isopropyl group
"CCC").molwt                                                             (Figure 2). Since the error only occurred in certain cases, it
                                                                         had not been previously noticed and would have been dif-
As an example of how such comparisons can be used to                     ficult to identify without such a comparative study. Once
identify bugs in toolkits, let us consider depiction. As a               reported, the problem was quickly solved and the subse-
dataset, we randomly chose 100 molecules from                            quent RDKit release depicted the stereochemistry cor-
PubChem [17], with subsequent filtering to remove mul-                   rectly. A comparison of depictions by commercial toolkits


                                                                                                                                 Page 8 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                              (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                           http://journal.chemistrycentral.com/content/2/1/24



                                                               Other requirements: OpenBabel, CDK, RDKit, Java,
                                                               OASA, JPype, Python Imaging Library

                                                               License: BSD

                                                               Any restrictions to use by non-academics: None

                                                               Competing interests
                                                               The authors declare that they have no competing interests.

                                                               Authors' contributions
                                                               NMOB conceived and developed Cinfony. GRH is the
                                                               lead developer of OpenBabel and created the Python and
                                                               Java SWIG bindings. All authors read and approved the
                                                               final manuscript.

                                                               Additional material

                                                                    Additional file 1
                                                                    Miniwebsite API. A mini-website of the Cinfony API documentation.
                                                                    Click here for file
                                                                    [http://www.biomedcentral.com/content/supplementary/1752-
Figure
different2toolkits
Comparison of depictions of PubChem CID7250053 using                153X-2-24-S1.zip]
Comparison of depictions of PubChem CID7250053
using different toolkits. The depiction using the develop-          Additional file 2
ment version of RDKit showed incorrect stereochemistry              Timing Code. A zip file containing Python, Java and C++ code used for
for the isopropyl substituent of the thiazole ring.                 run time comparisons for two test cases.
                                                                    Click here for file
                                                                    [http://www.biomedcentral.com/content/supplementary/1752-
                                                                    153X-2-24-S2.zip]
and depictions generated by Cinfony is available here (see
Additional file 3).                                                 Additional file 3
                                                                    Miniwebsite Depictions. A mini-website showing a comparison of the
Conclusion                                                          depictions generated by several cheminformatics toolkits.
Cinfony makes it easy to combine complementary fea-                 Click here for file
                                                                    [http://www.biomedcentral.com/content/supplementary/1752-
tures of the three main Open Source cheminformatics
                                                                    153X-2-24-S3.zip]
toolkits. By presenting a standard simplified API, the
learning curve associated with starting to use a new toolkit
is greatly reduced, thus encouraging users of one toolkit to
investigate the potential of others.
                                                               Acknowledgements
                                                               Cinfony would not be possible without the work of many Open Source
Cinfony is freely available from the Cinfony website [19],     projects. In particular, we thank several developers who responded quickly
both as Python source code and as a Windows distribu-          to bug reports or queries: Beda Kosata (OASA), Greg Landrum (RDKit),
tion containing dependencies. Installation instructions        Tim Vandermeersch (OpenBabel), Steve Ménard (JPype). Thanks also to
are provided for MacOSX, Linux and Windows.                    Gilbert Mueller and Chris Morley for feedback on installing Cinfony.
                                                               NMOB thanks Google Code for providing free web hosting and develop-
                                                               ment tools for Cinfony. We thank the anonymous reviewers for several
Availability and requirements
                                                               useful suggestions.
Project name: Cinfony
                                                               References
Project home page: http://cinfony.googlecode.com               1.      OpenBabel v.2.2.0 [http://openbabel.org]
                                                               2.      Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E:
Operating system(s): Platform independent                              Recent Developments of the Chemistry Development Kit
                                                                       (CDK) – An Open-Source Java Library for Chemo- and Bio-
                                                                       informatics. Curr Pharm Des 2006, 12:2110-2120.
Programming language: Python, Jython                           3.      Landrum G: RDKit. [http://www.rdkit.org].
                                                               4.      Murray-Rust P, Rzepa HS: Chemical Markup, XML, and the
                                                                       Worldwide Web. 1. Basic Principles. J Chem Inf Comput Sci 1999,
                                                                       39:928-942.



                                                                                                                            Page 9 of 10
  Chem. Cent. J. 2008, 2, 24.                                                                         (page number not for citation purposes)
Chemistry Central Journal 2008, 2:24                                                     http://journal.chemistrycentral.com/content/2/1/24



5.    Apodaca R, O'Boyle N, Dalke A, Van Drie J, Ertl P, Hutchison G,
      James CA, Landrum G, Morley C, Willighagen E, De Winter H:
      OpenSMILES. [http://www.opensmiles.org].
6.    Daylight Chemical Information Systems Manual               [http://
      www.daylight.com/dayhtml/doc/theory/theory.smiles.html]
7.    O'Boyle NM, Morley C, Hutchison GR: Pybel: a Python wrapper
      for the OpenBabel cheminformatics toolkit. Chem Cent J 2008,
      2:5.
8.    Kosata B: OASA. [http://bkchem.zirael.org/oasa_en.html].
9.    Raymond ES: The Art of UNIX Programming 2003 [http://www.catb.org/
      ~esr/writings/taoup/index.html]. Reading, MA: Addison-Wesley
10.   Symyx CTfile formats [http://www.mdli.com/downloads/public/
      ctfile/ctfile.jsp]
11.   KNIME – Konstanz Information Miner [http://knime.org]
12.   SWIG v.1.3.36 [http://www.swig.org]
13.   Ménard S: JPype. [http://jpype.sf.net].
14.   Boost.Python [http://www.boost.org/libs/python/doc/]
15.   R development core team: R: A language and environment for
      statistical computing. [http://www.R-project.org].
16.   Irwin JJ, Shoichet BK: ZINC – A Free Database of Commercially
      Available Compounds for Virtual Screening. J Chem Inf Model
      2005, 45:177-182.
17.   PubChem [http://pubchem.ncbi.nlm.nih.gov/]
18.   CACTVS Chemoinformatics Toolkit: Xemistry GmbH: Lah-
      ntal, Germany. .
19.   O'Boyle NM: Cinfony. [http://cinfony.googlecode.com].




                                                                            Publish with ChemistryCentral and every
                                                                            scientist can read your work free of charge
                                                                                      Open access provides opportunities to our
                                                                                  colleagues in other parts of the globe, by allowing
                                                                                      anyone to view the content free of charge.
                                                                                                         W. Jeffery Hurst, The Hershey Company.
                                                                              available free of charge to the entire scientific community
                                                                              peer reviewed and published immediately upon acceptance
                                                                              cited in PubMed and archived on PubMed Central
                                                                              yours you keep the copyright
                                                                            Submit your manuscript here:
                                                                            http://www.chemistrycentral.com/manuscript/




                                                                                                                                         Page 10 of 10
     Chem. Cent. J. 2008, 2, 24.                                                                                    (page number not for citation purposes)
O’Boyle et al. Journal of Cheminformatics 2011, 3:33
   http://www.jcheminf.com/content/3/1/33




    SOFTWARE                                                                                                                                     Open Access

   Open Babel: An open chemical toolbox
   Noel M O’Boyle1, Michael Banck2, Craig A James3, Chris Morley4, Tim Vandermeersch4 and Geoffrey R Hutchison5*


     Abstract
     Background: A frequent problem in computational modeling is the interconversion of chemical structures
     between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and
     de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing
     problem due to the multitude of different application areas for chemistry data, differences in the data stored by
     different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-
     neutral formats.
     Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many
     languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a
     wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics
     algorithms, from partial charge assignment and aromaticity detection, to bond order perception and
     canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and
     outline a variety of uses both in terms of software products and scientific research, including applications far
     beyond simple format interconversion.
     Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it
     provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and
     substructure and similarity searching. For developers, it can be used as a programming library to handle chemical
     data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely
     available under an open-source license from http://openbabel.org.


   Introduction                                                                        indication of biomolecular residues, or multiple
   The history of chemical informatics has included a huge                             conformations.
   variety of textual and computer representations of mole-                              While attempts have been made to provide a standard
   cular data. Such representations focus on specific atomic                           format for storing chemical data, including most notably
   or molecular information and may not attempt to store                               the development of Chemical Markup Language (CML)
   all possible chemical data. For example, line notations                             [2-6], an XML dialect, such formats have not yet
   like Daylight SMILES [1] do not offer coordinate infor-                             achieved widespread use. Consequently, a frequent pro-
   mation, while crystallographic or quantum mechanical                                blem in computational modeling is the interconversion
   formats frequently do not store chemical bonding data.                              of molecular structures between different formats, a pro-
   Hydrogen atoms are frequently omitted from x-ray crys-                              cess that involves extraction and interpretation of their
   tallography due to the difficulty in establishing coordi-                           chemical data and semantics.
   nates, and are often ignored by some file formats as the                              We outline for the first time, the development and use
   “implicit valence” of heavy atoms that indicates their                              of the Open Babel project, a full-featured open chemical
   presence. Other types of representations require specifi-                           toolbox, designed to “speak” the many different repre-
   cation of atom types on the basis of a specific valence                             sentations of chemical data. It allows anyone to search,
   bond model, inclusion of computed partial charges,                                  convert, analyze, or store data from molecular modeling,
                                                                                       chemistry, solid-state materials, biochemistry, or related
                                                                                       areas. It provides both ready-to-use programs as well as
   * Correspondence: geoffh@pitt.edu
   5
    University of Pittsburgh, Department of Chemistry, 219 Parkman Avenue,             a complete, extensible programmer’s toolkit for develop-
   Pittsburgh, PA 15217, USA                                                           ing cheminformatics software. It can handle reading,
   Full list of author information is available at the end of the article

                                         © 2011 O’Boyle et al; licensee Chemistry Central Ltd. This is an Open Access article distributed under the terms of the Creative
                                         Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
                                         reproduction in any medium, provided the original work is properly cited.




J. Cheminf. 2011, 3, 33.
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers
My Open Access papers

Weitere ähnliche Inhalte

Andere mochten auch

Going Green Without Greenbacks
Going Green Without GreenbacksGoing Green Without Greenbacks
Going Green Without GreenbacksEleanor Olsen
 
Company culture
Company cultureCompany culture
Company cultureMetavallon
 
Social Media_Hammer
Social Media_HammerSocial Media_Hammer
Social Media_HammerKay Hammer
 
Going Green Without the Greenbacks, 2011
Going Green Without the Greenbacks, 2011Going Green Without the Greenbacks, 2011
Going Green Without the Greenbacks, 2011Eleanor Olsen
 
Using Social Media to Amplify Academic Events
Using Social Media to Amplify Academic EventsUsing Social Media to Amplify Academic Events
Using Social Media to Amplify Academic EventsLorna Campbell
 
Eleanor olsen, interior designer
Eleanor olsen, interior designerEleanor olsen, interior designer
Eleanor olsen, interior designerEleanor Olsen
 
My graduation speech for post graduate
My graduation speech for post graduateMy graduation speech for post graduate
My graduation speech for post graduateFonsoi
 
Investing in East Africa
Investing in East AfricaInvesting in East Africa
Investing in East AfricaMichael Lesniak
 
Increasing Business Productivity in Connected Enterprises and an Always-On Di...
Increasing Business Productivity in Connected Enterprises and an Always-On Di...Increasing Business Productivity in Connected Enterprises and an Always-On Di...
Increasing Business Productivity in Connected Enterprises and an Always-On Di...Cognizant
 
Automotive SEO: How to Win the Race and Blow Past Your Competitors
Automotive SEO: How to Win the Race and Blow Past Your CompetitorsAutomotive SEO: How to Win the Race and Blow Past Your Competitors
Automotive SEO: How to Win the Race and Blow Past Your CompetitorsGreg Gifford
 
Persamaan-lingkaran
Persamaan-lingkaranPersamaan-lingkaran
Persamaan-lingkaranDPrayogo
 

Andere mochten auch (15)

Resume
ResumeResume
Resume
 
Going Green Without Greenbacks
Going Green Without GreenbacksGoing Green Without Greenbacks
Going Green Without Greenbacks
 
CV Hack
CV HackCV Hack
CV Hack
 
Company culture
Company cultureCompany culture
Company culture
 
2014 IS 101 lec5
2014 IS 101 lec52014 IS 101 lec5
2014 IS 101 lec5
 
Recuperação paralela
Recuperação paralelaRecuperação paralela
Recuperação paralela
 
Social Media_Hammer
Social Media_HammerSocial Media_Hammer
Social Media_Hammer
 
Going Green Without the Greenbacks, 2011
Going Green Without the Greenbacks, 2011Going Green Without the Greenbacks, 2011
Going Green Without the Greenbacks, 2011
 
Using Social Media to Amplify Academic Events
Using Social Media to Amplify Academic EventsUsing Social Media to Amplify Academic Events
Using Social Media to Amplify Academic Events
 
Eleanor olsen, interior designer
Eleanor olsen, interior designerEleanor olsen, interior designer
Eleanor olsen, interior designer
 
My graduation speech for post graduate
My graduation speech for post graduateMy graduation speech for post graduate
My graduation speech for post graduate
 
Investing in East Africa
Investing in East AfricaInvesting in East Africa
Investing in East Africa
 
Increasing Business Productivity in Connected Enterprises and an Always-On Di...
Increasing Business Productivity in Connected Enterprises and an Always-On Di...Increasing Business Productivity in Connected Enterprises and an Always-On Di...
Increasing Business Productivity in Connected Enterprises and an Always-On Di...
 
Automotive SEO: How to Win the Race and Blow Past Your Competitors
Automotive SEO: How to Win the Race and Blow Past Your CompetitorsAutomotive SEO: How to Win the Race and Blow Past Your Competitors
Automotive SEO: How to Win the Race and Blow Past Your Competitors
 
Persamaan-lingkaran
Persamaan-lingkaranPersamaan-lingkaran
Persamaan-lingkaran
 

Ähnlich wie My Open Access papers

BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)Mark Jensen
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)Mark Jensen
 
Python For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdfPython For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdfshaikriyaz89
 
CHX PYTHON INTRO
CHX PYTHON INTROCHX PYTHON INTRO
CHX PYTHON INTROKai Liu
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscoverygwprice
 
Matlab Projects USA
Matlab Projects USAMatlab Projects USA
Matlab Projects USAPhdtopiccom
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Sciencepetermurrayrust
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Foundation
 
Development_C_Extension_with_Pybind11.pdf
Development_C_Extension_with_Pybind11.pdfDevelopment_C_Extension_with_Pybind11.pdf
Development_C_Extension_with_Pybind11.pdfTakayuki Suzuki
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscRoyal Society of Chemistry
 
2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTSStian Soiland-Reyes
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonWaternomics
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with pythonUmair ul Hassan
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...baoilleach
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 
What’s New In Python 3.11 & Python 3.11.3 ?
What’s New In Python 3.11 & Python 3.11.3 ?What’s New In Python 3.11 & Python 3.11.3 ?
What’s New In Python 3.11 & Python 3.11.3 ?Inexture Solutions
 

Ähnlich wie My Open Access papers (20)

BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
Python For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdfPython For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdf
 
CHX PYTHON INTRO
CHX PYTHON INTROCHX PYTHON INTRO
CHX PYTHON INTRO
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscovery
 
Matlab Projects USA
Matlab Projects USAMatlab Projects USA
Matlab Projects USA
 
Ccp4 mmdb-python
Ccp4 mmdb-pythonCcp4 mmdb-python
Ccp4 mmdb-python
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Science
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11
 
Development_C_Extension_with_Pybind11.pdf
Development_C_Extension_with_Pybind11.pdfDevelopment_C_Extension_with_Pybind11.pdf
Development_C_Extension_with_Pybind11.pdf
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rsc
 
2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS
 
What is python
What is pythonWhat is python
What is python
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-python
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Python Programming
Python ProgrammingPython Programming
Python Programming
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
What’s New In Python 3.11 & Python 3.11.3 ?
What’s New In Python 3.11 & Python 3.11.3 ?What’s New In Python 3.11 & Python 3.11.3 ?
What’s New In Python 3.11 & Python 3.11.3 ?
 

Mehr von baoilleach

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESbaoilleach
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overviewbaoilleach
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?baoilleach
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Webbaoilleach
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringbaoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2baoilleach
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformaticsbaoilleach
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculationbaoilleach
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSARbaoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsbaoilleach
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...baoilleach
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tunebaoilleach
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...baoilleach
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopybaoilleach
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devicesbaoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...baoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...baoilleach
 
Improving enrichment rates
Improving enrichment ratesImproving enrichment rates
Improving enrichment ratesbaoilleach
 

Mehr von baoilleach (20)

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILES
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Improving enrichment rates
Improving enrichment ratesImproving enrichment rates
Improving enrichment rates
 

Kürzlich hochgeladen

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Kürzlich hochgeladen (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

My Open Access papers

  • 1. Open Access Publications of Noel O’Boyle November 2, 2011
  • 2.
  • 3. Contents I Cheminformatics toolkits 5 1 Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit 7 2 Cinfony - combining Open Source cheminformatics toolkits behind a common interface 15 3 Open Babel: An open chemical toolbox 25 II Enzyme reaction mechanisms 39 4 MACiE: a database of enzyme reaction mechanisms 41 5 MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for search- ing catalytic mechanisms 43 III QSAR 49 6 PYCHEM: a multivariate analysis package for python 51 7 Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction 53 IV The Rest 69 8 Userscripts for the life sciences 71 9 Confab - Systematic generation of diverse low-energy conformers 83 10 Review of “Data Analysis with Open Source Tools” 93 11 Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on 95 3
  • 4.
  • 6.
  • 7. Chemistry Central Journal Software Open Access Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit Noel M O'Boyle*1,2, Chris Morley3 and Geoffrey R Hutchison4 Address: 1Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK, 2Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK, 3OpenBabel Development Team and 4Department of Chemistry, University of Pittsburgh, Chevron Science Center, 219 Parkman Avenue, Pittsburgh, PA 15260, USA Email: Noel M O'Boyle* - baoilleach@gmail.com; Chris Morley - c.morley@gaseq.co.uk; Geoffrey R Hutchison - geoffh@pitt.edu * Corresponding author Published: 9 March 2008 Received: 23 January 2008 Accepted: 9 March 2008 Chemistry Central Journal 2008, 2:5 doi:10.1186/1752-153X-2-5 This article is available from: http://journal.chemistrycentral.com/content/2/1/5 © 2008 O'Boyle et al This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit. Results: Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel. Conclusion: Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers. Background OpenBabel is a C++ toolkit with extensive capabilities for Cheminformaticians often need to write once-off scripts reading and writing molecular file formats (over 80 are to create extract data from text files, prepare data for anal- supported) as well as for manipulating molecular data [2]. ysis or carry out simple statistics. Scripting languages such Many standard chemistry algorithms are included, for as Perl, Python and Ruby are ideally suited to these day- example, determination of the smallest set of smallest to-day tasks [1]. Such languages are, however, an order of rings, bond order perception, addition of hydrogens, and magnitude or more slower than compiled languages such assignment of Gasteiger charges. In relation to cheminfor- as C++. Since cheminformaticians regularly deal with matics, OpenBabel supports SMARTS searching [3], molecular files containing thousands of molecules and molecular fingerprints [4] (both Daylight-type, and struc- many cheminformatics algorithms are computationally tural-key based), and includes group contribution expensive, cheminformatics toolkits are typically written descriptors for LogP [5], polar surface area (PSA) [6] and in compiled languages for performance. molar refractivity (MR) [5]. Page 1 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 8. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5 Of the current popular scripting languages, Python [7] is header files, SWIG generates a C file which, when com- the de-facto standard language for scripting in cheminfor- piled and linked with the Python development libraries matics. Several commercial cheminformatics toolkits have and OpenBabel, creates a Python extension module, interfaces in Python: OpenEye's closed-source successor openbabel. This can then be imported into a Python script to OpenBabel, OEChem [8], is a C++ toolkit with inter- like any other Python module using the "import openbabel" faces in Python and Java; Rational Discovery's RDKit [9], statement. which is now open source, is a C++ cheminformatics toolkit with a Python interface; the Daylight toolkit [10] For a small number of C++ objects and functions, it was from Daylight Chemical Information Systems, written in necessary to add some convenience functions to facilitate C, only has Java and C++ wrappers but PyDaylight [11], access from Python. Certain types of molecule files have available separately from Dalke Scientific, provides a additional data present in addition to the connection Python interface to the toolkit; the Cambios Molecular table. OpenBabel stores these data in subclasses of OBGe- Toolkit [12] from Cambios Consulting is a commercial nericData such as OBPairData (for the data fields in mol- C++ toolkit with a Python interface. There are also toolkits ecule files such as MOL files and SDF files) and entirely implemented in Python: Frowns [13], an open OBUnitCell (for the data fields in CIF files). To access the source cheminformatics toolkit by Brian Kelley, and PyBa- data it is necessary to 'downcast' an instance of OBGener- bel [14], an open source toolkit included in the MGLTools icData to the specific subclass. For this reason, two con- package from the Molecular Graphics Labs at the Scripps venience functions were added to the interface file, one to Research Institute. Note that the latter is not related to the cast OBGenericData to OBPairData, and one to cast to OpenBabel project; rather its name derives from the fact OBUnitCell. Another convenience function was added to that its aim was to implement in Python some of the func- convert a Python list to a C array of doubles, as this type tionality of Babel v1.6 [15], a command-line application of input is required for a small number of OpenBabel for converting file formats which is a predecessor of functions. OpenBabel. Iterators are an important feature of the OpenBabel C++ Here we describe the implementation and application of library. For example, OBAtomAtomIter allows the user to Pybel, a Python module that provides access to the easily iterate over the atoms attached to a particular atom, OpenBabel C++ library from the Python programming and OBResidueIter is an iterator over the residues in a language. Pybel builds on the basic Python bindings to molecule. The OpenBabel iterators use the dereference make it easier to carry out frequent tasks in cheminformat- operator to access the data, the increment operator to iter- ics. It also aims to be as 'Pythonic' as possible; that is, to ate to the next element, and the boolean operator to test adhere to Python language conventions and idioms, and whether any elements remain. Iterators are also a core fea- where possible to make use of Python language features ture of the Python language. However, the iterators used such as iterators. The result is a module that takes advan- by OpenBabel are not automatically converted into tage of Python's expressive syntax to allow cheminforma- Python iterators. To deal with this, Python iterator classes ticians to carry out tasks such as SMARTS matching, data that wrap the dereference, increment and boolean opera- field manipulation and calculation of molecular finger- tors behind the scenes were added to the SWIG interface prints in just a few lines of code. file, so that Python statements such as "for attached_obatom in OBAtomAtomIter(obatom)" work with- Implementation out problem. SWIG bindings Python bindings to the OpenBabel toolkit were created Pybel module using SWIG [16]. SWIG (Simplified Wrapper and Inter- The SWIG bindings provide direct access from Python to face Generator) is a tool that automates the generation of the C++ objects and functions in the OpenBabel API bindings to libraries written in C or C++. One of the (application programming interface). The purpose of the advantages of SWIG compared to other automated wrap- Pybel module is to wrap these bindings to present a more ping methods such as Boost.Python [17] or SIP [18] is that Pythonic interface to OpenBabel (Figure 1). This extra SWIG also supports the generation of bindings to several level of abstraction is useful as Python programmers other languages. For example, OpenBabel also uses SWIG expect Python libraries to behave in certain ways that a to generate bindings for Perl, Ruby and Java. An addi- C++ library does not. For example, in Python, attributes of tional advantage is that SWIG will directly parse C or C++ an object are often directly accessed whereas in C++ it is header files while Boost.Python and SIP require each C++ typical to call Get/Set functions to access them. A C++ class to be exposed manually. The input to SWIG is an function returning a particular object might require a interface file containing a list of OpenBabel header files pointer to an empty object as a parameter, whereas the for which to generate bindings. Using the signatures in the Python equivalent would not. Even something as simple Page 2 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 9. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5 code shows how to store each molecule in a multimole- cule SDF file in a list called allmols: import openbabel allmols = [] obconversion = openbabel.OBConversion() obconversion.SetInFormat("sdf") obmol = openbabel.OBMol() notatend = obconversion.ReadFile(obmol, "inputfile.sdf") while notatend: allmols.append(obmol) obmol = openbabel.OBMol() notatend = obconversion.Read(obmol) To replace this somewhat verbose code, Pybel provides a readfile method that takes a file format and filename and returns molecules using the 'yield' keyword. This changes the method into a 'generator', a Python language feature where a method behaves like an iterator. Iterators are a major feature of the Python language which are used for looping over collections of objects. In Pybel, we have used iterators where possible to simplify access to the toolkit. As a result, the equivalent to the preceding code is: Figure text and1the OpenBabel C++ library The relationship between Python modules described in the The relationship between Python modules described import pybel in the text and the OpenBabel C++ library. Python modules are shown in green; the C++ library is shown in allmols = [mol for mol in pybel.read blue. file("sdf", "inputfile.sdf")] The benefits of iterator syntax are clear when dealing with as differences in the conventions for the case of letters multimolecule files. For single molecule files, however, used in variable and method names is a problem, as it the user needs to remember to explicitly request the itera- makes it more likely for Python programmers to intro- tor to return the first and only molecule using the next duce bugs in their code. method: One of the key aims of Pybel was to reduce the amount of mol = pybel.readfile("mol", "input code necessary to carry out common tasks. This is espe- file.mol").next() cially important for a scripting language where program- ming is often done interactively at a command prompt. In Pybel provides replacements for two of the main classes in addition, as for any programming language, repeated the OpenBabel library, OBMol and OBAtom. The follow- entry of code for routine and common tasks (so-called ing discussion describes the Pybel Molecule class which 'boilerplate code') is a common cause of errors in code. wraps an instance of OBMol, but the same design princi- Reading and writing molecule files is one of the most ples apply to the Pybel Atom class. Table 1 summarises common tasks for users of OpenBabel but requires several the attributes and methods of the Molecule object. By lines of code if using the SWIG bindings. The following wrapping the base class, Pybel can enhance the Molecule Page 3 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 10. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5 Table 1: Attributes and methods supported by the Pybel Molecule object Attribute Description* OBMol The underlying OBMol object atoms A list of Pybel Atoms charge The total charge (GetTotalCharge) data A MoleculeData object for access to data fields dim The dimensionality of the coordinates (GetDimension) energy The heat of formation (GetEnergy) exactmass The mass calculated using isotopic abundance (GetExactMass) flags The set of flags used internally by OpenBabel (GetFlags) formula The stoichiometric formula (GetFormula) mod The number of nested BeginModify() calls (Internal use) (GetMod) molwt The standard molar mass (GetMolWt) spin The total spin multiplicity (GetTotalSpinMultiplicity) sssr The smallest set of smallest rings (GetSSSR) title The title of the molecule (often the filename) (GetTitle) unitcell Unit cell data (if present) Method write Write the molecule to a file or return it as a string calcfp Return a molecular fingerprint as a Fingerprint object calcdesc Return the values of the group contribution descriptors __iter__ Enable iteration over the Atoms in the Molecule *Where a Molecule attribute is a direct replacement for a 'Get' method of the underlying OBMol, the name of the method is given in parentheses. object by providing (1) direct access to attributes rather # Using Pybel than through the use of Get methods, (2) additional attributes of the object, and (3) additional methods that value = pybel.Molecule(mol).data ["com act on the object. ment"] (1) As mentioned earlier, it is typical in Python to access It should be noted that all of these attributes are calculated attribute values directly rather than using Get/Set meth- on-the-fly rather than stored for future access as the under- ods. With this in mind, the Molecule class adds attributes lying OBMol may have been modified. such as energy, formula and molwt (among others) which give the values returned by calling GetEnergy(), GetFor- (3) Four additional methods have been added to the mula() and GetMolWt(), respectively on the underlying Pybel Molecule (Table 1). The first is a write method OBMol (see Table 1 for the full list). which writes a representation of the Molecule to a file and takes care of error handling. As with reading molecules (2) One of the aims of Pybel is to simplify access to some from files (see above), this method simplifies the proce- of the most common attributes. With this in mind, an dure significantly compared to using the SWIG bindings atoms attribute has been added which returns a list of the directly. In addition, a calcfp method and a calcdesc atoms of the molecule as Pybel Atoms. Access to the data method have been added which calculate a binary finger- fields associated with a molecule has been simplified by print for the molecule, and some descriptor values, respec- creation of a MoleculeData object which is returned when tively. In the OpenBabel library these are not methods of the data attribute of a Molecule is accessed. MoleculeData the OBMol, but rather are loaded as plugins (by OBFin- presents a dictionary interface to the data fields of the gerprint.FindFingerprint and OBDescriptor.FindType, molecule. Accessing and updating these field is more con- respectively) to which an OBMol is passed as input. The voluted if using the SWIG bindings. Compare the follow- __iter__ method is a special Python method that enables ing statements for accessing the "comment" field of the iteration over an object; in the case of a Molecule, the variable mol, an OBMol: defined iterator loops over the Atoms of the Molecule. This feature enables constructions such as "for atom in # Using the SWIG bindings mol" where mol is a Pybel Molecule. value = openbabel.toPairData(mol.GetData SMARTS is a query language developed by Daylight ["comment"]).GetValue() Chemical Information Systems for molecular substructure Page 4 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 11. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5 searching [3]. As implemented in the OpenBabel toolkit, The OBMol wrapped by a Pybel Molecule can be accessed finding matches of a particular substructure in a particular through the OBMol attribute. This makes it easy to call a molecule is a four step process that involves creating an method not wrapped by Pybel, such as OBMol.NumRotors, instance of OBSmartsPattern, initialising it with a which returns the number of rotatable bonds in a mole- SMARTS pattern, searching for a match, and finally cule: retrieving the result: mol = pybel.readfile("mol", "input obsmarts = openbabel.OBSmartsPattern() file.mol").next() obsmarts.Init("[#6] [#6]") numrotors = mol.OBMol.NumRotors() obsmarts.Match(obmol) Documentation and Testing To minimise programming errors, programs written results = obsmarts.GetUMapList() dynamically-typed languages such as Python should be tested comprehensively. Pybel has 100% code coverage in Since a SMARTS query can be thought of as a regular terms of unit tests, as measured by Ned Batchelder's cov- expression for molecules, in Pybel we decided to wrap the erage.py [19]. It also has several doctests, short snippets of SMARTS functionality in an analogous way to Python's Python code included in documentation strings which regular expression module, re. With these changes, the serve as both examples of usage and as unit tests. same process takes only two steps, an initialisation step and a search step: The Pybel API is fully documented with docstrings. These can be accessed in the usual way with the help() com- smarts = pybel.Smarts("[#6] [#6]") mand at the interactive Python prompt after importing Pybel: for example, "help(pybel.Molecule)". In addition, the results = smarts.findall(pybelmol) OpenBabel Python web page [20] contains a complete description of how to use the SWIG bindings and the Pybel was not written to replace the SWIG bindings but Pybel API. The webpage also contains links to HTML ver- rather to make it simpler to perform common tasks. As a sions of the OpenBabel API documentation and Pybel API result, Pybel does not attempt to wrap every single documentation. The latter is included in Additional File 1. method and class in the OpenBabel library. Because of this, a user may often want to interconvert between an Results and Discussion OBMol and a Molecule, or an OBAtom and an Atom. This The principle aim of Pybel is to make it simpler to use the is quite a straightforward process. A Pybel Molecule can be OpenBabel toolkit to carry out common tasks in chem- created by passing an OBMol to the Molecule constructor. informatics. These common tasks include reading and In the following example an OBMol is created using the writing molecule files, accessing data fields of a molecule, SWIG bindings and then written to a file using Pybel: computing and comparing molecular fingerprints and SMARTS matching. Here we present some examples that obmol = openbabel.OBMol() illustrate how Pybel may be used to carry out common cheminformatics tasks. a = obmol.NewAtom() Removal of duplicate molecules a.SetAtomicNum(6) When merging different datasets or as a final step in pre- processing, it may be necessary to identify and remove a.SetVector(0.0, 1.0, 2.0) # Set coordi duplicate molecules. In the following example, only the nates unique molecules in the multimolecule SDF file "input- file.sdf" will be written to "uniquemols.sdf". Here we will b = obmol.NewAtom() assume that a unique InChI string (IUPAC International Chemical Identifier) indicates a unique molecule. A simi- obmol.AddBond(1, 2, 1) # Single bond from lar procedure could be performed using the OpenBabel Atom 1 to Atom 2 canonical SMILES format, by replacing "inchi" with "can" in the following: pybel.Molecule(obmol).write("mol", "out putfile.mol") import pybel inchis = [] Page 5 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 12. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5 output = pybel.Outputfile("sdf", ties. This is the Lipinski Rule of Fives, so-called as the "uniquemols.sdf") numbers involved are all multiples of five. The following example shows how to filter a database to identify only for mol in pybel.readfile("sdf", "input those molecules that pass all four of the Lipinski criteria. file.sdf"): The values of the Lipinski descriptors are also added to the output file as data fields. Note that whereas molecular inchi = mol.write("inchi") weight is directly available as an attribute of a Molecule, and LogP is available as one of the three group contribu- if inchi not in inchis: tion descriptors calculated by OpenBabel, we need to use SMARTS pattern matching to identify the number of output.write(mol) hydrogen bond donors and acceptors. The SMARTS pat- terns used here correspond to the definitions of hydrogen inchis.append(inchi) bond donor and acceptor used by Lipinski: output.close() import pybel Selection of similar molecules HBD = pybel.Smarts("[#7,#8;!H0]") Another common task in cheminformatics is the selection of a set of molecules of similar structure to a target mole- HBA = pybel.Smarts("[#7,#8]") cule. Here we will assume that structural similarity is indi- cated by a Tanimoto coefficient [21] of at least 0.7 with def lipinski(mol): respect to Daylight-type (that is, based on hashed paths through the molecular graph) fingerprints. Note that """Return the values of the Lipinski Pybel redefines the | operator (bitwise OR) for Fingerprint descriptors.""" objects as the Tanimoto coefficient: desc = {'molwt': mol.molwt, import pybel 'HBD': len(HBD.findall(mol)), targetmol = pybel.readfile("sdf", "target mol.sdf").next() 'HBA': len(HBA.findall(mol)), targetfp = targetmol.calcfp() 'LogP': mol.calcdesc(['LogP']) ['LogP']} output = pybel.Outputfile("sdf", "similar mols.sdf") return desc for mol in pybel.readfile("sdf", "input passes_all_rules = lambda desc: (desc file.sdf"): ['molwt'] <= 500 and fp = mol.calcfp() desc ['HBD'] <= 5 and desc ['HBA'] <= 10 and if fp | targetfp >= 0.7: desc ['LogP'] <= 5) output.write(mol) if __name__=="__main__": output.close() output = pybel.Outputfile("sdf", "pas Applying a Rule of Fives filter sLipinski.sdf") In an influential paper, Lipinski et al. [22] performed an analysis of drug compounds that reached Phase II clinical for mol in pybel.readfile("sdf", trials and found that they tended to occupy a certain range "inputfile.sdf"): of values for molecular weight, LogP, and number of hydrogen bond donors and acceptors. Based on this, they descriptors = lipinski(mol) proposed a rule with four criteria to identify molecules that might have poor absorption or permeation proper- if passes_all_rules(descriptors): Page 6 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 13. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5 mol.data.update(descriptors) Additional material output.write(mol) Additional file 1 Pybel API. The HTML documentation of the Pybel API (application pro- output.close() gramming interface). Click here for file Future work [http://www.biomedcentral.com/content/supplementary/1752- The future development of Pybel is closely linked to any 153X-2-5-S1.zip] changes and improvements to OpenBabel. With each new release of the OpenBabel API, the SWIG bindings will be updated to include any additional functionality. How- ever, additions to the Pybel API will only occur if they sim- Acknowledgements plify access to new features of the OpenBabel toolkit of The idea for the Pybel module was inspired by Andrew Dalke's work on PyDaylight [11]. We thank the anonymous reviewers for their helpful com- general use to cheminformaticians. In general, the Pybel ments. API can be considered stable, and an effort will be made to ensure that future changes will be backwards compati- References ble. 1. Ousterhout JK: Scripting: Higher Level Programming for the 21st Century. [http://home.pacbell.net/ouster/scripting.html]. Conclusion 2. OpenBabel v.2.1.1 [http://openbabel.sf.net] 3. SMARTS – A Language for Describing Molecular Patterns Pybel provides a high-level Python interface to the widely- [http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html] used OpenBabel C++ toolkit. This combination of a high 4. Flower DR: On the properties of bit string-based measures of chemical similarity. J Chem Inf Comput Sci 1998, 38:379-386. performance cheminformatics toolkit and an expressive 5. Wildman SA, Crippen GM: Prediction of physicochemical scripting language makes it easy for cheminformaticians parameters by atomic contributions. J Chem Inf Comput Sci to rapidly and efficiently write scripts to manipulate 1999, 39:868-873. 6. Ertl P, Rohde B, Selzer P: Fast calculation of molecular polar molecular data. surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. Pybel is freely available from the OpenBabel web site2 J Med Chem 2000, 43:3714-3717. 7. Python [http://www.python.org] both as part of the OpenBabel source distribution and for 8. OEChem: OpenEye Scientific Software: Santa Fe, NM. . Windows as an executable installer. Compiled versions 9. RDKit [http://www.rdkit.org] 10. Daylight Toolkit: Daylight Chemical Information Systems, are also available as packages in some Linux distributions Inc.: Aliso Viejo, CA. . (openbabel-python in Fedora, for example). 11. PyDaylight: Dalke Scientific Software, LLC: Santa Fe, NM. . 12. Cambios Molecular Toolkit: Cambios Computing, LLC: Palo Alto, CA. . Availability and Requirements 13. Frowns [http://frowns.sf.net] Project name: Pybel 14. PyBabel in MGLTools [http://mgltools.scripps.edu] 15. Babel v.1.6 [http://smog.com/chem/babel/] 16. SWIG v.1.3.31 [http://www.swig.org] Project home page: http://openbabel.sf.net/wiki/Python 17. Boost.Python [http://www.boost.org/libs/python/doc/] 18. SIP – A Tool for Generating Python Bindings for C and C++ Libraries [http://www.riverbankcomputing.co.uk/sip/] Operating system(s): Platform independent 19. coverage.py [http://nedbatchelder.com/code/modules/cover age.html] Programming language: Python 20. OpenBabel Python [http://openbabel.sourceforge.net/wiki/ Python] 21. Jaccard P: La distribution de la flore dans la zone alpine. Rev Other requirements: OpenBabel Gen Sci Pures Appl 1907, 18:961-967. 22. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and License: GNU GPL permeability in drug discovery and development settings. Adv Drug Del Rev 1997, 23:3-25. Any restrictions to use by non-academics: None Authors' contributions GRH is the lead developer of OpenBabel and created the SWIG bindings. NMOB developed Pybel, and extended the SWIG interface file. CM compiled the SWIG bindings on Windows and added convenience functions to the OpenBabel API to facilitate access from scripting lan- guages. All authors read and approved the final manu- script. Page 7 of 7 Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
  • 14.
  • 15. Chemistry Central Journal Software Open Access Cinfony – combining Open Source cheminformatics toolkits behind a common interface Noel M O'Boyle*1 and Geoffrey R Hutchison2 Address: 1Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK and 2Department of Chemistry, University of Pittsburgh, Chevron Science Center, 219 Parkman Avenue, Pittsburgh, PA 15260, USA Email: Noel M O'Boyle* - oboyle@ccdc.cam.ac.uk; Geoffrey R Hutchison - geoffh@pitt.edu * Corresponding author Published: 3 December 2008 Received: 9 October 2008 Accepted: 3 December 2008 Chemistry Central Journal 2008, 2:24 doi:10.1186/1752-153X-2-24 This article is available from: http://journal.chemistrycentral.com/content/2/1/24 © 2008 O'Boyle et al Abstract Background: Open Source cheminformatics toolkits such as OpenBabel, the CDK and the RDKit share the same core functionality but support different sets of file formats and forcefields, and calculate different fingerprints and descriptors. Despite their complementary features, using these toolkits in the same program is difficult as they are implemented in different languages (C++ versus Java), have different underlying chemical models and have different application programming interfaces (APIs). Results: We describe Cinfony, a Python module that presents a common interface to all three of these toolkits, allowing the user to easily combine methods and results from any of the toolkits. In general, the run time of the Cinfony modules is almost as fast as accessing the underlying toolkits directly from C++ or Java, but Cinfony makes it much easier to carry out common tasks in cheminformatics such as reading file formats and calculating descriptors. Conclusion: By providing a simplified interface and improving interoperability, Cinfony makes it easy to combine complementary features of OpenBabel, the CDK and the RDKit. Background In general, all of these toolkits share the same core func- Cheminformatics toolkits are essential to the day-to-day tionality although the implementation details and under- work of the practising cheminformatician. They enable lying chemical model may differ. However, as a result of the user to deal with such tasks as handling different their independent development and history, each has chemistry file formats, substructure searching, calculation functionality specific to itself and each toolkit supports of molecular fingerprints, and structure diagram genera- different sets of file formats and forcefields, and can calcu- tion. The main Open Source cheminformatics libraries late different molecular fingerprints and molecular under active development are OpenBabel [1], the Chem- descriptors (Table 1). Despite the diversity of these istry Development Kit (CDK) [2], and the RDKit [3]. toolkits and the potential benefits in being able to access OpenBabel is a C++ toolkit with bindings in Perl, Python, all of them at the same time, there has been little work on Ruby and Java, the CDK is a Java toolkit, while the RDKit interoperability between them. This has resulted in a bal- is another C++ toolkit with Python bindings. While the kanization of this field such that users of one toolkit rarely CDK has its origins in academia, both OpenBabel and the use another toolkit. RDKit originated in companies (OpenEye and Rational Discovery, respectively) and have subsequently been One way to achieve interoperability of chemical toolkits is developed by the community under Open Source licenses. through the use of standard file formats for exchange of Page 1 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 16. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 Table 1: Some features of toolkits which are not shared by all three toolkits. CDK A large number of descriptors (some overlap with RDKit) Pharmacophore searching (like RDKit*) Calculation of maximum common substructure 2D structure layout (like RDKit) and depiction MACCS keys (also RDKit) and E-State fingerprints Integration with the R statistical programming environment Support for mass-spectrometry analysis (representations for cleavage reactions, structure generation from formulae) Fragmentation schemes (ring fragments, Murcko) 3D structure generation using a template and heuristics (like OpenBabel) 3D similarity using ultrafast shape descriptors Gasteiger π charge calculation OpenBabel Not just focused on cheminformatics Supports a very large number of chemical file formats including quantum mechanics file formats, molecular mechanics trajectories, 2D sketchers 3D structure generation using a template method (like CDK) Included in all major Linux distributions Bindings available from several scripting languages apart from Python, as well as the Java and .NET platforms Conformation generation and searching InChI (also CDK) and InChIKey generation Support for crystallographic space groups Several forcefield implementations: UFF (also RDKit), MMFF94, MMFF94s, Ghemical Ability to add custom data types to atoms, bonds, residues, molecules RDKit A large number of descriptors (some overlap with CDK) Fragmentation using RECAP rules 2D coordinate generation (like CDK) and depiction 3D coordinate generation using geometry embedding Calculation of Cahn-Ingold-Prelog stereochemistry codes (R/S) Pharmacophore searching (like CDK) Calculation of shape similarity (based on volume overlap) Chemical reaction handling and transforms Atom pairs and topological torsions fingerprints Feature maps and feature-map vectors Machine-learning algorithms * Where the term "like" is used, it indicates that the implementation details differ. data. For example, the CML project has defined a stand- models between different toolkits, and differences in the ardised XML format for chemical data [4], with successive API for core cheminformatics tasks shared by the toolkits. releases refining and extending the original standard. The OpenSMILES effort [5] has attempted to resolve ambigui- Here we describe Cinfony, a Python module that over- ties in the published SMILES definition [6] to create a comes these barriers to provide interoperability at the API standard. While these efforts deserve support, they face level. Cinfony allows access to OpenBabel, the CDK, and inevitable problems achieving consensus and they require the RDKit through a common interface, and uses a simple changes to existing software to support the standard. The yet robust method to pass chemical models between large number of chemical file formats supported by toolkits. Pybel, one of the components of Cinfony, has OpenBabel (currently over 80) illustrates both the poten- been described previously [7]. It provides access to tial of achieving a standard as well as the difficulties. OpenBabel from standard Python. In this work, we show that the API developed for Pybel may be considered a An alternative is interoperability at the API (application generic API for accessing any cheminformatics toolkit. We programming interface) level. This has the advantage that describe the design and implementation of the Cinfony it does require any changes to existing software. However, API for OpenBabel, the RDKit and the CDK. Next, we there are at least three barriers to overcome: the need for a show how Cinfony simplifies the process of accessing the programming language that can access all the toolkits toolkits and how it can be used in practice to combine the simultaneously, the difficulty of exchanging chemical power of the three Open Source toolkits. Finally, we dis- Page 2 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 17. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 cuss performance and some results from comparisons of Although the OBMol of OpenBabel has a corresponding the toolkits. method, OBMol.AddHydrogens(), the RDKit uses a glo- bal method, AddHs(Mol), while the CDK requires the Implementation user to instantiate a HydrogenAdder object, which can Common Application Programming Interface then be used to add hydrogens. Cinfony presents the same interface to three cheminfor- matics toolkits, OpenBabel, the CDK and the RDKit. The Molecule methods described in the original Pybel API These are available through three separate modules: oba- [7] have been extended to handle hydrogen addition and bel, cdk and rdkit. The API is designed to make it easy to removal, structure diagram generation, assignment of 3D carry out many of the common tasks in cheminformatics, geometry to 0D structures and geometry optimisation and covers the core functionality shared by all of the using forcefields. Both the CDK and the RDKit are capable toolkits. Table 2 gives an overview of the API. The com- of 2D coordinate generation and 2D depiction. However, plete API is available here (see Additional file 1). since OpenBabel currently has neither of these capabili- ties, a fourth toolkit, OASA, is used by Pybel for this pur- The main class containing chemical information is the pose. OASA is a lightweight cheminformatics toolkit Molecule class. Rather than create a new chemical model, implemented in Python [8]. the Molecule class is a light wrapper around the molecule object in the underlying library, for example, around A new development in the latest version of OpenBabel is OBMol in the case of OpenBabel. Attribute values such as 3D coordinate generation and geometry optimisation the molecular weight are calculated dynamically by query- using one of a number of forcefields. Since these methods ing the underlying molecule. This ensures that if the are also available in the RDKit, and are under develop- underlying OBMol, for example, is altered, the attribute ment in the CDK, two additional methods have been values returned will still be correct. The actual underlying added to the Cinfony Molecule: make3D(), for 3D coor- object (an OpenBabel OBMol, a CDK Molecule, or an dinate generation, and localopt(), for geometry optimisa- RDKit Mol) can be accessed directly at any point. tion. Particularly in the case of OpenBabel, these new methods simplify the process of generating 3D coordi- The Molecule class also contains several methods that act nates. Compare a single call to make3D() in Cinfony with on molecules such as methods for calculating fingerprints, the following OpenBabel code: adding hydrogens, and calculating descriptor values. This makes it easy to access these methods, and also brings structuregenerator = openbabel.OBOp.Find them to the attention of the user. In the underlying toolkit Type('Gen3D') these methods may not be present as part of the molecule class, and in fact, they can be difficult to find in the structuregenerator.Do(mol) toolkit's API. For example, the Cinfony method Mole- cule.addh() adds explicit hydrogens to the molecule. mol.AddHydrogens() Table 2: An overview of the Cinfony API. Class name Purpose Molecule Wraps a molecule instance of the underlying toolkit and provides access to methods that act on molecules Atom Wraps an atom instance of the underlying toolkit MoleculeData Provides dictionary-like access to the information contained in the tag fields in SDF and MOL2 files Outputfile Handles multimolecule output file formats Smarts Wraps the SMARTS functionality of the toolkit in an analogous way to the Python 're' module for regular expression matching Fingerprint Simplifies Tanimoto calculation of binary fingerprints Function name readfile Return an iterator over Molecules in a file readstring Return a Molecule Variable name descs A list of descriptor IDs forcefields A list of forcefield IDs fps A list of fingerprint IDs informatsaa A list of input format IDs outformats A list of output format IDs Page 3 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 18. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 ff = openbabel.OBForceField.Find translation process is transparent to the user. However, Type("MMFF94") the user should be aware of known limitations of particu- lar readers or writers. For example, the SMILES parser in ff.Setup(mol) CDK 1.0.3 ignores atom-based stereochemistry and thus that information is lost if a 0D rdkit or obabel Molecule ff.SteepestDescent(50) with atom-based stereochemistry is converted to a cdk Molecule. ff.GetCoordinates(mol) Cinfony Molecules are interconverted using the Mole- The Cinfony API is identical for all of the toolkits. How- cule() constructor. For example, if obabelmol is an obabel ever, the values returned by particular API calls are not Molecule, then the corresponding rdkit Molecule can be necessarily standardised across toolkits. This Cinfony constructed using rdkit.Molecule(pybelmol). This mecha- design decision is in agreement with the Principle of Least nism can also be used to interface Cinfony to other chem- Surprise [9]; when the user accesses the underlying toolkit informatics toolkits. The only requirements are that the directly, they will get the same result as found when using object passed to the Molecule() constructor needs to have Cinfony. This design decision places the responsibility on a _cinfony attribute set to True, and an _exchange the user to become familiar with differences in how the attribute containing a tuple (0, SMILES string) or (1, MOL toolkits behave. For example, all of the toolkits allow the file) depending on whether the molecule is 0D or not. calculation of path-based fingerprints. These encode all paths in the molecular graph up to a path length of P into Implementation a binary vector of length V, but the default values for V The Python scripting language has two main implementa- and P are different for each toolkit: 1024 and 7 for tions. The most widely used implementation is the origi- OpenBabel, 1024 and 8 for the CDK, and 2048 and 7 for nal reference implementation of Python in C, referred to RDKit. Although it is possible to alter these parameters for as CPython when necessary to distinguish it from other the CDK and the RDKit and so standardise V and P to implementations. The next most widely used implemen- 1024 and 7 for all of the toolkits, it is reasonable to tation is Jython, an implementation of Python in Java. assume that the developers of each package have chosen Although most users of Python do so through CPython, sensible defaults. In addition, the implementation details Jython scripts have the advantage of being able to access of each of the fingerprinters would still be different; for Java libraries natively. They can also be compiled into Java example, the RDKit sets four bits when hashing each classes to be used from Java programs. Jython scripts are molecular path, the others set one; OpenBabel does not also useful in contexts where Java is required but it is more set any bits for the one-atom fragments, N, C and O. convenient to work in Python; for example, to implement a Java web servlet or a node in a Java workflow environ- Interoperability ment such as KNIME [11]. The ability to transfer chemical models between toolkits is essential to the goal of interoperability. However, the As discussed earlier, one of the barriers to interoperability internal representation of a molecule is specific to a par- is the requirement for a programming language that can ticular toolkit. For example, as well as the connection simultaneously access more than one of the toolkits. From table and coordinates (if present), it may include derived CPython it is possible to use Cinfony modules to connect data relating to aromaticity, the number of implicit hydro- to OpenBabel (pybel), the CDK (cdkjpype) and the RDKit gens on an atom, or stereochemical configuration. Fortu- (rdkit). From Jython, there are modules for OpenBabel nately, the problem of transfer and storage of chemical (jybel) and the CDK (cdkjython). Convenience modules information has already been solved by the development obabel and cdk are provided that automatically import the of molecular file formats, of which over 80 are now sup- appropriate OpenBabel or CDK module depending on ported by OpenBabel. Specifically, the MDL MOL file for- the Python implementation. The relationship between mat [10] and the SMILES format [5,6] are shared by all these Cinfony modules and the underlying cheminfor- three toolkits, and are used by Cinfony to exchange infor- matics libraries is summarised in Figure 1. mation on molecules with 2D or 3D coordinates (MOL file format), and no coordinates (SMILES format), respec- pybel and jybel tively. OpenBabel provides SWIG [12] bindings for both CPy- thon and Java (among other languages). pybel is a wrapper By using existing file formats rather than trying to inter- around the CPython bindings, and has previously been convert the internal models themselves, Cinfony takes described in detail [7]. jybel is an implementation of the advantage of the existing input/output code of each Cinfony API that allows the user to access OpenBabel toolkit which is well-tested and mature. In addition, the from Jython using the Java bindings. Despite the fact that Page 4 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 19. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 rdkit Support for Python scripting has been part of the design of the RDKit from the start. The Python bindings in RDKit were created using Boost.Python [14], a framework for interfacing Python and C++. The Cinfony module rdkit uses these bindings to implement its API. It is currently not possible to access RDKit from Jython. RDKit has only preliminary support for Java bindings; when these are complete, a corresponding module will be added to Cin- fony. Dependency handling A fully-featured installation of Cinfony relies on a large Figure 1 Relationship of Cinfony modules to Open Source toolkits number of open source libraries. In particular, the 2D Relationship of Cinfony modules to Open Source depiction capabilities introduce dependencies on several toolkits. Python modules are accessible from CPython graphics libraries which may be problematic to install on (green), Jython (pale blue), or both (striped green and pale a particular platform (Cairo and its Python bindings, blue). Java libraries are indicated by dark blue, while C++ Python Imaging Library, AGG and the Python wrapper libraries are yellow. AggDraw). With this in mind, Cinfony treats all depend- encies as optional and only raises an Exception if the user calls a method or imports a module that requires a miss- ing dependency. jybel is used from a Java implementation of Python, and For example, the Python Imaging Library (PIL) is required accesses a C++ library through the Java Native Interface for displaying a 2D depiction on the screen. If all of the (JNI), the jybel code differs from pybel in very few respects. components of cinfony are installed except for PIL, Cin- In Jython, it is not possible to iterate directly over the fony works perfectly except that an Exception is raised if wrapped STL vectors used by OpenBabel as their Java the Molecule.draw() method is called with show = True SWIG bindings do not implement the Iterable interface. (the default). The image can however be written to a file Also, the current Jython implementation is 2.2 and does without problems (show = False, filename = not support generator expressions, which were introduced "image.png"). Similarly, if a user is only interested in in Python 2.4. Although both C++ and Python have the using the CDK and the RDKit, it is not necessary to install concept of a global function or variable, this is not the OpenBabel. case in Java. SWIG places such functions, and get/set methods for accessing the variables, in a special class Full installation instructions for Windows, MacOSX and named openbabel. Global constants are placed in another Linux are available from the Cinfony website. It should be class called openbabelConstants. A convenience module, noted that for Windows users, there is no need to compile obabel, is provided which automatically imports the or search for missing libraries as the dependencies are appropriate module depending on the Python implemen- included as binaries in the Cinfony distribution. tation. Results cdkjpype and cdkjython Cinfony API Since Jython runs on top of the Java Virtual Machine The original Pybel API was designed to make it easy to use (JVM), it can access Java libraries such as the CDK OpenBabel to perform the most common tasks in chem- natively. To access Java libraries from CPython, the informatics and to do so using idiomatic Python. Subse- Python library JPype [13] is needed. This starts an instance quently, we realised that the resulting API could be of the JVM and uses the JNI to communicate back and considered a generic API for wrapping the core function- forth. Overall, the differences between the two wrappers ality of any cheminformatics toolkit. Cinfony implements are minor. Jython and JPype differ in the syntax used to an extended version of the original Pybel API for the CDK handle Java exceptions. Also, JPype returns unicode and the RDKit, as well as OpenBabel. While the original strings from the CDK and these need to be converted to Pybel was restricted to CPython, Cinfony can also be used regular strings (otherwise problems arise if they are passed from Jython to access the CDK and OpenBabel. to an OpenBabel method expecting a std::string). The appropriate CDK wrapper, cdkjpype or cdkjython, will be Cinfony helps cheminformaticians avoid the steep learn- imported if the user imports the convenience module cdk. ing curve associated with starting to use a new toolkit. Page 5 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 20. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 With Cinfony, all of the core functionality of the toolkits targetfp = targetmol.calcfp() can be accessed with the same interface. For example, in Cinfony, a molecule can be created from a SMILES string output = cdk.Outputfile("sdf", "similar with: mols.sdf") mol = toolkit.readstring("smi", SMI for mol in cdk.readfile("sdf", "input LESstring) file.sdf"): RDKit fp = mol.calcfp() mol = Chem.MolFromSmiles(SMILESstring) if fp | targetfp >= 0.7: OpenBabel output.write(mol) mol = openbabel.OBMol() output.close() obconversion = openbabel.OBConversion() Alternatively, we could just have made a single change to the original script, by replacing the import statement from obconversion.SetInFormat("smi") "import pybel" with "from cinfony import cdk as pybel". obconversion.ReadString(mol, SMI Using Cinfony to combine toolkits LESstring) Another goal of Cinfony is to make it easy to combine toolkits in the same script. This allows the user to exploit CDK the complementary capabilities of different toolkits (Table 1). For example, let's suppose the user wants to (1) builder = cdk.DefaultChemObject convert a SMILES string to 3D coordinates with OpenBa- Builder.getInstance() bel, then (2) create a 2D depiction of that molecule with the RDKit, next (3) calculate descriptors with the CDK, sp = cdk.smiles.SmilesParser(builder) and finally (4) write out an SDF file containing the descriptor values and the 3D coordinates. The full Python mol = sp.parseSmiles(SMILESstring) script is only seven lines long: The RDKit was designed with Python scripting in mind, from cinfony import rdkit, cdk, obabel and of the three toolkits is the most concise. On the other hand, OpenBabel uses a characteristically C++ approach. mol = obabel.readstring("smi", "CCC=O") An empty molecule is created, and is passed to an OBCon- version instance as a container for the molecule read from mol.make3D() the SMILES string. The SmilesParser in the CDK requires an instance of an object implementing the IChemObject- rdkit.Molecule(mol).draw(show = False, Builder interface. filename = "aldehyde.png") Another advantage of a common API is that a script writ- descs = cdk.Molecule(mol).calcdesc() ten for one toolkit can easily be modified to use another. As an example, here is a script that selects molecules that mol.data.update(descs) are similar to a particular target molecule. This script is taken from the original Pybel paper [7], but uses the CDK mol.write("sdf", filename = "alde instead of OpenBabel and will run equally well from hyde.sdf") Jython and CPython. The only differences compared to the original script are that "pybel" has been replaced with For cheminformaticians interested in developing QSAR or "cdk", and the import statement has been changed from QSPR models, Cinfony can be used to simultaneously cal- "import pybel": culate descriptors from the RDKit, the CDK and OpenBa- bel. For example, the following script reads a multiline from cinfony import cdk input file, with each line consisting of a SMILES string fol- lowed by a property value. For each molecule, it calculates targetmol = cdk.readfile("sdf", "target all of the OpenBabel, RDKit and CDK descriptors (except mol.sdf").next() for CDK's CPSA) and writes out the results as a tab-sepa- Page 6 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 21. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 rated file suitable for reading with the statistical package R print >> outputfile, "t".join(["Prop [15]. Note that in this example script, if descriptors share erty"] + descnames) the same name only one is retained. This is the case for the TPSA descriptor in OpenBabel, which is replaced by the for smile, propval, desc in zip(smiles, RDKit's TPSA descriptor. propvals, descs): import string descvals = [str(desc[descname]) for descname in descnames] from cinfony import obabel, cdk, rdkit print >> outputfile, "t".join([smile, # Read in SMILES strings and observed prop str(propval)] + erty values descvals) smiles, propvals = [], [] outputfile.close() for line in open("data.txt"): Performance broken = line.rstrip().split() Accessing cheminformatics libraries using Cinfony allows the user to rapidly develop scripts that manipulate chem- smiles.append(broken [0]) ical information. However, there is a small price to be paid. Firstly, there is the cost of moving objects across the propvals.append(float(broken)) interface between Python and the cheminformatics librar- ies. Secondly, the additional code required by Cinfony to mols = [obabel.readstring("smi", smile) implement a standard API may slow performance further. for smile in smiles] To assess the performance penalty for accessing chem- # Calculate descriptor values using informatics toolkits using Cinfony rather than directly in OpenBabel, the native language, we looked at two simple test cases: (1) iterating over an SDF file containing 25419 molecules, # the CDK (apart from 'CPSA') and the RDKit (2) iterating and printing out the molecular weight of each of the molecules. The SDF file used was 3_p0.0.sdf, cdkdescs = [x for x in cdk.descs if x != the first portion of the drug-like subset of the ZINC 7.00 'CPSA'] dataset [16]. The Cinfony scripts, Java and C++ source code are available as Additional file 2. The results are descs = [] shown in Table 3. for mol in mols: While accessing the CDK using Jython is almost as fast as a pure Java implementation, there is a considerable over- d = mol.calcdesc() head associated with using JPype to access the CDK from CPython (89% slower for the second test case). This over- d.update(cdk.Molecule(mol).calcdesc(cd head is due to passing objects between the JVM and CPy- kdescs)) thon. For OpenBabel, there is little performance cost associated with accessing OpenBabel from either imple- d.update(rdkit.Molecule(mol).calcdesc( mentation of Python, although the jybel scripts are some- )) what slower than pybel scripts. A small portion of this speed difference can be attributed to a slower startup descs.append(d) (about 1.6 seconds for jybel, compared to 0.8 seconds for pybel). Finally, from the RDKit results in Table 3, it is clear # Write a file suitable for 'read.table' that using Boost.Python to wrap a C++ library is more effi- in R cient than using SWIG. The difference in run times between the C++ and Python implementations is negligi- outputfile = open("inputforR.txt", "w") ble. descnames = sorted(descs [0].keys(), key = In practice, the performance of a particular Cinfony script string.lower) will depend on the extent to which information is passed Page 7 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 22. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 Table 3: Performance of Cinfony modules compared to a native Java or C++ implementation. Iterate over SDF Iterate and calculate molecular weight CDK Time (s) Normalised Time (s) Normalised Native Java 21.2 1.00 36.8 1.00 cdkjython 23.1 1.09 41.6 1.13 cdkjpype 33.0 1.57 69.5 1.89 OpenBabel Native C++ 31.9 1.00 43.0 1.00 pybel 34.1 1.07 45.1 1.05 jybel 38.0 1.19 49.6 1.15 RDKit Native C++ 99.7 1.00 100.7 1.00 rdkit 99.9 1.00 101.0 1.00 The times reported are wallclock times from the best of three runs on a dual-core Intel Pentium 4 3.2 GHz machine with 1GB RAM. back and forth between Python and the underlying Java or ticomponent molecules. For each molecule, PubChem C++ library. Where most of the time is spent on computa- provides an SDF file containing coordinates for a 2D tion in the underlying library, the speed difference depiction, as well as the depiction itself as a PNG file. between a native implementation and one using Cinfony PubChem uses the CACTVS toolkit [18] to generate the is expected to be small. 2D coordinates as well as the corresponding depiction. Using a script similar to the following, we used Cinfony to Comparison of toolkits generate 2D depictions using OASA (the depiction library Cinfony makes it easy to compare the results obtained by used by pybel), the CDK and a development version of different toolkits for the same operations. This can be use- RDKit that all use the same 2D coordinates taken from the ful in identifying bugs, applying a test suite, or finding the SDF file: strengths and weaknesses of particular implementations. For example, where different toolkits calculate the same from cinfony import pybel, rdkit descriptors, if the calculated values are not highly corre- lated it may indicate a bug in one or the other. Earlier, we for toolkit in [rdkit, pybel]: mentioned that a difference in the treatment of implicit hydrogens causes different toolkits to give different values name = toolkit.__name__ for molecular weight unless hydrogens are explicitly added. Ensuring that a particular result is in agreement for mol in toolkit.readfile("sdf", with that obtained by another toolkit can act as a sanity "dataset.sdf"): check in such instances to avoid errors. mol.draw(filename = "%s_%s.png" % When carrying out the same operation with several (mol.title, name), toolkits, it is often convenient to iterate over the toolkits in an outer loop: show = False, from cinfony import obabel, rdkit, cdk usecoords = True) for toolkit in [obabel, rdkit, cdk]: When the resulting images were compared for the PubChem entry CID7250053, an error was found in the print toolkit.readstring("smi", depiction of the stereochemistry of an isopropyl group "CCC").molwt (Figure 2). Since the error only occurred in certain cases, it had not been previously noticed and would have been dif- As an example of how such comparisons can be used to ficult to identify without such a comparative study. Once identify bugs in toolkits, let us consider depiction. As a reported, the problem was quickly solved and the subse- dataset, we randomly chose 100 molecules from quent RDKit release depicted the stereochemistry cor- PubChem [17], with subsequent filtering to remove mul- rectly. A comparison of depictions by commercial toolkits Page 8 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 23. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 Other requirements: OpenBabel, CDK, RDKit, Java, OASA, JPype, Python Imaging Library License: BSD Any restrictions to use by non-academics: None Competing interests The authors declare that they have no competing interests. Authors' contributions NMOB conceived and developed Cinfony. GRH is the lead developer of OpenBabel and created the Python and Java SWIG bindings. All authors read and approved the final manuscript. Additional material Additional file 1 Miniwebsite API. A mini-website of the Cinfony API documentation. Click here for file [http://www.biomedcentral.com/content/supplementary/1752- Figure different2toolkits Comparison of depictions of PubChem CID7250053 using 153X-2-24-S1.zip] Comparison of depictions of PubChem CID7250053 using different toolkits. The depiction using the develop- Additional file 2 ment version of RDKit showed incorrect stereochemistry Timing Code. A zip file containing Python, Java and C++ code used for for the isopropyl substituent of the thiazole ring. run time comparisons for two test cases. Click here for file [http://www.biomedcentral.com/content/supplementary/1752- 153X-2-24-S2.zip] and depictions generated by Cinfony is available here (see Additional file 3). Additional file 3 Miniwebsite Depictions. A mini-website showing a comparison of the Conclusion depictions generated by several cheminformatics toolkits. Cinfony makes it easy to combine complementary fea- Click here for file [http://www.biomedcentral.com/content/supplementary/1752- tures of the three main Open Source cheminformatics 153X-2-24-S3.zip] toolkits. By presenting a standard simplified API, the learning curve associated with starting to use a new toolkit is greatly reduced, thus encouraging users of one toolkit to investigate the potential of others. Acknowledgements Cinfony would not be possible without the work of many Open Source Cinfony is freely available from the Cinfony website [19], projects. In particular, we thank several developers who responded quickly both as Python source code and as a Windows distribu- to bug reports or queries: Beda Kosata (OASA), Greg Landrum (RDKit), tion containing dependencies. Installation instructions Tim Vandermeersch (OpenBabel), Steve Ménard (JPype). Thanks also to are provided for MacOSX, Linux and Windows. Gilbert Mueller and Chris Morley for feedback on installing Cinfony. NMOB thanks Google Code for providing free web hosting and develop- ment tools for Cinfony. We thank the anonymous reviewers for several Availability and requirements useful suggestions. Project name: Cinfony References Project home page: http://cinfony.googlecode.com 1. OpenBabel v.2.2.0 [http://openbabel.org] 2. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E: Operating system(s): Platform independent Recent Developments of the Chemistry Development Kit (CDK) – An Open-Source Java Library for Chemo- and Bio- informatics. Curr Pharm Des 2006, 12:2110-2120. Programming language: Python, Jython 3. Landrum G: RDKit. [http://www.rdkit.org]. 4. Murray-Rust P, Rzepa HS: Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles. J Chem Inf Comput Sci 1999, 39:928-942. Page 9 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 24. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24 5. Apodaca R, O'Boyle N, Dalke A, Van Drie J, Ertl P, Hutchison G, James CA, Landrum G, Morley C, Willighagen E, De Winter H: OpenSMILES. [http://www.opensmiles.org]. 6. Daylight Chemical Information Systems Manual [http:// www.daylight.com/dayhtml/doc/theory/theory.smiles.html] 7. O'Boyle NM, Morley C, Hutchison GR: Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem Cent J 2008, 2:5. 8. Kosata B: OASA. [http://bkchem.zirael.org/oasa_en.html]. 9. Raymond ES: The Art of UNIX Programming 2003 [http://www.catb.org/ ~esr/writings/taoup/index.html]. Reading, MA: Addison-Wesley 10. Symyx CTfile formats [http://www.mdli.com/downloads/public/ ctfile/ctfile.jsp] 11. KNIME – Konstanz Information Miner [http://knime.org] 12. SWIG v.1.3.36 [http://www.swig.org] 13. Ménard S: JPype. [http://jpype.sf.net]. 14. Boost.Python [http://www.boost.org/libs/python/doc/] 15. R development core team: R: A language and environment for statistical computing. [http://www.R-project.org]. 16. Irwin JJ, Shoichet BK: ZINC – A Free Database of Commercially Available Compounds for Virtual Screening. J Chem Inf Model 2005, 45:177-182. 17. PubChem [http://pubchem.ncbi.nlm.nih.gov/] 18. CACTVS Chemoinformatics Toolkit: Xemistry GmbH: Lah- ntal, Germany. . 19. O'Boyle NM: Cinfony. [http://cinfony.googlecode.com]. Publish with ChemistryCentral and every scientist can read your work free of charge Open access provides opportunities to our colleagues in other parts of the globe, by allowing anyone to view the content free of charge. W. Jeffery Hurst, The Hershey Company. available free of charge to the entire scientific community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours you keep the copyright Submit your manuscript here: http://www.chemistrycentral.com/manuscript/ Page 10 of 10 Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
  • 25. O’Boyle et al. Journal of Cheminformatics 2011, 3:33 http://www.jcheminf.com/content/3/1/33 SOFTWARE Open Access Open Babel: An open chemical toolbox Noel M O’Boyle1, Michael Banck2, Craig A James3, Chris Morley4, Tim Vandermeersch4 and Geoffrey R Hutchison5* Abstract Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor- neutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org. Introduction indication of biomolecular residues, or multiple The history of chemical informatics has included a huge conformations. variety of textual and computer representations of mole- While attempts have been made to provide a standard cular data. Such representations focus on specific atomic format for storing chemical data, including most notably or molecular information and may not attempt to store the development of Chemical Markup Language (CML) all possible chemical data. For example, line notations [2-6], an XML dialect, such formats have not yet like Daylight SMILES [1] do not offer coordinate infor- achieved widespread use. Consequently, a frequent pro- mation, while crystallographic or quantum mechanical blem in computational modeling is the interconversion formats frequently do not store chemical bonding data. of molecular structures between different formats, a pro- Hydrogen atoms are frequently omitted from x-ray crys- cess that involves extraction and interpretation of their tallography due to the difficulty in establishing coordi- chemical data and semantics. nates, and are often ignored by some file formats as the We outline for the first time, the development and use “implicit valence” of heavy atoms that indicates their of the Open Babel project, a full-featured open chemical presence. Other types of representations require specifi- toolbox, designed to “speak” the many different repre- cation of atom types on the basis of a specific valence sentations of chemical data. It allows anyone to search, bond model, inclusion of computed partial charges, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas. It provides both ready-to-use programs as well as * Correspondence: geoffh@pitt.edu 5 University of Pittsburgh, Department of Chemistry, 219 Parkman Avenue, a complete, extensible programmer’s toolkit for develop- Pittsburgh, PA 15217, USA ing cheminformatics software. It can handle reading, Full list of author information is available at the end of the article © 2011 O’Boyle et al; licensee Chemistry Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. J. Cheminf. 2011, 3, 33.