3. Contents
I Cheminformatics toolkits 5
1 Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit 7
2 Cinfony - combining Open Source cheminformatics toolkits behind a common interface 15
3 Open Babel: An open chemical toolbox 25
II Enzyme reaction mechanisms 39
4 MACiE: a database of enzyme reaction mechanisms 41
5 MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for search-
ing catalytic mechanisms 43
III QSAR 49
6 PYCHEM: a multivariate analysis package for python 51
7 Simultaneous feature selection and parameter optimisation using an artificial ant colony:
case study of melting point prediction 53
IV The Rest 69
8 Userscripts for the life sciences 71
9 Confab - Systematic generation of diverse low-energy conformers 83
10 Review of “Data Analysis with Open Source Tools” 93
11 Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years
on 95
3
8. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5
Of the current popular scripting languages, Python [7] is header files, SWIG generates a C file which, when com-
the de-facto standard language for scripting in cheminfor- piled and linked with the Python development libraries
matics. Several commercial cheminformatics toolkits have and OpenBabel, creates a Python extension module,
interfaces in Python: OpenEye's closed-source successor openbabel. This can then be imported into a Python script
to OpenBabel, OEChem [8], is a C++ toolkit with inter- like any other Python module using the "import openbabel"
faces in Python and Java; Rational Discovery's RDKit [9], statement.
which is now open source, is a C++ cheminformatics
toolkit with a Python interface; the Daylight toolkit [10] For a small number of C++ objects and functions, it was
from Daylight Chemical Information Systems, written in necessary to add some convenience functions to facilitate
C, only has Java and C++ wrappers but PyDaylight [11], access from Python. Certain types of molecule files have
available separately from Dalke Scientific, provides a additional data present in addition to the connection
Python interface to the toolkit; the Cambios Molecular table. OpenBabel stores these data in subclasses of OBGe-
Toolkit [12] from Cambios Consulting is a commercial nericData such as OBPairData (for the data fields in mol-
C++ toolkit with a Python interface. There are also toolkits ecule files such as MOL files and SDF files) and
entirely implemented in Python: Frowns [13], an open OBUnitCell (for the data fields in CIF files). To access the
source cheminformatics toolkit by Brian Kelley, and PyBa- data it is necessary to 'downcast' an instance of OBGener-
bel [14], an open source toolkit included in the MGLTools icData to the specific subclass. For this reason, two con-
package from the Molecular Graphics Labs at the Scripps venience functions were added to the interface file, one to
Research Institute. Note that the latter is not related to the cast OBGenericData to OBPairData, and one to cast to
OpenBabel project; rather its name derives from the fact OBUnitCell. Another convenience function was added to
that its aim was to implement in Python some of the func- convert a Python list to a C array of doubles, as this type
tionality of Babel v1.6 [15], a command-line application of input is required for a small number of OpenBabel
for converting file formats which is a predecessor of functions.
OpenBabel.
Iterators are an important feature of the OpenBabel C++
Here we describe the implementation and application of library. For example, OBAtomAtomIter allows the user to
Pybel, a Python module that provides access to the easily iterate over the atoms attached to a particular atom,
OpenBabel C++ library from the Python programming and OBResidueIter is an iterator over the residues in a
language. Pybel builds on the basic Python bindings to molecule. The OpenBabel iterators use the dereference
make it easier to carry out frequent tasks in cheminformat- operator to access the data, the increment operator to iter-
ics. It also aims to be as 'Pythonic' as possible; that is, to ate to the next element, and the boolean operator to test
adhere to Python language conventions and idioms, and whether any elements remain. Iterators are also a core fea-
where possible to make use of Python language features ture of the Python language. However, the iterators used
such as iterators. The result is a module that takes advan- by OpenBabel are not automatically converted into
tage of Python's expressive syntax to allow cheminforma- Python iterators. To deal with this, Python iterator classes
ticians to carry out tasks such as SMARTS matching, data that wrap the dereference, increment and boolean opera-
field manipulation and calculation of molecular finger- tors behind the scenes were added to the SWIG interface
prints in just a few lines of code. file, so that Python statements such as "for
attached_obatom in OBAtomAtomIter(obatom)" work with-
Implementation out problem.
SWIG bindings
Python bindings to the OpenBabel toolkit were created Pybel module
using SWIG [16]. SWIG (Simplified Wrapper and Inter- The SWIG bindings provide direct access from Python to
face Generator) is a tool that automates the generation of the C++ objects and functions in the OpenBabel API
bindings to libraries written in C or C++. One of the (application programming interface). The purpose of the
advantages of SWIG compared to other automated wrap- Pybel module is to wrap these bindings to present a more
ping methods such as Boost.Python [17] or SIP [18] is that Pythonic interface to OpenBabel (Figure 1). This extra
SWIG also supports the generation of bindings to several level of abstraction is useful as Python programmers
other languages. For example, OpenBabel also uses SWIG expect Python libraries to behave in certain ways that a
to generate bindings for Perl, Ruby and Java. An addi- C++ library does not. For example, in Python, attributes of
tional advantage is that SWIG will directly parse C or C++ an object are often directly accessed whereas in C++ it is
header files while Boost.Python and SIP require each C++ typical to call Get/Set functions to access them. A C++
class to be exposed manually. The input to SWIG is an function returning a particular object might require a
interface file containing a list of OpenBabel header files pointer to an empty object as a parameter, whereas the
for which to generate bindings. Using the signatures in the Python equivalent would not. Even something as simple
Page 2 of 7
Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
9. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5
code shows how to store each molecule in a multimole-
cule SDF file in a list called allmols:
import openbabel
allmols = []
obconversion = openbabel.OBConversion()
obconversion.SetInFormat("sdf")
obmol = openbabel.OBMol()
notatend = obconversion.ReadFile(obmol,
"inputfile.sdf")
while notatend:
allmols.append(obmol)
obmol = openbabel.OBMol()
notatend = obconversion.Read(obmol)
To replace this somewhat verbose code, Pybel provides a
readfile method that takes a file format and filename and
returns molecules using the 'yield' keyword. This changes
the method into a 'generator', a Python language feature
where a method behaves like an iterator. Iterators are a
major feature of the Python language which are used for
looping over collections of objects. In Pybel, we have used
iterators where possible to simplify access to the toolkit.
As a result, the equivalent to the preceding code is:
Figure
text and1the OpenBabel C++ library
The relationship between Python modules described in the
The relationship between Python modules described import pybel
in the text and the OpenBabel C++ library. Python
modules are shown in green; the C++ library is shown in allmols = [mol for mol in pybel.read
blue. file("sdf", "inputfile.sdf")]
The benefits of iterator syntax are clear when dealing with
as differences in the conventions for the case of letters multimolecule files. For single molecule files, however,
used in variable and method names is a problem, as it the user needs to remember to explicitly request the itera-
makes it more likely for Python programmers to intro- tor to return the first and only molecule using the next
duce bugs in their code. method:
One of the key aims of Pybel was to reduce the amount of mol = pybel.readfile("mol", "input
code necessary to carry out common tasks. This is espe- file.mol").next()
cially important for a scripting language where program-
ming is often done interactively at a command prompt. In Pybel provides replacements for two of the main classes in
addition, as for any programming language, repeated the OpenBabel library, OBMol and OBAtom. The follow-
entry of code for routine and common tasks (so-called ing discussion describes the Pybel Molecule class which
'boilerplate code') is a common cause of errors in code. wraps an instance of OBMol, but the same design princi-
Reading and writing molecule files is one of the most ples apply to the Pybel Atom class. Table 1 summarises
common tasks for users of OpenBabel but requires several the attributes and methods of the Molecule object. By
lines of code if using the SWIG bindings. The following wrapping the base class, Pybel can enhance the Molecule
Page 3 of 7
Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
10. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5
Table 1: Attributes and methods supported by the Pybel Molecule object
Attribute Description*
OBMol The underlying OBMol object
atoms A list of Pybel Atoms
charge The total charge (GetTotalCharge)
data A MoleculeData object for access to data fields
dim The dimensionality of the coordinates (GetDimension)
energy The heat of formation (GetEnergy)
exactmass The mass calculated using isotopic abundance (GetExactMass)
flags The set of flags used internally by OpenBabel (GetFlags)
formula The stoichiometric formula (GetFormula)
mod The number of nested BeginModify() calls (Internal use) (GetMod)
molwt The standard molar mass (GetMolWt)
spin The total spin multiplicity (GetTotalSpinMultiplicity)
sssr The smallest set of smallest rings (GetSSSR)
title The title of the molecule (often the filename) (GetTitle)
unitcell Unit cell data (if present)
Method
write Write the molecule to a file or return it as a string
calcfp Return a molecular fingerprint as a Fingerprint object
calcdesc Return the values of the group contribution descriptors
__iter__ Enable iteration over the Atoms in the Molecule
*Where a Molecule attribute is a direct replacement for a 'Get' method of the underlying OBMol, the name of the method is given in parentheses.
object by providing (1) direct access to attributes rather # Using Pybel
than through the use of Get methods, (2) additional
attributes of the object, and (3) additional methods that value = pybel.Molecule(mol).data ["com
act on the object. ment"]
(1) As mentioned earlier, it is typical in Python to access It should be noted that all of these attributes are calculated
attribute values directly rather than using Get/Set meth- on-the-fly rather than stored for future access as the under-
ods. With this in mind, the Molecule class adds attributes lying OBMol may have been modified.
such as energy, formula and molwt (among others) which
give the values returned by calling GetEnergy(), GetFor- (3) Four additional methods have been added to the
mula() and GetMolWt(), respectively on the underlying Pybel Molecule (Table 1). The first is a write method
OBMol (see Table 1 for the full list). which writes a representation of the Molecule to a file and
takes care of error handling. As with reading molecules
(2) One of the aims of Pybel is to simplify access to some from files (see above), this method simplifies the proce-
of the most common attributes. With this in mind, an dure significantly compared to using the SWIG bindings
atoms attribute has been added which returns a list of the directly. In addition, a calcfp method and a calcdesc
atoms of the molecule as Pybel Atoms. Access to the data method have been added which calculate a binary finger-
fields associated with a molecule has been simplified by print for the molecule, and some descriptor values, respec-
creation of a MoleculeData object which is returned when tively. In the OpenBabel library these are not methods of
the data attribute of a Molecule is accessed. MoleculeData the OBMol, but rather are loaded as plugins (by OBFin-
presents a dictionary interface to the data fields of the gerprint.FindFingerprint and OBDescriptor.FindType,
molecule. Accessing and updating these field is more con- respectively) to which an OBMol is passed as input. The
voluted if using the SWIG bindings. Compare the follow- __iter__ method is a special Python method that enables
ing statements for accessing the "comment" field of the iteration over an object; in the case of a Molecule, the
variable mol, an OBMol: defined iterator loops over the Atoms of the Molecule.
This feature enables constructions such as "for atom in
# Using the SWIG bindings mol" where mol is a Pybel Molecule.
value = openbabel.toPairData(mol.GetData SMARTS is a query language developed by Daylight
["comment"]).GetValue() Chemical Information Systems for molecular substructure
Page 4 of 7
Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
11. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5
searching [3]. As implemented in the OpenBabel toolkit, The OBMol wrapped by a Pybel Molecule can be accessed
finding matches of a particular substructure in a particular through the OBMol attribute. This makes it easy to call a
molecule is a four step process that involves creating an method not wrapped by Pybel, such as OBMol.NumRotors,
instance of OBSmartsPattern, initialising it with a which returns the number of rotatable bonds in a mole-
SMARTS pattern, searching for a match, and finally cule:
retrieving the result:
mol = pybel.readfile("mol", "input
obsmarts = openbabel.OBSmartsPattern() file.mol").next()
obsmarts.Init("[#6] [#6]") numrotors = mol.OBMol.NumRotors()
obsmarts.Match(obmol) Documentation and Testing
To minimise programming errors, programs written
results = obsmarts.GetUMapList() dynamically-typed languages such as Python should be
tested comprehensively. Pybel has 100% code coverage in
Since a SMARTS query can be thought of as a regular terms of unit tests, as measured by Ned Batchelder's cov-
expression for molecules, in Pybel we decided to wrap the erage.py [19]. It also has several doctests, short snippets of
SMARTS functionality in an analogous way to Python's Python code included in documentation strings which
regular expression module, re. With these changes, the serve as both examples of usage and as unit tests.
same process takes only two steps, an initialisation step
and a search step: The Pybel API is fully documented with docstrings. These
can be accessed in the usual way with the help() com-
smarts = pybel.Smarts("[#6] [#6]") mand at the interactive Python prompt after importing
Pybel: for example, "help(pybel.Molecule)". In addition, the
results = smarts.findall(pybelmol) OpenBabel Python web page [20] contains a complete
description of how to use the SWIG bindings and the
Pybel was not written to replace the SWIG bindings but Pybel API. The webpage also contains links to HTML ver-
rather to make it simpler to perform common tasks. As a sions of the OpenBabel API documentation and Pybel API
result, Pybel does not attempt to wrap every single documentation. The latter is included in Additional File 1.
method and class in the OpenBabel library. Because of
this, a user may often want to interconvert between an Results and Discussion
OBMol and a Molecule, or an OBAtom and an Atom. This The principle aim of Pybel is to make it simpler to use the
is quite a straightforward process. A Pybel Molecule can be OpenBabel toolkit to carry out common tasks in chem-
created by passing an OBMol to the Molecule constructor. informatics. These common tasks include reading and
In the following example an OBMol is created using the writing molecule files, accessing data fields of a molecule,
SWIG bindings and then written to a file using Pybel: computing and comparing molecular fingerprints and
SMARTS matching. Here we present some examples that
obmol = openbabel.OBMol() illustrate how Pybel may be used to carry out common
cheminformatics tasks.
a = obmol.NewAtom()
Removal of duplicate molecules
a.SetAtomicNum(6) When merging different datasets or as a final step in pre-
processing, it may be necessary to identify and remove
a.SetVector(0.0, 1.0, 2.0) # Set coordi duplicate molecules. In the following example, only the
nates unique molecules in the multimolecule SDF file "input-
file.sdf" will be written to "uniquemols.sdf". Here we will
b = obmol.NewAtom() assume that a unique InChI string (IUPAC International
Chemical Identifier) indicates a unique molecule. A simi-
obmol.AddBond(1, 2, 1) # Single bond from lar procedure could be performed using the OpenBabel
Atom 1 to Atom 2 canonical SMILES format, by replacing "inchi" with "can"
in the following:
pybel.Molecule(obmol).write("mol", "out
putfile.mol") import pybel
inchis = []
Page 5 of 7
Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
12. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5
output = pybel.Outputfile("sdf", ties. This is the Lipinski Rule of Fives, so-called as the
"uniquemols.sdf") numbers involved are all multiples of five. The following
example shows how to filter a database to identify only
for mol in pybel.readfile("sdf", "input those molecules that pass all four of the Lipinski criteria.
file.sdf"): The values of the Lipinski descriptors are also added to the
output file as data fields. Note that whereas molecular
inchi = mol.write("inchi") weight is directly available as an attribute of a Molecule,
and LogP is available as one of the three group contribu-
if inchi not in inchis: tion descriptors calculated by OpenBabel, we need to use
SMARTS pattern matching to identify the number of
output.write(mol) hydrogen bond donors and acceptors. The SMARTS pat-
terns used here correspond to the definitions of hydrogen
inchis.append(inchi) bond donor and acceptor used by Lipinski:
output.close() import pybel
Selection of similar molecules HBD = pybel.Smarts("[#7,#8;!H0]")
Another common task in cheminformatics is the selection
of a set of molecules of similar structure to a target mole- HBA = pybel.Smarts("[#7,#8]")
cule. Here we will assume that structural similarity is indi-
cated by a Tanimoto coefficient [21] of at least 0.7 with def lipinski(mol):
respect to Daylight-type (that is, based on hashed paths
through the molecular graph) fingerprints. Note that """Return the values of the Lipinski
Pybel redefines the | operator (bitwise OR) for Fingerprint descriptors."""
objects as the Tanimoto coefficient:
desc = {'molwt': mol.molwt,
import pybel
'HBD': len(HBD.findall(mol)),
targetmol = pybel.readfile("sdf", "target
mol.sdf").next() 'HBA': len(HBA.findall(mol)),
targetfp = targetmol.calcfp() 'LogP': mol.calcdesc(['LogP'])
['LogP']}
output = pybel.Outputfile("sdf", "similar
mols.sdf") return desc
for mol in pybel.readfile("sdf", "input passes_all_rules = lambda desc: (desc
file.sdf"): ['molwt'] <= 500 and
fp = mol.calcfp() desc ['HBD'] <= 5 and desc
['HBA'] <= 10 and
if fp | targetfp >= 0.7:
desc ['LogP'] <= 5)
output.write(mol)
if __name__=="__main__":
output.close()
output = pybel.Outputfile("sdf", "pas
Applying a Rule of Fives filter sLipinski.sdf")
In an influential paper, Lipinski et al. [22] performed an
analysis of drug compounds that reached Phase II clinical for mol in pybel.readfile("sdf",
trials and found that they tended to occupy a certain range "inputfile.sdf"):
of values for molecular weight, LogP, and number of
hydrogen bond donors and acceptors. Based on this, they descriptors = lipinski(mol)
proposed a rule with four criteria to identify molecules
that might have poor absorption or permeation proper- if passes_all_rules(descriptors):
Page 6 of 7
Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
13. Chemistry Central Journal 2008, 2:5 http://journal.chemistrycentral.com/content/2/1/5
mol.data.update(descriptors) Additional material
output.write(mol)
Additional file 1
Pybel API. The HTML documentation of the Pybel API (application pro-
output.close() gramming interface).
Click here for file
Future work [http://www.biomedcentral.com/content/supplementary/1752-
The future development of Pybel is closely linked to any 153X-2-5-S1.zip]
changes and improvements to OpenBabel. With each new
release of the OpenBabel API, the SWIG bindings will be
updated to include any additional functionality. How-
ever, additions to the Pybel API will only occur if they sim- Acknowledgements
plify access to new features of the OpenBabel toolkit of The idea for the Pybel module was inspired by Andrew Dalke's work on
PyDaylight [11]. We thank the anonymous reviewers for their helpful com-
general use to cheminformaticians. In general, the Pybel
ments.
API can be considered stable, and an effort will be made
to ensure that future changes will be backwards compati-
References
ble. 1. Ousterhout JK: Scripting: Higher Level Programming for the
21st Century. [http://home.pacbell.net/ouster/scripting.html].
Conclusion 2. OpenBabel v.2.1.1 [http://openbabel.sf.net]
3. SMARTS – A Language for Describing Molecular Patterns
Pybel provides a high-level Python interface to the widely- [http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html]
used OpenBabel C++ toolkit. This combination of a high 4. Flower DR: On the properties of bit string-based measures of
chemical similarity. J Chem Inf Comput Sci 1998, 38:379-386.
performance cheminformatics toolkit and an expressive 5. Wildman SA, Crippen GM: Prediction of physicochemical
scripting language makes it easy for cheminformaticians parameters by atomic contributions. J Chem Inf Comput Sci
to rapidly and efficiently write scripts to manipulate 1999, 39:868-873.
6. Ertl P, Rohde B, Selzer P: Fast calculation of molecular polar
molecular data. surface area as a sum of fragment-based contributions and
its application to the prediction of drug transport properties.
Pybel is freely available from the OpenBabel web site2 J Med Chem 2000, 43:3714-3717.
7. Python [http://www.python.org]
both as part of the OpenBabel source distribution and for 8. OEChem: OpenEye Scientific Software: Santa Fe, NM. .
Windows as an executable installer. Compiled versions 9. RDKit [http://www.rdkit.org]
10. Daylight Toolkit: Daylight Chemical Information Systems,
are also available as packages in some Linux distributions Inc.: Aliso Viejo, CA. .
(openbabel-python in Fedora, for example). 11. PyDaylight: Dalke Scientific Software, LLC: Santa Fe, NM. .
12. Cambios Molecular Toolkit: Cambios Computing, LLC: Palo
Alto, CA. .
Availability and Requirements 13. Frowns [http://frowns.sf.net]
Project name: Pybel 14. PyBabel in MGLTools [http://mgltools.scripps.edu]
15. Babel v.1.6 [http://smog.com/chem/babel/]
16. SWIG v.1.3.31 [http://www.swig.org]
Project home page: http://openbabel.sf.net/wiki/Python 17. Boost.Python [http://www.boost.org/libs/python/doc/]
18. SIP – A Tool for Generating Python Bindings for C and C++
Libraries [http://www.riverbankcomputing.co.uk/sip/]
Operating system(s): Platform independent 19. coverage.py [http://nedbatchelder.com/code/modules/cover
age.html]
Programming language: Python 20. OpenBabel Python [http://openbabel.sourceforge.net/wiki/
Python]
21. Jaccard P: La distribution de la flore dans la zone alpine. Rev
Other requirements: OpenBabel Gen Sci Pures Appl 1907, 18:961-967.
22. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental
and computational approaches to estimate solubility and
License: GNU GPL permeability in drug discovery and development settings.
Adv Drug Del Rev 1997, 23:3-25.
Any restrictions to use by non-academics: None
Authors' contributions
GRH is the lead developer of OpenBabel and created the
SWIG bindings. NMOB developed Pybel, and extended
the SWIG interface file. CM compiled the SWIG bindings
on Windows and added convenience functions to the
OpenBabel API to facilitate access from scripting lan-
guages. All authors read and approved the final manu-
script.
Page 7 of 7
Chem. Cent. J. 2008, 2, 5. (page number not for citation purposes)
16. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
Table 1: Some features of toolkits which are not shared by all three toolkits.
CDK
A large number of descriptors (some overlap with RDKit)
Pharmacophore searching (like RDKit*)
Calculation of maximum common substructure
2D structure layout (like RDKit) and depiction
MACCS keys (also RDKit) and E-State fingerprints
Integration with the R statistical programming environment
Support for mass-spectrometry analysis (representations for cleavage reactions, structure generation from formulae)
Fragmentation schemes (ring fragments, Murcko)
3D structure generation using a template and heuristics (like OpenBabel)
3D similarity using ultrafast shape descriptors
Gasteiger π charge calculation
OpenBabel
Not just focused on cheminformatics
Supports a very large number of chemical file formats including quantum mechanics file formats, molecular mechanics trajectories, 2D sketchers
3D structure generation using a template method (like CDK)
Included in all major Linux distributions
Bindings available from several scripting languages apart from Python, as well as the Java and .NET platforms
Conformation generation and searching
InChI (also CDK) and InChIKey generation
Support for crystallographic space groups
Several forcefield implementations: UFF (also RDKit), MMFF94, MMFF94s, Ghemical
Ability to add custom data types to atoms, bonds, residues, molecules
RDKit
A large number of descriptors (some overlap with CDK)
Fragmentation using RECAP rules
2D coordinate generation (like CDK) and depiction
3D coordinate generation using geometry embedding
Calculation of Cahn-Ingold-Prelog stereochemistry codes (R/S)
Pharmacophore searching (like CDK)
Calculation of shape similarity (based on volume overlap)
Chemical reaction handling and transforms
Atom pairs and topological torsions fingerprints
Feature maps and feature-map vectors
Machine-learning algorithms
* Where the term "like" is used, it indicates that the implementation details differ.
data. For example, the CML project has defined a stand- models between different toolkits, and differences in the
ardised XML format for chemical data [4], with successive API for core cheminformatics tasks shared by the toolkits.
releases refining and extending the original standard. The
OpenSMILES effort [5] has attempted to resolve ambigui- Here we describe Cinfony, a Python module that over-
ties in the published SMILES definition [6] to create a comes these barriers to provide interoperability at the API
standard. While these efforts deserve support, they face level. Cinfony allows access to OpenBabel, the CDK, and
inevitable problems achieving consensus and they require the RDKit through a common interface, and uses a simple
changes to existing software to support the standard. The yet robust method to pass chemical models between
large number of chemical file formats supported by toolkits. Pybel, one of the components of Cinfony, has
OpenBabel (currently over 80) illustrates both the poten- been described previously [7]. It provides access to
tial of achieving a standard as well as the difficulties. OpenBabel from standard Python. In this work, we show
that the API developed for Pybel may be considered a
An alternative is interoperability at the API (application generic API for accessing any cheminformatics toolkit. We
programming interface) level. This has the advantage that describe the design and implementation of the Cinfony
it does require any changes to existing software. However, API for OpenBabel, the RDKit and the CDK. Next, we
there are at least three barriers to overcome: the need for a show how Cinfony simplifies the process of accessing the
programming language that can access all the toolkits toolkits and how it can be used in practice to combine the
simultaneously, the difficulty of exchanging chemical power of the three Open Source toolkits. Finally, we dis-
Page 2 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
17. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
cuss performance and some results from comparisons of Although the OBMol of OpenBabel has a corresponding
the toolkits. method, OBMol.AddHydrogens(), the RDKit uses a glo-
bal method, AddHs(Mol), while the CDK requires the
Implementation user to instantiate a HydrogenAdder object, which can
Common Application Programming Interface then be used to add hydrogens.
Cinfony presents the same interface to three cheminfor-
matics toolkits, OpenBabel, the CDK and the RDKit. The Molecule methods described in the original Pybel API
These are available through three separate modules: oba- [7] have been extended to handle hydrogen addition and
bel, cdk and rdkit. The API is designed to make it easy to removal, structure diagram generation, assignment of 3D
carry out many of the common tasks in cheminformatics, geometry to 0D structures and geometry optimisation
and covers the core functionality shared by all of the using forcefields. Both the CDK and the RDKit are capable
toolkits. Table 2 gives an overview of the API. The com- of 2D coordinate generation and 2D depiction. However,
plete API is available here (see Additional file 1). since OpenBabel currently has neither of these capabili-
ties, a fourth toolkit, OASA, is used by Pybel for this pur-
The main class containing chemical information is the pose. OASA is a lightweight cheminformatics toolkit
Molecule class. Rather than create a new chemical model, implemented in Python [8].
the Molecule class is a light wrapper around the molecule
object in the underlying library, for example, around A new development in the latest version of OpenBabel is
OBMol in the case of OpenBabel. Attribute values such as 3D coordinate generation and geometry optimisation
the molecular weight are calculated dynamically by query- using one of a number of forcefields. Since these methods
ing the underlying molecule. This ensures that if the are also available in the RDKit, and are under develop-
underlying OBMol, for example, is altered, the attribute ment in the CDK, two additional methods have been
values returned will still be correct. The actual underlying added to the Cinfony Molecule: make3D(), for 3D coor-
object (an OpenBabel OBMol, a CDK Molecule, or an dinate generation, and localopt(), for geometry optimisa-
RDKit Mol) can be accessed directly at any point. tion. Particularly in the case of OpenBabel, these new
methods simplify the process of generating 3D coordi-
The Molecule class also contains several methods that act nates. Compare a single call to make3D() in Cinfony with
on molecules such as methods for calculating fingerprints, the following OpenBabel code:
adding hydrogens, and calculating descriptor values. This
makes it easy to access these methods, and also brings structuregenerator = openbabel.OBOp.Find
them to the attention of the user. In the underlying toolkit Type('Gen3D')
these methods may not be present as part of the molecule
class, and in fact, they can be difficult to find in the structuregenerator.Do(mol)
toolkit's API. For example, the Cinfony method Mole-
cule.addh() adds explicit hydrogens to the molecule. mol.AddHydrogens()
Table 2: An overview of the Cinfony API.
Class name Purpose
Molecule Wraps a molecule instance of the underlying toolkit and provides access to methods that act on molecules
Atom Wraps an atom instance of the underlying toolkit
MoleculeData Provides dictionary-like access to the information contained in the tag fields in SDF and MOL2 files
Outputfile Handles multimolecule output file formats
Smarts Wraps the SMARTS functionality of the toolkit in an analogous way to the Python 're' module for regular expression matching
Fingerprint Simplifies Tanimoto calculation of binary fingerprints
Function name
readfile Return an iterator over Molecules in a file
readstring Return a Molecule
Variable name
descs A list of descriptor IDs
forcefields A list of forcefield IDs
fps A list of fingerprint IDs
informatsaa A list of input format IDs
outformats A list of output format IDs
Page 3 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
18. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
ff = openbabel.OBForceField.Find translation process is transparent to the user. However,
Type("MMFF94") the user should be aware of known limitations of particu-
lar readers or writers. For example, the SMILES parser in
ff.Setup(mol) CDK 1.0.3 ignores atom-based stereochemistry and thus
that information is lost if a 0D rdkit or obabel Molecule
ff.SteepestDescent(50) with atom-based stereochemistry is converted to a cdk
Molecule.
ff.GetCoordinates(mol)
Cinfony Molecules are interconverted using the Mole-
The Cinfony API is identical for all of the toolkits. How- cule() constructor. For example, if obabelmol is an obabel
ever, the values returned by particular API calls are not Molecule, then the corresponding rdkit Molecule can be
necessarily standardised across toolkits. This Cinfony constructed using rdkit.Molecule(pybelmol). This mecha-
design decision is in agreement with the Principle of Least nism can also be used to interface Cinfony to other chem-
Surprise [9]; when the user accesses the underlying toolkit informatics toolkits. The only requirements are that the
directly, they will get the same result as found when using object passed to the Molecule() constructor needs to have
Cinfony. This design decision places the responsibility on a _cinfony attribute set to True, and an _exchange
the user to become familiar with differences in how the attribute containing a tuple (0, SMILES string) or (1, MOL
toolkits behave. For example, all of the toolkits allow the file) depending on whether the molecule is 0D or not.
calculation of path-based fingerprints. These encode all
paths in the molecular graph up to a path length of P into Implementation
a binary vector of length V, but the default values for V The Python scripting language has two main implementa-
and P are different for each toolkit: 1024 and 7 for tions. The most widely used implementation is the origi-
OpenBabel, 1024 and 8 for the CDK, and 2048 and 7 for nal reference implementation of Python in C, referred to
RDKit. Although it is possible to alter these parameters for as CPython when necessary to distinguish it from other
the CDK and the RDKit and so standardise V and P to implementations. The next most widely used implemen-
1024 and 7 for all of the toolkits, it is reasonable to tation is Jython, an implementation of Python in Java.
assume that the developers of each package have chosen Although most users of Python do so through CPython,
sensible defaults. In addition, the implementation details Jython scripts have the advantage of being able to access
of each of the fingerprinters would still be different; for Java libraries natively. They can also be compiled into Java
example, the RDKit sets four bits when hashing each classes to be used from Java programs. Jython scripts are
molecular path, the others set one; OpenBabel does not also useful in contexts where Java is required but it is more
set any bits for the one-atom fragments, N, C and O. convenient to work in Python; for example, to implement
a Java web servlet or a node in a Java workflow environ-
Interoperability ment such as KNIME [11].
The ability to transfer chemical models between toolkits is
essential to the goal of interoperability. However, the As discussed earlier, one of the barriers to interoperability
internal representation of a molecule is specific to a par- is the requirement for a programming language that can
ticular toolkit. For example, as well as the connection simultaneously access more than one of the toolkits. From
table and coordinates (if present), it may include derived CPython it is possible to use Cinfony modules to connect
data relating to aromaticity, the number of implicit hydro- to OpenBabel (pybel), the CDK (cdkjpype) and the RDKit
gens on an atom, or stereochemical configuration. Fortu- (rdkit). From Jython, there are modules for OpenBabel
nately, the problem of transfer and storage of chemical (jybel) and the CDK (cdkjython). Convenience modules
information has already been solved by the development obabel and cdk are provided that automatically import the
of molecular file formats, of which over 80 are now sup- appropriate OpenBabel or CDK module depending on
ported by OpenBabel. Specifically, the MDL MOL file for- the Python implementation. The relationship between
mat [10] and the SMILES format [5,6] are shared by all these Cinfony modules and the underlying cheminfor-
three toolkits, and are used by Cinfony to exchange infor- matics libraries is summarised in Figure 1.
mation on molecules with 2D or 3D coordinates (MOL
file format), and no coordinates (SMILES format), respec- pybel and jybel
tively. OpenBabel provides SWIG [12] bindings for both CPy-
thon and Java (among other languages). pybel is a wrapper
By using existing file formats rather than trying to inter- around the CPython bindings, and has previously been
convert the internal models themselves, Cinfony takes described in detail [7]. jybel is an implementation of the
advantage of the existing input/output code of each Cinfony API that allows the user to access OpenBabel
toolkit which is well-tested and mature. In addition, the from Jython using the Java bindings. Despite the fact that
Page 4 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
19. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
rdkit
Support for Python scripting has been part of the design
of the RDKit from the start. The Python bindings in RDKit
were created using Boost.Python [14], a framework for
interfacing Python and C++. The Cinfony module rdkit
uses these bindings to implement its API. It is currently
not possible to access RDKit from Jython. RDKit has only
preliminary support for Java bindings; when these are
complete, a corresponding module will be added to Cin-
fony.
Dependency handling
A fully-featured installation of Cinfony relies on a large
Figure 1
Relationship of Cinfony modules to Open Source toolkits number of open source libraries. In particular, the 2D
Relationship of Cinfony modules to Open Source depiction capabilities introduce dependencies on several
toolkits. Python modules are accessible from CPython graphics libraries which may be problematic to install on
(green), Jython (pale blue), or both (striped green and pale a particular platform (Cairo and its Python bindings,
blue). Java libraries are indicated by dark blue, while C++ Python Imaging Library, AGG and the Python wrapper
libraries are yellow. AggDraw). With this in mind, Cinfony treats all depend-
encies as optional and only raises an Exception if the user
calls a method or imports a module that requires a miss-
ing dependency.
jybel is used from a Java implementation of Python, and For example, the Python Imaging Library (PIL) is required
accesses a C++ library through the Java Native Interface for displaying a 2D depiction on the screen. If all of the
(JNI), the jybel code differs from pybel in very few respects. components of cinfony are installed except for PIL, Cin-
In Jython, it is not possible to iterate directly over the fony works perfectly except that an Exception is raised if
wrapped STL vectors used by OpenBabel as their Java the Molecule.draw() method is called with show = True
SWIG bindings do not implement the Iterable interface. (the default). The image can however be written to a file
Also, the current Jython implementation is 2.2 and does without problems (show = False, filename =
not support generator expressions, which were introduced "image.png"). Similarly, if a user is only interested in
in Python 2.4. Although both C++ and Python have the using the CDK and the RDKit, it is not necessary to install
concept of a global function or variable, this is not the OpenBabel.
case in Java. SWIG places such functions, and get/set
methods for accessing the variables, in a special class Full installation instructions for Windows, MacOSX and
named openbabel. Global constants are placed in another Linux are available from the Cinfony website. It should be
class called openbabelConstants. A convenience module, noted that for Windows users, there is no need to compile
obabel, is provided which automatically imports the or search for missing libraries as the dependencies are
appropriate module depending on the Python implemen- included as binaries in the Cinfony distribution.
tation.
Results
cdkjpype and cdkjython Cinfony API
Since Jython runs on top of the Java Virtual Machine The original Pybel API was designed to make it easy to use
(JVM), it can access Java libraries such as the CDK OpenBabel to perform the most common tasks in chem-
natively. To access Java libraries from CPython, the informatics and to do so using idiomatic Python. Subse-
Python library JPype [13] is needed. This starts an instance quently, we realised that the resulting API could be
of the JVM and uses the JNI to communicate back and considered a generic API for wrapping the core function-
forth. Overall, the differences between the two wrappers ality of any cheminformatics toolkit. Cinfony implements
are minor. Jython and JPype differ in the syntax used to an extended version of the original Pybel API for the CDK
handle Java exceptions. Also, JPype returns unicode and the RDKit, as well as OpenBabel. While the original
strings from the CDK and these need to be converted to Pybel was restricted to CPython, Cinfony can also be used
regular strings (otherwise problems arise if they are passed from Jython to access the CDK and OpenBabel.
to an OpenBabel method expecting a std::string). The
appropriate CDK wrapper, cdkjpype or cdkjython, will be Cinfony helps cheminformaticians avoid the steep learn-
imported if the user imports the convenience module cdk. ing curve associated with starting to use a new toolkit.
Page 5 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
20. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
With Cinfony, all of the core functionality of the toolkits targetfp = targetmol.calcfp()
can be accessed with the same interface. For example, in
Cinfony, a molecule can be created from a SMILES string output = cdk.Outputfile("sdf", "similar
with: mols.sdf")
mol = toolkit.readstring("smi", SMI for mol in cdk.readfile("sdf", "input
LESstring) file.sdf"):
RDKit fp = mol.calcfp()
mol = Chem.MolFromSmiles(SMILESstring) if fp | targetfp >= 0.7:
OpenBabel output.write(mol)
mol = openbabel.OBMol() output.close()
obconversion = openbabel.OBConversion() Alternatively, we could just have made a single change to
the original script, by replacing the import statement from
obconversion.SetInFormat("smi") "import pybel" with "from cinfony import cdk as pybel".
obconversion.ReadString(mol, SMI Using Cinfony to combine toolkits
LESstring) Another goal of Cinfony is to make it easy to combine
toolkits in the same script. This allows the user to exploit
CDK the complementary capabilities of different toolkits
(Table 1). For example, let's suppose the user wants to (1)
builder = cdk.DefaultChemObject convert a SMILES string to 3D coordinates with OpenBa-
Builder.getInstance() bel, then (2) create a 2D depiction of that molecule with
the RDKit, next (3) calculate descriptors with the CDK,
sp = cdk.smiles.SmilesParser(builder) and finally (4) write out an SDF file containing the
descriptor values and the 3D coordinates. The full Python
mol = sp.parseSmiles(SMILESstring) script is only seven lines long:
The RDKit was designed with Python scripting in mind, from cinfony import rdkit, cdk, obabel
and of the three toolkits is the most concise. On the other
hand, OpenBabel uses a characteristically C++ approach. mol = obabel.readstring("smi", "CCC=O")
An empty molecule is created, and is passed to an OBCon-
version instance as a container for the molecule read from mol.make3D()
the SMILES string. The SmilesParser in the CDK requires
an instance of an object implementing the IChemObject- rdkit.Molecule(mol).draw(show = False,
Builder interface. filename = "aldehyde.png")
Another advantage of a common API is that a script writ- descs = cdk.Molecule(mol).calcdesc()
ten for one toolkit can easily be modified to use another.
As an example, here is a script that selects molecules that mol.data.update(descs)
are similar to a particular target molecule. This script is
taken from the original Pybel paper [7], but uses the CDK mol.write("sdf", filename = "alde
instead of OpenBabel and will run equally well from hyde.sdf")
Jython and CPython. The only differences compared to
the original script are that "pybel" has been replaced with For cheminformaticians interested in developing QSAR or
"cdk", and the import statement has been changed from QSPR models, Cinfony can be used to simultaneously cal-
"import pybel": culate descriptors from the RDKit, the CDK and OpenBa-
bel. For example, the following script reads a multiline
from cinfony import cdk input file, with each line consisting of a SMILES string fol-
lowed by a property value. For each molecule, it calculates
targetmol = cdk.readfile("sdf", "target all of the OpenBabel, RDKit and CDK descriptors (except
mol.sdf").next() for CDK's CPSA) and writes out the results as a tab-sepa-
Page 6 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
21. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
rated file suitable for reading with the statistical package R print >> outputfile, "t".join(["Prop
[15]. Note that in this example script, if descriptors share erty"] + descnames)
the same name only one is retained. This is the case for the
TPSA descriptor in OpenBabel, which is replaced by the for smile, propval, desc in zip(smiles,
RDKit's TPSA descriptor. propvals, descs):
import string descvals = [str(desc[descname]) for
descname in descnames]
from cinfony import obabel, cdk, rdkit
print >> outputfile, "t".join([smile,
# Read in SMILES strings and observed prop str(propval)] +
erty values
descvals)
smiles, propvals = [], []
outputfile.close()
for line in open("data.txt"):
Performance
broken = line.rstrip().split() Accessing cheminformatics libraries using Cinfony allows
the user to rapidly develop scripts that manipulate chem-
smiles.append(broken [0]) ical information. However, there is a small price to be
paid. Firstly, there is the cost of moving objects across the
propvals.append(float(broken)) interface between Python and the cheminformatics librar-
ies. Secondly, the additional code required by Cinfony to
mols = [obabel.readstring("smi", smile) implement a standard API may slow performance further.
for smile in smiles]
To assess the performance penalty for accessing chem-
# Calculate descriptor values using informatics toolkits using Cinfony rather than directly in
OpenBabel, the native language, we looked at two simple test cases:
(1) iterating over an SDF file containing 25419 molecules,
# the CDK (apart from 'CPSA') and the RDKit (2) iterating and printing out the molecular weight of
each of the molecules. The SDF file used was 3_p0.0.sdf,
cdkdescs = [x for x in cdk.descs if x != the first portion of the drug-like subset of the ZINC 7.00
'CPSA'] dataset [16]. The Cinfony scripts, Java and C++ source
code are available as Additional file 2. The results are
descs = [] shown in Table 3.
for mol in mols: While accessing the CDK using Jython is almost as fast as
a pure Java implementation, there is a considerable over-
d = mol.calcdesc() head associated with using JPype to access the CDK from
CPython (89% slower for the second test case). This over-
d.update(cdk.Molecule(mol).calcdesc(cd head is due to passing objects between the JVM and CPy-
kdescs)) thon. For OpenBabel, there is little performance cost
associated with accessing OpenBabel from either imple-
d.update(rdkit.Molecule(mol).calcdesc( mentation of Python, although the jybel scripts are some-
)) what slower than pybel scripts. A small portion of this
speed difference can be attributed to a slower startup
descs.append(d) (about 1.6 seconds for jybel, compared to 0.8 seconds for
pybel). Finally, from the RDKit results in Table 3, it is clear
# Write a file suitable for 'read.table' that using Boost.Python to wrap a C++ library is more effi-
in R cient than using SWIG. The difference in run times
between the C++ and Python implementations is negligi-
outputfile = open("inputforR.txt", "w") ble.
descnames = sorted(descs [0].keys(), key = In practice, the performance of a particular Cinfony script
string.lower) will depend on the extent to which information is passed
Page 7 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
22. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
Table 3: Performance of Cinfony modules compared to a native Java or C++ implementation.
Iterate over SDF Iterate and calculate molecular weight
CDK Time (s) Normalised Time (s) Normalised
Native Java 21.2 1.00 36.8 1.00
cdkjython 23.1 1.09 41.6 1.13
cdkjpype 33.0 1.57 69.5 1.89
OpenBabel
Native C++ 31.9 1.00 43.0 1.00
pybel 34.1 1.07 45.1 1.05
jybel 38.0 1.19 49.6 1.15
RDKit
Native C++ 99.7 1.00 100.7 1.00
rdkit 99.9 1.00 101.0 1.00
The times reported are wallclock times from the best of three runs on a dual-core Intel Pentium 4 3.2 GHz machine with 1GB RAM.
back and forth between Python and the underlying Java or ticomponent molecules. For each molecule, PubChem
C++ library. Where most of the time is spent on computa- provides an SDF file containing coordinates for a 2D
tion in the underlying library, the speed difference depiction, as well as the depiction itself as a PNG file.
between a native implementation and one using Cinfony PubChem uses the CACTVS toolkit [18] to generate the
is expected to be small. 2D coordinates as well as the corresponding depiction.
Using a script similar to the following, we used Cinfony to
Comparison of toolkits generate 2D depictions using OASA (the depiction library
Cinfony makes it easy to compare the results obtained by used by pybel), the CDK and a development version of
different toolkits for the same operations. This can be use- RDKit that all use the same 2D coordinates taken from the
ful in identifying bugs, applying a test suite, or finding the SDF file:
strengths and weaknesses of particular implementations.
For example, where different toolkits calculate the same from cinfony import pybel, rdkit
descriptors, if the calculated values are not highly corre-
lated it may indicate a bug in one or the other. Earlier, we for toolkit in [rdkit, pybel]:
mentioned that a difference in the treatment of implicit
hydrogens causes different toolkits to give different values name = toolkit.__name__
for molecular weight unless hydrogens are explicitly
added. Ensuring that a particular result is in agreement for mol in toolkit.readfile("sdf",
with that obtained by another toolkit can act as a sanity "dataset.sdf"):
check in such instances to avoid errors.
mol.draw(filename = "%s_%s.png" %
When carrying out the same operation with several (mol.title, name),
toolkits, it is often convenient to iterate over the toolkits
in an outer loop: show = False,
from cinfony import obabel, rdkit, cdk usecoords = True)
for toolkit in [obabel, rdkit, cdk]: When the resulting images were compared for the
PubChem entry CID7250053, an error was found in the
print toolkit.readstring("smi", depiction of the stereochemistry of an isopropyl group
"CCC").molwt (Figure 2). Since the error only occurred in certain cases, it
had not been previously noticed and would have been dif-
As an example of how such comparisons can be used to ficult to identify without such a comparative study. Once
identify bugs in toolkits, let us consider depiction. As a reported, the problem was quickly solved and the subse-
dataset, we randomly chose 100 molecules from quent RDKit release depicted the stereochemistry cor-
PubChem [17], with subsequent filtering to remove mul- rectly. A comparison of depictions by commercial toolkits
Page 8 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
23. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
Other requirements: OpenBabel, CDK, RDKit, Java,
OASA, JPype, Python Imaging Library
License: BSD
Any restrictions to use by non-academics: None
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
NMOB conceived and developed Cinfony. GRH is the
lead developer of OpenBabel and created the Python and
Java SWIG bindings. All authors read and approved the
final manuscript.
Additional material
Additional file 1
Miniwebsite API. A mini-website of the Cinfony API documentation.
Click here for file
[http://www.biomedcentral.com/content/supplementary/1752-
Figure
different2toolkits
Comparison of depictions of PubChem CID7250053 using 153X-2-24-S1.zip]
Comparison of depictions of PubChem CID7250053
using different toolkits. The depiction using the develop- Additional file 2
ment version of RDKit showed incorrect stereochemistry Timing Code. A zip file containing Python, Java and C++ code used for
for the isopropyl substituent of the thiazole ring. run time comparisons for two test cases.
Click here for file
[http://www.biomedcentral.com/content/supplementary/1752-
153X-2-24-S2.zip]
and depictions generated by Cinfony is available here (see
Additional file 3). Additional file 3
Miniwebsite Depictions. A mini-website showing a comparison of the
Conclusion depictions generated by several cheminformatics toolkits.
Cinfony makes it easy to combine complementary fea- Click here for file
[http://www.biomedcentral.com/content/supplementary/1752-
tures of the three main Open Source cheminformatics
153X-2-24-S3.zip]
toolkits. By presenting a standard simplified API, the
learning curve associated with starting to use a new toolkit
is greatly reduced, thus encouraging users of one toolkit to
investigate the potential of others.
Acknowledgements
Cinfony would not be possible without the work of many Open Source
Cinfony is freely available from the Cinfony website [19], projects. In particular, we thank several developers who responded quickly
both as Python source code and as a Windows distribu- to bug reports or queries: Beda Kosata (OASA), Greg Landrum (RDKit),
tion containing dependencies. Installation instructions Tim Vandermeersch (OpenBabel), Steve Ménard (JPype). Thanks also to
are provided for MacOSX, Linux and Windows. Gilbert Mueller and Chris Morley for feedback on installing Cinfony.
NMOB thanks Google Code for providing free web hosting and develop-
ment tools for Cinfony. We thank the anonymous reviewers for several
Availability and requirements
useful suggestions.
Project name: Cinfony
References
Project home page: http://cinfony.googlecode.com 1. OpenBabel v.2.2.0 [http://openbabel.org]
2. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E:
Operating system(s): Platform independent Recent Developments of the Chemistry Development Kit
(CDK) – An Open-Source Java Library for Chemo- and Bio-
informatics. Curr Pharm Des 2006, 12:2110-2120.
Programming language: Python, Jython 3. Landrum G: RDKit. [http://www.rdkit.org].
4. Murray-Rust P, Rzepa HS: Chemical Markup, XML, and the
Worldwide Web. 1. Basic Principles. J Chem Inf Comput Sci 1999,
39:928-942.
Page 9 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)
24. Chemistry Central Journal 2008, 2:24 http://journal.chemistrycentral.com/content/2/1/24
5. Apodaca R, O'Boyle N, Dalke A, Van Drie J, Ertl P, Hutchison G,
James CA, Landrum G, Morley C, Willighagen E, De Winter H:
OpenSMILES. [http://www.opensmiles.org].
6. Daylight Chemical Information Systems Manual [http://
www.daylight.com/dayhtml/doc/theory/theory.smiles.html]
7. O'Boyle NM, Morley C, Hutchison GR: Pybel: a Python wrapper
for the OpenBabel cheminformatics toolkit. Chem Cent J 2008,
2:5.
8. Kosata B: OASA. [http://bkchem.zirael.org/oasa_en.html].
9. Raymond ES: The Art of UNIX Programming 2003 [http://www.catb.org/
~esr/writings/taoup/index.html]. Reading, MA: Addison-Wesley
10. Symyx CTfile formats [http://www.mdli.com/downloads/public/
ctfile/ctfile.jsp]
11. KNIME – Konstanz Information Miner [http://knime.org]
12. SWIG v.1.3.36 [http://www.swig.org]
13. Ménard S: JPype. [http://jpype.sf.net].
14. Boost.Python [http://www.boost.org/libs/python/doc/]
15. R development core team: R: A language and environment for
statistical computing. [http://www.R-project.org].
16. Irwin JJ, Shoichet BK: ZINC – A Free Database of Commercially
Available Compounds for Virtual Screening. J Chem Inf Model
2005, 45:177-182.
17. PubChem [http://pubchem.ncbi.nlm.nih.gov/]
18. CACTVS Chemoinformatics Toolkit: Xemistry GmbH: Lah-
ntal, Germany. .
19. O'Boyle NM: Cinfony. [http://cinfony.googlecode.com].
Publish with ChemistryCentral and every
scientist can read your work free of charge
Open access provides opportunities to our
colleagues in other parts of the globe, by allowing
anyone to view the content free of charge.
W. Jeffery Hurst, The Hershey Company.
available free of charge to the entire scientific community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright
Submit your manuscript here:
http://www.chemistrycentral.com/manuscript/
Page 10 of 10
Chem. Cent. J. 2008, 2, 24. (page number not for citation purposes)