This is a copy of an invited talk I gave at the ACS meeting in Dallas in March 2014. The talk was about the impact of information technology on chemistry and related sciences. I interpreted 'information technology' broadly and divided the talk into three sections: Data, Simulation and Sociology.
'Data' talks about how chemical information has grown exponentially and how chemists are coming up with new techniques to store, organize and understand this information.
'Simulation' talks about how chemists are using the last two decades' spectacular progress in hardware and software to understand the behavior of molecules in a variety of applications ranging from drug design to new materials.
'Sociology' talks about the impact of blogs and social media on the practice of chemistry. More specifically I talk about how social media is serving as a 'second tier' of peer review and how this new medium is having an increasingly influential impact on many issues close to chemists' hearts including lab safety, 'chemophobia' and the public appreciation of chemistry.
2. About me
• Medicinal and computational chemist working in the biopharma
industry in Cambridge, MA.
• Blogger “The Curious Wavefunction”
• Contact:
- Blog: http://wavefunction.fieldofscience.com
- Twitter: @curiouswavefn
- Email: curiouswavefunction@gmail.com
3. Two kinds of scientific revolutions
• Idea-driven (Kuhn): physics (quantum theory), astronomy
(expanding universe), biology (evolution)
• Tool-driven (Galison): engineering (transistor), biology
(sequencing), astronomy (telescope).
Thomas
Kuhn
The Structure of
Scientific Revolutions
(1962)
Image and Logic
(1997)
Peter
Galison
Chemistry as an experimental science has benefited
much more from tool-driven revolutions.
5. Our latest (and greatest) tool
The Computer
“I
think
it's
fair
to
say
that
personal
computers
have
become
the
most
empowering
tool
we've
ever
created.
They're
tools
of
communica=on,
they're
tools
of
crea=vity,
and
they
can
be
shaped
by
their
user.”
–
Bill
Gates
6. A brief history of computers in chemistry
• 1950s: Driven by quantum chemistry and crystallography.
• Early efforts needed access to centralized machines, travel. Computations
enormously expensive: 1.5 years (1959) vs one day (2014).
Punched
card
(2014)
Punched
card
(1960)
UNIVAC
1:
1.5
yrs
to
calculate
12
molecules
Apple
MacBook
Air:
4
hours
for
same
calcula:on
• 1958: Moore’s Law; doubling of transistors every two years.
• 1970s: Use of computers started becoming routine. Still slow.
• 1990s: Exponential developments in desktop computing, software, internet.
• 2000s:Applications to biology, materials science become routine.
7. How have computers affected chemistry?
• Publications: ~25 major journals, also described in others.
• Companies: Schrodinger, OpenEye, CCG, Perkin-Elmer etc.
• Conferences: Gordon Conference, IAQMS.
• ACS Division of Computers in Chemistry.
• Awards:ACS Award for Computers in Chemistry.
9. Data
“It
is
a
capital
mistake
to
theorize
before
one
has
data.”
-‐-‐
Arthur
Conan
Doyle
(“Sherlock
Holmes:
A
Scandal
in
Bohemia”)
10. Chemical data has grown exponentially
Growth
of
the
Cambridge
Structural
Database
(Image:
CSD)
Why? Better tools to determine and record structures, properties.
Data repositories have enabled easy and
instant global access to data.
• Chemical Abstracts Service
(CAS): 75 million registered
substances.
• Protein Data Bank (PDB):
97, 000 protein structures.
• Cambridge Structural
Database (CSD): 40, 000
added every year.
• Scifinder, Google Scholar.
11. Standardization
• Chemical structure representation: drawing, manipulation.
Standard, multiple compressed file formats (eg. SMILES strings),
error-free sharing of data.
• E-Notebook: Standardized and safe record keeping, organization,
analysis and visualization.
ChemDraw
SMILES
Data is easier to compare, verify and reproduce.
13. Visualization
• Instant visualization of data in various forms, user-friendly presentation;
eg. Spotfire, instant Jchem etc.
• Tools ranging from basic plots to advanced, on-the-fly statistical analysis
(eg. principal component analysis, regression) now available.
• Instant comprehension of complex biomolecular and inorganic structures
(eg. Pymol).
Much easier to make sense of data and property relationships.
14. Software for chemical analysis
• What do you use software for? Analytical, spectroscopic,
purification?
• Advanced techniques now more easily accessible.
• Enormous savings in time and labor.
NMR
Crystallography
GC-‐MS
Ubiquitously affected everyday chemical research and the
work of bench chemists.
15. Using data intelligently: Cheminformatics
• Applying tools from informatics and computer science to extract
meaning from data.
• Most common problems: Searching, finding trends, correlating
chemical structures to various properties (descriptors).
If
only
all
correla:ons
were
this
good…
16. Case Study I: Similarity searching
- Simplified representations (eg. bit strings) make searches of
millions of molecules very fast
- Tanimoto similarity: Efficient, can be calculated for any property.
- Drug side effects similarity prediction especially promising.
Tanimoto
similarity
between
molecules
J.
Med.
Chem.,
2010,
53,
4830
Drug
side
effects:
Nature
Biotechnology
2007,
25,
197
17. Case Study II: Diversity analysis
• Humans are pattern-seeking; often ignore diversity to focus on
similarity.
• Maximizing diversity = Maximize probability of finding new
molecules with novel properties.
• Create molecular libraries of millions of compounds; screening
collections for drug discovery, materials science etc.
Shape
diversity:
Nat.
Chem.
Biol.
2012,
8,
358
Voltage
vs
safety
of
Li-‐ion
ba8eries:
Nat.
Mat.
2013,
12,
191
18. Simulation and Analysis
“Nobody
believes
a
theore=cal
result,
except
the
person
who
calculated
it.
Everybody
believes
an
experimental
result,
except
the
person
who
measured
it.”
-‐-‐
Paul
Labute
(Chemical
Compu=ng
Group)
21. Major applications: QM and MM
• Quantum chemistry made computers; computers made quantum
chemistry.
• Molecular mechanics: Classical mechanics applied to molecules.
• QM equations cannot be solved exactly. Need approximations,
iterative processing, and computing power.
• Useful for calculating many properties (energies, dipole moments,
reactivity).
Poten:al
energy
surface
for
chemical
reac:ons
Fullerene
from
graphene:
Nat.
Chem.
2010,
2,
450
22. The 2013 Nobel Prize
• Tradeoff: Quantum mechanics (QM) - accurate but expensive.
Molecular mechanics (MM) – inaccurate but cheap.
• QM/MM: Best of both worlds, multiscale.
• Applicable to large biological systems (proteins, DNA), extended
materials (zeolites, polymers).
24. Molecular Dynamics
• Molecular Dynamics (MD): Newton’s laws of motion applied to
molecules, millions of steps; large amounts of data.
• Parallel processing, special-purpose machines allow MD to surpass
Moore’s Law.
• Simulations approaching biological timescales becoming routine.
25. Knowledge-Based Protein Folding
• Knowledge-based protein structure prediction:Taking advantage of
existing information in PDB to predict folded structures.
• Use advanced statistical methods based on PDB data for assigning
probabilities to various solutions: Rosetta.
• Outstanding success in CASP (Critical Assessment of Protein
Structure).
"The
amazing
thing
is
that
Rose1a
had
31
points
and
the
next
best
group
had
8
points.
It
is
like
baseball
in
1927,
when
Babe
Ruth
hit
60
home
runs
and
the
runner
up
hit
14,
and
en:re
teams
didn't
hit
as
many
as
he
did”.
–
Peter
Kollman
(UCSF),
CASP
2000.
Overlap
between
predicted
(red)
and
experimental
(green)
protein
structures
26. Protein design
• Protein design: Given a structure, find alternative sequences.
• Uses of alternative sequences: Enzymes catalyzing new reactions, new
small molecule-binding proteins (eg. for environmental cleanups).
• 2003: First protein designed entirely de novo.
• 2008: First enzyme catalyzing reaction with no natural precedent.
• As PDB grows, protein design becomes better.
Top7:
Protein
designed
from
scratch.
(Science,
2003,
302,
1364)
Kemp
eliminase
enzyme
from
scratch
(Nature,
2008,
453,
190)
27. Structure-Based Drug Design
• Predict structure of drug bound to protein, suggest modifications to
improve properties.
• Combination of crystallography data and simulation.
• Outstanding success in some areas: eg. HIV protease inhibitors against
AIDS.
Impact
of
addi:on
of
HIV
protease
inhibitors
to
an:retroviral
therapy
among
AIDS
pa:ents
in
San
Francisco
(Am
J
Epidemiol.
152,
2,
2000)
HIV
protease
bound
to
indinavir
Katharine
Holloway
28. The wisdom of crowds (and clouds)
• FoldIt: Computer game to solve
protein folding and design problems.
• Led to HIV protein structure and
algorithm discovery.
PNAS,
2011,
108,
18949
Comparison
of
Folding@Home
with
leading
supercomputers
• Distributed computing,
Folding@Home: 100 million hours
logged on Nintendo PS3, also enabled
on cloud.
• Used to study folding of proteins
involved in cancer, Alzheimer’s
disease; drug design.
30. New materials for the new millennium
• Based on Density Functional Theory (Nobel Prize 1998).
• Application of materials simulations and computational screening:
- Hydrogen storage (metal-organic frameworks)
- Photovoltaics and solar cells
- Alloys and new materials for batteries
- Semiconductor design
31. The age of biology
• Human Genome Project: Computers made it possible.
• Sequencing has greatly surpassed Moore’s Law. New techniques;
IonTorrent, Nanopore etc.
• Computational Biology and Bioinformatics: Comparing genomes,
predicting diseases, mapping ancestral differences.
• Aided by massive amounts of data: GenBank, Cancer Genome,
Ensembl, UniProt etc.
• Ripe territory for Big Data and new informatics techniques.
32. Sociology
“The
democra=za=on
of
informa=on
and
exper=se
that
springs
from
the
world
wide
web,
and
the
power
of
groups
of
mo=vated
amateurs
to
strike
out
on
their
own
in
technical
subjects,
is
weakening
the
authority
of
“experts”
in
society.”
-‐-‐
George
Whitesides.
33. The chemical blogosphere
• Chemistry blogs took off in 2002, initially focused on research, grad
school hijinks.
• Quickly diversified; peer review, job market, academic culture, publishing,
issues in industry, safety culture.
Derek
Lowe:
drug
discovery,
industry
Chemjobber:
The
Job
Market,
safety
culture,
industry
Paul
Bracher:
academic
culture,
peer
review
Ash
Jogalekar:
Nature
and
evolu:on
of
chemistry,
peer
review
SeeArrOh:
Chemophobia,
food,
peer
review
C&EN
official
blog
James
Ashenhurst:
Org
Chem
tutoring
34. What are blogs good for?
Peer Review 2.0
• Timely, democratic review of
latest research.
• Interesting research highlighted
immediately.
• Critiqued by large audience.
• Self-selecting.
• Instrumental in spotting:
fallacious research, self-
plagiarism, dubious
methodologies and fabrication.
Non-research contributions
• Lab safety (C&EN).
• Academic culture (Chembark).
• The (sad) state of the job market
(Chemjobber).
• Representation of women, minorities
(Dr Rubidium).
• Chemophobia, Industry (SeeArrOh,
Derek Lowe).
35. Peer Review 2.0: case study I
• First reported instance of comprehensive informal peer review of chemical
literature.
• 2006: 37 step synthesis of hexacyclinol described in single-author paper in
Angew. Chem. by James LaClair (Xenobe Institute).
• Commenter on blog of Stanford grad student Dylan Stiles points out
inconsistencies in structure, others weigh in and point out many more.
• Other official papers refute data, suggest alternative structure.
• Extensive discussion of problems with paper on multiple blogs, hundreds of
comments. Paper retracted in 2012, long after problems were clear.
“The
proof
is
in
the
product”.
36. Peer Review 2.0: case study II
• April 2012: Paper in JACS on amino acid chirality and origin of life.
• Two issues: Bad scientific communication and charges of self-plagiarism.
• Extensive similarities with two previous articles highlighted by Nature
Chemistry editor Stuart Cantrill exclusively on Twitter.
• Paper retracted in May 2012.
• Case illustrates peer-review operating entirely outside formal channels.
ACS
Press
Release:
“New
scien:fic
research
raises
the
possibility
that
advanced
versions
of
T.
rex
and
other
dinosaurs
—
monstrous
creatures
with
the
intelligence
and
cunning
of
humans
—
may
be
the
life
forms
that
evolved
on
other
planets
in
the
universe.”
Photo
uploaded
by
Stuart
Cantrill
on
Twi1er
37. Chemists:
Embrace
open
access
• Open-access, arXiv@Cornell
– ASAP publishing
– Open access
– Instant and free peer review by large community
• Chemical community less open to sharing and arXiv-style
publishing?
• Cultural differences between various scientific communities
(eg. particle physicists vs total synthesis chemists).
39. Challenges and promises
• Data:
- Bigger, better annotated databases with quality control.
- Statistics becoming more useful and appreciated.
- Greater awareness of data mining tools among
experimental chemists.
• Simulation:
- Long molecular dynamics simulations approaching
realistic timescales.
- Insights from network theory used in synthetic planning.
- First quantum chemistry calculation on quantum
computer (2010).
- Better statistical validation of results, quality control.
40. Challenges and promises
• Sociology
- More open access journals, more open access options.
- Widespread publicity of research results.
- Discussion, criticism on blogs being taken seriously.
Retraction Watch.
- Better use of multimedia (Twitter, Skype, podcasts).
- Cultural changes:
- More cross-talk between chemists, statisticians and
computer scientists.
- More cross-talk between academia and industry.
- Willingness to share data, code, experimental results.
- Willingness to present and discuss negative data.
41. But…be afraid of the hype
Fortune
Magazine,
October
1981
• Jetpacks
• Artificial intelligence
• Nuclear fusion
• Robot maids
In
30
years…
(1950-‐2014)
42. Translating hype into reality
• Fearlessness; ability to jump across boundaries, question
received wisdom.
• Resilience; ability to bounce back from failure.
• Adaptability; ability to welcome change.
• Teamwork; ability to collaborate and share.
• Imagination; ability to think outside the box.
43. The future of information technology in
chemistry…
…is us
“The
best
way
to
predict
the
future
is
to
invent
it.”
–
Alan
Kay.
“Be
the
change
that
you
wish
to
see
in
the
world.”
–
Gandhi.