Defense -- thesis: “Mapping Genotype to Phenotype using Attribute Grammar.”
PhD degree in Genetics, Bioinformatics and Computational Biology (GBCB) in the tracks of Computer Science, Mathematics and Life Sciences.
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
1. MAPPING GENOTYPE TO
PHENOTYPE USING ATTRIBUTE
GRAMMAR
LAURA ADAM
Genetics, Bioinformatics and Computational Biology Program
ADVISORY COMMITTEE:
Jean Peccoud (Chair)
David Bevan, Harold Garner, François Képès, Naren Ramakrishnan, John Tyson
Doctoral Dissertation Defense
Thursday, July 25th, 2013
2. Outline
Introduction:
– Synthetic Biology
– Computer Assisted Design
Design:
– GenoCAD: formal language to design biology
– Domain Specific Language: Grammar editor
Simulation:
– Attribute Grammars to compile a mathematical model
– Design Genotype to Phenotype languages
Build:
– Biosecurity questions
Conclusion
2
4. Synthetic Biology is:
• The design and construction of new
biological parts, devices, and systems
• Engineering standard biological parts, circuits
• Chemical synthesis of DNA
• The re-design of existing, natural
biological systems for useful purposes or
understanding their properties
• Minimal genome/minimal life
• Synthetic cells/protocells from scratch
• Orthogonal biological systems
4
5. Applications of Synthetic Biology
Khalil, A. S., & Collins, J. J. (2010). Synthetic biology: applications come of age. Nature reviews. Genetics, 11(5), 367–79.
doi:10.1038/nrg2775
http://syntheticbiology.org/
“[…] change our lives over the coming years, leading to cheaper drugs, 'green' means to fuel our
cars and targeted therapies for attacking 'superbugs' and diseases, such as cancer.”
5
6. Designing Biology
• No established design
methodology
• Attempts to mimic others
engineering disciplines
(electrical engineering –
logic gates; assembly
language, etc.)
Andrianantoandro, E., Basu, S., Karig, D. K., & Weiss, R.
(2006). Synthetic biology : new engineering rules for an
emerging discipline. Molecular Systems Biology, 1–14.
doi:10.1038/msb4100073
Creating and analyzing new biological systems
Gardner, T. S., Cantor, C. R., & Collins, J. J. (2000).
Construction of a genetic toggle switch in Escherichia
coli. Nature, 403(6767), 339–42. doi:10.1038/35002131
Elowitz, M. B., & Leibler, S. (2000). A synthetic
oscillatory network of transcriptional regulators.
Nature, 403(6767), 335–338. doi:10.1038/35002125
Gibson, D. G. D. G., Glass, J. I. J. I. J. I., Lartigue, C.,
Noskov, V. N. N. V. N., Chuang, R.-Y. Y., Algire, M. A. M.
a., … Venter, J. C. (2010). Creation of a Bacterial Cell
Controlled by a Chemically Synthesized Genome.
Science, 329(5987), 52–6. doi:10.1126/science.1190719
Ro, D. K. et al. Production of the antimalarial drug
precursor artemisinic acid in engineered yeast. Nature
440, 940–943 (2006).
How to
Scale up the complexity
of design in Synthetic Biology?6
7. Designing Biology?
• Ad hoc, getting too
complex to be done by
hand, error prone,
lengthy (hence, costly)
H. Koeppl et al. (eds.), Design and Analysis of Biomolecular Circuits: Engineering
Approaches to Systems and Synthetic Biology, DOI 10.1007/978-1-4419-6766-4 10,
7
Purnick, P. E. M., & Weiss, R. (2009). The second wave of synthetic
biology: from modules to systems. Nature Reviews Molecular Cell
Biology, 10(6), 410–422. doi:10.1038/nrm2698
Need for a Design Methodology
Assist in the process of:
• designing a system with a DESIRED BEHAVIOR
• constructing its physical realization
8. Design process in
Synthetic Biology
Design
• Databases
• CAD tools
Analyze
• Computational
tools
Build
• Assembly of parts
• DNA synthesis
8
11. CAD for Synthetic Biology
• Top-down: specify a behavior (high level) constructs
– Lack of quantitative parameters data for parts behavior (see
BioFAB or Matt Lux’s thesis)
• Bottom-up: design a construct from parts predict its behavior
• Assembly of biological parts, modules, circuits
– Databases, Biobricks™ repository
– aggregated parts sequences physical DNA of construct
11
12. CAD tools – design a sequence
• Graphic interface (drag and drop / point and click parts)
– Clotho
– GenoCAD
– SynbioSS
• Programming language-like / text-based
– GEC
• Network diagram
– TinkerCell
– ProMoT
• Chemical equations
– Antimony
12
How to
Represent a genetic construct?
13. SYNTHETIC BIOLOGY OPEN
LANGUAGE (SBOL)
13
Laura Adam (Virginia Bioinformatics Institute), Aaron Adler (BBN Technologies), J.
Christopher Anderson (Dept of Bioengineering, University of California Berkeley), Jacob
Beal (BBN Technologies), Matthieu Bultelle (Bioengineering, Imperial College London),
Kevin Clancy (Life Technologies), Kendall G. Clark (Clark & Parsia, LLC.), Douglas
Densmore (Electrical and Computer Engineering, Boston University), Omri Drory
(Genome Compiler), Drew Endy (BIOFAB and Dept of Bioengineering, Stanford
University), John H. Gennari (Biomedical and Health Informatics, University of
Washington), Raik Gruenberg (EMBL-CRG Systems Biology program, CRG), Jennifer
Hallinan (School of Computing Science, Newcastle University), Timothy Ham (Joint
BioEnergy Institute), Allan Kuchinsky (Agilent Technologies), Matthew W. Lux (Virginia
Bioinformatics Institute), Curtis Madsen (Electrical and Computer Engineering,
University of Utah), Akshay Maheshwari (UCSD), Barry Moore (Human Genetics,
University of Utah), Chris J. Myers (Electrical and Computer Engineering, University of
Utah), Carlos Olguin (Autodesk Research), Jean Peccoud (Virginia Bioinformatics
Institute), Hector Plahar (Joint BioEnergy Institute), Matthew Pocock (School of
Computing Science, Newcastle University), Cesar A. Rodriguez (BIOFAB), Nicholas
Roehner (Electrical and Computer Engineering, University of Utah), Vincent Rouilly
(Biozentrum, University of Basel), Trevor F. Smith (Agilent Technologies), Guy-Bart
Stan (Bioengineering, Imperial College London), Vinod Tek (Bioengineering, Imperial
College London), Alan Villalobos (DNA 2.0, Inc.), Mandy Wilson (Virginia Bioinformatics
Institute), Chris Winstead (Electrical and Computer Engineering Utah State University),
Anil Wipat (School of Computing Science, Newcastle University), and Fusun Yaman
Sirin (BBN Technologies).
14. Need for standards in Synthetic
Biology
14
• Core Data Model
• SBOL visual
• libSBOL
– (java, C, python)
Synthetic Biology Open Language (SBOL):
a data exchange standard for descriptions of genetic parts,
devices, modules, and systems
15. Participation in SBOL
• Workshops I attended:
– Blacksburg, Virginia on January 7-10, 2011
– San Diego, California on June 8, 2011
Manuscript in submission to Nature Biotechnology
• Won SBOL logo competition
15
17. DNA as a Language
What insights can we get from
computational linguistics? 17
18. Example of formal grammar
A grammar is a:
Set of rules describing how to form sentences from a language’s vocabulary
18
R1: Sentence → Subject + Verb + Object
R2: Subject → NounPhrase
R3: Object → NounPhrase
R4: NounPhrase → NounPhrase + Modifier
R5: Modifier → PrepositionalPhrase
21. Rule-based design of DNA
sequences
21
Construct Promoter Cistron Terminator
Cistron Cistron Cistron
Cistron RBS Gene
Terminator Terminator Terminator
Grammar rules indicates how categories can be arranged to form a ‘valid’ design, parts
from the library implement it.
28. A language for the chloroplast of
Chlamydomonas reinhardtii
• Project to design nitrogen fixating algae
• Gene expression in the chloroplast of
microalgae
– genomic sequences for targeting the
insertion of construct in the chloroplast
28
Chlamydomonas reinhardtii
29. Identify and Define Categories
Category Definition
5FLR / 3FLR 5’ / 3’ Flanking region for homologous recombination
SIS Short Interval Sequences used to make polycistronic cassettes
STP Stop codon
ATG Start codon
GEN Gene or protein domain. By convention does not include start and stop codons.
CDS Open reading frame composed of several protein domains. Does not include start and
stop codons.
TAG Epitope tags. By convention does not include Start or Stop codons.
PBS Sequence associated with the initiation of transcription and translation.
TCS Targeted expression cassette. Expression cassette flanked with two adjacent
genomic sequences for homologous recombination.
CAS Expression cassette delimited by a promoter in 5’ and a transcription terminator in 3’.
29
Category Definition
[ and ] Negative orientation delimiters
( and ) Plasmid delimiters
{ and } Chromosome delimiters
32. Define rewriting rules
Code Rule Comment
CAS S -> TCS This rule is used to design only one expression cassette
1PLAS S ->. ( VEC TCS ) This rule is used to specify the expression cassette along with the vector where it is
inserted. The output is the entire plasmid sequence.
2PLAS S ->. ( VEC TCS ) ( VEC TCS ) This rule is for designs that involve two plasmids.
TGS TCS -> 5FLR CAS 3FLR Specifies the flanking regions for homologous recombination.
PRCT CAS-> PBS CDS TER A gene expression cassette is composed of a promoter, open reading frame, and a
transcription terminator.
2CAS CAS -> CAS CAS This rule makes it possible to have more than one expression cassette on a
construct.
rCAS CAS -> [ CAS ] This rule is used to specify that the cassette is coded on the negative strand.
2CDS CDS -> CDS SIS CDS This rule makes it possible to design polycistronic constructs.
SGEN CDS -> ATG GEN STP The open reading frame is composed of a single gene flanked by a start and stop
codon.
TGEN GEN GEN TAG This rule is used to add a tag to a coding sequence. It can be used iteratively to
add more than one tag.
2GEN GEN-> GEN GEN This rule can be used to fuse two coding sequences that are not tags.
32
37. A dynamic language
• You can change the model …
– Add or Change a Rule in a Grammar
– Delete a Rule from a Grammar
– Remove a Part from a library
– Change a Part’s sequence
– Change a Part’s category
… but how does this affect the dependent
designs?
37
ATGGTGAGCAAGGGCGAGGAGAATAACA
TGGCCATCATCAAGGAGTTCATGCGCTTC
AAGGTGCGCATGGAGGGCTCCGTGAAC
GGCCACGAGTTCGAGATCGAGGGCGAG
GGCGAGGGCCGCCCCTACGAGGGCTTT
CAGACCGCTAAGCTGAAGGTGACC
ATGGTGAGCAAGGGCGAGGAGAATAACA
TGGCCATCATCAAGGAGTTCATGCGCTTC
AAGGTGCGCATGGAGGGCTCCGTGAAC
GGCCACGAGTTCGAGATCGAGGGCGAG
GGCGAGGGCCGCCCCTACGAGGGCTTT
CAGACCGCTAAGCTGAAGGTGACC
CAAGGGCGAGGAGAATAACATGGCCATC
ATCAAGGAGTTCATGCGCTTCAAGGTGC
GCATGGAGGGCTCCGTGAACGGCCACG
AGTTCGAGATCGAGGGCGAGGGCGAGG
GCCGCCCCTACGAGGGCTTTCAGACCGC
TAAGCTGAAGGTGACC
39. Different design statuses
39
Valid – the sequence could be decomposed into its
parts, and the parts’ categories make up a grammar-
sanctioned framework.
Needs validation – either grammar, part, or library has
changed, and the sequence has not been validated
since
Under construction – design is unfinished, so cannot
be compiled.
Out of Date – although design is still valid with respect
to grammars and libraries, the parts have changed.
Invalid – the sequence cannot be resolved.
40. Left recursion (CIS
CIS CIS)
Remove orphan
rules
We need to generate
CFG compilers
40
41. Design - Recap
Design
• Databases
• CAD tools
Analyze
• Computational
tools
Build
• Assembly of parts
• DNA synthesis
Participate in SBOL, community
effort for standard in Synthetic
Biology
Context-Free Grammar to design
DNA molecules
GenoCAD
Domain specific languages
Edit libraries of parts and grammars
rules
CFG compiler generation
41
49. Attribute Grammars (AG)
• AG = a CFG plus:
– Categories and Parts have attributes
– Rules have semantic actions to compute
attributes values
While going through the parse tree, we
now also evaluate the semantics
(meaning)
49
50. Example - Target output
– Transcription:
• dna dna + mrna
– Translation:
• mrna mrna + protein
– Degradation mrna:
• mrna []
– Degradation protein:
• protein []
– Interaction promoter protein:
• dna + repressor <-> dna_repressor_x
50
51. Example – Parts Attributes
• Promoter: transcription rate, repressor
– Promoter(transcription_rate, repressor) ptetr (50, tetr)
– Promoter(transcription_rate, repressor) placi (10, laci)
• RBS: translation rate
– RBS(translation_rate) rbsA (25)
– RBS(translation_rate) rbsB (50)
• CDS: degradation rates for the protein and the mRNA
– CDS(protein_deg,mrna_deg) laci(1,1)
– CDS(protein_deg,mrna_deg) tetr(1,1)
• Terminator
– Terminator t1
51
52. Example: Rules Semantic Actions
• CAS PROMOTER(transcription_rate,
repressor), CIS, TERMINATOR
– Transcription: dna dna + mrna, [transcription_rate]
– Interaction: if repressor in construct then dna +
repressor <-> dna_repressor_X
• CISTRON RBS(translation_rate),
CDS(protein_deg,mrna_deg)
– Translation: mrna mrna + protein, [translation_rate]
– Degradation_mrna: mrna ϕ, [mrna_deg]
– Degradation_protein: protein ϕ, [protein_deg]
52
72. Use an API for the Grammar’s
semantic actions
SBML API (use java libSBML)
• species(Name, InitConc)
• parameter(Name, Value)
• reaction(Name, Modifiers, Reactants, Products, Math)
• event(Name, Event_assignments, Trigger_math)
• Etc.
TRANS API
Use keywords to match declared Species, use naming
convention
– TYPE = “DNA”, “PROT”….
– KEY = “LAC”, “TET”,…
72
73. Parts Attributes
73
Category PartsID Parameters
PTE pTetR parameter('k_transcription_pTetR',10)
PLA placI parameter('k_transcription_placI',25)
PCI pcI parameter('k_transcription_pcI',50)
TER B0010
RBS Strong_RBS
parameter('k_translation_Strong_RBS’,25),
parameter('k_degradation_Strong_RBS',1)
CDS
lacI parameter('k_degradation_LAC_lacI',0.1)
tetR parameter('k_degradation_TET_tetR',0.1)
cIts parameter('k_degradation_CI_cIts',0.1)
74. Rules Semantic Actions
74
Reaction
Rules Name Modifiers Reactants Products Math
CAS -->
PRO CIS TER Transcription_<CAS.Construct> DNA_<PRO.Construct> mRNA_<CIS.Construct> k_transcription_<PRO.Construct>
CIS --> RBS
CDS
Translation_<CIS.Construct> mRNA_<CIS.Construct> PROT_<CDS.Construct> k_translation_<RBS.Construct>
Degradation_mRNA_<CIS.Construct> mRNA_<CIS.Construct>
-k_degradation_<RBS.Construct>*
mRNA_<CIS.Construct>
Degradation_PROT_<CDS.Construct> PROT_<CDS.Construct>
-k_degradation_<CDS.Construct>*
PROT_<CDS.Construct>
Rules Species Reactions
CAS --> PRO CIS TER species_amount(DNA_<PRO.Construct>,1)
reaction(Transcription_<CAS.Construct>,
[DNA_<PRO.Construct>], [], [mRNA_<CIS.Construct>],
"k_transcription_<PRO.Construct>")
CIS --> RBS CDS
species(mRNA_<CIS.Construct>,0),
species(PROT_<CDS.Construct>,
init_<CDS.Construct>.getValue())
reaction(Translation_<CIS.Construct>,
[mRNA_<CIS.Construct>], [], [PROT_<CDS.Construct>],
"k_translation_<RBS.Construct>"),
reaction(Degradation_mRNA_<CIS.Construct>, [],
[mRNA_<CIS.Construct>], [], ”-
k_degradation_<RBS.Construct>* mRNA_<CIS.Construct>"),
reaction(Degradation_PROT_<CDS.Construct>, [],
[PROT_<CDS.Construct>], [], "-
k_degradation_<CDS.Construct>* PROT_<CDS.Construct>")
75. Rules Semantic Actions - TRANS
75
PRO PLA: semantic action for the POSSIBLE interaction (trans) reaction
Name interaction_LAC_<PLA.Construct>
Modifiers
Reactants TRANSspecies{PROT-LAC}
Products TRANSspecies_and_declare{[PROT-LAC],PROT-
LAC_DNA_<PLA.Construct>_x,0}
Math TRANS{[PROT-LAC],+ k_binding_PROT-
LAC_<PLA.Construct> - k_release_PROT-
LAC_<PLA.Construct> * PROT-
LAC_DNA_<PLA.Construct>_x,0}
80. Synthetic Biology designs
Toggle switch (Garner, 2000) Oscillator (Elowitz, 2000)
Same libraries of parts but different layouts.
Used SBML API to design an attribute grammar using Wilson-Cowan rate laws 80
84. A G2P language for Natural
Genome
• Scale-up to handle natural genome
– Illustrate Attribute Grammar as a formalism to
map Genotype to Phenotype
• Systems Biology
– Regulation of the cell cycle of the budding
yeast
84
89. Discrete Boolean Network
• Design variety of language!
• Qualitative
• API GinML: edges and nodes
Thieffry, D., & Thomas, R. (1995). Dynamical behaviour of biological regulatory networks—II. Immunity control in
bacteriophage lambda. Bulletin of Mathematical Biology. Retrieved from
http://link.springer.com/article/10.1007/BF02460619
89
90. Analyze - Recap
Design
• Databases
• CAD tools
Analyze
• Computational
tools
Build
• Assembly of parts
• DNA synthesis
Attribute Grammar to map genotype and phenotype
Designing G2P languages, use SBML or GinML API
Model natural genome, the cell cycle example
Implementation in GenoCAD: design to simulation
Generation of AG compilers from database 90
94. A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G
A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T
C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C G A T C
94
95. Sections:
Customer screening
Sequence screening
Record retention
Government contact
“[…] to minimize the risk that unauthorized individuals or
individuals with malicious intent will obtain “toxins
and agents of concern” through the use of nucleic
acid synthesis technologies, and to simultaneously
minimize any negative impacts on the conduct of
research and business operations.” 95
100. iGEM--International Genetically
Engineered Machine
• Summer project for teams of undergrads in Synthetic Biology
– Projects range from a rainbow of pigmented bacteria, to banana
smelling bacteria, an arsenic biosensor, etc.
– 165 teams in 2011
• Judge
– iGEM 2012 Americas East Jamboree (Information Processing &
Fundamental Advances & Software track) in Pittsburgh, PA
– aGEM 2012 in Edmonton, Canada
– iGEM 2011 World Championship (Software track)
in MIT Cambridge, MA
– iGEM 2011 Americas Jamboree (Information Processing &
Software track) in Indianapolis, IN
– iGEM 2010 World Championship (Poster) in MIT Cambridge, MA
• Advisor of teams
– Virginia Tech iGEM 2011 team
– VT-ENSIMAG Biosecurity software team for iGEM 2010 100
101. Biological weapons nonproliferation
• 1 year postdoc fellowship
• Center for Nonproliferation Studies at the
Monterey Institute for International Studies
101
102. Build - Recap
Design
• Databases
• CAD tools
Analyze
• Computational
tools
Build
• Assembly of parts
• DNA synthesis
Screening a DNA
sequence for Select Agent
and Toxins
Relations our science and
policy
Engaging the students
(iGEM)
102
104. Conclusions
Design: GenoCAD – Rule-based genetic design tools user-
friendly and Domain Specific Language
Galdzicki, M., Wilson, M. L., Rodriguez, C. A., Pocock, M. R., Oberortner,
E., Adam, L., … Sauro, H. M. (2012). Synthetic Biology Open Language
(SBOL) Version 1.1.0, 1–26. (and NBT paper in submission)
Wilson, M. L., Hertzberg, R., Adam, L., & Peccoud, J. (2011). A step-by-
step introduction to rule-based design of synthetic genetic constructs
using GenoCAD. (C. Voigt, Ed.)Methods in enzymology, 498, 173–88.
doi:10.1016/B978-0-12-385120-8.00008-5
Mandy L Wilson, Sakiko Okumoto, Laura Adam and Jean Peccoud.
Development of a domain-specific genetic language to design
Chlamydomonas reinhardtii expression vectors.(Manuscript in
preparation)
– Talk: Adam, L. & Peccoud, J. Formal grammars to protect intellectual
properties in synthetic biology. International Conference on Synthetic
Biology at Evry, France, December 15-16, 2010.
– GenoCAD tutorials
104
105. Conclusions
Design: GenoCAD – Rule-based genetic design tools user-
friendly and Domain Specific Language
Analyze: Semantic models of DNA sequences
Cai, Y., Lux, M. W., Adam, L., & Peccoud, J. (2009). Modeling structure-
function relationships in synthetic DNA sequences using attribute
grammars. PLoS computational biology, 5(10)
Laura Adam, Matthew W. Lux, Mandy L. Wilson, Tian Hong, Jean Peccoud.
Design of Languages for Systems and Synthetic Biology to translate
genetic designs into mathematical models. (Manuscript in preparation)
– Poster: Adam, L. & Peccoud, J. (2011). Using user defined semantic
languages in synthetic biology: generating DNA compilers. Third
International Workshop on BioDesign Automation (IWBDA) at 48th
ACM/EDAC/IEEE Design Automation Conference (DAC) in San Diego, CA.
– Talk:. Formal languages to map Genotype to Phenotype in Natural
Genomes. GBCB seminar, 2012.
105
106. Conclusions
Design: GenoCAD – Rule-based genetic design tools user-
friendly and Domain Specific Language
Analyze: Semantic models of DNA sequences
Build: Biosecurity issues and DNA synthesis
Adam, L., et al. (2011). Strengths and limitations of the federal guidance
on synthetic DNA. Nature Biotechnology, 29(3), 208–210.
doi:10.1038/nbt.1802
– Talk: GenoTHREAT: A biosecurity software to screen DNA synthesis
orders against Pathogens. GBCB seminar, 2011.
– Adam, L.(2011). Scientists need to be proactive to foment international
biosecurity. Runner up essay for the “Young Scientists” essay contest
organized by the Implementation and Support Unit of the Biological Weapon
Convention at the United Nations
106
107. Conclusions
Design: GenoCAD – Rule-based genetic design tools user-
friendly and Domain Specific Language
Analyze: Semantic models of DNA sequences
Build: Biosecurity issues and DNA synthesis
Define your Domain Specific G2P languages, Design
mutants and Analyze them in GenoCAD in minutes!
107
108. Acknowledgements
VBI SynBio Group
• J. Peccoud (P.I.)
• N. Adames
• D. Ball
• M. Lux
• C. Overend
• M. Wilson
• and Patrick (Yizhi) Cai
• and R. Hertzberg
PhD committee:
Dr. Bevan
Dr. Garner
Dr. Kepes
Dr. Peccoud
Dr. Ramakrishnan
Dr. Tyson
• And Dennie Munson!
108
Collaborators
• SBOL: H. Sauro, C.Myers, D. Densmore, C. Rodriguez, M.
Galdzicki and many more
• Language: Eric Van Wyck
• GenoGUARD: Ed You (FBI)
112. What if we could…
• Compile circular DNA
• Read the different messages on both
strands
• Lexical analysis of natural sequences
• Customize (GenoCAD) and standardize?
(SBOL)
• Handle ambiguity
112
Natural
language
Natural
genome
Formal
language
Synthetic
biology
114. Parsing
114
Left to Right
Top-Down
Parse
The Parse Tree of the Sentence
"The boy went home“
Right to Left
Top-Down
Parse
Left to Right
Bottom-Up
Parse
Right to Left
Bottom-Up
Parse
115. Use of attribute grammar in
synthetic biology
115
Formal definition Semantic In the synthetic
biology context
V, a finite set of non-
terminals
Attributes Parts categories
Σ, a finite set of
terminals
Attributes values Genetic Parts
R, a finite relation from
V to (VUΣ)*
Semantic actions Design Rules
S∈V, the start symbol Hard-coded
declarations
Start
118. Insights from
Genotype-to-Phenotype (G2P)
mapping
G2P map
The
Phenotype
is X
Genotype
• genetic makeup of a cell, an
organism, or an individual
• specific alleles
• inherited
Phenotype
• observable characteristics or
traits
118
119. Traditional G2P mapping is linear
• Sui Huang, Rational drug discovery: what can we learn from regulatory networks?, Drug Discovery Today, Volume 7, Issue 20, 15
October 2002
• Peccoud, J., Velden, K. V., Podlich, D., Winkler, C., Arthur, L., & Cooper, M. (2004). The selective values of alleles in a molecular
network model are context dependent. Genetics, 166(4), 1715–25.
Phenotypes
Central dogma
119
120. Current Formalisms
Databases:
genetic mapping, genome annotation,
genotype, mutant, transcriptome,
proteome and metabolome data.
Ontologies:
Controlled vocabulary for annotation of
genes and their products (cellular
component, molecular function,
biological process)
Actually, G2P maps are nonlinear:
Gene Networks
• Priest, N. K., Rudkin, J. K., Feil, E. J., van den Elsen, J. M. H., Cheung, A., Peacock, S. J., Laabei, M., et al. (2012). From
genotype to phenotype: can systems biology be used to predict Staphylococcus aureus virulence? Nature reviews. Microbiology,
10(11), 791–7. doi:10.1038/nrmicro2880
• Benfey, P. N., & Mitchell-Olds, T. (2008). From genotype to phenotype: systems biology meets natural variation. Science.
“replacing the linear pathways with interconnected networks.”
120