Approaches to cDNA Cloning and Analysis

Approaches to cDNA Cloning and Analysis

Dr. Matthias Harbers

Chief Scientist DNAFORM Inc.

Co-assigned Scientist at the RIKEN Omics Center

© Matthias Harbers 2008
1

Classical View on the Utilization of Genomic Information

Transcript Start Site Nucleus

Promoter “Gene”

Genomic DNA
(storage of information)
Transcription Factors

Transcription by RNA polymerase II

AAAAA Coding mRNA
Cap
(transport of information)
(7-methylguanosine cap or m7G cap)
Translation at ribosome

Protein
Cytoplasm
(tools to operate “functions”)

Developed in the 50th and 60th of last century. 2

The Classical View Has Been Challenged by new Developments

Discovery/Project Importance Year
Discovery of reverse DNA can be synthesized from RNA 1969
transcriptases templates
Discovery of ligase and Establishing DNA recombination, 1960s and 70s
restriction DNA cloning, and preparation of
endonucleases DNA libraries
DNA sequencing Chain-termination method 1975
(“Sanger Sequencing”)
Human Genome Project Move to sequencing entire genomes 1990 to 2003

Expressed sequence tags First attempt to gene discovery 1991
(ESTs) and expression profiling

IMAGE Project Program to create cDNA collections 1993 to 2007
from key organisms
ENCODE Project Functional elements in human Since 2003
genome
3

Topics of the Presentation

Approaches to cDNA cloning

Special topics related to cDNA cloning

Large-scale cDNA cloning projects

Small RNA (sRNA) cloning

Tag-based approaches

Next-Generation Sequencing

Where do we go from here?

4

Approaches to cDNA cloning
AAAAA 3’ Capped and polyadenylated mRNA
5’ Cap

Cap mRNA A A A A A… 1st Strand cDNA synthesis:
TTTTT Commonly oligo(dT) priming

mRNA
Prime 2nd strand cDNA synthesis:
Adaptor
cDNA 5’-Linker ligation or tailing reaction

2nd Strand synthesis
Adaptor cDNA
(Option to make PCR)

Digestion with cloning enzyme(s):
cDNA Methylation can protect against internal
cleavage within cDNA

Ligation into phage or plasmid vector:
PlPasmi
Plasmid
d
(Plasmid with cDNA insert may be
excised from phage vector)
Phage
5

Special Topics Related to cDNA Cloning

Synthesis of very long cDNAs (>10.000 bp, not further discussed)

Full-length cDNA cloning (important to obtain functional cDNAs)

Normalization (key to gene discovery in large-scale projects)

Cloning vectors and applications (not further discussed)

Subtractive cloning (not further discussed)

Expression cloning (not further discussed)

Addressing splicing (left out of large-scale projects)

Ref.: Harbers M: The current status of cDNA cloning, Genomics. 2008 Mar;91(3):232-42.

6

Use of cDNA Libraries

Isolation of individual target genes

in Research Laboratories

Transcriptome Analysis and Genome Projects

Large-scale random clone picking

End-sequencing to build transcript catalogs

Full-length sequencing of selected clones

Creation of sequence data bases

Creation of cDNA collections

Ref.: Carninci P et al.: Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia.
Genome Res. 2003 Jun;13(6B):1273-89. 7

Benefits of Large-Scale cDNA Cloning Projects

Improved cDNA Cloning Technology

SNP Analysis:
Proteomics:
Sequence Data Location in Promoter or
Functional Studies on
Exon
Proteins
Clone Collections Functional Studies

Gene Regulation: Genomics:
Promoter Identification Gene Discovery
Expression Profiling Mapping

RNAi Noncoding RNA
Knock down Sense-antisense Pairs

Public sequence databases and clone collections are essential tools for research!
8

The mRNA Pool of a Cell

10,000 t0 20,000 transcripts
<20% of mRNA
5 t0 10 transcripts
up to 20% of mRNA

500 t0 2,000 transcripts
40 to 60 % of mRNA
(Old numbers estimated from
reassociation and hybridization studies)

Discovery of rarely expressed genes is a difficult task!
9

Normalization of cDNA Libraries
During a Normalization Step a cDNA pool is hybridized against an aliquot of the
original mRNA sample or the same cDNA pool. Due to concentration dependent
hybridization kinetics the number clones representing highly expressed genes will
be reduced yielding in a more equal distribution of different cDNAs in the library.

Without Normalization With Normalization Combine Normalization and
/Subtraction /Subtraction
Subtraction for higher Gene
 /Hind III  /Hind III
Discovery
9.4 kbp 9.4 kbp
6.6 kbp 6.6 kbp

Number of non-redundand clones
4.4 kbp 4.4 kbp

2.2 kbp 2.2 kbp Driver 2
2.0 kbp 2.0 kbp Lib. 4 +
Driver 2
Driver 1
Lib. 3 +
Driver 1
Lib. 2 No Driver
0.5 kbp 0.5 kbp
Lib. 1
: Highly expressed genes Example: Pancreas cDNA
Number of Libraries
10

Full-Length cDNA Cloning

“Cap Trapper” Method “Oligo Capping” Method
Cap P P P mRNA A A A A A…
Cap mRNA A A A A A… P mRNA A A A A A…
TTTTT
Phosphatase
Chemical reaction
Cap P P P mRNA A A A A A…

Biotin Cap mRNA mRNA A A A A A…
A A A A A…
cDNA TTTTT Pyrophosphatase

RNase I digestion P mRNA A A A A A…

mRNA A A A A A…
Biotin Cap mRNA A A A A A…
cDNA TTTTT RNA Ligase
Adaptor mRNA A A A A A…
Recovery on beads TTTTT

Biotin Cap mRNA
Beads A A A A A…
cDNA TTTTT Adaptor mRNA A A A A A…
cDNA TTTTT

Adaptor Primer
cDNA
cDNA

Key Steps: Key Steps:
Biotinylation of Cap structure and RNase I Treatment Replacement of Cap structure by RNA oligonucleotide
11

Examples for Large-Scale cDNA Cloning Projects
Targeting at the cloning and full-length sequencing of “one representative” cDNA clone for
each gene. This reduces cost, but it entirely ignores splicing events.

Project Organisms URL
IMAGE Consortium Human, mouse, rat, zebrafish, fugu, http://image.llnl.gov/
Xenopus (X. laevis and X. tropicalis),
cow, and primate
Mammalian Gene Human, mouse, rat, cow, others http://mgc.nci.nih.gov/
Collection (MGC)
Tokyo University Human http://cdna.hgc.jp/

RIKEN FANTOM Mouse http://fantom3.gsc.riken.go.jp/

Rice full-length cDNA Rice http://cdna01.dna.affrc.go.jp/cDNA/
Consortium

RIKEN Arabidopsis Arabidopsis http://www.brc.riken.jp/lab/epd/Eng/
news/071015.shtml
ORF Consortium Human (some mouse clones) http://www.orfeomecollaboration.org

12

Pre-mRNA is Spliced into mRNA

Large-scale cloning projects do not cover splice variants.
But maybe 75% of all signal transducers are regulated by splicing! 13

Capturing alternatively Spliced Exons in mRNA

Sense strand Antisense strand
Sample 1 Sample 2

Cut double-stranded regions

Capture single-stranded regions

Ref.: Watahiki A et al.: Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas.
Nature Methods 2004 Dec 1(3): 233-9.
14

The Discovery of small RNAs

Classical cloning protocols removed all cDNA fragments of less than
500 bp (avoid linker contamination, cutoff of cloning vectors).

Proteins of less than 100 amino acids were commonly not annotated.

However, small RNAs have important functions!

Small RNAs are non-coding RNAs (ncRNAs) often derived from maturation
processes in the cell that include digestion steps by RNases.

Most prominent example: microRNAs (miRNA) have reverse complement
sequences to other mRNA transcripts. They are around 21-23 base pairs long
after maturation and can alter the expression/translation of one or several
target genes through RNA interference.

And we are still finding many more new RNA species!

Ref.: Kawaji H, Hayashizaki Y. Exploration of small RNAs. PLoS Genet. 2008 Jan;4(1):e22.

15

Small RNA (sRNA) Cloning
5’ P OH 3’ Short RNA

Modify 3’ end:
P CCCCCCCCC P
C-Tailing or adaptor ligation

Modify 5’ end:
CCCCCCCCC
Here by adaptor ligation

CCCCCCCCC
GGGGGGGG 1st Strand cDNA synthesis

CCCCCCCCC
GGGGGGGG 2nd Strand synthesis and PCR

Sequence analysis:
PlPasmi Direct sequencing of DNA fragments
Plasmid
d
(Option to ligate into plasmid vector)

Key Steps:
Modification of 5’ and 3’ end of RNA for PCR amplification. Selection by size range. Commonly only sequenced.
No cloning needed as short cDNAs can be chemically synthesized.
16

Tag-Based Approaches

Gene discovery cannot be done by standard methods used in
expression profiling such as microarray or PCR.

Unsupervised approaches are needed for gene discovery that do
not require sequence information for probe design.

First approach to gene discovery was sequencing of 3’ ends of cDNA
clones (EST sequencing). Requires one read per clone.

Gene identification does not require sequences of 500 to 800 bp,
but much shorter sequences of some 20 bp or less are sufficient.

Use long sequencing reads to cover many short fragments by one run.

New protocols to isolated short fragments from RNA.

Tag-based approaches in expression profiling and gene discovery.
Ref.: Harbers M and Carninci P: Tag-based approaches for transcriptome research and genome annotation.
Nature Methods 2005 Jul 2(7): 495-502.
17

Tag-Based Approaches
Paired-end Tags or PETs

5’ end 3’ end
Anchoring enzyme sites
Cap selection Remove poly(A)

Cap mRNA AAAAA

CAGE SAGE SAGE 3’ SAGE
5’ SAGE (5’ related) (3’ related)
MPSS
DGE

RNA-Seq
or other shotgun approaches
18

Serial Analysis Gene Expression (SAGE)
(Digital Gene Expression (DGE))

mRNA A A A A A… 1st Strand cDNA Synthesis with biotinylated primer
TTTTTT Biotin (Commonly starting from mRNA.)

cDNA
Biotin Beads Preparation of double-stranded cDNA and digestion with anchoring enzyme

Adaptor cDNA
Biotin Beads Adaptor Ligation and digestion with Mme I (20 bp) or EcoP15I (27 bp)

Adaptor Adaptor Formation of “Di-Tags”
(Di-Tags can be used for direct sequencing (DGE).)

Concatenation and cloning into plasmid vector
(Classic sequencing of concatemers.)

Very well established and rich reference/annotation information.
Digital expression profiling by “tag counting”.

Ref.: Velculescu VE et al. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):368-9, 371.
19

Cap Analysis Gene Expression (CAGE)
5’ CAP mRNA AAAAA 3’ Commonly starting from 50g total RNA.
1st Strand cDNA Synthesis
(Covering poly(A-) mRNA and long mRNA.)
CAP mRNA AAAAA
cDNA NNNNNN

5’-End Selection on Beads by Cap Trapper
(Less bias due to chemical modification of Cap.)
Beads CAP mRNA AAAAA
cDNA NNNNNN

Adaptor Ligation and 2nd Strand Synthesis

Adaptor I
cDNA NNNNNN

Digestion with Mme I (20 bp) or EcoP15I (27 bp)

Adaptor I cDNA

Isolation of CAGE TAGs

Adaptor I TAG

3’-End Adaptor Ligation

Adaptor I TAG Adaptor II Preferably used for direct sequencing (>4,000,000 tags per run).

Ref.: Kodzius R et al.: Cap analysis of gene expression: transcription start site mapping and expression profiling.
Nature Methods 2006 Mar 3(3): 211-222. 20

Cap Analysis Gene Expression (CAGE)

Signal 1 Signal 2 Signal 3 CAP mRNA A A A A A

TSS

Genome TF1 TF2 TF3 Exon 1 2 3 4 5

Tiling Array/RNA-Seq
Array/RNA-
Microarray
TF CAGE Tags SAGE
ChIP RACE

CAGE tags experimentally link transcripts to their promoters.
CAGE tags integrate information based on genome annotations.
CAGE tags can be linked to whole genome tiling arrays and RNA-Seq data.
CAGE tags can be linked to Chromatin IP/ChIP-Seq data.
CAGE tags correlate with open chromatin.
CAGE tags provide primer information for cloning new transcripts.
21

Classical DNA Sequencing by Chain-Termination Method
dNTP/ddNTP Mix
G C G
A T G
T
C C
A A A G C T
Primer T A

A C C A
DNA Template T G G T T G C T G C C A A T G T
One reaction per nucleotide
DNA Polymerase

A T G C T G G T T G C T G C C A A T G T

T G G T T G C T G C C A

T G G T T G

T G G T T G C T G C

Capillary Sequencer Analyze fragments DNA fragments from
by gel electrophoresis Primer extension reactions

Over 30 years the most important method in molecular biology.
Challenged by emerging new sequencing technologies: Next-Generation Sequencing.
22

Next-Generation Sequencing

Driven by the “$1000 genome” different companies are on the move to provide new sequencing
technologies based on “sequencing by synthesis” or “ligation-based sequencing”. Other approaches
may use hybridization methods or physical means in the future.

Platform Mb per run/read length Method
Roche 454 Sequencing 100 Mb/250 bp/7h per run Emulsion PCR and Pyrosequencing

Illumina (Solexa) 1300 Mb/32-40bp/4 days per run Bridge PCR and sequencing-by-
synthesis

ABI SOLiD 3000 Mb/35 bp/5 days per run Emulsion PCR and ligation-based
sequencing

Helicos 25 to 90 Mb per h/up to 55 bp Single-molecule detection

Ref.: Mardis ER. The impact of next-generation sequencing technology on genetics.
Trends Genet. 2008 Mar;24(3):133-41. Epub 2008 Feb 11.
von Bubnoff A. Next-generation sequencing: the race is on. Cell. 2008 Mar 7;132(5):721-3.

23

Example for Ligation-Based Sequencing: ABI SOLID System

DNA fragments having Project specific data analysis:
adaptor sequences: Mapping to genome
Genomic DNA Reference information
Tag Sequencing

Images are the courtesy of ABI and were kindly provided by ABI Japan.
24


Images are the courtesy of ABI and were kindly provided by ABI Japan. 25


Images are the courtesy of ABI and were kindly provided by ABI Japan. 26

Example for Sequencing-by-Synthesis: Illumina 1G System

DNA per run Addition of Add to flow Preparation
0.1 ～1µg 2 adaptors cell of clusters

Images are the courtesy of Illumina and were kindly provided by Illumina Japan. 27

3’ 5’

Cycle 1
A  Addition of the sequence reagent
T
C G One base extension reaction

C Removal of non-incorporated bases
G C
G Detect fluorescence signal
T A
A C T Removal of the fluorescence label
G
C Cycle 2
T C
C
C Repetition of the above reactions
C A G
T
A Cycle 3, 4, 5…..
T C
A
G C
Repetition of the above reaction
A
G
T
A G T
T G
T

5’ Images are the courtesy of Illumina and were kindly provided by Illumina Japan. 28


40,000,000 clusters on a flow cell

20um

100um

Images are the courtesy of Illumina and were kindly provided by Illumina Japan. 29

Where do we go from here?

Next-Generation Sequencing will push genome sequencing field for
re-sequencing and de novo sequencing (“1000 Genome Project”).

Metagenomics (Environmental Genomics, Ecogenomics, or
Community Genomics): Direct analysis of genetic materials obtained
from environmental samples.

Expression profiling: SAGE (DGE), CAGE, PET, RNA-Seq.

Analytical applications to identify functional regions/elements in
genomes: ChIP-Seq, open chromatin, SNPs, splicing, others to come .

Analytical applications in mutation screens.

Analytical applications for detection of infectious agents.

30

Transcriptome Analysis: The Dominance of noncoding RNA
Genome sequencing and annotation did not tell us about the real
extent of gene expression!

Tiling array experiments and deep sequencing by next-generation
sequencing methods indicates that >90% of the genome is expressed.

Maybe 40 to 50% of the mRNA is not polyadenylated, and we did not
analyze it yet.

Most of the transcripts are potentially noncoding RNAs having
unknown (regulatory ?) functions.

The definition of a “gene” may no longer hold with many different
transcripts derived from same loci.

We do not understand the “hidden layers” regulating the utilization of
genomic information.
Ref.: Mattick, J.S. "Challenging the dogma: The hidden layer of non-protein-coding RNAs on complex organisms"
Bioessays. (2003) 25, 930-939.
31

Example for RNA-Seq in Yeast Saccharomyces pombe (fission yeast)

Illumina 1G sequencer; average read length 39.1 base, fragments from poly(A) mRNA

> 23 mil reads (~60 genome length) proliferating cells.

> 99 mil reads (~ 190 genome length) from five different stages.

Covering ~94% nuclear and > 99% of mitochondrial genome.

Confirmed expression from intergenic regions by RT-PCR.

Control experiments using whole genome tiling arrays (25 mer/20 nt intervals)
confirmed identification novel transcripts (26 out of 453 may encode short
proteins).

Recent publications on the use of RNA-Seq include S. pombe, S. cerevisiae, Arabidopsis,
mouse tissues, mouse stem cells, and HeLa S3.

Ref.: Wilhelm BT, Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.
Nature. 2008 Jun 26;453(7199):1239-43. Epub 2008 May 18.
Graveley BR. Molecular biology: power sequencing. Nature. 2008 Jun 26;453(7199):1197-8.
32

Examples for Genome Size (haploid)

Genome Length in bp Estimated gene number
Phi-X 174 5,386 10

Human mitochondrion 16,569 37

E. coli 4,639,221 4,377

Saccharomyces cerevisiae 12,495,682 5,770

Caenorhabditis elegans 100,258,171 19,427

Arabidopsis thaliana 115,409,949 ~28,000

Drosophila melanogaster 122,653,977 13,379

Humans 3.3 x 109 ~20,500

Amphibians 109–1011 ?

Values taken from: http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/G/GenomeSizes.html out of July 2007
33

Where are our limitations?

Mammalian genome size and transcriptome complexity:
Enrichment of fragments e.g. using microarrays,
Normalization and longer reads required.

Thus far uneven representation requires use of more than one method.

Requirements for starting materials (target is to analyze single cells).

No unified cDNA library method: using different methods depending on RNA length.

Very large data files and lack of computational analysis tools.

What is transcriptional noise?

Research dominated by “detection” rather than “functional analysis”.

Ref.: Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II.
Nat Struct Mol Biol. 2007 Feb;14(2):103-5.

34

Present Strategies for Transcriptome Analysis
Interest has shifted to next-generation sequencing to profile transcriptional
activities.

We cannot predict ends of transcripts, and therefore tag-based approaches
to indentify start sites and termination sites are needed.

Identification of transcription start sites in combination with other
information is driving “gene networks studies” and “system biology”.

RNA-Seq provides new means for the identification of splice sites and
expressed mutations.

We do not clone all those new transcripts, but there will be a need to get
resources for functional analysis of new transcripts.

We are more than ever falling short on the functional analysis of new transcripts.
Thus far we have not even analyzed all coding transcripts!

It is an exciting time to work on transcriptome analysis offering many challenges and rewards!
35

Contact:

Dr. Matthias Harbers
DNAFORM Inc.
Leading Venture Plaza-2, 75-1, Ono-cho
Tsurumi-ku, Yokohama City, Kanagawa, 230-0046
Japan
E-mail: matthias.harbers@dnaform.jp
Phone： +81-(0)45-510-0607
FAX: +81-(0) 45-510-0608
URL: http://www.dnaform.jp

36

Approaches to cDNA Cloning and Analysis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Approaches to cDNA Cloning and Analysis

Ähnlich wie Approaches to cDNA Cloning and Analysis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Approaches to cDNA Cloning and Analysis