SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Yeast Genome Project
Introduction
•

Saccharomyces cerevisiae

•
•
•
•
•
•




It is perhaps the most useful diploid yeast, having been instrumental to winemaking, baking and brewing since
ancient times
It is one of the most intensively studied eukaryotic model organisms in molecular and cell biology
Size: 5- 10 µm in diameter
Sequenced in year: 1996
Strain sequenced: S288C
Databases:
Munich Information Centre for Protein Sequences (MIPS): http://www.mips.biochem.mpg.de/mips/yeast/
Yeast Protein Database (YPD): http://quest7.proteome.com/YPDhome.html
Saccharomyces Genome Database (SGD): http://genome-www.stanford.edu/Saccharomyces/

•

Schizosaccharomyces pombe (Fission yeast)

•
•
•
•

It is used as a model organism in molecular and cell biology
Size: 3 to 4 µm in diameter and 7 to 14 µm in length
Sequenced in year: 2002
Strain sequenced: 972h by European sequencing consortium (EUPOM) including 13 laboratories and Wellcome
Trust Sanger Institute; Cold Spring Harbor Laboratory
Databases:
PomBase: http://www.pombase.org
Broad Institute: Saccharomyces genome database:
http://www.broadinstitute.org/annotation/genome/schizosaccharomyces_group/MultiHome.html

•


• Candida albicans
• Most common human fungal pathogen
• It is diploid fungus that grows both as yeast and filamentous cells and a
causal agent of opportunistic oral and genital infections in humans
and candidal onychomycosis, an infection of the nail plate
• Size: 2.0-7.0 µm in diameter µm in length 3.0-8.5 µm in length
• Sequenced in year:2004 by consortia formed by Stanford technology
centre
• Strain sequenced: SC5314
• Databases:
 Candida database : http://www.candidagenome.org
 Broad Institute: Saccharomyces genome database:
http://www.broadinstitute.org/annotation/genome/candida_group/Multi
Home.html
•

The bakers yeast Saccharomyces cerevisae is the first eukaryote whose genome is
entirely sequenced

•

Mitochondrial DNA was sequenced in segments in the 1980s.

•

In 1989, it was decided to initiate a yeast sequencing project within the frame of
the EU biotechnology programmes, some 35 European laboratories became
initially involved in this enterprise [Vassarotti & Goffeau, 1992]

•

Chromosome III was the first chromosome to be completed in 1992 followed by XI
and II both in 1994

•

The 315kb sequence of yeast chromosome III was published, it was a remarkable
scientific landmark not only by being the first eukaryotic chromosome ever to be
sequenced, but primarily because it revealed the extent of what remained to
be understood in the genome of an otherwise extensively studied
organism, such as, Saccharomyces cerevisiae

•

Soon after its beginning, several other laboratories joined the project and agreed
upon an international collaboration that enabled the whole yeast genome
sequence to be finalized in 1995

•

More than 600 scientists in Europe, North America and Japan became involved in
this effort and the entire sequence was released in April 1996.
EU=55.9%, UK=17.6%, USA= 20.0%, Canada= 4.3%, Japan= 2.2%
Figure: Consortia involved in the yeast genome sequencing project
Cloning and Mapping Procedures:
• The sequencing of chromosome III started from a collection of
overlapping plasmid or phage lambda clones that were distributed by the
DNA coordinator to the contracting laboratories. However, it soon became
evident that ordered cosmid libraries were much more advantageous to
aid large scale sequencing.
• To construct a library with as complete coverage as possible with as few
clones as possible, the cloned DNA fragments should be randomly
distributed on the DNA.
• Under these conditions, the number of clones (N) in a library representing
each genomic segment with a given probability (P) is
N = ln (1-P)/ln (1-f)
where f is the insert length expressed as fraction of the genome size
[Clarke & Carbon, 1976].

• For example, with the size of 12,800 kb for the yeast genome and
assuming an average insert length of 35 kb, a cosmid library containing
4600 random clones would represent the yeast genome at P=99.99%, i.e.
about twelve times the genome equivalent
A low number of clones was of interest in setting up ordered yeast cosmid
libraries or specific sublibraries by sorting out from an unordered cosmid library
by colony hybridization using specific chromosomal DNA purified by pulsed-field
gel electrophoresis as a probe
The 'nested chromosomal fragmentation' method [Thierry & Dujon, 1992] was
then applied to rapid sorting of these clones
Finally, a set of overlapping cosmids was sufficient to build a contig of specific
chromosome
•

This approach has also been successfully applied to many of the other
chromosomes sequenced in the yeast genome project

•

To facilitate sequencing and assembly of the sequences, contigs of
overlapping cosmids and fine-resolution physical maps of the respective
chromosomes were constructed first, by application of classical mapping
methods (fingerprints, cross-hybridization) or by novel methods developed
for this programme, such as site-specific chromosome fragmentation [Thierry
& Dujon, 1992] or the high resolution cross-hybridization matrix [Scholler et
al., 1995]
Sequencing strategies and Sequence Assembly
•
•

In the European network, clones were distributed to the collaborating laboratories according to a
scheme worked out by the DNA coordinators
Each contracting laboratory was free to apply sequencing strategies and techniques of its own
provided that the sequences were entirely determined on both strands and unambiguous
readings were obtained

• Two principle approaches were used to prepare subclones for sequencing:
1)
generation of sub-libraries by the use of a series of appropriate restriction enzymes or from
nested deletions of appropriate sub-fragments made by exonuclease III
2)
generation of shotgun libraries from whole cosmids or subcloned fragments by random
shearing of the DNA
• Sequencing by the Sanger technique was done
1)
manually, labelling with [35S]dATP being the preferred method of monitoring
2)
by automated devices
• Two types of devices for on-line detection with fluorescence labeling were employed
1)
Applied Biosystems ABI373A
2)
Pharmacia A.L.F.
•

One laboratory used the direct blotting electrophoresis system from GATC company (Konstanz).
Similar procedures were applied to the sequencing of chromosomes outside the European
network. The American laboratories largely relied on machine-based large-scale sequencing.
Sequencing Telomeres
• The yeast chromosome telomeres presented a particular
problem

• Due to their repetitive sub-structures and the lack of
appropriate restriction sites they could be cloned by
conventional procedures with only a few exceptions
• Largely, telomeres were physically mapped relative to the
terminal-most cosmid inserts using the I-SceI chromosome
fragmentation procedure [Thierry & Dujon, 1992]
• The sequences were then determined from specific plasmid
clones obtained by 'telomere trap cloning', an elegant
strategy developed by E. Louis at Oxford [Louis, 1994; Louis
& Borts, 1995]
Sequence Assembly
•

Within the European network, all original sequences were submitted by the collaborating
laboratories to the Martinsried Institute of Protein Sequences (MIPS) which acted as an
informatics centre

•

The sequences were kept in a data library, assembled into progressively growing contigs, and
updated during the course of the project by the application of appropriate criteria in a number of
quality controls, starting with chromosome XI

•

In collaboration with the DNA coordinators the final chromosome sequences were derived. Also in
the other yeast chromosomes, automated procedures were employed for
sequence
assembly, based for example on the programpackage developed at
1)
Cambridge [e.g. Dear & Staden, 1991]
2)
ACeDB programdeveloped for the C. elegans genome project [Thierry-Mieg & Durbin, 1992]
•

In any case, correct assembly of the sequences was guaranteed by establishing that the order of
restriction sites predicted from the sequence was consistent with the physical maps of these sites
that had been determined independently and care was taken to perform quality controls that
would result in a high accuracy

•

From theoretical considerations taking all types of errors together, it follows that with an average
sequence accuracy of 99.9%

•

In practice, care was taken to minimize frameshift errors, which represented about two thirds of
all sequencing errors and thus would have the most deleterious effects on gene interpretation.
Meanwhile, all sequences have been systematically checked for errors again and were corrected in
the data libraries.
The sequences have been interpreted using the following principles:
i.
All intron splice site/branch-point pairs detected by using
specially defined patterns were listed
ii. All ORFs containing at least 100 contiguous sense codons and not
contained entirely in a longer ORF
iii. Centromere and telomere regions, as well as tRNA genes and Ty
elements or remnants thereof were sought by comparison with
previously characterized datasets
• FASTA BLASTX and FLASH1 in combination with the Protein
Sequence
Database of PIR-International and other public
databases
• Protein signatures were detected by using the PROSITE
dictionary, as well as BLOCKS and PRODOM domains
• Base composition; nucleotide pattern frequencies; GC profiles; ORF
distribution profiles were performed by using GCG programs or the
X11 program package
• For calculations of GC content of ORFs the algorithm CODONS was
used
• This information was compiled at the end of the sequencing project
to annotate all genetic elements in the yeast genome
Classification of S. cerevisiae genes
ORF sizes in the S. cerevisiae genome
 At the time, the yeast genome sequencing project had been
finalized, comparison of the total sequence with public databases
revealed:
• some 28.4% of the yeast ORFs corresponded either to previously
known protein-encoding genes or to genes whose functions have
been determined previously or during the course of the project
• An estimated 5.6% of the total remained questionable ORFs
• 66% of the total ORFs represented novel putative yeast genes
• 14.8% of the total had homologues among gene products from
yeast or other organisms whose functions are known
• 14.4% of the total had recognizable motifs or weak homologies to
genes of experimentally characterized functions.
• Remaining 37.7% of the total ORFs had either homologues to ORFs
of unknown function on other
• Thus, approximately 2200 of the yeast genes had to be categorized
as 'genes of unknown function', sometimes called ‘orphans’
 A most useful inventory of the yeast proteins had been compiled in
the Yeast Proteome Database (YPD) [Garrels et al., 1996] and is
updated regularly.
The mystery of orphans
•
•
•

•
•
•
•
•

•
•

o


‘Orphans’ are defined by the absence of known function and of structural homologs of known
function, so it seems only natural that, with time, they will vanish.
Functions of a few genes previously classified as orphans were reported during the sequencing
project itself
The most striking result from the chromosome III, sequence was that approximately half of
all protein-coding ORFs revealed by the sequence, had no clearcut sequence homologs in any
organisrn, including yeast itself
Thus, with right sequence of the first eukayotic chromosome, it was the discovery of the
extent of our ignorance, rather than the discovery of many new genes, that was the most
conspicuous finding
exact figures depend on stringency criteria applied to determine the significance of sequence
similarities
on average, 30-35% of all ORFs of the yeast genome are orphans.
Even in absence of homologs, computers can provide some clues about the nature of some
orphans.
For example, prediction of transmembrane segments resulted in the striking conclusion that up
to 35-40% of the predicted proteins from chromosome III have trans- membrane helices.
Ultimately, the function of each sequence-predicted ORF can only be demonstrated by
experiments
total number of orphans in the yeast genome (about 2000)
It is clear that orphans by and large, are not fundamentally different from other yeast genes in
terms of expression.
If orphans are real genes, why were they not discovered before?
Genome redundancy is a possible explanation. As sequencing progressed, structural homologs to
earlier orphans were regularly discovered in the yeast genome. Statistically, however, there is no
indication that orphans tend to be more frequently duplicated than the genes previously
characterized by classical genetics or their structural homologs. If any-thing, the converse seems
to be true.
Gene Density and Gene Arrangement of Proteinencoding Genes in S. cerevisiae
•

From the number of genes and the total size of the yeast genome one arrives at a gene density

•

Gene density in all yeast chromosomes is rather similar

•

Excluding the ORFs contributed by the Ty elements, ORFs occupy on average 70% of the sequences.
This leaves only limited space for the intergenic regions which can be thought to harbour the
major regulatory elements involved in chromosome maintenance, DNA replication and
transcription.

•

The compact nature of the S. cerevisiae genome is apparent when compared to more complex
eukaryotic systems.

•
•

C. elegans contains a potential protein-encoding gene only every 5-6 kb [Hodgkin et al., 1994]
In the human genome, gene density had been estimated to be as low as one gene in 30 kb
[Olson, 1993] after the draft sequence is available, this figure is one gene in about 100 kb

•

Schizosaccharomyces pombe, possesses a lower gene density (one gene per 2.3 kb) than S.
cerevisiae. The difference between the two yeast genomes appears to be due to the fact that in the
fission yeast 40% of the genes contain introns, whereas only a minor fraction (< 5% of the proteinencoding genes in S. cerevisiae are found to be interrupted by introns
•
•

Generally, ORFs appear to be rather evenly distributed among the two strands of the single
chromosomes. In some chromosomes (e.g. I, II, VIII), there is a slight excess of coding capacity on
one of the strands, the significance of which is not known
Average base composition of yeast DNA is 38.4% (G+C)

• GC content of:
1.
protein coding (40.2%)
2.
non-coding regions (35.1%)
•
•

Coding regions are evenly distributed between the two strands
Average ORF size is 1450 bp

• The average sizes of inter-ORF regions vary between 630 and 945 bp for different chromosomes
1.
618 bp on average for 'divergent promoters' (36.2% GC)
2.
326 bp for 'convergent terminators' (29.3% GC)
3.
517 bp for 'promoter-terminator combinations' (34.2% GC)
•
•
•
•

Average base composition has been found to be symmetrical over the entire chromosomes
Base composition of ORFs themselves showing a significant excess of homopurine pairs on the
coding strand .
Regional variations of base composition with similar amplitudes were first noted along
chromosome III
A most interesting observation was that the compositional periodicity correlates with local gene
density, reaching more than 85% in GC-rich regions, followed by segments of comparably lower
gene density (50-55%) in AT-rich regions [Dujon et al., 1994].
 Functional elements of yeast chromosome:
1.
2.
3.

Centromere
Telomere
Origins of replication

 Complex and Simple repeats
•
•

yeast genome is remarkably poor in repeated sequences
unique constellation of repetitious sequences at the two ends of chromosome I is
found. Approximately 30 kb in each subtelomeric region carry similar (but nonessential) genes and a 15 kb repeat

•

these terminal regions represent the yeast equivalent to heterochromatin and the
occurrence of this type of DNA suggests that its presence gives this chromosome the
critical length required for proper stability and function

•

The 30 kb region can be removed from each end without affecting vegetative
growth, although chromosome stability is considerably reduced

•

Besides the Ty elements, it is the rDNA on chromosome XII that most significantly
contributes to repetitiveness. A cluster of some 15 tandem repeats (2 kb each)
containing the CUP1 gene and contributing to polymorphic variation is found on
chromosome VIII
Repeated stretches of short oligonucleotides exist. These include poly(A) or poly(T)
tracts, alternating poly(AT) or poly(TG) tracts, and direct or inverted long repeats

•
(S. cerevisae)
Genome Inventory of S. cerevisae
Graphical View of Protein Coding Genes of S. cerevisiae (as
of Nov 20, 2013)

Distribution of Gene Products among Biological
Process Categories

S. Cerevisiae gene products that are annotated to one or more terms in each GO aspect

Distribution of Gene Products among
Molecular Function Categories

Distribution of Gene Products among Cellular
Component Categories
Genome Inventory of S. pombe

2004

2013
Genome Inventory of C. albicans
Graphical View of Protein Coding Genes of C. albicans (as of
Nov 20, 2013)

Distribution of Gene Products among Cellular
Component Categories

C. albicans gene products that are annotated to one or more terms in each GO aspect

Distribution of Gene Products among Biological
Process Categories

Distribution of Gene Products among
Molecular Function Categories
Feature type
(Total )

Saccharomyces
cerevisae

Schizosaccharomyces
pombe

6,607

5123

6,214

Chromosome length (bp)

12,157,105

12,362,167

14,324,315

Nuclear genome (bp)

12,071,326

12,342,737

14,283,895

85,779

19,430

40,420

16

3

8

Mean coding Length (bp)

1485

1426

1439

No. of Introns

272

4730

224

69.9 %

57.5 %

61.5 %

92

450

-

GC content

39 %

36 %

33.46 %

Gene density (gene per bp)

2124

2528

2342

Unique proteins

1104

681

1218

Pseudogenes

19

29

7

Centromere

16

3

8

tRNA

299

171

156

rRNA

27

47

6

snRNA

6

7

5

No. of genes

Mitochondrial genome (bp)

No. of chromosomes

Coding percentage
Non-coding RNA

Candida albicans
Table 1: Frequency and Characteristics of Short Tandem
Repeats in the Coding Sequences of Fungal Genomes

Table 2: Number, Abundance Ranking, and Proportion of Gene
Products Containing the Indicated Interpro Protein Domain yeast
species and human
Genetic and Physical maps
• The genetic map of S. cerevisiae [Mortimer et al., 1992] has been
of considerable value to yeast molecular biologists
• DNA probes from some known genes mapped to particular
chromosomes for chromosomal walking. Finally, however, physical
maps of all chromosomes have been constructed without reference
to the genetic maps.
• Beside local expansion or contraction of the genetic map, and the
fact that the overall frequency of meiotic recombination increases
with shortening chromosome size, the order of the genes
positioned on the chromosomes by genetic and physical mapping
grossly agree
• Thus, the comparison of the physical and genetic maps show that
most of the linkages have been established to give the correct
gene order but that in many cases the relative distances derived
from genetic mapping are imprecise. The obvious imprecision of
the genetic maps may be due to the fact that different yeast
strains have been used in establishing the linkages
Genetic and Physical map of yeast chromosome II
Genetic redundancy in yeast
•
•
•
•
•

•

•
•

•
•

There is a considerable degree of internal genetic redundancy in the yeast genome
It is difficult to correlate physical redundancy completely to functional redundancy because even
in yeast gene functions have been precisely defined to a limited extent
Duplicated sequences are confined to nearly the entire coding region of these genes and do not
extend into the intergenic regions
Corresponding gene products share high similarity in terms of amino acid sequence or sometimes
are even identical and, therefore, may be functionally redundant
Due to sequence differences within the promoter regions, gene expression should vary according
to the nature of the regulatory elements or other (regulatory) constraints; it may well be that one
gene copy is highly expressed while another one is lowly expressed; turning on or off expression of
a particular copy within a gene family may depend on the differentiated status of the cell (such as
mating type, sporulation, etc.)
Classical examples of redundant genes in subtelomeric regions are the yeast MEL, SUC, MGL and
MAL genes subtelomeric regions of several yeast chromosomes share highly conserved
segments, in some instances up to 30 kb, which carry duplicated genes the functions of which are
largely unknown.
Duplicated genes have also been found in clusters. E.g. in chromosome II and cluster of three
hexose transporter genes on chromosome VIII
Cluster Homology Regions (CHRs): Sequences of complete chromosomes on being compared to
each other revealed that there are large chromosome segments in which homologous genes are
arranged in the same order with the same relative transcriptional orientations on two or more
chromosomes. This is responsible for 30-40% of total redundancy
Chromosomes II and IV share the longest CHR, comprising a pair of pericentric regions of 170 and
120 kb, respectively, that share 18 pairs of homologous genes
Significance: Whatever the relative timescale and mechanisms of duplications, these events
followed by mutations affecting functional properties give a chance to result in improved
environmental fitness. On the other hand, the high gene density in yeast indicates a strong
tendency to maintain a compact genome, therefore compensatory mechanisms must exist to
remove non-functional or superfluous gene copies.
Figure: View of 53 clustered gene duplications between the 16 chromosomes of
yeast
Table: Gene duplication in S. pombe and S. cerevisiae using NCBI BlastClust
Sequence Variation among Yeast Strains
• Polymorphisms in different yeast strains is due to the following
factors:

1)

variable number of gene copies from repeated gene families

2)

individual patterns caused by the presence or absence of
particular Ty elements

3)

plasticity of the chromosome ends

4)

excisions or inversions of particular gene regions

5)

chromosome breakage has been found to occur in yeast, resulting
in karyotypes deviating from the 'normal' picture
Yeast Mitochondrial genome
•

The mitochondrial genes and their mosaic intronic structure were first identified in S. cerevisiae in
1998 . First mitochondrial gene sequenced ever was from S. cerevisiae

•





Multi-copy mitochondrial genome from S. cerevisiae is characterized by :
low gene density and high A+T content
base composition is highly heterogeneous
G+C content of the genes is approximately 30%
intergenic spacers are composed of quasi-pure A+T stretches of several hundreds of base
pairs, interrupted by more than 150 (G+C)rich clusters, ranging from 10 to 80 bp in length
(This shows why scientists have sequenced the genes and neglected the intergenic regions)

•






The genome contains the genes for
cytochrome c oxidase subunits I, II and III (cox1, cox2 and cox3)
ATP synthase subunits 6, 8 and 9 (atp6, atp8 and atp9),
apocytochrome b (cytb), a ribosomal protein (var1)
several intron-related open reading frames (ORFs)
7-8 replication origin- like (ori) elements and encodes 21S and 15S ribosomal RNAs, 24 tRNAs that
can recognize all codons, and the 9S RNA component of RNase P

•

cox1 gene and, to a lesser extent, the cytb, 21S RNA and 15S RNA genes constitute the largest
blocks of higher G+C density
atp6, atp9, cox2, cox3 and tRNA genes appear as small G+C-enriched islands in the middle of A+T
and G+C cluster-rich regions

•
Red- Exons; Grey- Introns; Yellow- rRNA; Green- tRNA; Dark blue- Ori elements
Human-Yeast connection
• By comparing the catalogue of human sequences available in the
databases with the ORFs on the completed yeast chromosomes at the
amino acid level it is estimated that:
 >30% of the yeast genes have homologues among the human genes.
 As expected, most of the genes of known function categorized in this way
represent basic functions in both organisms.
 More similarities become apparent, when ESTs are included in the
analysis.
 Most compelling protagonists among these homologues are yeast genes
that bear substantial similarity to human 'disease genes‘
 Yeast genome is 200 times smaller than the human one
 Yeast genome is only 9-10 times less complex in its capacity to code for
proteins
• Applications:
 Yeast may be a simple system to assay novel drugs or ligands in view of the
conservation of some basic mechanisms between yeast and human
cells
 This conservation that makes some yeast genes important for study of
human genetics
S. Cerevisae genes related to human disease genes

S. Cerevisae genes related to nucleotide excision repair (NER) genes
S. pombe genes related to human disease genes

S. pombe genes related to human cancer genes
Figure: Comparison of homologous genes from different species

Figure: Orthologs in different species
Figure: Comparison of proteins in S. pombe (S.p.), S. cerevisiae (S.c.) and C. elegans (C.e.)
(a) Pie chart comparing the homology of proteins of S. pombe with those of S. cerevisiae and
C. elegans; (b) Pie chart comparing the homology of proteins of S. cerevisiae with those of
S. pombe and C. elegans
S. cerevisiae had a sequence approximately 60 times
larger than any sequence previously attempted
indicating why Goffeau felt compelled to invite the
cooperation of a group of laboratories

At the time the sequencing of model organisms such
as S. cerevisiae appeared to be the logical step
towards the eventual characterization of the human
genome, a task that seemed beyond the scope of
technology due to its tremendous size of 3,000 Mb
Thank-you…
By:
Nazish Nehal,
M. Tech (Biotechnology),
University School of Biotechnology (USBT),
Guru Gobind Singh Indraprastha University,
New Delhi (INDIA)

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
UniProt
UniProtUniProt
UniProt
 
Mapping Techniques - Fluorescent in situ Hybridization(FISH) and Sequence Tag...
Mapping Techniques - Fluorescent in situ Hybridization(FISH) and Sequence Tag...Mapping Techniques - Fluorescent in situ Hybridization(FISH) and Sequence Tag...
Mapping Techniques - Fluorescent in situ Hybridization(FISH) and Sequence Tag...
 
Ion torrent and SOLiD Sequencing Techniques
Ion torrent and SOLiD Sequencing Techniques Ion torrent and SOLiD Sequencing Techniques
Ion torrent and SOLiD Sequencing Techniques
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Site directed mutagenesis
Site  directed mutagenesisSite  directed mutagenesis
Site directed mutagenesis
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Electrophoretic mobility shift assay
Electrophoretic mobility shift assay Electrophoretic mobility shift assay
Electrophoretic mobility shift assay
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Mapping the genome of bacteria
Mapping the genome of bacteriaMapping the genome of bacteria
Mapping the genome of bacteria
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Artificial Vectors
Artificial VectorsArtificial Vectors
Artificial Vectors
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
EMBL
EMBLEMBL
EMBL
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 

Ähnlich wie Yeast genome project

MOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptx
MOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptxMOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptx
MOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptx
AmosiRichard
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
Rakesh Kumar
 
Gene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptxGene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptx
TanveerAhmadRather
 

Ähnlich wie Yeast genome project (20)

MOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptx
MOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptxMOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptx
MOLECULAR AND CYTOGENETIC ANALYSIS -BMLS GENERAL &HBT-1.pptx
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Sequencing genes and genomes
Sequencing genes and genomesSequencing genes and genomes
Sequencing genes and genomes
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Microbial physiology in genomic era
Microbial physiology in genomic eraMicrobial physiology in genomic era
Microbial physiology in genomic era
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies.
 
Human Genome presentation.pptx
Human Genome presentation.pptxHuman Genome presentation.pptx
Human Genome presentation.pptx
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 
Gene cloning
Gene cloningGene cloning
Gene cloning
 
It final presentation
It final presentationIt final presentation
It final presentation
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plan...
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plan...Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plan...
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plan...
 
Gene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptxGene sequencing steps involved, methods used and applications pptx
Gene sequencing steps involved, methods used and applications pptx
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Yeast genome project

  • 2. Introduction • Saccharomyces cerevisiae • • • • • •    It is perhaps the most useful diploid yeast, having been instrumental to winemaking, baking and brewing since ancient times It is one of the most intensively studied eukaryotic model organisms in molecular and cell biology Size: 5- 10 µm in diameter Sequenced in year: 1996 Strain sequenced: S288C Databases: Munich Information Centre for Protein Sequences (MIPS): http://www.mips.biochem.mpg.de/mips/yeast/ Yeast Protein Database (YPD): http://quest7.proteome.com/YPDhome.html Saccharomyces Genome Database (SGD): http://genome-www.stanford.edu/Saccharomyces/ • Schizosaccharomyces pombe (Fission yeast) • • • • It is used as a model organism in molecular and cell biology Size: 3 to 4 µm in diameter and 7 to 14 µm in length Sequenced in year: 2002 Strain sequenced: 972h by European sequencing consortium (EUPOM) including 13 laboratories and Wellcome Trust Sanger Institute; Cold Spring Harbor Laboratory Databases: PomBase: http://www.pombase.org Broad Institute: Saccharomyces genome database: http://www.broadinstitute.org/annotation/genome/schizosaccharomyces_group/MultiHome.html •  
  • 3. • Candida albicans • Most common human fungal pathogen • It is diploid fungus that grows both as yeast and filamentous cells and a causal agent of opportunistic oral and genital infections in humans and candidal onychomycosis, an infection of the nail plate • Size: 2.0-7.0 µm in diameter µm in length 3.0-8.5 µm in length • Sequenced in year:2004 by consortia formed by Stanford technology centre • Strain sequenced: SC5314 • Databases:  Candida database : http://www.candidagenome.org  Broad Institute: Saccharomyces genome database: http://www.broadinstitute.org/annotation/genome/candida_group/Multi Home.html
  • 4. • The bakers yeast Saccharomyces cerevisae is the first eukaryote whose genome is entirely sequenced • Mitochondrial DNA was sequenced in segments in the 1980s. • In 1989, it was decided to initiate a yeast sequencing project within the frame of the EU biotechnology programmes, some 35 European laboratories became initially involved in this enterprise [Vassarotti & Goffeau, 1992] • Chromosome III was the first chromosome to be completed in 1992 followed by XI and II both in 1994 • The 315kb sequence of yeast chromosome III was published, it was a remarkable scientific landmark not only by being the first eukaryotic chromosome ever to be sequenced, but primarily because it revealed the extent of what remained to be understood in the genome of an otherwise extensively studied organism, such as, Saccharomyces cerevisiae • Soon after its beginning, several other laboratories joined the project and agreed upon an international collaboration that enabled the whole yeast genome sequence to be finalized in 1995 • More than 600 scientists in Europe, North America and Japan became involved in this effort and the entire sequence was released in April 1996.
  • 5.
  • 6. EU=55.9%, UK=17.6%, USA= 20.0%, Canada= 4.3%, Japan= 2.2% Figure: Consortia involved in the yeast genome sequencing project
  • 7. Cloning and Mapping Procedures: • The sequencing of chromosome III started from a collection of overlapping plasmid or phage lambda clones that were distributed by the DNA coordinator to the contracting laboratories. However, it soon became evident that ordered cosmid libraries were much more advantageous to aid large scale sequencing. • To construct a library with as complete coverage as possible with as few clones as possible, the cloned DNA fragments should be randomly distributed on the DNA. • Under these conditions, the number of clones (N) in a library representing each genomic segment with a given probability (P) is N = ln (1-P)/ln (1-f) where f is the insert length expressed as fraction of the genome size [Clarke & Carbon, 1976]. • For example, with the size of 12,800 kb for the yeast genome and assuming an average insert length of 35 kb, a cosmid library containing 4600 random clones would represent the yeast genome at P=99.99%, i.e. about twelve times the genome equivalent
  • 8. A low number of clones was of interest in setting up ordered yeast cosmid libraries or specific sublibraries by sorting out from an unordered cosmid library by colony hybridization using specific chromosomal DNA purified by pulsed-field gel electrophoresis as a probe The 'nested chromosomal fragmentation' method [Thierry & Dujon, 1992] was then applied to rapid sorting of these clones Finally, a set of overlapping cosmids was sufficient to build a contig of specific chromosome • This approach has also been successfully applied to many of the other chromosomes sequenced in the yeast genome project • To facilitate sequencing and assembly of the sequences, contigs of overlapping cosmids and fine-resolution physical maps of the respective chromosomes were constructed first, by application of classical mapping methods (fingerprints, cross-hybridization) or by novel methods developed for this programme, such as site-specific chromosome fragmentation [Thierry & Dujon, 1992] or the high resolution cross-hybridization matrix [Scholler et al., 1995]
  • 9. Sequencing strategies and Sequence Assembly • • In the European network, clones were distributed to the collaborating laboratories according to a scheme worked out by the DNA coordinators Each contracting laboratory was free to apply sequencing strategies and techniques of its own provided that the sequences were entirely determined on both strands and unambiguous readings were obtained • Two principle approaches were used to prepare subclones for sequencing: 1) generation of sub-libraries by the use of a series of appropriate restriction enzymes or from nested deletions of appropriate sub-fragments made by exonuclease III 2) generation of shotgun libraries from whole cosmids or subcloned fragments by random shearing of the DNA • Sequencing by the Sanger technique was done 1) manually, labelling with [35S]dATP being the preferred method of monitoring 2) by automated devices • Two types of devices for on-line detection with fluorescence labeling were employed 1) Applied Biosystems ABI373A 2) Pharmacia A.L.F. • One laboratory used the direct blotting electrophoresis system from GATC company (Konstanz). Similar procedures were applied to the sequencing of chromosomes outside the European network. The American laboratories largely relied on machine-based large-scale sequencing.
  • 10. Sequencing Telomeres • The yeast chromosome telomeres presented a particular problem • Due to their repetitive sub-structures and the lack of appropriate restriction sites they could be cloned by conventional procedures with only a few exceptions • Largely, telomeres were physically mapped relative to the terminal-most cosmid inserts using the I-SceI chromosome fragmentation procedure [Thierry & Dujon, 1992] • The sequences were then determined from specific plasmid clones obtained by 'telomere trap cloning', an elegant strategy developed by E. Louis at Oxford [Louis, 1994; Louis & Borts, 1995]
  • 11. Sequence Assembly • Within the European network, all original sequences were submitted by the collaborating laboratories to the Martinsried Institute of Protein Sequences (MIPS) which acted as an informatics centre • The sequences were kept in a data library, assembled into progressively growing contigs, and updated during the course of the project by the application of appropriate criteria in a number of quality controls, starting with chromosome XI • In collaboration with the DNA coordinators the final chromosome sequences were derived. Also in the other yeast chromosomes, automated procedures were employed for sequence assembly, based for example on the programpackage developed at 1) Cambridge [e.g. Dear & Staden, 1991] 2) ACeDB programdeveloped for the C. elegans genome project [Thierry-Mieg & Durbin, 1992] • In any case, correct assembly of the sequences was guaranteed by establishing that the order of restriction sites predicted from the sequence was consistent with the physical maps of these sites that had been determined independently and care was taken to perform quality controls that would result in a high accuracy • From theoretical considerations taking all types of errors together, it follows that with an average sequence accuracy of 99.9% • In practice, care was taken to minimize frameshift errors, which represented about two thirds of all sequencing errors and thus would have the most deleterious effects on gene interpretation. Meanwhile, all sequences have been systematically checked for errors again and were corrected in the data libraries.
  • 12. The sequences have been interpreted using the following principles: i. All intron splice site/branch-point pairs detected by using specially defined patterns were listed ii. All ORFs containing at least 100 contiguous sense codons and not contained entirely in a longer ORF iii. Centromere and telomere regions, as well as tRNA genes and Ty elements or remnants thereof were sought by comparison with previously characterized datasets • FASTA BLASTX and FLASH1 in combination with the Protein Sequence Database of PIR-International and other public databases • Protein signatures were detected by using the PROSITE dictionary, as well as BLOCKS and PRODOM domains • Base composition; nucleotide pattern frequencies; GC profiles; ORF distribution profiles were performed by using GCG programs or the X11 program package • For calculations of GC content of ORFs the algorithm CODONS was used • This information was compiled at the end of the sequencing project to annotate all genetic elements in the yeast genome
  • 13. Classification of S. cerevisiae genes
  • 14.
  • 15. ORF sizes in the S. cerevisiae genome
  • 16.  At the time, the yeast genome sequencing project had been finalized, comparison of the total sequence with public databases revealed: • some 28.4% of the yeast ORFs corresponded either to previously known protein-encoding genes or to genes whose functions have been determined previously or during the course of the project • An estimated 5.6% of the total remained questionable ORFs • 66% of the total ORFs represented novel putative yeast genes • 14.8% of the total had homologues among gene products from yeast or other organisms whose functions are known • 14.4% of the total had recognizable motifs or weak homologies to genes of experimentally characterized functions. • Remaining 37.7% of the total ORFs had either homologues to ORFs of unknown function on other • Thus, approximately 2200 of the yeast genes had to be categorized as 'genes of unknown function', sometimes called ‘orphans’  A most useful inventory of the yeast proteins had been compiled in the Yeast Proteome Database (YPD) [Garrels et al., 1996] and is updated regularly.
  • 17. The mystery of orphans • • • • • • • • • • o  ‘Orphans’ are defined by the absence of known function and of structural homologs of known function, so it seems only natural that, with time, they will vanish. Functions of a few genes previously classified as orphans were reported during the sequencing project itself The most striking result from the chromosome III, sequence was that approximately half of all protein-coding ORFs revealed by the sequence, had no clearcut sequence homologs in any organisrn, including yeast itself Thus, with right sequence of the first eukayotic chromosome, it was the discovery of the extent of our ignorance, rather than the discovery of many new genes, that was the most conspicuous finding exact figures depend on stringency criteria applied to determine the significance of sequence similarities on average, 30-35% of all ORFs of the yeast genome are orphans. Even in absence of homologs, computers can provide some clues about the nature of some orphans. For example, prediction of transmembrane segments resulted in the striking conclusion that up to 35-40% of the predicted proteins from chromosome III have trans- membrane helices. Ultimately, the function of each sequence-predicted ORF can only be demonstrated by experiments total number of orphans in the yeast genome (about 2000) It is clear that orphans by and large, are not fundamentally different from other yeast genes in terms of expression. If orphans are real genes, why were they not discovered before? Genome redundancy is a possible explanation. As sequencing progressed, structural homologs to earlier orphans were regularly discovered in the yeast genome. Statistically, however, there is no indication that orphans tend to be more frequently duplicated than the genes previously characterized by classical genetics or their structural homologs. If any-thing, the converse seems to be true.
  • 18. Gene Density and Gene Arrangement of Proteinencoding Genes in S. cerevisiae • From the number of genes and the total size of the yeast genome one arrives at a gene density • Gene density in all yeast chromosomes is rather similar • Excluding the ORFs contributed by the Ty elements, ORFs occupy on average 70% of the sequences. This leaves only limited space for the intergenic regions which can be thought to harbour the major regulatory elements involved in chromosome maintenance, DNA replication and transcription. • The compact nature of the S. cerevisiae genome is apparent when compared to more complex eukaryotic systems. • • C. elegans contains a potential protein-encoding gene only every 5-6 kb [Hodgkin et al., 1994] In the human genome, gene density had been estimated to be as low as one gene in 30 kb [Olson, 1993] after the draft sequence is available, this figure is one gene in about 100 kb • Schizosaccharomyces pombe, possesses a lower gene density (one gene per 2.3 kb) than S. cerevisiae. The difference between the two yeast genomes appears to be due to the fact that in the fission yeast 40% of the genes contain introns, whereas only a minor fraction (< 5% of the proteinencoding genes in S. cerevisiae are found to be interrupted by introns
  • 19.
  • 20. • • Generally, ORFs appear to be rather evenly distributed among the two strands of the single chromosomes. In some chromosomes (e.g. I, II, VIII), there is a slight excess of coding capacity on one of the strands, the significance of which is not known Average base composition of yeast DNA is 38.4% (G+C) • GC content of: 1. protein coding (40.2%) 2. non-coding regions (35.1%) • • Coding regions are evenly distributed between the two strands Average ORF size is 1450 bp • The average sizes of inter-ORF regions vary between 630 and 945 bp for different chromosomes 1. 618 bp on average for 'divergent promoters' (36.2% GC) 2. 326 bp for 'convergent terminators' (29.3% GC) 3. 517 bp for 'promoter-terminator combinations' (34.2% GC) • • • • Average base composition has been found to be symmetrical over the entire chromosomes Base composition of ORFs themselves showing a significant excess of homopurine pairs on the coding strand . Regional variations of base composition with similar amplitudes were first noted along chromosome III A most interesting observation was that the compositional periodicity correlates with local gene density, reaching more than 85% in GC-rich regions, followed by segments of comparably lower gene density (50-55%) in AT-rich regions [Dujon et al., 1994].
  • 21.  Functional elements of yeast chromosome: 1. 2. 3. Centromere Telomere Origins of replication  Complex and Simple repeats • • yeast genome is remarkably poor in repeated sequences unique constellation of repetitious sequences at the two ends of chromosome I is found. Approximately 30 kb in each subtelomeric region carry similar (but nonessential) genes and a 15 kb repeat • these terminal regions represent the yeast equivalent to heterochromatin and the occurrence of this type of DNA suggests that its presence gives this chromosome the critical length required for proper stability and function • The 30 kb region can be removed from each end without affecting vegetative growth, although chromosome stability is considerably reduced • Besides the Ty elements, it is the rDNA on chromosome XII that most significantly contributes to repetitiveness. A cluster of some 15 tandem repeats (2 kb each) containing the CUP1 gene and contributing to polymorphic variation is found on chromosome VIII Repeated stretches of short oligonucleotides exist. These include poly(A) or poly(T) tracts, alternating poly(AT) or poly(TG) tracts, and direct or inverted long repeats •
  • 23. Genome Inventory of S. cerevisae
  • 24. Graphical View of Protein Coding Genes of S. cerevisiae (as of Nov 20, 2013) Distribution of Gene Products among Biological Process Categories S. Cerevisiae gene products that are annotated to one or more terms in each GO aspect Distribution of Gene Products among Molecular Function Categories Distribution of Gene Products among Cellular Component Categories
  • 25. Genome Inventory of S. pombe 2004 2013
  • 26. Genome Inventory of C. albicans
  • 27. Graphical View of Protein Coding Genes of C. albicans (as of Nov 20, 2013) Distribution of Gene Products among Cellular Component Categories C. albicans gene products that are annotated to one or more terms in each GO aspect Distribution of Gene Products among Biological Process Categories Distribution of Gene Products among Molecular Function Categories
  • 28. Feature type (Total ) Saccharomyces cerevisae Schizosaccharomyces pombe 6,607 5123 6,214 Chromosome length (bp) 12,157,105 12,362,167 14,324,315 Nuclear genome (bp) 12,071,326 12,342,737 14,283,895 85,779 19,430 40,420 16 3 8 Mean coding Length (bp) 1485 1426 1439 No. of Introns 272 4730 224 69.9 % 57.5 % 61.5 % 92 450 - GC content 39 % 36 % 33.46 % Gene density (gene per bp) 2124 2528 2342 Unique proteins 1104 681 1218 Pseudogenes 19 29 7 Centromere 16 3 8 tRNA 299 171 156 rRNA 27 47 6 snRNA 6 7 5 No. of genes Mitochondrial genome (bp) No. of chromosomes Coding percentage Non-coding RNA Candida albicans
  • 29. Table 1: Frequency and Characteristics of Short Tandem Repeats in the Coding Sequences of Fungal Genomes Table 2: Number, Abundance Ranking, and Proportion of Gene Products Containing the Indicated Interpro Protein Domain yeast species and human
  • 30. Genetic and Physical maps • The genetic map of S. cerevisiae [Mortimer et al., 1992] has been of considerable value to yeast molecular biologists • DNA probes from some known genes mapped to particular chromosomes for chromosomal walking. Finally, however, physical maps of all chromosomes have been constructed without reference to the genetic maps. • Beside local expansion or contraction of the genetic map, and the fact that the overall frequency of meiotic recombination increases with shortening chromosome size, the order of the genes positioned on the chromosomes by genetic and physical mapping grossly agree • Thus, the comparison of the physical and genetic maps show that most of the linkages have been established to give the correct gene order but that in many cases the relative distances derived from genetic mapping are imprecise. The obvious imprecision of the genetic maps may be due to the fact that different yeast strains have been used in establishing the linkages
  • 31. Genetic and Physical map of yeast chromosome II
  • 32. Genetic redundancy in yeast • • • • • • • • • • There is a considerable degree of internal genetic redundancy in the yeast genome It is difficult to correlate physical redundancy completely to functional redundancy because even in yeast gene functions have been precisely defined to a limited extent Duplicated sequences are confined to nearly the entire coding region of these genes and do not extend into the intergenic regions Corresponding gene products share high similarity in terms of amino acid sequence or sometimes are even identical and, therefore, may be functionally redundant Due to sequence differences within the promoter regions, gene expression should vary according to the nature of the regulatory elements or other (regulatory) constraints; it may well be that one gene copy is highly expressed while another one is lowly expressed; turning on or off expression of a particular copy within a gene family may depend on the differentiated status of the cell (such as mating type, sporulation, etc.) Classical examples of redundant genes in subtelomeric regions are the yeast MEL, SUC, MGL and MAL genes subtelomeric regions of several yeast chromosomes share highly conserved segments, in some instances up to 30 kb, which carry duplicated genes the functions of which are largely unknown. Duplicated genes have also been found in clusters. E.g. in chromosome II and cluster of three hexose transporter genes on chromosome VIII Cluster Homology Regions (CHRs): Sequences of complete chromosomes on being compared to each other revealed that there are large chromosome segments in which homologous genes are arranged in the same order with the same relative transcriptional orientations on two or more chromosomes. This is responsible for 30-40% of total redundancy Chromosomes II and IV share the longest CHR, comprising a pair of pericentric regions of 170 and 120 kb, respectively, that share 18 pairs of homologous genes Significance: Whatever the relative timescale and mechanisms of duplications, these events followed by mutations affecting functional properties give a chance to result in improved environmental fitness. On the other hand, the high gene density in yeast indicates a strong tendency to maintain a compact genome, therefore compensatory mechanisms must exist to remove non-functional or superfluous gene copies.
  • 33. Figure: View of 53 clustered gene duplications between the 16 chromosomes of yeast
  • 34. Table: Gene duplication in S. pombe and S. cerevisiae using NCBI BlastClust
  • 35. Sequence Variation among Yeast Strains • Polymorphisms in different yeast strains is due to the following factors: 1) variable number of gene copies from repeated gene families 2) individual patterns caused by the presence or absence of particular Ty elements 3) plasticity of the chromosome ends 4) excisions or inversions of particular gene regions 5) chromosome breakage has been found to occur in yeast, resulting in karyotypes deviating from the 'normal' picture
  • 36. Yeast Mitochondrial genome • The mitochondrial genes and their mosaic intronic structure were first identified in S. cerevisiae in 1998 . First mitochondrial gene sequenced ever was from S. cerevisiae •     Multi-copy mitochondrial genome from S. cerevisiae is characterized by : low gene density and high A+T content base composition is highly heterogeneous G+C content of the genes is approximately 30% intergenic spacers are composed of quasi-pure A+T stretches of several hundreds of base pairs, interrupted by more than 150 (G+C)rich clusters, ranging from 10 to 80 bp in length (This shows why scientists have sequenced the genes and neglected the intergenic regions) •      The genome contains the genes for cytochrome c oxidase subunits I, II and III (cox1, cox2 and cox3) ATP synthase subunits 6, 8 and 9 (atp6, atp8 and atp9), apocytochrome b (cytb), a ribosomal protein (var1) several intron-related open reading frames (ORFs) 7-8 replication origin- like (ori) elements and encodes 21S and 15S ribosomal RNAs, 24 tRNAs that can recognize all codons, and the 9S RNA component of RNase P • cox1 gene and, to a lesser extent, the cytb, 21S RNA and 15S RNA genes constitute the largest blocks of higher G+C density atp6, atp9, cox2, cox3 and tRNA genes appear as small G+C-enriched islands in the middle of A+T and G+C cluster-rich regions •
  • 37. Red- Exons; Grey- Introns; Yellow- rRNA; Green- tRNA; Dark blue- Ori elements
  • 38. Human-Yeast connection • By comparing the catalogue of human sequences available in the databases with the ORFs on the completed yeast chromosomes at the amino acid level it is estimated that:  >30% of the yeast genes have homologues among the human genes.  As expected, most of the genes of known function categorized in this way represent basic functions in both organisms.  More similarities become apparent, when ESTs are included in the analysis.  Most compelling protagonists among these homologues are yeast genes that bear substantial similarity to human 'disease genes‘  Yeast genome is 200 times smaller than the human one  Yeast genome is only 9-10 times less complex in its capacity to code for proteins • Applications:  Yeast may be a simple system to assay novel drugs or ligands in view of the conservation of some basic mechanisms between yeast and human cells  This conservation that makes some yeast genes important for study of human genetics
  • 39. S. Cerevisae genes related to human disease genes S. Cerevisae genes related to nucleotide excision repair (NER) genes
  • 40. S. pombe genes related to human disease genes S. pombe genes related to human cancer genes
  • 41. Figure: Comparison of homologous genes from different species Figure: Orthologs in different species
  • 42. Figure: Comparison of proteins in S. pombe (S.p.), S. cerevisiae (S.c.) and C. elegans (C.e.) (a) Pie chart comparing the homology of proteins of S. pombe with those of S. cerevisiae and C. elegans; (b) Pie chart comparing the homology of proteins of S. cerevisiae with those of S. pombe and C. elegans
  • 43. S. cerevisiae had a sequence approximately 60 times larger than any sequence previously attempted indicating why Goffeau felt compelled to invite the cooperation of a group of laboratories At the time the sequencing of model organisms such as S. cerevisiae appeared to be the logical step towards the eventual characterization of the human genome, a task that seemed beyond the scope of technology due to its tremendous size of 3,000 Mb
  • 44. Thank-you… By: Nazish Nehal, M. Tech (Biotechnology), University School of Biotechnology (USBT), Guru Gobind Singh Indraprastha University, New Delhi (INDIA)