Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
How to Assemble a Human Genome? Mix generous
amounts of Junk DNA and Indifferent DNA, add a Dollop
of Garbage DNA and a Sp...
Dan Graur 	

(until 5 September 2012)
Dan Graur 	

(from 6 September 2012 to the present time)
In September 2012, 30 papers based on thousands
of data sets were simultaneously published in high
profile journals to desc...
The main finding of the main paper was
picked up by news outlets all over the world.
And what was the main finding of the main
paper?
And what was the main finding of the main
paper?

80% of t

he huma
n genom

e is func

tional
442 authors 	

+	

594 collaborators
On the thirtieth day
of the month of
September, in the
Year of our Lord
2012, it was
announced that “junk
DNA” is “dead.”	...
An epic media spin
An epic media spin
An epic media spin
An example of epic media spin
An epic media spin

Compiled by T. Ryan Gregory, Genomicron
An epic media spin

(Una manipulación mediática épica)
Creationists had a ball
Three problems: (1) If the human genome is indeed devoid
of junk DNA as implied by the ENCODE project, then a
long, undire...
Three problems: (2) If ENCODE is right, then humans are
the Goldilocks of the living world.	


Organism	


C-value	


Junk...
Three problems: (3) If ENCODE-2012 is right, then
ENCODE-2011 is wrong.	


80% of t

he huma
n genom

e is func

tional	

...
Solution: Kill ENCODE
We wrote a critical piece on
ENCODE, and got a very negative
review from Trends in Genetics.	

angry?	

	

insane?	

“Grau...
How did ENCODE reach the conclusion that 80%
of the human genome is functional, when the
evidence for selection constraint...
Wrong experimental systems.	

A huge chunk of ENCODE data is derived from
HeLa cells and other cancer cells. 	

Does the H...
Wrong experimental systems.
Wrong experimental systems.	


Landry et al. 2013
•  Equating hype with science.
6 S E P T E M B E R 2 0 1 2	


The birth of 80%	


“These data enabled us
to assign biochemical
functions for 80% of
the g...
6 S E P T E M B E R 2 0 1 2	


Implication that 80% may be 99%	


“The vast majority (80.4%) of the human genome
participa...
“junk DNA” is dead!	

6 S E P T E M B E R 2 0 1 2
“junk DNA” is dead!
99% is not enough, 100% is better	


ENCODE researcher Ewan Birney tells Ed
Yong that that the 80 percent figure will
incre...
The PR machine at work:	

“Virtually all of the DNA passed down from
generation to generation has been kept for a
reason.”...
99% disappears and 80% becomes 40%	


I [went] back to ENCODE biologist John
Stamatoyannopoulos, who was quoted in the firs...
40% is actually 9%
(The origin of 9%: 5% + 4% = 9%)
(Oops: 9% reverts back to 5%)
9% becomes 20%
20% kills “junk DNA”
20% kills “junk DNA”	


• 
• 
• 
• 

20% of the genome is functional. 	

Ergo, 80% must be junk. 	

Yet, “junk DNA” should...
20% kills “junk DNA”	


• 
• 
• 
• 
	


20% of the genome is functional. 	

Ergo, 80% must be junk. 	

Yet, “junk DNA” sho...
At the end of 2012, 20%
becomes the favorite
number in Nature
Science remains loyal to
80%.
Genome

	


Transcribed	


Translated	


Nontranscribed	


Nontranslated	


Information flow within the genome 	

42
Genome

	


Transcribed	


Nontranscribed	

DNA	


Translated	

protein	


Nontranslated	

RNA	


Information flow within t...
Genome

	


Nontranscribed	


Transcribed	


Functional	

Nontranslated	


Translated	


Functional	


Junk	


Junk	


Fun...
Genome

	


Functional	


Junk	

nonfunctional	


Junk has nothing to do with non-protein-coding.	

Junk is about function...
Genome

	


Functional	


ad hoc	


Junk	


ad hoc	


46
Genome

	


Functional	


Junk	


= Pseudogene

	


47
Genome

	


Functional	


= Lazarus DNA

Junk	


	

48
Lazarus DNA
Emmaus DNA	


	


Zombie DNA	


49
If the acquired function (Lazarus DNA) lowers the
fitness of the carriers, it is called zombie DNA.
If the acquired function (Lazarus DNA) is advantageous, it
is called Emmaus DNA.
Genome

	


Functional	


Nontranscribed	


Transcribed	


Junk	


Transcribed	


Nontranscribed	


52
Genome

	


Functional	


Nontranscribed	


Transcribed	


Junk	


Transcribed	


Nontranscribed	

Transcriptome	


Not al...
Genome

	


Functional	


Transcribed	


Untranslated	


Junk	


Untranscribed	

 Untranscribed	


Transcribed	


Translat...
Genome

	


Functional	


Transcribed	


Untranslated	


Junk	


Untranscribed	

 Untranscribed	


Transcribed	


Translat...
THE ORIGIN OF A SPECIES (smart & elegant)	

	

Seiko Astron: Like a smartphone, the Astron is
GPS-enabled, allowing it to ...
Actually, Darwin would not be proud. 	

Evolution does not produce “smart and elegant”	

This is an intelligently designed...
The story of the human genome: 	

1998 to the present
agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttga
agttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttg...
15 February 2001
	


1st draft
21 October 2004
	


From 30,000 protein-coding
genes to less than 25,000.
agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttga
agttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttg...
Genebuild last updated/patched:
!May 2012!
!
Total length:
3,287,209,763 bp!
Protein-coding genes:
21,065!
Pseudogenes: !
...
The “end” of the Human Genome Project in 2004
($3.8 Billion) was a big disappointment for
scientists unversed in evolution...
Small

	


<<	

3.5 billion letters in a fourletter alphabet = 7 billion
bits = 0.81 GB (gigabytes)	


1 DVD = 8.5 GB	


I...
Small

	


≈	

1 DVD = 8.5 GB	

3.5 billion letters in a four-letter alphabet = 	

= 7 billion bits = 0.81 GB (gigabytes)	...
Sparsely populated with genes.

	


Organism

Gene Density
(# genes per 1 Mb)

Escherichia coli (bacterium)

911

Saccharo...
Densely populated with dead transposable elements

	


45-67% 	


+plus at most 0.1% for RNA-specifying genes (non-coding ...
Densely populated with dead transposable elements
Unoriginal

	


0.15% nonsynonymous differences	

1.22% synonymous differences	


Cost of sequencing your human genome 	

...
Comparing the human genome to other genomes
has given rise to three complexity paradoxes.*
Genomic paradox = A lack of cor...
Defining complexity is difficult	

The complexity of a system may
be defined by the minimum
number of independent
characters ...
Defining complexity is difficult	


Thus the wall on the right is more complex—it has a crack,
75
	

than the wall on left. ...
However, even if we cannot quantify organismal
complexity very well, in many cases, it is possible
to state unequivocally ...
However, even if we cannot quantify organismal
complexity very well, in many cases, it is possible
to state unequivocally ...
K-value paradox: Complexity
does not correlate with
chromosome number.	

Homo sapiens	


46	


Lysandra atlantica	


Ophio...
C-value paradox: Complexity
does not correlate with
genome size.	


3.5 × 109 bp
Homo sapiens

1.5 × 1010 bp
Allium cepa

...
G-value paradox:
Complexity does not
correlate with proteincoding gene number.	


	

~21,000	


~21,000	


~57,000	


80
	...
Total Number of Protein-Coding Genes	

	

Drosophila melanogaster (fruitfly) 	

	

13,917	

Pan troglodytes (chimpanzee) 	
...
Mommy, mommy, a fern has 	

27 times as many chromosomes
as I do; an amoeba has 200
times more DNA than I do; 	

and wheat...
agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttga
agttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttg...
What is and what isn’t “junk DNA”	

There are known knowns; there are
things we know that we know. 	

	

There are known u...
“Junk DNA” misrepresented as a “known unknown”
What is and what isn’t “junk DNA”	

Junk DNA is a known known; it is a thing
that we know what it does—it takes space.
Jun...
Junk DNA is a consequence of population genetics considerations!	

In organisms with
LARGE effective
population sizes, the...
Junk DNA is a consequence of population genetics considerations!	

In organisms with
LARGE effective
population sizes, the...
Junk DNA is a consequence of population genetics considerations!	

In organisms with
LARGE effective
population sizes, the...
Junk DNA is a consequence of population genetics considerations!
013)	

(2
little
Genomic anthropocentrism?	

d Doo

Human exceptionalism? 	

. For
W

“What would we expect for the number...
•  A peculiar definition of function.	

•  A peculiar definition of junk.	

•  A lack of evolutionary perspective.
In biology, there are two main concepts
of function: 	

	

•  A historical concept of function, also
referred to as the “s...
What is the function of the heart? 	


The proper function
is to pump blood.
What is the function of the heart? 	


The proper function
is to pump blood. 	


The causal functions of the heart are to
...
•  Evolutionary biologists use the proper
or selected effect function.	

•  ENCODE used the causal function.	


“Operation...
•  An example of a function that fits the
ENCODE definition: shoes binding
chewing gum.	


“Operationally, we define a
functi...
“By the logic employed by ENCODE, following a
collision between a car and a pedestrian, a car’s
bonnet would be ascribed t...
ENCODE uses a know logical fallacy called
affirming the consequent. 	

If a functional sequence is transcribed, 	

	

then,...
The ENCODE Project:	

74.7% of the genome is transcribed,	

56.1% is associated with modified histones,
15.2% is found in o...
Our additions to ENCODE 	

74.7% of the genome is transcribed,	

56.1% is associated with modified histones,
15.2% is found...
Interesting Question: 	

	

Why do people have problems with DNA that
has no function?
Inability to deal with randomness	

“… nothing is so alien to the human mind
as the idea of randomness.”	

John Cohen. 196...
People like mysteries:
such as hidden messages
in the Bible.
If you search long enough and hard enough for patterns in random texts,
you will find patterns. Especially if you do not em...
The Bible Code employs no negative controls. Someone else did and
they found similar “prophecies” in Moby Dock by Herman M...
ENCODE has no negative controls.	

	

Mike White provided them and showed in a paper
published in PNAS that random DNA seq...
What is “junk”?
“Some years ago I noticed that there are two kinds of
rubbish in the world and that most languages have
different words to...
“Were the extra DNA to become disadvantageous, it
would become subject to selection, just as junk that
takes up too much s...
Graur’s garage: Functional but full of junk	


A garage according to ENCODE 	


A garage in which junk became garbage
Junk can Sometimes be Repurposed
Junk DNA can Sometimes be Repurposed	


Norihiro Okada & Jürgen
Brosius, specialists in the
repurposing of junk DNA.
Functional DNA ✔	

Junk DNA ✔	

Garbage DNA ✔	

Lazarus DNA ✔	

Indifferent DNA	

Dark DNA
Sequence-indifferent DNA or
indifferent DNA refers to DNA sites that
are functional, but show no evidence of
selection aga...
Examples of indifferent DNA
are spacers and flanking
elements whose presence is
required but the sequence is not
important....
Dark DNA refers to the fraction of the genome for
which no good evidence exists as to its evolutionary
impact on fitness. 	...
Interesting Question: 	

	

How can one tell if a certain genomic sequence is
functional or not?	


Can we make the car on...
Mutation	


Mutation	


Mutation	


Mutation	


Mutation	


Mutation	


Mutation	


Functional DNA	

(almost all mutations...
Mutation	


Mutation	


Mutation	


Mutation	


Mutation	


Mutation	


Mutation	


Nonfunctional
DNA	

(all mutations are...
How do we know if a particular genomic
sequence is functional? 	

Since most mutations in functional regions are
deleterio...
Another indicator for the existence of a
genomic function is that losing it has some
consequence for the organism. 	

	

E...
123
124
Is it even possible that ENCODE is right?	

	

No! The main reason being that in humans, there is a huge
difference betwee...
Is it even possible that ENCODE is right?	

	

Under such conditions selection is inefficient and most
genetic variation is...
Fact 1: It has been known for more than a century that the vast
majority of non-neutral mutations are deleterious (Thomas ...
Motoo Kimura: Mutation rate cannot reach zero,
because of the COST OF FIDELITY. 	

	

In other words, the mutation rate in...
How did ENCODE reach such ridiculous
numbers?	

	

1.  It used methodologies encouraging biased
errors in favor of inflatin...
How did ENCODE reach such ridiculous
numbers?	

	

1.  It used methodologies encouraging biased
errors in favor of inflatin...
Example:	

	

Transcription factors binding sites (TFBS):	

	

So far, almost all known TFBSs range in length from 6

to 1...
How did ENCODE reach such ridiculous
numbers?	

	

1.  It used methodologies encouraging biased
errors in favor of inflatin...
Encode prefers false positives over false
negatives, thus inflating the proportion of
positives.
Example:	

	

ENCODE used a probability based alignment tool, and mapped RNA
transcripts onto DNA when the statistical con...
How did ENCODE reach such ridiculous
numbers?	

	

1.  It used methodologies encouraging biased
errors in favor of inflatin...
“Derived allele frequency spectrum for primate-specific elements, with
variations outside ENCODE elements in black and vari...
p = 10−37	

Magnitude of effect = 0.042% 	


“Derived allele frequency spectrum for primate-specific elements, with
variati...
Let’s examine the rationale and the
methodology for dealing with the
derived allele frequency spectrum
in primate-specific ...
The Why	

•  If all alleles are neutral, a certain
frequency distribution is expected.	

•  If some alleles are under nega...
The Why	

•  To deal with very short periods of
evolutionary time, ENCODE decided to use
primate specific sequences.
human	

human	

human	

chimpanzee	

gorilla	

macaque	

rat	

mouse
human	

human	

human	

chimpanzee	

gorilla	

macaque	

rat	

mouse	


Primate Specific Sequences
What is missing from the derived allele
frequency spectrum of primate-specific
elements in ENCODE?	

	

Genes!	

	

3,296,4...
Missing populations and their effect on
estimates of derived alleles and
ancestral alleles.	

	

Three human populations w...
Caucasians	

Derived allele 	

frequency (%)	

OUT	


40

60

60 	


Primate Specific Sequences 	


Ancestral alleles	


As...
Yoruba	

Derived allele 	

frequency (%)	


OUT	


100

20

0
Yoruba	

Derived allele 	

Frequency (%)	


100

20

0 	


The ENCODE data includes 2,136
alleles with frequencies of exac...
ENCODE uses multifurcated trees	


Frequency of derived allele = 40%
ENCODE uses multifurcated trees	


Frequency of derived allele < 40%
ENCODE uses only single species from
primates. 	


There are no derived alleles
p = 10−37	

Magnitude of effect = 0.042% 	


“Derived allele frequency spectrum for primate-specific elements, with
variati...
Unwarranted extrapolations:	

Badly trained techincians
tend to “kill” junk DNA
whenever they find a new
function in non-co...
Even supposing that all the 55,000 putative lincRNAs in
this paper are functional and important, then 	

55,000 × 2000 bp ...
Conclusion: Badly trained
technicians who do not understand
(1) population genetics, (2) the
concept of effective populati...
6 S E P T E M B E R 2 0 1 2	


442 researchers + 288 million dollars.	

What have we learned from ENCODE? 	


157
“Data is not information, information is
not knowledge, knowledge is not
wisdom, wisdom is not truth,” 	


	

—Robert Roya...
onion test

The
is a simple reality check for
anyone who thinks they have come up with a
universal function for 80% of the...
“All science is either physics or
stamp collecting.” 	

Ernest Rutherford	

	

“ENCODE is stamp collecting.”	

Roderic Gui...
Acknowledgments: The Good Guys	

Coauthors: Ricardo Azevedo, Becky Zufall, Nicholas Price, and
Yichen Zheng (UH), and Eran...
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Nächste SlideShare
Wird geladen in …5
×

Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

3.393 Aufrufe

Veröffentlicht am

How to Assemble a Human Genome? Mix generous amounts of Junk DNA and Indifferent DNA, add a dollop of Garbage DNA and a sprinkling of Functional DNA (Lazarus DNA optional)

Veröffentlicht in: Bildung, Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

  1. 1. How to Assemble a Human Genome? Mix generous amounts of Junk DNA and Indifferent DNA, add a Dollop of Garbage DNA and a Sprinkling of Functional DNA (Lazarus DNA optional) Dan Graur University of Houston
  2. 2. Dan Graur (until 5 September 2012)
  3. 3. Dan Graur (from 6 September 2012 to the present time)
  4. 4. In September 2012, 30 papers based on thousands of data sets were simultaneously published in high profile journals to describe the major findings from the ENCODE project.
  5. 5. The main finding of the main paper was picked up by news outlets all over the world.
  6. 6. And what was the main finding of the main paper?
  7. 7. And what was the main finding of the main paper? 80% of t he huma n genom e is func tional
  8. 8. 442 authors + 594 collaborators
  9. 9. On the thirtieth day of the month of September, in the Year of our Lord 2012, it was announced that “junk DNA” is “dead.”
  10. 10. An epic media spin
  11. 11. An epic media spin
  12. 12. An epic media spin An example of epic media spin
  13. 13. An epic media spin Compiled by T. Ryan Gregory, Genomicron
  14. 14. An epic media spin (Una manipulación mediática épica)
  15. 15. Creationists had a ball
  16. 16. Three problems: (1) If the human genome is indeed devoid of junk DNA as implied by the ENCODE project, then a long, undirected evolutionary process, cannot explain the human genome. If, on the other hand, organisms are designed, then all DNA, or as much as possible, is expected to exhibit function. If ENCODE is right, then Evolution is wrong.
  17. 17. Three problems: (2) If ENCODE is right, then humans are the Goldilocks of the living world. Organism C-value Junk Complexity Tetraodon fluvialis (pufferfish) 0.35 No Primitive Hyla nana (frog) 1.89 No Primitive Homo sapiens (human) 3.5 No Pinnacle of Creation Extatosoma tiaratum (insect) 8.0 Yes Primitive Alium cepa (onion) 16.75 Yes Primitive Protopterus aethiopicus (lungfish) 132.83 Yes Primitive Paris japonica (canopy plant) 152.20 Yes Primitive
  18. 18. Three problems: (3) If ENCODE-2012 is right, then ENCODE-2011 is wrong. 80% of t he huma n genom e is func tional Evolutio nary con s the frac tion of t traint indicates he huma th is functi n genom at onal is ~ e that 5%. Nature. 2011. 478:476-482
  19. 19. Solution: Kill ENCODE
  20. 20. We wrote a critical piece on ENCODE, and got a very negative review from Trends in Genetics. angry? insane? “Graur is mad, and not entirely without cause.” “It would be good for Trends in Genetics to publish a reasoned and dispassionate critical essay on this topic, preferably by someone of Graur’s stature, but not him.” 192 cm , 6’2”, 115 kg, 254 lb
  21. 21. How did ENCODE reach the conclusion that 80% of the human genome is functional, when the evidence for selection constraint is ~5%? •  Equating hype with science. •  Wrong experimental systems. •  Inappropriate statistical analyses. •  A peculiar definition of function. •  A peculiar definition of junk. •  A lack of evolutionary perspective. •  A lack of objectivity about the study organism. •  Ignorance of everything that came before ENCODE.
  22. 22. Wrong experimental systems. A huge chunk of ENCODE data is derived from HeLa cells and other cancer cells. Does the HeLa karyotype look human to you?
  23. 23. Wrong experimental systems.
  24. 24. Wrong experimental systems. Landry et al. 2013
  25. 25. •  Equating hype with science.
  26. 26. 6 S E P T E M B E R 2 0 1 2 The birth of 80% “These data enabled us to assign biochemical functions for 80% of the genome…” 26
  27. 27. 6 S E P T E M B E R 2 0 1 2 Implication that 80% may be 99% “The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. Much of the genome lies close to a regulatory event: 95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction..., and 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE.” 27
  28. 28. “junk DNA” is dead! 6 S E P T E M B E R 2 0 1 2
  29. 29. “junk DNA” is dead!
  30. 30. 99% is not enough, 100% is better ENCODE researcher Ewan Birney tells Ed Yong that that the 80 percent figure will increase, possibly reaching 100 percent. “We don’t really have any large chunks of redundant DNA,” Birney says. “This metaphor of junk isn't that useful.”
  31. 31. The PR machine at work: “Virtually all of the DNA passed down from generation to generation has been kept for a reason.” An intelligent God, perhaps?
  32. 32. 99% disappears and 80% becomes 40% I [went] back to ENCODE biologist John Stamatoyannopoulos, who was quoted in the first wave of news. He said he thought the skeptics hadn’t fully understood the papers… He did Faye Flam admit that the press conference mislead people by claiming that 80% of our genome was essential and useful. He puts that number at 40%. Otherwise he stands by all the ENCODE claims.
  33. 33. 40% is actually 9%
  34. 34. (The origin of 9%: 5% + 4% = 9%)
  35. 35. (Oops: 9% reverts back to 5%)
  36. 36. 9% becomes 20%
  37. 37. 20% kills “junk DNA”
  38. 38. 20% kills “junk DNA” •  •  •  •  20% of the genome is functional. Ergo, 80% must be junk. Yet, “junk DNA” should be “totally expunged” from the lexicon. In which universe does Ewan Birney’s logic work?
  39. 39. 20% kills “junk DNA” •  •  •  •  20% of the genome is functional. Ergo, 80% must be junk. Yet, “junk DNA” should be “totally expunged” from the lexicon. In which universe does Ewan Birney’s logic work? •  In a universe in which 20% >> 80%!
  40. 40. At the end of 2012, 20% becomes the favorite number in Nature
  41. 41. Science remains loyal to 80%.
  42. 42. Genome Transcribed Translated Nontranscribed Nontranslated Information flow within the genome 42
  43. 43. Genome Transcribed Nontranscribed DNA Translated protein Nontranslated RNA Information flow within the genome 43
  44. 44. Genome Nontranscribed Transcribed Functional Nontranslated Translated Functional Junk Junk Functional Junk 44
  45. 45. Genome Functional Junk nonfunctional Junk has nothing to do with non-protein-coding. Junk is about function… actually lack of function. 45
  46. 46. Genome Functional ad hoc Junk ad hoc 46
  47. 47. Genome Functional Junk = Pseudogene 47
  48. 48. Genome Functional = Lazarus DNA Junk 48
  49. 49. Lazarus DNA Emmaus DNA Zombie DNA 49
  50. 50. If the acquired function (Lazarus DNA) lowers the fitness of the carriers, it is called zombie DNA.
  51. 51. If the acquired function (Lazarus DNA) is advantageous, it is called Emmaus DNA.
  52. 52. Genome Functional Nontranscribed Transcribed Junk Transcribed Nontranscribed 52
  53. 53. Genome Functional Nontranscribed Transcribed Junk Transcribed Nontranscribed Transcriptome Not all the transcriptome is functional. 53
  54. 54. Genome Functional Transcribed Untranslated Junk Untranscribed Untranscribed Transcribed Translated Untranslated Translated 54
  55. 55. Genome Functional Transcribed Untranslated Junk Untranscribed Untranscribed Transcribed Translated Untranslated Translated Not all the proteome is functional. Proteome 55
  56. 56. THE ORIGIN OF A SPECIES (smart & elegant) Seiko Astron: Like a smartphone, the Astron is GPS-enabled, allowing it to determine accurate time from atomic clocks and automatically update to any time zone in the world. Unlike a smartphone, however, it looks nice with a suit, won’t break if you drop it and uses solar power, so it never needs to be charged. Darwin would be proud. $2,300 Hemispheres Magazine. April 2013.
  57. 57. Actually, Darwin would not be proud. Evolution does not produce “smart and elegant” This is an intelligently designed Dining Table This is an evolutionary functional Dining Table
  58. 58. The story of the human genome: 1998 to the present
  59. 59. agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttga agttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgc aacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcata aacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaac cataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatc taatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagac attcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtaga aaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaa aagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaag aacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcc tcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaac ccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcaga gtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaag ggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtg aaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcacccc ataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaa gaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcaggg caacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaata atggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgctta caaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagag ggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcata Run by a Eton high-school boy called Ewan Birney. gacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaagga His guess was was in the very high range! agctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacg 59 gttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaa The gene number game: Genesweep© (started in Cold Spring Harbor, 1998) Bets: 281 Median: 61,302 Lowest: 25,947 Highest: 212,278 Pot: 1,200 US Dollars
  60. 60. 15 February 2001 1st draft
  61. 61. 21 October 2004 From 30,000 protein-coding genes to less than 25,000.
  62. 62. agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttga agttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgc aacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcata aacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaac cataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatc taatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagac attcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtaga aaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaa aagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaag aacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcc tcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaac ccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcaga gtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaag ggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtg aaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcacccc ataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaa gaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcaggg caacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaata atggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgctta Lee Rowen (Institute for Systems Biology) won half of the pot with a caaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagag ggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcata guess of 25,947 genes. She was at the bottom of the pool. Olivier gacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaagga Jaillon (26,500) & Paul Dear (27,462) shared the rest of the 600 agctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacg 64 dollars. gttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaa The gene number game: Genesweep© Bets: 281 Median: 61,302 Lowest: 25,947 Highest: 212,278 Pot: 1200 US Dollars
  63. 63. Genebuild last updated/patched: !May 2012! ! Total length: 3,287,209,763 bp! Protein-coding genes: 21,065! Pseudogenes: ! ! ! 15,930! RNA-specifying genes: ! 12,955! Genebuild last updated/patched: April 2013! ! Total length: 3,320,602,130 bp! Protein-coding genes: 20,774! Pseudogenes: ! ! ! 14,445! RNA-specifying genes: ! ! 22,493! 65
  64. 64. The “end” of the Human Genome Project in 2004 ($3.8 Billion) was a big disappointment for scientists unversed in evolutionary biology The human genome turned out to be: •  small in size •  sparsely populated with genes •  densely populated with dead genomic parasites •  unoriginal
  65. 65. Small << 3.5 billion letters in a fourletter alphabet = 7 billion bits = 0.81 GB (gigabytes) 1 DVD = 8.5 GB Information content
  66. 66. Small ≈ 1 DVD = 8.5 GB 3.5 billion letters in a four-letter alphabet = = 7 billion bits = 0.81 GB (gigabytes) Information content
  67. 67. Sparsely populated with genes. Organism Gene Density (# genes per 1 Mb) Escherichia coli (bacterium) 911 Saccharomyces cerevisae (yeast) 483 Arabidopsis thaliana (mustard weed) 221 Drosophila melanogaster (fly) 197 Homo sapiens 12 69
  68. 68. Densely populated with dead transposable elements 45-67% +plus at most 0.1% for RNA-specifying genes (non-coding RNA) +plus at most 0.1% for DNA switches.
  69. 69. Densely populated with dead transposable elements
  70. 70. Unoriginal 0.15% nonsynonymous differences 1.22% synonymous differences Cost of sequencing your human genome = ~$25,000. Percent genome recovery = 90%. Error rate = 1-3%. I will provide you with your genome sequence with less error for half the price (and you can haggle). Data: http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030087
  71. 71. Comparing the human genome to other genomes has given rise to three complexity paradoxes.* Genomic paradox = A lack of correspondence between a measure of genome size and the presumed amount of genetic information “needed” by the organism (its complexity). *The paradoxes only exist under the assumption that humans are the most complex organisms and the pinnacle of creation.
  72. 72. Defining complexity is difficult The complexity of a system may be defined by the minimum number of independent characters required to describe it, where independence is defined as the ability of the character to assume any possible character state independently of any other character in the system. 74
  73. 73. Defining complexity is difficult Thus the wall on the right is more complex—it has a crack, 75 than the wall on left.
  74. 74. However, even if we cannot quantify organismal complexity very well, in many cases, it is possible to state unequivocally that A is more complex than B. Without doubt, is more complex than I
  75. 75. However, even if we cannot quantify organismal complexity very well, in many cases, it is possible to state unequivocally that A is more complex than B. Without doubt, is more complex than I
  76. 76. K-value paradox: Complexity does not correlate with chromosome number. Homo sapiens 46 Lysandra atlantica Ophioglossum reticulatum 250 ~1260 78
  77. 77. C-value paradox: Complexity does not correlate with genome size. 3.5 × 109 bp Homo sapiens 1.5 × 1010 bp Allium cepa 6.7 × 1011 bp Amoeba dubia 79
  78. 78. G-value paradox: Complexity does not correlate with proteincoding gene number. ~21,000 ~21,000 ~57,000 80 >94,000
  79. 79. Total Number of Protein-Coding Genes Drosophila melanogaster (fruitfly) 13,917 Pan troglodytes (chimpanzee) 18,746 Canis familiaris (dog) 19,856 Bos taurus (cow) 19,994 Caenorhabditis elegans (nematode) 20.517 Homo sapiens (human) 20,774 Arabidopsis thaliana (mustard weed) 27,416 Physcomitrella patens (moss) 35,938 Oryza sativa (rice) 40,577 Populus trichocarpa (poplar) 41,377 Manihot esculenta (cassava) 47,164 Malus domestica (apple) 57,386 Triticum aestivum (bread wheat) >94,000 81
  80. 80. Mommy, mommy, a fern has 27 times as many chromosomes as I do; an amoeba has 200 times more DNA than I do; and wheat has 5 times more genes than me. 82
  81. 81. agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttga agttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgc aacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcata aacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaac cataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatc taatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagac attcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtaga aaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaa aagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaag aacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcc tcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaac ccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcaga gtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaag ggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtg aaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcacccc ataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaa Ohno S. 1972. So much ‘junk’ DNA in gaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcaggg our genome. Brookhaven Symp. Biol. caacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaata atggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgctta 23:366-370. caaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagag ggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcata gacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaagga agctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacg 83 gttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaa gatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcc Conclusion: The human genome is mostly “junk.” ?
  82. 82. What is and what isn’t “junk DNA” There are known knowns; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don’t know. But there are also unknown unknowns —there are things we do not know we don’t know.” Donald Rumsfeld February 12, 2002
  83. 83. “Junk DNA” misrepresented as a “known unknown”
  84. 84. What is and what isn’t “junk DNA” Junk DNA is a known known; it is a thing that we know what it does—it takes space. Junk DNA is any piece of DNA that has no function and does not affect fitness. NOT everything that is not translated or not transcribed is Junk DNA. Junk DNA is NOT a known unknown. Dark DNA is a known unknown. Dan Graur June 22, 2013
  85. 85. Junk DNA is a consequence of population genetics considerations! In organisms with LARGE effective population sizes, the strength of natural selection is relatively strong. In organisms with SMALL effective population sizes, the strength of natural selection is relatively weak.
  86. 86. Junk DNA is a consequence of population genetics considerations! In organisms with LARGE effective population sizes, the strength of natural selection is relatively strong. In organisms with SMALL effective population sizes, the strength of natural selection is relatively weak. The majority of new mutations are mildly deleterious. In humans and elephants, selection is not sufficiently strong to eliminate many such deleterious mutations.
  87. 87. Junk DNA is a consequence of population genetics considerations! In organisms with LARGE effective population sizes, the strength of natural selection is relatively strong. In organisms with SMALL effective population sizes, the strength of natural selection is relatively weak. Humans and elephants are expected to accumulate numerous deleterious mutations in their genome.
  88. 88. Junk DNA is a consequence of population genetics considerations!
  89. 89. 013) (2 little Genomic anthropocentrism? d Doo Human exceptionalism? . For W “What would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own? If the number [of functional elements] were to stay moreor-less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk. If on the other hand the number of functional elements were to rise significantly with genome size, then organisms with genomes larger than ours should be more complex phenotypically than we are.”
  90. 90. •  A peculiar definition of function. •  A peculiar definition of junk. •  A lack of evolutionary perspective.
  91. 91. In biology, there are two main concepts of function: •  A historical concept of function, also referred to as the “selected effect function” or “proper function.” •  A non-historical concept of function, also referred to as the “causal function.”
  92. 92. What is the function of the heart? The proper function is to pump blood.
  93. 93. What is the function of the heart? The proper function is to pump blood. The causal functions of the heart are to add 300 grams to the body weight, to produce sounds, to be encased in the the pericardium, to partially fill the mediastinum, to provide an inaccurate logo for Valentine Day cards, etc.
  94. 94. •  Evolutionary biologists use the proper or selected effect function. •  ENCODE used the causal function. “Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).”
  95. 95. •  An example of a function that fits the ENCODE definition: shoes binding chewing gum. “Operationally, we define a functional element as an entity that displays a reproducible signature (for example, chewing gum binding.”
  96. 96. “By the logic employed by ENCODE, following a collision between a car and a pedestrian, a car’s bonnet would be ascribed the 'function' of projecting a pedestrian many meters and the pedestrian would have the 'function' of deforming the car’s bonnet.” Laurence Hurst 2013. BMC Biol. 11:58
  97. 97. ENCODE uses a know logical fallacy called affirming the consequent. If a functional sequence is transcribed, then, all transcribed sequences are functional. Moreover, ENCODE uses the logical fallacy inconsistently.
  98. 98. The ENCODE Project: 74.7% of the genome is transcribed, 56.1% is associated with modified histones, 15.2% is found in open-chromatin areas, 8.5% binds transcription factors, 4.6% consists of methylated CpGs. The fraction of the genome that is functional (the Boolean union) is 80.4%.
  99. 99. Our additions to ENCODE 74.7% of the genome is transcribed, 56.1% is associated with modified histones, 15.2% is found in open-chromatin areas, 8.5% binds transcription factors, 4.6% consists of methylated CpGs. 84.8% binds histone 100% of the genome is replicated. The fraction of the genome that is functional is 100%.
  100. 100. Interesting Question: Why do people have problems with DNA that has no function?
  101. 101. Inability to deal with randomness “… nothing is so alien to the human mind as the idea of randomness.” John Cohen. 1960. Chance, Skill, and Luck: The Psychology of Guessing and Gambling. Baltimore, MD: Penguin Books. Apophenia /æpɵˈfiːniə/: The experience of seeing meaningful patterns or connections in random or meaningless data. A type of mild or incipient schizophrenia. In statistics, apophenia is known as Type I error (false positives). Klaus Conrad. 1958. Die beginnende Schizophrenie. Versuch einer Gestaltanalyse des Wahns [Incipient Schizophrenia: An Attempt to Analyze delusion]. Stuttgart: Georg Thieme Verlag.
  102. 102. People like mysteries: such as hidden messages in the Bible.
  103. 103. If you search long enough and hard enough for patterns in random texts, you will find patterns. Especially if you do not employ negative controls. This pattern, for instance, predicts on the vertical from the bottom up (in Hebrew) that MITROMNI(TAURA)NSIA, where NSIA is “president.” The 5 letters in between MITROMNI and NSIA are random. It also helps that Hebrew has no vowels.
  104. 104. The Bible Code employs no negative controls. Someone else did and they found similar “prophecies” in Moby Dock by Herman Melville.
  105. 105. ENCODE has no negative controls. Mike White provided them and showed in a paper published in PNAS that random DNA sequences cause reproducible regulatory effects on the reporter gene. Random genetic sequences have as much or a little a function as the human genome sequences analyzed by ENCODE.
  106. 106. What is “junk”?
  107. 107. “Some years ago I noticed that there are two kinds of rubbish in the world and that most languages have different words to distinguish them. There is the rubbish we keep, which is junk, and the rubbish we throw away, which is garbage. The excess DNA in our genomes is junk, and it is there because it is harmless, as well as being useless, and because the molecular processes generating extra DNA outpace those getting rid of it.” Sydney Brenner. 1998. Refuge of spandrels. Current Biology 8:R669.
  108. 108. “Were the extra DNA to become disadvantageous, it would become subject to selection, just as junk that takes up too much space, or is beginning to smell, is instantly converted to garbage by one’s wife, that excellent Darwinian instrument.” Sydney Brenner. 1998. Refuge of spandrels. Current Biology 8:R669.
  109. 109. Graur’s garage: Functional but full of junk A garage according to ENCODE A garage in which junk became garbage
  110. 110. Junk can Sometimes be Repurposed
  111. 111. Junk DNA can Sometimes be Repurposed Norihiro Okada & Jürgen Brosius, specialists in the repurposing of junk DNA.
  112. 112. Functional DNA ✔ Junk DNA ✔ Garbage DNA ✔ Lazarus DNA ✔ Indifferent DNA Dark DNA
  113. 113. Sequence-indifferent DNA or indifferent DNA refers to DNA sites that are functional, but show no evidence of selection against point mutations. Deletion of these sites, however, are deleterious, and are subject to purifying selection.
  114. 114. Examples of indifferent DNA are spacers and flanking elements whose presence is required but the sequence is not important. One such case is the third position of four-fold redundant codons, which needs to be present to avoid a downstream frameshift.
  115. 115. Dark DNA refers to the fraction of the genome for which no good evidence exists as to its evolutionary impact on fitness. Dark DNA is an unknown unknown. The term “dark” is borrowed from the field of astrophysics. An astrophysicist (Dr. Or Graur) whose research deals with dark energy. Unfortunately, he has no interest in dark DNA.
  116. 116. Interesting Question: How can one tell if a certain genomic sequence is functional or not? Can we make the car on the left less fit for driving? Can make the car on the right less fit for driving?
  117. 117. Mutation Mutation Mutation Mutation Mutation Mutation Mutation Functional DNA (almost all mutations are deleterious) Evolutionary change 119
  118. 118. Mutation Mutation Mutation Mutation Mutation Mutation Mutation Nonfunctional DNA (all mutations are neutral) Evolutionary change 120
  119. 119. How do we know if a particular genomic sequence is functional? Since most mutations in functional regions are deleterious and likely to impair the function, these mutations will tend to be eliminated by natural selection. Thus, functional regions of the genome should evolve more slowly, and therefore be more conserved among species, than nonfunctional regions. 121
  120. 120. Another indicator for the existence of a genomic function is that losing it has some consequence for the organism. Evolution has tested the functionality of every region of the human genome through mutation over millions of years of evolution. 122
  121. 121. 123
  122. 122. 124
  123. 123. Is it even possible that ENCODE is right? No! The main reason being that in humans, there is a huge difference between population size and effective population size. Long-term Ne = 10,000
  124. 124. Is it even possible that ENCODE is right? Under such conditions selection is inefficient and most genetic variation is deleterious. Genomic “perfection” is unachievable. Long-term Ne = 10,000
  125. 125. Fact 1: It has been known for more than a century that the vast majority of non-neutral mutations are deleterious (Thomas Morgan 1903). Fact 2: Mutation rate is evolvable. These facts have led Alfred Sturtevant to raise the question “Why does the mutation rate not become reduced to zero?” (Sturtevant 1937). 128
  126. 126. Motoo Kimura: Mutation rate cannot reach zero, because of the COST OF FIDELITY. In other words, the mutation rate in a lineage is a compromise between the benefits of complete fidelity in the replication of the genetic material and the cost of achieving complete fidelity. The mutation rate modulation hypot hesis 129
  127. 127. How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased errors in favor of inflating estimates of functionality. 2.  It consistently and excessively favored sensitivity over specificity. 3.  It paid attention to statistical significance, rather than magnitude of the effect.
  128. 128. How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased errors in favor of inflating estimates of functionality. 2.  It consistently and excessively favored sensitivity over specificity. 3.  It paid attention to statistical significance, rather than magnitude of the effect.
  129. 129. Example: Transcription factors binding sites (TFBS): So far, almost all known TFBSs range in length from 6 to 14 nucleotides. The TFBS entries in ENCODE range in size from 457 to 824 nucleotides. Thus, the estimates of the fraction of the human genome devoted to transcription factor bindings are extraordinarily inflated (sometimes by about two orders of magnitude).
  130. 130. How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased errors in favor of inflating estimates of functionality. 2.  It consistently and excessively favored sensitivity over specificity. 3.  It paid attention to statistical significance, rather than magnitude of the effect.
  131. 131. Encode prefers false positives over false negatives, thus inflating the proportion of positives.
  132. 132. Example: ENCODE used a probability based alignment tool, and mapped RNA transcripts onto DNA when the statistical confidence exceeded 90%. This means that 10% of the correspondences between RNA and genome are erroneous. The total number of RNA transcripts in ENCODE is approximately 109 million. The mean transcript length is 564 nucleotides. Thus, a total of 6 billion nucleotides, or two times the human genome size, are potentially misplaced (false positives).
  133. 133. How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased errors in favor of inflating estimates of functionality. 2.  It consistently and excessively favored sensitivity over specificity. 3.  It paid attention to statistical significance, rather than magnitude of the effect.
  134. 134. “Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data.”
  135. 135. p = 10−37 Magnitude of effect = 0.042% “Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data.”
  136. 136. Let’s examine the rationale and the methodology for dealing with the derived allele frequency spectrum in primate-specific elements
  137. 137. The Why •  If all alleles are neutral, a certain frequency distribution is expected. •  If some alleles are under negative selection, an excess of rare derived alleles is expected. •  This excess is expected to be detectable for only very short periods of evolutionary time.
  138. 138. The Why •  To deal with very short periods of evolutionary time, ENCODE decided to use primate specific sequences.
  139. 139. human human human chimpanzee gorilla macaque rat mouse
  140. 140. human human human chimpanzee gorilla macaque rat mouse Primate Specific Sequences
  141. 141. What is missing from the derived allele frequency spectrum of primate-specific elements in ENCODE? Genes! 3,296,458 SNPs that are in annotated coding regions are not found in the ENCODE sample.
  142. 142. Missing populations and their effect on estimates of derived alleles and ancestral alleles. Three human populations were available at the time ENCODE was submitted; ENCODE used only one.
  143. 143. Caucasians Derived allele frequency (%) OUT 40 60 60 Primate Specific Sequences Ancestral alleles Asians Yoruba Derived alleles
  144. 144. Yoruba Derived allele frequency (%) OUT 100 20 0
  145. 145. Yoruba Derived allele Frequency (%) 100 20 0 The ENCODE data includes 2,136 alleles with frequencies of exactly 0. In a miraculous feat of science, ENCODE was able to determine the frequencies of nonexistent alleles. OUT
  146. 146. ENCODE uses multifurcated trees Frequency of derived allele = 40%
  147. 147. ENCODE uses multifurcated trees Frequency of derived allele < 40%
  148. 148. ENCODE uses only single species from primates. There are no derived alleles
  149. 149. p = 10−37 Magnitude of effect = 0.042% “Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data.”
  150. 150. Unwarranted extrapolations: Badly trained techincians tend to “kill” junk DNA whenever they find a new function in non-coding DNA.
  151. 151. Even supposing that all the 55,000 putative lincRNAs in this paper are functional and important, then 55,000 × 2000 bp = 110 MB (less than 4% of the human genome). Showing that 4% of the genome is functional is “cool,” but doesn’t bear on the questions of “junk DNA,” which has to do with the majority of the genome.
  152. 152. Conclusion: Badly trained technicians who do not understand (1) population genetics, (2) the concept of effective population size, (3) random genetic drift, and (4) the limitations of selection should be forbidden to even mention “junk DNA” let alone write papers on the subject.
  153. 153. 6 S E P T E M B E R 2 0 1 2 442 researchers + 288 million dollars. What have we learned from ENCODE? 157
  154. 154. “Data is not information, information is not knowledge, knowledge is not wisdom, wisdom is not truth,” —Robert Royar (1994) paraphrasing Frank Zappa’s (1979) anadiplosis
  155. 155. onion test The is a simple reality check for anyone who thinks they have come up with a universal function for 80% of the genome, or 100% of the genome. Whatever the proposed function, ask yourself this question: Can you explain why onions need about five times more DNA than humans?” T. Ryan Gregory 1.5 × 1010 bp Allium cepa 3.5 × 109 bp Homo sapiens 159
  156. 156. “All science is either physics or stamp collecting.” Ernest Rutherford “ENCODE is stamp collecting.” Roderic Guigó “I can think of better uses for 288 million dollars.” Dan Graur
  157. 157. Acknowledgments: The Good Guys Coauthors: Ricardo Azevedo, Becky Zufall, Nicholas Price, and Yichen Zheng (UH), and Eran Elhaik (Johns Hopkins). Reviewers: Giddy Landan (Heirich Heine Universität, Germany), Michael Lynch (University of Indiana, USA), Naruya Saitou (National Institute of Genetics, Japan), David Penny (Massey University, New Zealand), W. Ford Doolittle (Dalhousie University, Canada + 2 reviewers who think I don’t know who they are. Editor: Bill Martin (Genome Biology and Evolution)

×