Web & Social Media Analytics Previous Year Question Paper.pdf
2009 09 08 Wiltshire Ipit Seminar Slides
1. Genetic variation in mice: modeling disease, pharmacogenetics, and basic biology Tim Wiltshire School of Pharmacy University of North Carolina Chapel Hill
2. What do we know about gene function? How do we efficiently annotate the function of all the genes in the mammalian genome? Goal: “Genome-wide functional genomics” 40234 entries in Entrez Gene 19709 genes ( 49% ) have zero linked references 31672 genes ( 78% ) have five or fewer linked references Fraction of all Citations Accounted for by Highly-Cited Genes TP53 TNF APOE MTHFR HLA-DRB1 IL6 ACE TGFB1 EGFR VEGFA
3. How should we use the genetic variation in mice as a model for Annotating gene function and discovery in disease status, pharmacogenetics, and basic biology? Traditional genetics – F2 crosses, recombinant inbred strains (RI), knockouts, transgenics. Inbred strains – genetic variation of the inbred strains, haplotype mapping. New RI initiatives - A new set of comprehensive RI strains Outbred strains – most closely model human populations
10. CC Population ~ Human Population CAST/EiJ WSB/EiJ C57BL6/J PWK/PhJ A/J 129S1/SvIm NZO/HlLt NOD/Lt Captures 90% of the variation present in the mouse! The variation is randomly distributed across the genome (there are no blind spots) Yang et al. 2007 Nature Genetics 39, 1100 Roberts et al. 2007 Mammalian Genome 18, 473 SNPs Insertion/deletions 20 x 10 6 1 x 10 6 50 x 10 6 4 x 10 6 Human CC
15. Haplotype Association Mapping Taking a 3 SNP window consecutively down the genome and asking “do these haplotypes associate with a specific phenotype”? C C C C C C C G C C G C G C G C G C G G C C C G G C G C C G G A A A A A A A A A A T A A A A A A A A A A A A T A A A A A A A G G G G G G G G G G A G G G G G G G G G G G G A G G G G G G G A A A A A A A A A A T A A A A A A A A A A A A T A A A A A A A G G G G G G G G G G A G G G G G G G G G G G G A G G G G G G G T T T T T T T T T T T T T T G T G T G G T T T T T T T T T T T G G G G G G G G G G A G G G G G G G G G G G G A G G G G G G G T T T T T T T T T T T T T T G T G T G G T T T T T T T T T T T T T T T T T T T T T C T T T T T T T T T T T T C T T T T T T T T T T T T T T T T T T T T T G T G T G G T T T T T T T T T T T T T T T T T T T T T C T T T T T T T T T T T T C T T T T T T T C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C T T T T T T T T T T C T T T T T T T T T T T T C T T T T T T T C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C T T T T T T T T T T A T T T T T T T T T T T T A T T T T T T T C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C T T T T T T T T T T A T T T T T T T T T T T T A T T T T T T T C C C C C C C C C C T C T C C C C C C C C C C T C C C C C C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C T T T T T T T T T T A T T T T T T T T T T T T A T T T T T T T C C C C C C C C C C T C T C C C C C C C C C C T C C C C C C C C C C C C C C C C C T C T C C C C C C C C C C T C C C C C C C G G G G G G G G G G A G G G G G G G G G G G G A G G G G G G G A A A G A A A G G A G A G A G A G A G G A G A G G A G A A G G
16. • Inferred haplotype patterns can then be related back to the observed phenotype values across the same set of strains CTG ANOVA analysis: Identify associations between shared haplotypes and phenotypes TCG logP Genome Location
19. The use of haplotype association mapping to identify clinical QTL (cQTL) Identification of clinical QTL and expression difference for open field behavior Haplotype Group logP Genome Location Grm7
20. Whole-genome association analysis of urethane-induced lung adenoma incidence in laboratory inbred mice. The scatter plots were drawn for -log( P ) against SNP positions in the chromosomes. The two horizontal gray lines indicate the significance levels of -log( P ) = 4.8 and -log( P ) = 6.2. The arrows indicate the genomic regions with -log( P ) > 4.8. These refined genomic regions with significant associations are within 10 Mb of one or more QTLs (such as Sluc18 , Pas1 , Sluc23 and Pas10 , and Sluc26 ) for chemically induced lung cancer detected by previous linkage studies. Candidate lung tumor susceptibility genes identified through whole-genome association analyses in inbred mice. Liu et.al. Nature Genetics 38 , 888 - 895 (2006)
21. Whole organism phenotypes gene expression biomarkers identification of biological networks What phenotypes can be used? Anxiety and Depression Gene expression analysis Biomarker analysis Haplotype association mapping Clinical phenotypes
22.
23. Using gene expression differences between strains to identify gene networks Probe X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1516 171819 X - LogP Significance Threshold Chr Chr1 ChrX Probe X Probe Y Probe Z
24.
25.
26.
27. functional enrichment other knowledge: expression, literature, known interactions, etc Gene Ontology KEGG pathway functional enrichment other knowledge: expression, literature, known interactions, etc Gene Ontology KEGG pathway Schema of trans-band analysis functional enrichment other knowledge: expression, literature, known interactions, etc Gene Ontology KEGG pathway GeneID -logP 15502 5.30 74559 4.66 107652 4.42 19357 4.40 212862 4.30 73074 4.30 107652 4.23 14828 4.12 108946 4.09 >transband at chr=3, pos=46,624,006 Biological hypothesis Putative Regulator putative targets Trans-regulator candidates
31. In Silico Pharmacogenetics: Warfarin Metabolism Guo et al. Nat Biotechnol. 2006 May; 24(5): 531–536. Haplotype-based genetic analysis of warfarin metabolites. A representative set of haplotype blocks having the highest correlation with this data set. For each predicted block, the chromosomal location, number of SNPs within a block, its gene symbol and an indicator of gene expression in liver are shown. The haplotype for each strain is represented by a colored block, and is presented in the same order as the phenotypic data in the top panel. The calculated p-value measures the probability that strain groupings within an individual block would have the same degree of association with the phenotypic data by random chance. In the gene expression column, a green square indicates the gene is expressed in liver tissue, while a gray square indicates that it is unknown. The log-transformation of the measured combined amount of 7-hydroxywarfarin ( 7-OH ) and its glucuronidated metabolite ( M8 ) as a % of the total amount of drug and metabolites for each of 13 inbred strains.
32. Haplotype Associated Mapping case study Fig. 1. Serum ALT measured in human volunteers taking daily oral doses of APAP (4g/day). (A) Lines represent per subject daily serum ALT (U/L) values 14 days prior to clinic admission and throughout the 14-day duration of the study. Subjects were considered responders if peak serum ALT reached greater than 1.5-fold higher than the average of their baseline values (average of values obtained for days -14 and 1-3; N = 22). ALT elevations were observed following the start of treatment on day 4 and continued to fall beyond treatment cessation on day 11. (B) Daily ALT (U/L) values of non-responder volunteers receiving APAP treatment were not significantly different from those receiving placebo (N = 9). (C) The peak ALT fold change (over baseline) reached over the course of treatment per subject number is plotted for both non-responder (white bars) and responder (black bars) individuals. Horizontal line represents a 1.5-fold increase over the subject’s pre-treatment baseline. Mouse population-guided resequencing reveals that variants in CD44 contribute to acetaminophen-induced liver injury in humans Alison H. Harrill, Paul B. Watkins, Stephen Su, Pamela K. Ross, David E. Harbourt, Ioannis M. Stylianou, Gary A. Boorman, Mark W. Russo, Richard S. Sackler, Steven C. Harris, , Philip C. Smith , Raymond Tennant, Molly Bogue, Kenneth Paigen, Christopher Harris, Tanupriya Contractor, Timothy Wiltshire, Ivan Rusyn and David W. Threadgill Genome Research 2009
33.
34. Cellular Genetics Develop cell-based assay system for MEFs from 30 strains. What cell types? What phenotypes to measure? Infectability with lentiviral vectors High content imaging Gene expression profiling
35. a. c. Strain distribution pattern of mitochondrial membrane potential across 30 different strains Purify MEFs from 30 different strains Seed in 96 wells and grow in or 1% serum for 72hrs At end of each timepoint, stain cells with JC-1 and measure flourescence with facs Technical replicates for 1% FBS 24hr Interday replicates for 1% FBS 24hr Heritability: 64.7% Interday replicates for 1% FBS 72hr
36. d. Chromosome 15: Gene name: Fbxl7 Genome scan for mitochondrial membrane potential siRNA knockdown of Fbxl7 P = 1.02E-08 nmol O2/min/1x10^6 cells P-ampk (Thr 175) P-p53 (Ser15) total p53 Ctrl siRNA 3 tubulin tubulin total ampka Ctrl siRNA 3 p21 Ctrl siRNA 3
37. Effect of huFbxl7 knockdown in cancer cell lines GM1600 (gliobastoma) LnCAP (prostate) Colo741 (colorectal) Hs587t (mammary) mRNA knockdown cell proliferation mito. membrane potential
38.
39. DNA Content, Nuclear Count & Size Mitochondrial Membrane Potential Changes (Intensity) Hoescht G21-0.41 uM Vinblastine-1 Hoescht G19-33.3 uM Vinblastine-1 Hoescht G20-3.7 uM Vinblastine-1 Mito Red G21-0.41 uM Vinblastine-1 Mito Red G20-3.70 uM Vinblastine-1 Mito Red G19-33.3 uM Vinblastine-1
40. Cell Morphology & Permeability Cytochrome C Localization and Release CY5 G19-33.3 uM Vinblastine-1 CY5 G20-3.7 uM Vinblastine-1 CY5 G21-0.41 uM Vinblastine-1 FITC G20-3.7 uM Vinblastine-1 FITC G21-0.41 uM Vinblastine-1 FITC G19-33.3 uM Vinblastine-1
41. MEF cell viability studies Alomar blue analysis Whole well measurement Strain specific phenotypic differences
42.
43. Acknowledgements GNF Serge Batalov Andrew Su Chunlei Wu Jeff Janes Dave Delano Stephen Su Joe Bass (Northwestern U.) Bev Paigen (JAX) Mat Pletcher (Pfizer) Lisa Tarantino (UNC) Russell Thomas (Hamner Inst) collaborators
Realizing this several years ago, we started to think about how we could design such an experimental platform and how this would have the potential to dramatically increase our understanding of humans (and mice) at the whole organism level with all the complexity of its interconnected parts. In 2001 we proposed a resource that could fill this void. This effort lead to the formation of the Complex Trait Consortium and the refinement of our original proposal to what is know called the Collaborative Cross.
The Collaborative Cross is a unique panel of recombinant inbred mice developed by combining the genomes of 8 diverse founders. A recombinant inbred line is where the genomes of the founder lines are genetic scrambled and new inbred lines that have random mixes of the parentals are derived. The input genetic variation is combined just like starting at the Elite 8 stage of the NCAA basketball tournament. This randomizes the input variation so that causation can be defined. Each line will have a different random mix of the founder variation.
However, the power of the Collaborative Cross is having a 1,000 iterations of the randomization.
The eight inbred strains used as founders of the cross are AJ, C57BL6, 129, NOD, NZO, PWK, CAST and WSB. These strains were selected to maximize diversity before we had complete genome data. Based on our analysis of 8.3 million SNPs discovered by the recent resequencing of a selected group of strains by the NH we estimate that the CC has captured appox 90% of the genetic variation present in mouse Furthermore this variation is uniformly distribute acrross the genome. N other words every genome region. Every gene has similar levels of diveristy. Finally, the level and types of variation present in the founder strains of the CC this compares favorably to that found in humans.
An example of HAM to identify a clinical QTL. This is a strain distribution for open field - percent time in the center - a measure of anxious behavior in rodents. As you can see, there are considerable strain differences in this trait as well as sex differences. When we performed haplotype association mapping, we identified a peak logP score on Chr 6. The only gene under this peak was Grm7 - the metabotropic glutamate receptor 7. KO mice for mGluR7 show decreased anxiety-like behaviors I.e. more time in center of open field. We can look at the haplotype groups at this peak and you can see here that there are 2 - hap group 1 strains show decreased anxiety (more time in the center) while hap group 2 strains show increased anxiety (less time in the center of the open field). We also collected brain tissue and performed genomewide expression analysis using the Affymetrix MOE430 chip and when we look at a probeset for mGluR7, we see that there are significant expression differences in amygdala and prefrontal cortex between the two haplotype groups as well - in the direction that would be predicted by the KO phenotype - I.e. hap grp 1 which shows decreased anxiety, also shows lower expression of mGluR7. This, of course, needs to be confirmed with more sensitive methods of gene expression analysis and preliminary results using RT-PCR indicate that the differential expression is confirmed.
If we perform an association analysis on all probesets present on the Affy MOE430 chip - and the figure to the left is a single probeset and then flip them all on their sides so that we can only see dots representing peaks with logP scores over a specified significance threshold. Each probeset is placed in order by Chr and genomic position along the y-axis with chromosome location along the X-axis. When you visualize the results this way, you get a plot that looks like this.
Analyzing these plots, we can see that the horizontal lines represent individual probesets. The diagonal line represents the cis-QTL bands - or loci for which expression differences map on or near the gene itself as shown in my example of the COMT cis-QTL. You can also clearly see vertical bands that are enriched and these represent trans-QTL bands - or multiple probesets whose expression is controlled by some trans-acting factor.
We can conduct a similar type of analysis for expression data. This is an example of that - using expression data as a phenotype and conducting haplotype association mapping. You can see here that this resulted in a highly significant peak on Chr 16 which, not surprisingly, sits right on top of the COMT locus itself, indicating that this is a cis-QTL - meaning that some locus in or near the COMT gene is regulating its expression. There are two haplotype groups that appear to be bimodal distribution for COMT expression in the inbred strains. But what if we do this genome-wide?
Analyzing these plots, we can see that the horizontal lines represent individual probesets. The diagonal line represents the cis-QTL bands - or loci for which expression differences map on or near the gene itself as shown in my example of the COMT cis-QTL. You can also clearly see vertical bands that are enriched and these represent trans-QTL bands - or multiple probesets whose expression is controlled by some trans-acting factor.
Another example is a transband in liver on Chr 19. The transband enrichment for “apoptosis” - 5/25 probesets or 20% was significantly higher than the background occurrence of apoptosis in the same tissue - about 1.6%. Therefore, this transband was enriched 12.5X for genes involved in apoptosis. Once the transbands are annotated for functional enrichment, the key becomes identification of the transregulator that underlies the transband.
In this example from a transband in fat that was enriched for the GO category “integrin signalling”, 5 candidate regulators were identified under the transband peak. Using pathways analysis programs such as Ingenuity identified interactions between Gsk3b - here in blue - and multiple transband targets - in grey. Making Gsk3b the prime candidate as the transregulator. The wealth of data now available from sequencing efforts for non-synonymous SNPs also aids in identification of strain differences in candidate transregulators - such as this known non-synonymous SNP in Gsk3b that causes a frame shift that is correlated in strains in different haplotype groups within the transband. So, how do we start to annotate transbands in the brain that may underly gene networks involved in behaviors such as anxiety or depression or addiction?
More importantly, the distribution of this variation is random across the genome. Virtually all intervals have an equivalent amount of variation in the Collaborative Cross. This is very different in other resources that have complicated arrangements of variation.