SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
www.nottingham.ac.uk/adac
Obtaining, QC, mapping and analysis of
NGS data.
Richard Emes
Associate Professor & Reader in Bioinformatics.
School of Veterinary Medicine and Science
Director Advanced Data Analysis Centre
richard.emes@nottingham.ac.uk
www.nottingham.ac.uk/adac
@rdemes
@ADAC_UoN
2
What is ADAC?
The University of Nottingham Advanced Data Analysis Centre (ADAC).
•  Bioinformatics and data analysis support.
Why is this important?
•  Complex data underpins much current research.
•  Innovative analysis can prompt new discoveries.
•  Excellent research can often be stalled due to a lack of expertise in conducting data
analysis, availability or cost of inclusion of diverse specialists.
Why ADAC?
•  ADAC supports high-class research by providing analysts with expertise in a range of
bioinformatics and computer science disciplines.
•  Flexible support
•  Consultancy, collaboration, bespoke analysis.
•  Leadership from recognized experts in the fields of bioinformatics and computer
science.
•  Track record of funding
•  Pivotal role in collaborations funded by amongst others Zoetis, BBSRC, NERC
and Technology Strategy Board.
•  ADAC is conducting transcriptome analysis for a multinational FP7 funded
project (EU Prohealth).
http://www.nottingham.ac.uk/adac/
Enquiries: adac@nottingham.ac.uk or richard.emes@nottingham.ac.uk
@ADAC_UoN or @rdemes
Current Areas of expertise relevant here:
•  Transcriptomics (Microarray, NGS)
•  Comparative genomics (eukaryotic, prokaryotic)
•  Identification of biomarkers from genetic and epigenetic
datasets
•  Artificial intelligence for decision support
•  Machine Learning
•  Integration of complex datasets
•  Data Management
@HWI-_FC_20BTNAAXX:2:1:215:593	

ACAGTGCATGACATGCATAGCAGCATAGACTAC	

+HWI-_FC_20BTNAAXX:2:1:215:593	

GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh
Some common terms
•  Library: collection of molecules. This is the “complexity” of what you sequence.
•  Flowcell: slide where sequencing is attached to a solid platform. 
•  Lane: unique sequencing unit of the flowcell. 
•  Reads: Raw sequence of bases and imputed quality scores. 
•  Fragment: Original molecule being sequenced (fragment of genome/gene). Ie
PE are reads form the same fragment. 
•  Cluster: DNA bound to slide, local amplification of product to amplify signal to
measure fluorescence. 
•  Mapping: Finds where your sequence matches to a reference. Importantly
gives a probability that this is the correct location.
Illumina sequencing
Illumina sequencing
What coverage do I need?
Obtaining NGS Data
•  Short Read Archive (SRA)
•  European Nucleotide Archive (ENA)
Obtaining NGS Data
•  Short Read Archive (SRA)
•  Will need to convert to FastQ using sra toolkit
Deciphering a fastq file
@HWI-_FC_20BTNAAXX:2:1:215:593#0/1	

ACAGTGCATGACATGCATAGCAGCATAGACTAC	

+HWI-_FC_20BTNAAXX:2:1:215:593#0/1	

GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh	

	

Header: @HWI-_FC_20BTNAAXX:2:1:215:593#0/1	

HWI-_FC_20BTNAAXX instrument identifier	

2 flowcell lane	

1 tile number in flowcell lane 	

215 x - coordinate of cluster in the tile	

593 y - coordinate of cluster in the tile	

#0 index of multiplexed samples	

/1 member of pair /1 or /2 if Paired end
Deciphering a fastq file
@HWI-_FC_20BTNAAXX:2:1:215:593#0/1	

ACAGTGCATGACATGCATAGCAGCATAGACTAC	

+HWI-_FC_20BTNAAXX:2:1:215:593#0/1	

GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh	

	

	

Sequence: ACAGTGCATGACATGCATAGCAGCATAGACTAC	

Quality Header: +HWI-_FC_20BTNAAXX:2:1:215:593	

Quality: GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh
Deciphering a fastq file
Quality: GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh	

Sanger encoding = ASCII table lookup – 33	

Solexa encoding = ASCII table lookup – 64	

	

	

	

	

	

	

	

	

	

	

G = 71 – 64 = 7	

h = 104- 64 = 40
Deciphering a fastq file
Quality: GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh
SNP Calling
•  Genotyping: identifying variants in a single genome
(i.e. from each parent)
•  SNP Calling: identifying variants between individual genomes
ACGTGCAGCATAGCA?CGACATCGACATACGC
TGCACGTCGTATCGT?GCTGTAGCTGTATGCG
****A*******
***A*****
**T******
*****T**********
******T********
ACGTGCAGCATAGCATCGACATCGACATACGC
TGCACGTCGTATCGTAGCTGTAGCTGTATGCG
Sample Genome(s)
Reads
Reference Genome
INDEL Calling
•  INDEL: Insertion/deletion
ACGTGCAGCATAGCA???CGACATCGACATACGC
TGCACGTCGTATCGT???GCTGTAGCTGTATGCG
****ACG*******
***ACG*****
**---********
*****---**********
******---********
ACGTGCAGCATAGCACGTCGACATCGACATACGC
TGCACGTCGTATCGTGCAGCTGTAGCTGTATGCG
Sample Genome(s)
Reads
Reference Genome
A pipeline for SNP identification
•  Quality Control
–  FastQC, Fastx toolkit
•  Trimming: 
–  Sickle, Trimgalore, Trimomatic, Cutadapt
•  Mapping 
–  BWA, Bowtie, Stampy
•  Remove Duplicates
–  Picard tools, Samtools rmdup
•  Call SNPs / INDELS
–  Samtoools mpileup, VarScan, GATK, Many
others!
Galaxy: https://usegalaxy.org/
Toolbox Workflows
Visualize the data
•  FASTQC
•  Stand Alone or non-interactive
–  Basic Statistics module, includes:
•  Filename: The original filename of the file which was analyzed
•  Encoding: Says which ASCII encoding of quality values was found in this file.
•  Total Sequences: A count of the total number of sequences processed.
•  Sequence Length: Provides the length of the shortest and longest sequence in
the set. If all sequences are the same length only one value is reported.
Visualize the data
•  FASTQC: Per Base Sequence Quality:
Red line = Median quality
Yellow box = IQR
Whiskers = 10%-90%
Blue line = Mean quality

If the lower quartile for
any base is less than 10,
or if the median for any
base is less than 25.

If the lower quartile for
any base is less than 5 or
if the median for any base
is less than 20.
Visualize the data
•  FASTQC: Per Sequence Quality Scores:


If the most frequently
observed mean quality is
below 27 - this equates
to a 0.2% error rate.

If the most frequently
observed mean quality is
below 20 - this equates
to a 1% error rate.
Visualize the data
•  FASTQC: Per Base Sequence Content

Proportion of each base
position in a file for which
ATCG DNA bases has
been called. 

If the difference between
A and T, or G and C is
greater than 10% in any
position.

If the difference between
A and T, or G and C is
greater than 20% in any
position. 
Possibly adapters or
affect of trimming.
Visualize the data
•  FASTQC: Sequence Length Distribution
Distribution of fragment
sizes in the file.


If all sequences are not
the same length.


If any of the sequences
have zero length.
Visualize the data
•  FASTQC: Duplicate Sequences
Degree of duplication
within first 200,000
reads of file. Distribution
of duplication levels in
dataset

If non-unique sequences
make up more than 20%
of the total.

If non-unique sequences
make up more than 50%
of the total.
Visualize the data
•  FASTQC: Overrepresented Sequences
•  FASTQC: Adapter Content
Lists all of the sequences which make up more than 0.1% of the total.
If any sequence is found to represent more than 0.1% of the total.

If any sequence is found to represent more than 1% of the total. 




To know if your library contains a significant amount of adapter in order to be able
to assess whether you need to adapter trim or not.
If any sequence is present in more than 5% of all reads.

If any sequence is present in more than 10% of all reads.
Cut adapters
-f = the type of file (in this case fastq)
-q CUTOFF, Trim low-quality ends from reads before adapter removal.
-a ADAPTER, Sequence of an adapter that was ligated to the 3' end. The adapter itself and
anything that follows is trimmed.
-m 100 minimum length of reads following adapter removal. Reads less than 100 will be
discarded
--discard-untrimmed any reads without an adapter will be discarded.
-o output file also in fastq format.
cutadapt -f fastq -q 20 -a AGATCGGAAGAG -m 100 --discard-untrimmed
-o SNP.test.trimmed.fastq SNP.test.fastq
Cut adapters
Galaxy: Fastx_clipper from fastx_toolkit
Quality filters
Quality filters
Fastx_toolkit
-q = Minimum quality score to keep.
-p = Minimum percent of bases in a read that must have [-q] quality.
- i input file (output of adapter trimming step)
-v verbose
-Q quality encoding
-o output file also in fastq format.
fastq_quality_filter -q 20 -p 70 -i SNP.test.trimmed.fastq -v –Q64
-o SNP.test.trimmed.QC.fastq
Quality filters
•  Galaxy: filter by quality
Align to genome
The problem
•  Generally a large genome (Human > 3Gb)
•  Large number of short reads
The solution
•  Index genome into hash of kmers or short sequences
•  Use efficient aligners
–  Large number of aligners available.
•  Common aligners: Bowtie1/2, BWA, Stampy
Align to genome
Example Bowtie 2 alignment
Build index
-f fasta formatted genome file
./bowtie.index.files/chr17 output location for index files
Galaxy: select pre-built index when using bowtie or BWA
bowtie2-build -f ./genome/chr17.fa ./bowtie.index.files/chr17
Align to genome
Align to reference
-p number of processors
--end-to-end alignment is not local
-k 1 number of positions read is allowed to align k = 1 means all non-
uniquely mapping reads are discarded
-x path to indexed genome file to align reads to.
-U reads are unpaired (in this case
-S output in SAM format
bowtie2 -p 4 --end-to-end -k 1 -x ./bowtie.index.files/chr17
-U SNP.test.trimmed.QC.fastq -S SNP.test.trimmed.QC.fastq.sam
Alignment file formats
SAM - Sequence Alignment/Map format. (BAM is a binary compressed
equivalent)
–  TAB-delimited text format
–  header section (optional)
@HD = Header, VN[format version], SO[sorting order]
@SQ = Reference sequence line
@PG = Program, ID[program ID], CL[command line]
@HD VN:1.0 SO:unsorted
@SQ SN:chr17 LN:81195210
@PG ID:bowtie2 PN:bowtie2 VN:2.2.1 CL:"/
home/rde/tools/bowtie2-2.2.1/bowtie2-align-s --wrapper
basic-0 -p 4 --end-to-end -k 1 -x ./bowtie.index.files/
chr17 -S SNP.test.trimmed.QC.fastq.sam -U
SNP.test.trimmed.QC.fastq”
Alignment file formats
•  SAM - Sequence Alignment/Map format.
–  TAB-delimited text format
–  alignment section mandatory 11 columns
–  Optional fields
–  For details http://samtools.github.io/hts-specs/SAMv1.pdf
–  Samflags http://picard.sourceforge.net/explain-flags.html
instrument_name:100:flowcellID:1:1:32575:625 0
chr17 72387574 255 141M *
0 0
GGAAGAGCTGGGACCAGGCCCAGCAATTACCTCACCATGGTTGGGTGCAACCAAG
TGGGGCAACTCTTTGGCCAGAAAGCAAAAGTCTTTTTAGCTTCAATGTAGGCCAT
TCTGGGTCCCAGACCCACAGCTTTGGACATT
Dgga^h`hedac`hcc`c^_f`ChggeDfC`e`_h_ddgd^_`be^bcg^acchC
`^^bCdagfbgf_`^dc^^cD`dbha^Dbc`fb^d`CdaD^``dDb_edffdDDh
_chceabh`heecd_h`gb_b^Ch`Dd_^c` AS:i:0 XN:i:
0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:
141 YT:Z:UU
1  Query template
2  Bitwise flag
3  Reference Name
4  1-based leftmost position
5  Mapping Quality
6  CIGAR string
7  Reference Name of mate
8  Position of mate
9  Observed template length
10  Sequence
11  Phred-scaled Quality
Remove duplicates
•  Duplicate reads generated as artifact in the library
generation step results in false confidence in variants.
•  samtools [rmdup], Picard tools [MarkDuplicates]
samtools view -bS SNP.test.trimmed.QC.fastq.sam
-o SNP.test.trimmed.QC.fastq.bam
samtools rmdup -s SNP.test.trimmed.QC.fastq.bam
SNP.test.trimmed.QC.fastq.rmdup.bam
Remove duplicates
Call Variants
•  Identify regions in alignment where sequence differs
****A*******
***A*****
**T******
*****T**********
******T********
ACGTGCAGCATAGCATCGACATCGACATACGC
Reads
Reference Genome
Call Variants
samtools sort SNP.test.trimmed.QC.fastq.rmdup.bam
SNP.test.trimmed.QC.fastq.rmdup.sorted
samtools index SNP.test.trimmed.QC.fastq.rmdup.sorted.bam
samtools faidx ./genome/chr17.fa
samtools mpileup -f ./genome/chr17.fa
SNP.test.trimmed.QC.fastq.rmdup.sorted.bam
| java -jar ./tool/VarScan.v2.3.6.jar mpileup2snp --output-vcf –
strand-filter 0
samtools mpileup
-f faidx indexed reference sequence file
VarScan mpileup2snp or mpileup2indel
--min-coverage Minimum read depth at a position to make a call [8]
--min-reads2 Minimum supporting reads at a position to call variants [2]
--min-avg-qual Minimum base quality at a position to count a read [15]
--min-var-freq Minimum variant allele frequency threshold [0.01]
--strand-filter Ignore variants with >90% support on one strand [1]
Call Variants
Call Variants
VCF Variant Call Format file
•  Header text marked with ##
•  Column headings marked with #
•  Mandatory columns
–  CHROM Chromosome
–  POS Position of variant start
–  ID Unique variant ID
–  REF Reference Allele
–  ALT Alternate non-reference alleles (comma separated)
–  QUAL Phred quality score
–  FILTER Filtering information
–  INFO User annotation
Visualize: IGV
So Many SNPS – So What?
•  Get gene
•  Functional Analysis to identify key candidates
Identify
Homologues
Locate
variants
Identify
Ontologies
Pathway and
interaction
analysis
Locate SNPs
on structure
Compare to
current data
•  Functional Analysis to identify key candidates
A step by step example (don’t do this with lots of variants!)
•  Get gene
•  Modify bases as shown in VCF file.
•  BLASTx to identify reading frame.
•  Produce mRNA, and encoded peptide sequence fasta files (provided)
•  Determine variant positions in mRNA & peptide sequences
Compare to
current data
•  dbSNP
–  SNP already known?
•  Repositories such as Ensembl, UCSC
–  In splice variant?
–  In known regions, domain etc?
•  Variant effect predictor (more later)
•  Visualize the position in genome – Ensembl/UCSC
–  Add custom track using GFF/bed file
•  Coding/non-coding
–  Synonymous / non-synonymous
–  Codon usage
•  Locate in relation to known domains
–  Pfam
–  SMART
•  Repeat regions
Locate
variants
•  For single genes
–  Search for available information PubMed, interpro
etc
•  For multiple genes
–  BLAST2GO
–  DAVID
Identify
Ontologies
•  Pathway analysis
–  Understand the process of your gene. Does it make
biological sense?
–  IPA
–  DAVID
–  Webgestalt
•  Interaction analysis
–  BioGRID
–  STRING
–  PSICQUIC
Pathway and
interaction
analysis
•  Structure prediction
–  BLAST of PDB
–  Predict structure
•  PSIPRED – 2° structure
•  ITASSER – 3° structure
•  Phyre – 3° structure
–  Locate in 3D
•  Swiss PDB viewer
Locate SNPs
on structure
•  Predict effect of SNPs
•  Suspect
•  VEP
Locate SNPs
on structure
Arg	

 Ser
Identify
Homologues
Emes R.D. Inferring function from homology. in Methods in Molecular Biology 453: Humana Press 2008.
Links and websites 1
•  Many at http://emeslab.wordpress.com/useful-links/
•  SRA: http://www.ncbi.nlm.nih.gov/sra
•  SRA toolkit http://eutils.ncbi.nih.gov/Traces/sra/?view=software
•  ENA: http://www.ebi.ac.uk/ena/
•  Galaxy: https://usegalaxy.org/
•  FASTQC: www.bioinformatics.babraham.ac.uk/projects/fastqc/
•  Fastx_toolkit: http://hannonlab.cshl.edu/fastx_toolkit/
•  Bowtie 1: http://bowtie-bio.sourceforge.net/index.shtml
•  Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
•  BWA: http://bio-bwa.sourceforge.net/
•  Stampy: http://www.well.ox.ac.uk/project-stampy
•  SAMtools: http://samtools.sourceforge.net
•  PicardTools: http://picard.sourceforge.net
•  VarScan: http://varscan.sourceforge.net
•  IGV: https://www.broadinstitute.org/software/igv/
Links and websites 2
•  dbSNP: http://www.ncbi.nlm.nih.gov/SNP/
•  Ensembl: http://www.ensembl.org/index.html
•  UCSC Genome Browser: https://genome.ucsc.edu/
•  Pfam: http://pfam.xfam.org/
•  SMART: http://smart.embl-heidelberg.de/
•  GO: http://www.geneontology.org/
•  BLAST2GO: http://www.blast2go.com/b2ghome
•  IPA: http://www.ingenuity.com/products/login
•  DAVID: http://david.abcc.ncifcrf.gov/
•  Webgestalt: http://bioinfo.vanderbilt.edu/webgestalt/
•  BioGRID: http://thebiogrid.org/
•  PSICQUIC: http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml
•  ITASSER: http://zhanglab.ccmb.med.umich.edu/I-TASSER/
•  Phyre: http://www.sbg.bio.ic.ac.uk/phyre2/
•  SuSPect: http://www.sbg.bio.ic.ac.uk/~suspect/
•  PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/
•  SwissPDB: http://spdbv.vital-it.ch/
Richard Emes
Associate Professor & Reader in Bioinformatics.
School of Veterinary Medicine and Science
Director Advanced Data Analysis Centre
richard.emes@nottingham.ac.uk
www.nottingham.ac.uk/adac
@rdemes
@ADAC_UoN
61

Weitere ähnliche Inhalte

Was ist angesagt?

RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)Dimitris Kontokostas
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotechAdam Muise
 
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Data Consortium
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search systemAleesha Noushad
 
A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search systemAleesha Noushad
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 

Was ist angesagt? (15)

ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha Noy
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search system
 
2016 02 23_biological_databases_part1
2016 02 23_biological_databases_part12016 02 23_biological_databases_part1
2016 02 23_biological_databases_part1
 
A scalabilty and mobility resilient data search system
A  scalabilty and mobility resilient data search systemA  scalabilty and mobility resilient data search system
A scalabilty and mobility resilient data search system
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Biz model for ion proton dna sequencer
Biz model for ion proton dna sequencerBiz model for ion proton dna sequencer
Biz model for ion proton dna sequencer
 

Andere mochten auch

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06Computer Science Club
 
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSTHE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSNikolaos Tselios
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Mutiple Sclerosis
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.Varsha Gayatonde
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...Meningitis Research Foundation
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWASJinseob Kim
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talkjoshbis
 
01 introduction-f. laurens-20130221
01 introduction-f. laurens-2013022101 introduction-f. laurens-20130221
01 introduction-f. laurens-20130221fruitbreedomics
 
Introduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselIntroduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselAwais Khan
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASHong ChangBum
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 

Andere mochten auch (20)

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITSTHE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
THE GENETIC ARCHITECTURES OF PSYCHOLOGICAL TRAITS
 
13 aranzana
13 aranzana13 aranzana
13 aranzana
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWAS
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Crowdsourcing GWAS
Crowdsourcing GWASCrowdsourcing GWAS
Crowdsourcing GWAS
 
Intro gwas
Intro gwasIntro gwas
Intro gwas
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talk
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
01 introduction-f. laurens-20130221
01 introduction-f. laurens-2013022101 introduction-f. laurens-20130221
01 introduction-f. laurens-20130221
 
Introduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselIntroduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tassel
 
GWAS
GWASGWAS
GWAS
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWAS
 
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Omics era
Omics eraOmics era
Omics era
 

Ähnlich wie Gwas.emes.comp

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
RNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptxRNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptxBiancaMoreira45
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfPushpendra83
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsAdam Bradley
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
MTech_Thesis_presentation.ppt
MTech_Thesis_presentation.pptMTech_Thesis_presentation.ppt
MTech_Thesis_presentation.pptAhmed638470
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisAdamCribbs1
 
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2QIAGEN
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
 
From Zero to Nextflow 2017
From Zero to Nextflow 2017From Zero to Nextflow 2017
From Zero to Nextflow 2017Luca Cozzuto
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph Community
 

Ähnlich wie Gwas.emes.comp (20)

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
RNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptxRNA-Seq_analysis_course(2).pptx
RNA-Seq_analysis_course(2).pptx
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence ReadsPipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
Pipeline Scripting for the Parallel Alignment of Genomic Short Sequence Reads
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
MTech_Thesis_presentation.ppt
MTech_Thesis_presentation.pptMTech_Thesis_presentation.ppt
MTech_Thesis_presentation.ppt
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
From Zero to Nextflow 2017
From Zero to Nextflow 2017From Zero to Nextflow 2017
From Zero to Nextflow 2017
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 

Kürzlich hochgeladen

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 

Kürzlich hochgeladen (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 

Gwas.emes.comp

  • 2. Obtaining, QC, mapping and analysis of NGS data. Richard Emes Associate Professor & Reader in Bioinformatics. School of Veterinary Medicine and Science Director Advanced Data Analysis Centre richard.emes@nottingham.ac.uk www.nottingham.ac.uk/adac @rdemes @ADAC_UoN 2
  • 3. What is ADAC? The University of Nottingham Advanced Data Analysis Centre (ADAC). •  Bioinformatics and data analysis support. Why is this important? •  Complex data underpins much current research. •  Innovative analysis can prompt new discoveries. •  Excellent research can often be stalled due to a lack of expertise in conducting data analysis, availability or cost of inclusion of diverse specialists. Why ADAC? •  ADAC supports high-class research by providing analysts with expertise in a range of bioinformatics and computer science disciplines. •  Flexible support •  Consultancy, collaboration, bespoke analysis. •  Leadership from recognized experts in the fields of bioinformatics and computer science. •  Track record of funding •  Pivotal role in collaborations funded by amongst others Zoetis, BBSRC, NERC and Technology Strategy Board. •  ADAC is conducting transcriptome analysis for a multinational FP7 funded project (EU Prohealth).
  • 4. http://www.nottingham.ac.uk/adac/ Enquiries: adac@nottingham.ac.uk or richard.emes@nottingham.ac.uk @ADAC_UoN or @rdemes Current Areas of expertise relevant here: •  Transcriptomics (Microarray, NGS) •  Comparative genomics (eukaryotic, prokaryotic) •  Identification of biomarkers from genetic and epigenetic datasets •  Artificial intelligence for decision support •  Machine Learning •  Integration of complex datasets •  Data Management
  • 6. Some common terms •  Library: collection of molecules. This is the “complexity” of what you sequence. •  Flowcell: slide where sequencing is attached to a solid platform. •  Lane: unique sequencing unit of the flowcell. •  Reads: Raw sequence of bases and imputed quality scores. •  Fragment: Original molecule being sequenced (fragment of genome/gene). Ie PE are reads form the same fragment. •  Cluster: DNA bound to slide, local amplification of product to amplify signal to measure fluorescence. •  Mapping: Finds where your sequence matches to a reference. Importantly gives a probability that this is the correct location.
  • 9.
  • 10.
  • 11. What coverage do I need?
  • 12. Obtaining NGS Data •  Short Read Archive (SRA) •  European Nucleotide Archive (ENA)
  • 13. Obtaining NGS Data •  Short Read Archive (SRA) •  Will need to convert to FastQ using sra toolkit
  • 14. Deciphering a fastq file @HWI-_FC_20BTNAAXX:2:1:215:593#0/1 ACAGTGCATGACATGCATAGCAGCATAGACTAC +HWI-_FC_20BTNAAXX:2:1:215:593#0/1 GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh Header: @HWI-_FC_20BTNAAXX:2:1:215:593#0/1 HWI-_FC_20BTNAAXX instrument identifier 2 flowcell lane 1 tile number in flowcell lane 215 x - coordinate of cluster in the tile 593 y - coordinate of cluster in the tile #0 index of multiplexed samples /1 member of pair /1 or /2 if Paired end
  • 15. Deciphering a fastq file @HWI-_FC_20BTNAAXX:2:1:215:593#0/1 ACAGTGCATGACATGCATAGCAGCATAGACTAC +HWI-_FC_20BTNAAXX:2:1:215:593#0/1 GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh Sequence: ACAGTGCATGACATGCATAGCAGCATAGACTAC Quality Header: +HWI-_FC_20BTNAAXX:2:1:215:593 Quality: GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh
  • 16. Deciphering a fastq file Quality: GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh Sanger encoding = ASCII table lookup – 33 Solexa encoding = ASCII table lookup – 64 G = 71 – 64 = 7 h = 104- 64 = 40
  • 17. Deciphering a fastq file Quality: GhhhhhhhhhhhUhhEGhhhGhhhhhhhhhhhhh
  • 18. SNP Calling •  Genotyping: identifying variants in a single genome (i.e. from each parent) •  SNP Calling: identifying variants between individual genomes ACGTGCAGCATAGCA?CGACATCGACATACGC TGCACGTCGTATCGT?GCTGTAGCTGTATGCG ****A******* ***A***** **T****** *****T********** ******T******** ACGTGCAGCATAGCATCGACATCGACATACGC TGCACGTCGTATCGTAGCTGTAGCTGTATGCG Sample Genome(s) Reads Reference Genome
  • 19. INDEL Calling •  INDEL: Insertion/deletion ACGTGCAGCATAGCA???CGACATCGACATACGC TGCACGTCGTATCGT???GCTGTAGCTGTATGCG ****ACG******* ***ACG***** **---******** *****---********** ******---******** ACGTGCAGCATAGCACGTCGACATCGACATACGC TGCACGTCGTATCGTGCAGCTGTAGCTGTATGCG Sample Genome(s) Reads Reference Genome
  • 20. A pipeline for SNP identification •  Quality Control –  FastQC, Fastx toolkit •  Trimming: –  Sickle, Trimgalore, Trimomatic, Cutadapt •  Mapping –  BWA, Bowtie, Stampy •  Remove Duplicates –  Picard tools, Samtools rmdup •  Call SNPs / INDELS –  Samtoools mpileup, VarScan, GATK, Many others!
  • 22. Visualize the data •  FASTQC •  Stand Alone or non-interactive –  Basic Statistics module, includes: •  Filename: The original filename of the file which was analyzed •  Encoding: Says which ASCII encoding of quality values was found in this file. •  Total Sequences: A count of the total number of sequences processed. •  Sequence Length: Provides the length of the shortest and longest sequence in the set. If all sequences are the same length only one value is reported.
  • 23. Visualize the data •  FASTQC: Per Base Sequence Quality: Red line = Median quality Yellow box = IQR Whiskers = 10%-90% Blue line = Mean quality If the lower quartile for any base is less than 10, or if the median for any base is less than 25. If the lower quartile for any base is less than 5 or if the median for any base is less than 20.
  • 24. Visualize the data •  FASTQC: Per Sequence Quality Scores: If the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. If the most frequently observed mean quality is below 20 - this equates to a 1% error rate.
  • 25. Visualize the data •  FASTQC: Per Base Sequence Content Proportion of each base position in a file for which ATCG DNA bases has been called. If the difference between A and T, or G and C is greater than 10% in any position. If the difference between A and T, or G and C is greater than 20% in any position. Possibly adapters or affect of trimming.
  • 26. Visualize the data •  FASTQC: Sequence Length Distribution Distribution of fragment sizes in the file. If all sequences are not the same length. If any of the sequences have zero length.
  • 27. Visualize the data •  FASTQC: Duplicate Sequences Degree of duplication within first 200,000 reads of file. Distribution of duplication levels in dataset If non-unique sequences make up more than 20% of the total. If non-unique sequences make up more than 50% of the total.
  • 28. Visualize the data •  FASTQC: Overrepresented Sequences •  FASTQC: Adapter Content Lists all of the sequences which make up more than 0.1% of the total. If any sequence is found to represent more than 0.1% of the total. If any sequence is found to represent more than 1% of the total. To know if your library contains a significant amount of adapter in order to be able to assess whether you need to adapter trim or not. If any sequence is present in more than 5% of all reads. If any sequence is present in more than 10% of all reads.
  • 29. Cut adapters -f = the type of file (in this case fastq) -q CUTOFF, Trim low-quality ends from reads before adapter removal. -a ADAPTER, Sequence of an adapter that was ligated to the 3' end. The adapter itself and anything that follows is trimmed. -m 100 minimum length of reads following adapter removal. Reads less than 100 will be discarded --discard-untrimmed any reads without an adapter will be discarded. -o output file also in fastq format. cutadapt -f fastq -q 20 -a AGATCGGAAGAG -m 100 --discard-untrimmed -o SNP.test.trimmed.fastq SNP.test.fastq
  • 32.
  • 33. Quality filters Fastx_toolkit -q = Minimum quality score to keep. -p = Minimum percent of bases in a read that must have [-q] quality. - i input file (output of adapter trimming step) -v verbose -Q quality encoding -o output file also in fastq format. fastq_quality_filter -q 20 -p 70 -i SNP.test.trimmed.fastq -v –Q64 -o SNP.test.trimmed.QC.fastq
  • 34. Quality filters •  Galaxy: filter by quality
  • 35. Align to genome The problem •  Generally a large genome (Human > 3Gb) •  Large number of short reads The solution •  Index genome into hash of kmers or short sequences •  Use efficient aligners –  Large number of aligners available. •  Common aligners: Bowtie1/2, BWA, Stampy
  • 36. Align to genome Example Bowtie 2 alignment Build index -f fasta formatted genome file ./bowtie.index.files/chr17 output location for index files Galaxy: select pre-built index when using bowtie or BWA bowtie2-build -f ./genome/chr17.fa ./bowtie.index.files/chr17
  • 37. Align to genome Align to reference -p number of processors --end-to-end alignment is not local -k 1 number of positions read is allowed to align k = 1 means all non- uniquely mapping reads are discarded -x path to indexed genome file to align reads to. -U reads are unpaired (in this case -S output in SAM format bowtie2 -p 4 --end-to-end -k 1 -x ./bowtie.index.files/chr17 -U SNP.test.trimmed.QC.fastq -S SNP.test.trimmed.QC.fastq.sam
  • 38.
  • 39. Alignment file formats SAM - Sequence Alignment/Map format. (BAM is a binary compressed equivalent) –  TAB-delimited text format –  header section (optional) @HD = Header, VN[format version], SO[sorting order] @SQ = Reference sequence line @PG = Program, ID[program ID], CL[command line] @HD VN:1.0 SO:unsorted @SQ SN:chr17 LN:81195210 @PG ID:bowtie2 PN:bowtie2 VN:2.2.1 CL:"/ home/rde/tools/bowtie2-2.2.1/bowtie2-align-s --wrapper basic-0 -p 4 --end-to-end -k 1 -x ./bowtie.index.files/ chr17 -S SNP.test.trimmed.QC.fastq.sam -U SNP.test.trimmed.QC.fastq”
  • 40. Alignment file formats •  SAM - Sequence Alignment/Map format. –  TAB-delimited text format –  alignment section mandatory 11 columns –  Optional fields –  For details http://samtools.github.io/hts-specs/SAMv1.pdf –  Samflags http://picard.sourceforge.net/explain-flags.html instrument_name:100:flowcellID:1:1:32575:625 0 chr17 72387574 255 141M * 0 0 GGAAGAGCTGGGACCAGGCCCAGCAATTACCTCACCATGGTTGGGTGCAACCAAG TGGGGCAACTCTTTGGCCAGAAAGCAAAAGTCTTTTTAGCTTCAATGTAGGCCAT TCTGGGTCCCAGACCCACAGCTTTGGACATT Dgga^h`hedac`hcc`c^_f`ChggeDfC`e`_h_ddgd^_`be^bcg^acchC `^^bCdagfbgf_`^dc^^cD`dbha^Dbc`fb^d`CdaD^``dDb_edffdDDh _chceabh`heecd_h`gb_b^Ch`Dd_^c` AS:i:0 XN:i: 0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z: 141 YT:Z:UU 1  Query template 2  Bitwise flag 3  Reference Name 4  1-based leftmost position 5  Mapping Quality 6  CIGAR string 7  Reference Name of mate 8  Position of mate 9  Observed template length 10  Sequence 11  Phred-scaled Quality
  • 41. Remove duplicates •  Duplicate reads generated as artifact in the library generation step results in false confidence in variants. •  samtools [rmdup], Picard tools [MarkDuplicates] samtools view -bS SNP.test.trimmed.QC.fastq.sam -o SNP.test.trimmed.QC.fastq.bam samtools rmdup -s SNP.test.trimmed.QC.fastq.bam SNP.test.trimmed.QC.fastq.rmdup.bam
  • 43. Call Variants •  Identify regions in alignment where sequence differs ****A******* ***A***** **T****** *****T********** ******T******** ACGTGCAGCATAGCATCGACATCGACATACGC Reads Reference Genome
  • 44. Call Variants samtools sort SNP.test.trimmed.QC.fastq.rmdup.bam SNP.test.trimmed.QC.fastq.rmdup.sorted samtools index SNP.test.trimmed.QC.fastq.rmdup.sorted.bam samtools faidx ./genome/chr17.fa samtools mpileup -f ./genome/chr17.fa SNP.test.trimmed.QC.fastq.rmdup.sorted.bam | java -jar ./tool/VarScan.v2.3.6.jar mpileup2snp --output-vcf – strand-filter 0 samtools mpileup -f faidx indexed reference sequence file VarScan mpileup2snp or mpileup2indel --min-coverage Minimum read depth at a position to make a call [8] --min-reads2 Minimum supporting reads at a position to call variants [2] --min-avg-qual Minimum base quality at a position to count a read [15] --min-var-freq Minimum variant allele frequency threshold [0.01] --strand-filter Ignore variants with >90% support on one strand [1]
  • 47. VCF Variant Call Format file •  Header text marked with ## •  Column headings marked with # •  Mandatory columns –  CHROM Chromosome –  POS Position of variant start –  ID Unique variant ID –  REF Reference Allele –  ALT Alternate non-reference alleles (comma separated) –  QUAL Phred quality score –  FILTER Filtering information –  INFO User annotation
  • 49.
  • 50. So Many SNPS – So What? •  Get gene •  Functional Analysis to identify key candidates Identify Homologues Locate variants Identify Ontologies Pathway and interaction analysis Locate SNPs on structure Compare to current data
  • 51. •  Functional Analysis to identify key candidates A step by step example (don’t do this with lots of variants!) •  Get gene •  Modify bases as shown in VCF file. •  BLASTx to identify reading frame. •  Produce mRNA, and encoded peptide sequence fasta files (provided) •  Determine variant positions in mRNA & peptide sequences
  • 52. Compare to current data •  dbSNP –  SNP already known? •  Repositories such as Ensembl, UCSC –  In splice variant? –  In known regions, domain etc? •  Variant effect predictor (more later)
  • 53. •  Visualize the position in genome – Ensembl/UCSC –  Add custom track using GFF/bed file •  Coding/non-coding –  Synonymous / non-synonymous –  Codon usage •  Locate in relation to known domains –  Pfam –  SMART •  Repeat regions Locate variants
  • 54. •  For single genes –  Search for available information PubMed, interpro etc •  For multiple genes –  BLAST2GO –  DAVID Identify Ontologies
  • 55. •  Pathway analysis –  Understand the process of your gene. Does it make biological sense? –  IPA –  DAVID –  Webgestalt •  Interaction analysis –  BioGRID –  STRING –  PSICQUIC Pathway and interaction analysis
  • 56. •  Structure prediction –  BLAST of PDB –  Predict structure •  PSIPRED – 2° structure •  ITASSER – 3° structure •  Phyre – 3° structure –  Locate in 3D •  Swiss PDB viewer Locate SNPs on structure
  • 57. •  Predict effect of SNPs •  Suspect •  VEP Locate SNPs on structure Arg Ser
  • 58. Identify Homologues Emes R.D. Inferring function from homology. in Methods in Molecular Biology 453: Humana Press 2008.
  • 59. Links and websites 1 •  Many at http://emeslab.wordpress.com/useful-links/ •  SRA: http://www.ncbi.nlm.nih.gov/sra •  SRA toolkit http://eutils.ncbi.nih.gov/Traces/sra/?view=software •  ENA: http://www.ebi.ac.uk/ena/ •  Galaxy: https://usegalaxy.org/ •  FASTQC: www.bioinformatics.babraham.ac.uk/projects/fastqc/ •  Fastx_toolkit: http://hannonlab.cshl.edu/fastx_toolkit/ •  Bowtie 1: http://bowtie-bio.sourceforge.net/index.shtml •  Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml •  BWA: http://bio-bwa.sourceforge.net/ •  Stampy: http://www.well.ox.ac.uk/project-stampy •  SAMtools: http://samtools.sourceforge.net •  PicardTools: http://picard.sourceforge.net •  VarScan: http://varscan.sourceforge.net •  IGV: https://www.broadinstitute.org/software/igv/
  • 60. Links and websites 2 •  dbSNP: http://www.ncbi.nlm.nih.gov/SNP/ •  Ensembl: http://www.ensembl.org/index.html •  UCSC Genome Browser: https://genome.ucsc.edu/ •  Pfam: http://pfam.xfam.org/ •  SMART: http://smart.embl-heidelberg.de/ •  GO: http://www.geneontology.org/ •  BLAST2GO: http://www.blast2go.com/b2ghome •  IPA: http://www.ingenuity.com/products/login •  DAVID: http://david.abcc.ncifcrf.gov/ •  Webgestalt: http://bioinfo.vanderbilt.edu/webgestalt/ •  BioGRID: http://thebiogrid.org/ •  PSICQUIC: http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml •  ITASSER: http://zhanglab.ccmb.med.umich.edu/I-TASSER/ •  Phyre: http://www.sbg.bio.ic.ac.uk/phyre2/ •  SuSPect: http://www.sbg.bio.ic.ac.uk/~suspect/ •  PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/ •  SwissPDB: http://spdbv.vital-it.ch/
  • 61. Richard Emes Associate Professor & Reader in Bioinformatics. School of Veterinary Medicine and Science Director Advanced Data Analysis Centre richard.emes@nottingham.ac.uk www.nottingham.ac.uk/adac @rdemes @ADAC_UoN 61