SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Quality control of sequencing
with FastQC obtained with the
Illumina platform
Hafiz.M.Zeeshan.Raza
Research Associate
COMSATS University Islamabad, Pakistan
Sahiwal Campus
hafizraza26@gmail.com
Cell# 0092-36-6155501
Basic data (BASIC STATISTICS)
In the basic statistical data it is represented:
• File name: Sec_Ilumina.fastq.txt
• File format: conventional
• System used: Illumina
• Total analyzed sequences: 25000. This is the number of
readings analyzed.
• Sequences marked with bad quality: 0.
• Length of each reading: 38 bases
• % GC: 45%
• Note: The program will not tell you the sequences that
have bad quality you are the one that will correct them
and you will mark them.
Quality (Q) of the sequences per base
(PER BASE SEQUENCE QUALITY)
• The quartiles are represented in yellow, the blue line is the median and in
red, the mean of the quality. In the X axis, the bases of the readings are
represented and each reading has 38 bases. While in the Y, the qualities 0-
34 are represented, distinguishing three zones:
• Green zone : 28-34. They correspond to a very good quality.
• Orange zone : intermediate quality zone (20-28).
• Red zone : area of ​​poor quality (0-20).
• The quality of the 25000 readings is represented from each base.
Continue…
• From the graphical representation, it can be said that when you see the qualities
assigned to the first base, they are all very good, since they are in the green
zone. At the base 38 there is a lot of dispersion in the qualities (that is why the
quartile is so big), that is, there are good qualities and others very bad.
• In conclusion, we can say that until the base 22 the qualities are good, but from
this they get worse since of the 25000 readings from the base 23 I have some
readings in which that base has bad quality. Therefore, I will have to use a program
that will remove all the bases on which, for example, Q <25 (the quality is assigned
by us) or that make me the average Q <25. In this way, I will have that of the
25,000 readings each will have a different size, there will be readings that have 38
bases and others that do not.
Quality of the sequence by "tile"
(PER TILE SEQUENCE QUALITY).
• In this case a graphic is shown here (it only appears if an
Illumina library is used) that shows the flow cells, where the
sequence is placed. This chart allows you to search the
quality scores of each piece through all its bases to see if
there was a quality loss associated with only part of the flow
cell.
• If there are marks on the graph, this tells me that I have poor
quality since I may not have filtered the reagents, I have not
done the vacuum. So that the bubbles stay in the flow cell and
when looking at the spectra it interferes me, giving a bad
quality.
• In our image no fault is shown by us since the background is
blue. Which indicates that it has been degassed and filtered.
Levels of quality per sequence
(PER SEQUENCE QUALITY SCORE)
• It gives us an idea in advance of how many readings I am going to
remove since it allows to see if a subset of its sequences have
values ​​with low quality.
• In the graph shown on the left, the average of the quality of the
sequences is represented on the X axis. While on the Y axis, the
number of sequences or readings corresponding to that average is
represented.
• In our case, it can be seen that there are more than 3500 readings
that present a warm environment of 29-31. Existing much less with
low quality.
Content of the sequence per base
(PER BASE SEQUENCE CONTENT)
• In this section we are told the proportion of each of the
bases in the sequence.
• In a random library, there should be little difference
between the bases of a sequence of execution, so the lines
in this plot must run parallel to each other.
• In our case, we can see that there are differences between
some bases and others when the amount of A should be
equal to that of T, and that of G = C.
Content of guanine and cytosine (GC) per sequence
(PER SEQUENCE GC CONTENT)
• This module measures the GC content of our entire sequence ( red line ) and
compares it with a theoretical normal distribution of GC content ( blue line ).
• The average percentage of the content of G and C is shown on the X axis, while the
number of readings is shown on the Y axis.
• In our case ( red line ), we see that there are several peaks, where there should be
a Gaussian curve. This indicates that you have been able to recognize:
 Adapter dimers
 Contamination with other DNA
• If I have sequences with bad quality, in which I do not know what the base is,
when sequencing G or C is put where maybe I should not go.
• This indicates to me that the sequencing has not been carried out correctly, but
after analyzing the file with the qualities, I will be able to correct them and see:
• If there was DNA contamination, if it is of good quality I will not remove it
• In the case of sequences that are misread, if you read them as G and C when
correcting them, they should be removed.
Content of N per base
(PER BASE N CONTENT)
• This section tells us the content of bases that have an N
(unassigned base).
• In our case, we can see that the content of N is
practically nil, that is, that N has not been assigned to
the bases that were not known, but has placed an A, T,
G or C. This indicates that in our sequence the quality is
not so bad as to put an N.
Distribution of the length of the sequences
(SEQUENCE LENGTH DISTRIBUTION)
• Some high-performance sequencers generate fragments of
sequences of uniform length, but others may contain
different lengths.
• This module generates a graph that shows the size
distribution of the fragments in the file that was analyzed.
• In our case, we see that the sequences are homogeneous,
the 25000 sequences have 38 bases.
Levels of duplicate sequences
(SEQUENCE DUPLICATION LEVELS)
• This module counts the degree of duplication for each sequence in a
library and creates a graph that shows the relative number of sequences
with different degrees of duplication.
• When sequencing, it is necessary that random sequences occur.
• The graph shows the proportion of the library that consists of sequences
in each of the different duplication level containers. There are two lines:
• The blue line shows the total of the sequences
• The red line shows the duplicated sequences
Continue…
• In the case of the complete sequence we can observe 3 peaks:
• > 10: in this case there are more than 10% of sequences that have the same
fragment from the beginning to the end 10 times
• > 100: of the 25,000 readings that I have, 25% of them have the same fragment
from the beginning to the end 100 times
• > 1K: 20% of repeated sequences, that is, they have the same fragment from the
beginning to the end 1000 times
• In the case, genomic DNA should not be observed duplications (red line). However,
they can be generated. In general there are two possible types of duplicates of a
library: duplicates derived from PCR artifacts, or biological duplicates that are
natural collisions where different copies of the same sequence are randomly
selected. However, there is no way to distinguish between these two types and
both will be reported as duplicates here.
• In the RNA-Seq libraries, some sequences are expected to occur very frequently,
and others will be very rare ( transcripts under copy number), so a high level of
duplication in the part of the library is inevitable.
Overrepresented sequences
• In this module, it shows the evaluation of the number of sequences
that come out at the time of mapping (I use a reference
transcriptome and I look for alignment by homology) that can give
me problems when I try to do an assembly.
• You can see if there are dimers in the adapters, because as you
know the adapter that has been placed if that sequence comes out
you know that the adapter has been sequenced forming dimers.
Content of the adapters
(ADAPTER CONTENT)
• An obvious class of sequences that you may want to
analyze are the adapter sequences. It is useful to know if
the library contains a significant number of adapters to be
able to evaluate if you need adjustment adapter or not.
• Therefore, this module makes a specific search for a set of
Kmers defined separately and will give you a view of the
total proportion of your library that contains these Kmers.
Contact now for Scientific writing, synopsis, thesis, assignments, ppt presentations, etc.
hafizraza26@gmail.com
Check the work now https://www.slideshare.net/HafizMuhammadRaza/edit_my_uploads

Weitere ähnliche Inhalte

Was ist angesagt?

Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
avrilcoghlan
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 

Was ist angesagt? (20)

RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Overview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seqOverview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seq
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Exome sequence analysis
Exome sequence analysisExome sequence analysis
Exome sequence analysis
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
PAM matrices evolution
PAM matrices evolutionPAM matrices evolution
PAM matrices evolution
 

Ähnlich wie Quality control of sequencing with fast qc obtained with

DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
DivyanshGupta922023
 

Ähnlich wie Quality control of sequencing with fast qc obtained with (20)

Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
ACF.ppt
ACF.pptACF.ppt
ACF.ppt
 
Finch TV DNA SEQUENCING
Finch TV DNA SEQUENCINGFinch TV DNA SEQUENCING
Finch TV DNA SEQUENCING
 
Qpcr
QpcrQpcr
Qpcr
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
Histogram
HistogramHistogram
Histogram
 

Mehr von Hafiz Muhammad Zeeshan Raza

Mehr von Hafiz Muhammad Zeeshan Raza (16)

Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...
 
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
 
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
 
OMANTEL
OMANTELOMANTEL
OMANTEL
 
Cell organelles
Cell organellesCell organelles
Cell organelles
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Translation & Post Translational Modifications
Translation & Post Translational ModificationsTranslation & Post Translational Modifications
Translation & Post Translational Modifications
 
DNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationDNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional Modification
 
Recombinant DNA technology
Recombinant DNA technologyRecombinant DNA technology
Recombinant DNA technology
 
Restriction Fragment Length Polymorphism (RFLP)
Restriction Fragment Length Polymorphism (RFLP)Restriction Fragment Length Polymorphism (RFLP)
Restriction Fragment Length Polymorphism (RFLP)
 
Mendeley software beginers
Mendeley software beginersMendeley software beginers
Mendeley software beginers
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 

Kürzlich hochgeladen

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Kürzlich hochgeladen (20)

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Quality control of sequencing with fast qc obtained with

  • 1. Quality control of sequencing with FastQC obtained with the Illumina platform Hafiz.M.Zeeshan.Raza Research Associate COMSATS University Islamabad, Pakistan Sahiwal Campus hafizraza26@gmail.com Cell# 0092-36-6155501
  • 2. Basic data (BASIC STATISTICS) In the basic statistical data it is represented: • File name: Sec_Ilumina.fastq.txt • File format: conventional • System used: Illumina • Total analyzed sequences: 25000. This is the number of readings analyzed. • Sequences marked with bad quality: 0. • Length of each reading: 38 bases • % GC: 45% • Note: The program will not tell you the sequences that have bad quality you are the one that will correct them and you will mark them.
  • 3.
  • 4. Quality (Q) of the sequences per base (PER BASE SEQUENCE QUALITY) • The quartiles are represented in yellow, the blue line is the median and in red, the mean of the quality. In the X axis, the bases of the readings are represented and each reading has 38 bases. While in the Y, the qualities 0- 34 are represented, distinguishing three zones: • Green zone : 28-34. They correspond to a very good quality. • Orange zone : intermediate quality zone (20-28). • Red zone : area of ​​poor quality (0-20). • The quality of the 25000 readings is represented from each base.
  • 5.
  • 6. Continue… • From the graphical representation, it can be said that when you see the qualities assigned to the first base, they are all very good, since they are in the green zone. At the base 38 there is a lot of dispersion in the qualities (that is why the quartile is so big), that is, there are good qualities and others very bad. • In conclusion, we can say that until the base 22 the qualities are good, but from this they get worse since of the 25000 readings from the base 23 I have some readings in which that base has bad quality. Therefore, I will have to use a program that will remove all the bases on which, for example, Q <25 (the quality is assigned by us) or that make me the average Q <25. In this way, I will have that of the 25,000 readings each will have a different size, there will be readings that have 38 bases and others that do not.
  • 7. Quality of the sequence by "tile" (PER TILE SEQUENCE QUALITY). • In this case a graphic is shown here (it only appears if an Illumina library is used) that shows the flow cells, where the sequence is placed. This chart allows you to search the quality scores of each piece through all its bases to see if there was a quality loss associated with only part of the flow cell. • If there are marks on the graph, this tells me that I have poor quality since I may not have filtered the reagents, I have not done the vacuum. So that the bubbles stay in the flow cell and when looking at the spectra it interferes me, giving a bad quality. • In our image no fault is shown by us since the background is blue. Which indicates that it has been degassed and filtered.
  • 8.
  • 9. Levels of quality per sequence (PER SEQUENCE QUALITY SCORE) • It gives us an idea in advance of how many readings I am going to remove since it allows to see if a subset of its sequences have values ​​with low quality. • In the graph shown on the left, the average of the quality of the sequences is represented on the X axis. While on the Y axis, the number of sequences or readings corresponding to that average is represented. • In our case, it can be seen that there are more than 3500 readings that present a warm environment of 29-31. Existing much less with low quality.
  • 10.
  • 11. Content of the sequence per base (PER BASE SEQUENCE CONTENT) • In this section we are told the proportion of each of the bases in the sequence. • In a random library, there should be little difference between the bases of a sequence of execution, so the lines in this plot must run parallel to each other. • In our case, we can see that there are differences between some bases and others when the amount of A should be equal to that of T, and that of G = C.
  • 12.
  • 13. Content of guanine and cytosine (GC) per sequence (PER SEQUENCE GC CONTENT) • This module measures the GC content of our entire sequence ( red line ) and compares it with a theoretical normal distribution of GC content ( blue line ). • The average percentage of the content of G and C is shown on the X axis, while the number of readings is shown on the Y axis. • In our case ( red line ), we see that there are several peaks, where there should be a Gaussian curve. This indicates that you have been able to recognize:  Adapter dimers  Contamination with other DNA • If I have sequences with bad quality, in which I do not know what the base is, when sequencing G or C is put where maybe I should not go. • This indicates to me that the sequencing has not been carried out correctly, but after analyzing the file with the qualities, I will be able to correct them and see: • If there was DNA contamination, if it is of good quality I will not remove it • In the case of sequences that are misread, if you read them as G and C when correcting them, they should be removed.
  • 14.
  • 15. Content of N per base (PER BASE N CONTENT) • This section tells us the content of bases that have an N (unassigned base). • In our case, we can see that the content of N is practically nil, that is, that N has not been assigned to the bases that were not known, but has placed an A, T, G or C. This indicates that in our sequence the quality is not so bad as to put an N.
  • 16.
  • 17. Distribution of the length of the sequences (SEQUENCE LENGTH DISTRIBUTION) • Some high-performance sequencers generate fragments of sequences of uniform length, but others may contain different lengths. • This module generates a graph that shows the size distribution of the fragments in the file that was analyzed. • In our case, we see that the sequences are homogeneous, the 25000 sequences have 38 bases.
  • 18.
  • 19. Levels of duplicate sequences (SEQUENCE DUPLICATION LEVELS) • This module counts the degree of duplication for each sequence in a library and creates a graph that shows the relative number of sequences with different degrees of duplication. • When sequencing, it is necessary that random sequences occur. • The graph shows the proportion of the library that consists of sequences in each of the different duplication level containers. There are two lines: • The blue line shows the total of the sequences • The red line shows the duplicated sequences
  • 20. Continue… • In the case of the complete sequence we can observe 3 peaks: • > 10: in this case there are more than 10% of sequences that have the same fragment from the beginning to the end 10 times • > 100: of the 25,000 readings that I have, 25% of them have the same fragment from the beginning to the end 100 times • > 1K: 20% of repeated sequences, that is, they have the same fragment from the beginning to the end 1000 times • In the case, genomic DNA should not be observed duplications (red line). However, they can be generated. In general there are two possible types of duplicates of a library: duplicates derived from PCR artifacts, or biological duplicates that are natural collisions where different copies of the same sequence are randomly selected. However, there is no way to distinguish between these two types and both will be reported as duplicates here. • In the RNA-Seq libraries, some sequences are expected to occur very frequently, and others will be very rare ( transcripts under copy number), so a high level of duplication in the part of the library is inevitable.
  • 21.
  • 22. Overrepresented sequences • In this module, it shows the evaluation of the number of sequences that come out at the time of mapping (I use a reference transcriptome and I look for alignment by homology) that can give me problems when I try to do an assembly. • You can see if there are dimers in the adapters, because as you know the adapter that has been placed if that sequence comes out you know that the adapter has been sequenced forming dimers.
  • 23.
  • 24. Content of the adapters (ADAPTER CONTENT) • An obvious class of sequences that you may want to analyze are the adapter sequences. It is useful to know if the library contains a significant number of adapters to be able to evaluate if you need adjustment adapter or not. • Therefore, this module makes a specific search for a set of Kmers defined separately and will give you a view of the total proportion of your library that contains these Kmers.
  • 25.
  • 26. Contact now for Scientific writing, synopsis, thesis, assignments, ppt presentations, etc. hafizraza26@gmail.com Check the work now https://www.slideshare.net/HafizMuhammadRaza/edit_my_uploads