SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
09/05/13 K-INBRE Bioinformatics Core KSU
Bioinformatics
1
Introduction to the field of
bioinformatics
Sept, 2013
Jennifer Shelton
K-INBRE Bioinformatics Core KSU
09/05/13 K-INBRE Bioinformatics Core KSU
Outline
2
I. Basic concepts
i. Definition of bioinformatics
ii. Databases (flat-file and
relational)
iii. Assembly (Overlap-layout-
consensus)
II. Steps you can take on your
own
09/05/13 K-INBRE Bioinformatics Core KSU
Definition of bioinformatics
3
Acquire
data
Store/archive data
Organize data
Analyzedata
Visualizedata
Biological,
Medical,
Behavioral, or
Health
“Bioinformatics: Research,
development, or application of
computational tools and
approaches for expanding the
use of biological, medical,
behavioral or health data,
including those to acquire, store,
organize, archive, analyze, or
visualize such data.”
-NIH Biomedical Information
Science and Technology
Initiative Consortium 2000
09/05/13 K-INBRE Bioinformatics Core KSU
Definition of bioinformatics
4
Acquire
data
Store/archive data
Organize data
Analyzedata
Visualizedata
Biological,
Medical,
Behavioral, or
Health
Acquire
data
Store/archive data
Organize data
Analyzedata
Visualizedata
Biological,
Medical,
Behavioral, or
Health
“Bioinformatics: Research,
development, or application of
computational tools and
approaches for expanding the
use of biological, medical,
behavioral or health data,
including those to acquire, store,
organize, archive, analyze, or
visualize such data.”
-NIH Biomedical Information
Science and Technology
Initiative Consortium 2000
09/05/13 K-INBRE Bioinformatics Core KSU
Problem with volume
5
“We believe the field of
bioinformatics for genetic
analysis will be one of the
biggest areas of disruptive
innovation in life science tools
over the next few years,”
-Isaac Ro, Goldman Sachs
Mark Smiciklas, Flickr.com/photos/intersectionconsulting
Ro, Goldman Sachs
Per year worldwide we can
generate
~13,000,000,000,000,000 bp
of data
09/05/13 K-INBRE Bioinformatics Core KSU
"This unprecedented amount of
sequencing information poses
bottlenecks that vary, depending on
application, at the level of data
extraction, analysis, and
interpretation”
"These challenges have become part
and parcel of the biomedical research
community where investigators have
increasingly needed to incorporate
bioinformatics and biostatistics into
their armamentarium."
Problem with volume
6
Mark Smiciklas, Flickr.com/photos/intersectionconsulting
Opportunities and Challenges Associated with Clinical
Diagnostic Genome Sequencing: A Report of the
Association for Molecular Pathology. The Journal of
Molecular Diagnostics - November 2012
09/05/13 K-INBRE Bioinformatics Core KSU
“It sounds like an analog
solution in a digital age,”-Sifei
He, head of cloud computing
for BGI (referring to FedExing
disks of data because internet
connections are often too slow)
NY Times 2011 article: DNA
Sequencing Caught in a
Deluge of Data http://
www.nytimes.com/
2011/12/01/business/dna-
sequencing-caught-in-
deluge-of-data.html?
pagewanted=all&_r=0
Problem with volume
7
09/05/13 K-INBRE Bioinformatics Core KSU
Examples of bioinformatics tools
8
9/4/13 tumblr_m5sa3oXBAB1rrtrfso1_500.jpg (500×500)
?
? ?
?
?
?
?
?
?
09/05/13 K-INBRE Bioinformatics Core KSU
Outline
9
I. Basic concepts
i. Definition of bioinformatics
ii. Databases (flat-file and
relational)
iii. Assembly (Overlap-layout-
consensus)
II. Steps you can take on your
own
09/05/13 K-INBRE Bioinformatics Core KSU
Flat-file databases
‘records’ about one unique
object
‘fields’ same kind of data
about different object
http://www.ncbi.nlm.nih.gov/
genbank/
10
GenBank:
09/05/13 K-INBRE Bioinformatics Core KSU 11
Flat-file databases
Any flat-file database, like GenBank can be thought of as a single
spreadsheet called a ‘table’ of ‘fields’ and ‘records’
09/05/13 K-INBRE Bioinformatics Core KSU
Relational databases
Have multiple tables
with some shared
fields and some
different
**‘fields’ same kind of
data about different
objects
http://www.genome.jp/kegg/
pathway.html
12
09/05/13 K-INBRE Bioinformatics Core KSU
Relational databases
Relational databases are like multiple tables that are linked with a
shared field. This acts like a “key” between them
13
9/25/12 KEGG PATHWAY: hsa05204
2/10www.genome.jp/dbget-‐‑bin/www_bget?pathway+hsa05204
Organism Homo sapiens (human) [GN:hsa]
Gene 1543 CYP1A1; cytochrome P450, family 1, subfamily A, polypeptide 1
(EC:1.14.14.1) [KO:K07408] [EC:1.14.14.1]
1576 CYP3A4; cytochrome P450, family 3, subfamily A, polypeptide 4
(EC:1.14.13.67 1.14.13.97 1.14.13.32) [KO:K07424]
[EC:1.14.14.1]
1577 CYP3A5; cytochrome P450, family 3, subfamily A, polypeptide 5
(EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1]
1551 CYP3A7; cytochrome P450, family 3, subfamily A, polypeptide 7
(EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1]
64816 CYP3A43; cytochrome P450, family 3, subfamily A, polypeptide
43 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1]
5743 PTGS2; prostaglandin-endoperoxide synthase 2 (prostaglandin
G/H synthase and cyclooxygenase) (EC:1.14.99.1) [KO:K11987]
[EC:1.14.99.1]
10 NAT2; N-acetyltransferase 2 (arylamine N-acetyltransferase)
(EC:2.3.1.5) [KO:K00622] [EC:2.3.1.5]
9 NAT1; N-acetyltransferase 1 (arylamine N-acetyltransferase)
(EC:2.3.1.5) [KO:K00622] [EC:2.3.1.5]
1544 CYP1A2; cytochrome P450, family 1, subfamily A, polypeptide 2
(EC:1.14.14.1) [KO:K07409] [EC:1.14.14.1]
6799 SULT1A2; sulfotransferase family, cytosolic, 1A, phenol-
preferring, member 2 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1]
6817 SULT1A1; sulfotransferase family, cytosolic, 1A, phenol-
preferring, member 1 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1]
6818 SULT1A3; sulfotransferase family, cytosolic, 1A, phenol-
preferring, member 3 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1]
445329 SULT1A4; sulfotransferase family, cytosolic, 1A, phenol-
preferring, member 4 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1]
1545 CYP1B1; cytochrome P450, family 1, subfamily B, polypeptide 1
(EC:1.14.14.1) [KO:K07410] [EC:1.14.14.1]
1558 CYP2C8; cytochrome P450, family 2, subfamily C, polypeptide 8
(EC:1.14.14.1) [KO:K07413] [EC:1.14.14.1]
1562 CYP2C18; cytochrome P450, family 2, subfamily C, polypeptide
18 (EC:1.14.14.1) [KO:K07413] [EC:1.14.14.1]
1557 CYP2C19; cytochrome P450, family 2, subfamily C, polypeptide
19 (EC:1.14.13.48 1.14.13.49 1.14.13.80) [KO:K07413]
[EC:1.14.14.1]
1559 CYP2C9; cytochrome P450, family 2, subfamily C, polypeptide 9
(EC:1.14.13.48 1.14.13.49 1.14.13.80) [KO:K07413]
[EC:1.14.14.1]
2052 EPHX1; epoxide hydrolase 1, microsomal (xenobiotic)
09/05/13 K-INBRE Bioinformatics Core KSU
Outline
14
I. Basic concepts
i. Definition of bioinformatics
ii. Databases (flat-file and
relational)
iii. Assembly (Overlap-layout-
consensus)
II. Steps you can take on your
own
09/05/13 K-INBRE Bioinformatics Core KSU
Assembly
15
Of the ~13,000,000,000,000,000bp of sequence data we can generate
each year, most is not the full length of the molecule of DNA or
RNA.
Instead, scientists get back multiple copies of their genome (or
transcriptome) but all in short segments (between 50bp and several
kbs)
Steps of Overlap-Layout-
Consensus (OLC):
1) Lets’ think of a genome like the
text of a book. We get back multiple
copies of the book
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
16
1) Instead of being nicely bound, we get randomly shredded text all
mixed together from our multiple copies
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
17
2) We look for lines that overlap for more than some minimum number
of letters (in these programs all overlaps are found, then a single “path”
is found through this “graph” of overlaps)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
18
2) We look for lines that overlap for more than some minimum number of
letters (in these programs overlaps are found, then a single “path” is found
through this “graph” of overlaps)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
19
3) We move column by column counting the letters in a column a make
a note of the most common letter (take the consensus)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
20
3) We move column by column counting the letters in a column a make
a note of the most common letter (take the consensus)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
21
3) We move column by column counting the letters in a column a make
a note of the most common letter (take the consensus)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
22
3) We move column by column counting the letters in a column a make
a note of the most common letter (take the consensus)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
09/05/13 K-INBRE Bioinformatics Core KSU
OLC Assembly
23
3) We move column by column counting the letters in a column a make
a note of the most common letter (take the consensus)
ice was beginning to get very tired of
sitting by her tister on the bank, and of
having nothing to do
Alice was
beginning to get vory tired of sitting by her sister on
the bank, and of having nothing to do: once
lice was beginning to get
very tired of sitting by her sister on the bank, and
of having nothing
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
09/05/13 K-INBRE Bioinformatics Core KSU
0"
10"
20"
30"
40"
50"
60"
400! 500! 600! 700! 800!
Sand"bluestem"
(removed)"
Sand"bluestem"
(intact)"
0!
10!
20!
30!
40!
50!
60!
400! 500! 600! 700! 800!
Big$bluestem$
(removed)$
Big$bluestem$(intact)$
RelativereflectanceofEWC
Wavelength (nm)
Big bluestem Sand bluestem
Bischof B.
Bittersweet Balsam
Assemblies
homenursery.com gardeninginsomnia.com
24
60
145
230
315
400
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
MIRA(454)
MIRAcluster
0
75
150
225
300
375
450
525
600
Sand bluestem assembly length and number of contigs
Cumulativelengthofsequences(Mb)
Assembly k-mer value or name
Numberofsequences(k)
Cumulative length of sequences (Mb)
Number of sequences x 10^5
0.4
1.6
2.7
3.9
5.0
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
MIRA(454)
MIRAcluster
Sand bluestem N values
Contiglength(kb)
Assembly k-mer value or name
N75 (kb) N50 (kb)
N25 (kb)
k-mer N75 (kb) N50 (kb) N25 (kb) Cumulative
length of
sequences
(Mb)
Number of
sequences x
105
k-mer N75 (kb) N50 (kb)
27
37
47
57
merge
CDH cluster
MIRA cluster
1.219 2.028 3.126 142.633358 1.28113 27 1.219 2.0
1.206 2.008 3.087 128.100083 1.1091 37 1.206 2.0
1.195 1.977 3.051 113.176134 0.93839 47 1.195 1.9
1.271 2.035 3.096 102.507455 0.82755 57 1.271 2.0
1.41 2.211 3.331 345.752982 2.31102 merge 1.41 2.2
1.44 2.27 3.422 84.202533 0.59174 CDH cluster 1440 2270
1.804 2.69 3.941 105.920843 0.50279 MIRA cluster 1804 2690
1.1
1.7
2.3
2.8
3.4
4.0
27
37
47
57
merge
CDHcluster
MIRAcluster
Balsam N values
Contiglength(kb)
Assembly k-mer value or name
N75 (kb) N50 (kb)
N25 (kb)
80
185
290
395
500
27
37
47
57
merge
CDHcluster
MIRAcluster
0
0.75
1.5
2.25
3
Balsam assembly length and number of contigs
Cumulativelengthofsequences(Mb)
Assembly k-mer value or name
Numberofsequencesx10^5
Cumulative length of sequences (Mb)
Number of sequences x 10^5
k-mer N75 (kb) N50 (kb) N25 (kb) Cumulative
length of
sequences
(Mb)
Number of
sequences x
105
27
37
47
57
merge
CDH cluster
MIRA cluster
1.213 2.11 3.221 175.505163 1.61952
1.176 2.026 3.068 154.222168 1.36947
1.168 1.948 2.932 129.331497 1.07545
1.218 1.974 2.95 111.672465 0.90385
1.404 2.23 3.299 418.762352 2.77833
1.399 2.274 3.339 96.411479 0.70852 CDH cluster 1399 2274 3339 96411479 70852
1.825 2.676 3.856 123.666263 0.59598 MIRA cluster 1825 2676 3856 123666263 59598
100
200
300
400
500
27
37
47
57
merge
CDHcluster
MIRAcluster
0
0.75
1.5
2.25
3
Bittersweet assembly length and number of contigs
Cumulativelengthofsequences(Mb)
Assembly k-mer value or name
Numberofsequencesx10^5
Cumulative length of sequences (Mb)
Number of sequences x 10^5
1.1
1.8
2.6
3.3
4.0
27
37
47
57
merge
CDHcluster
MIRAcluster
Bittersweet N values
Contiglength(kb)
Assembly k-mer value or name
N75 (kb) N50 (kb)
N25 (kb)
Red flour beetle
Day E.
09/05/13 K-INBRE Bioinformatics Core KSU
Outline
25
I. Basic concepts
i. Definition of bioinformatics
ii. Databases (flat-file and
relational)
iii. Assembly (Overlap-layout-
consensus)
II. Steps you can take on your
own
09/05/13 K-INBRE Bioinformatics Core KSU
What can you do to get prepared?
26
-Manoj Samanta http://www.homolog.us/blogs/2011/07/22/a-beginners-
guide-to-bioinformatics-part-i/
•Layer 1 – Using web to analyze biological data
•Layer 2 – Ability to install and run new programs
•Layer 3 – Writing own scripts for analysis in PERL,
python or R
•Layer 4 – High level coding in C/C++/Java for
implementing existing algorithms or
modifying existing codes for new functionality
•Layer 5 – Thinking mathematically, developing own
algorithms and implementing in C/C++/
Java
If you are interested in studying bioinformatics here is an outline of
increasingly complex levels of skills you might work towards
09/05/13 K-INBRE Bioinformatics Core KSU
K-INBRE resources
27
Over the fall semester the Bioinformatics Core and Virginia Rider
from Pittsburg State University will be hosting an undergraduate
bioinformatics club.
Our first topic will be command-line blast. Students will get an
account on Beocat (Kansas’ largest compute cluster).
http://bioinformaticsk-state-undergrad.blogspot.com
09/05/13 K-INBRE Bioinformatics Core KSU
K-INBRE resources
28
K-INBRE hosts a journal club, Wednesday at noon, via PolyCom
to discuss current bioinformatics tools.
http://bioinformaticsk-state.blogspot.com/
09/05/13 K-INBRE Bioinformatics Core KSU
K-INBRE resources
29
Bradley Olson and K-INBRE – Perl
Justin Blumenstiel et al. – Python
http://bioinformaticskstateperl.blogspot.com/
09/05/13 K-INBRE Bioinformatics Core KSU
K-INBRE resources
30
K-INBRE and i5K have begun a Github script sharing
organization to archive and share scripts.
https://github.com/i5K-KINBRE-script-share
i5K-KINBRE-
script-share
RNA-Seq
annotation and
comparison
genome
annotation and
comparison
genome and
transcriptome
assembly
read cleaning
and format
conversion
KSU
bioinfo
lab
Olson
lab
read
me
KSU
bioinfo
lab
Olson
lab
read
me
read
me
KSU
bioinfo
lab
Olson
lab
read
me
GitHub organization
Category of ‘omics’ tool
Lab or research group
List and description of
scripts
09/05/13 K-INBRE Bioinformatics Core KSU
K-INBRE resources
31
-Git has very well developed version control built-in http://git-
scm.com/video/what-is-version-control
-Easy to search
-More advantages are reviewed in this quick introduction http://
git-scm.com/video/quick-wins
-Provides continuity within labs (as students and post docs
rotate out)
- Increases collaboration and sharing of workflows between our
community
- It is also a good way to distribute the code you describe in a
publication.
- Git is also widely used by beginners as well as developers of
technology and software in the omics community. Including:
https://github.com/broadinstitute (The Broad Institute)
https://github.com/lh3 (Li H. developer of BWA etc)
https://github.com/dzerbino (Daniel Zerbino developer of oases
and velvet)
https://github.com/PacificBiosciences
09/05/13 K-INBRE Bioinformatics Core KSU
Questions?
32
9/4/13 tumblr_mp3qolvEiS1rr34bqo1_500.jpg (497×628)
Contact information:
sheltonj@ksu.edu
K-INBRE Bioinformatics
Core:
http://www.kumc.edu/kinbre/
bioinformatics.html
http://bioinformatics.k-
state.edu/

Weitere ähnliche Inhalte

Andere mochten auch

2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 

Andere mochten auch (14)

2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge2016 bioinformatics i_bio_python_wimvancriekinge
2016 bioinformatics i_bio_python_wimvancriekinge
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge
 
2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Mehr von Jennifer Shelton

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussionJennifer Shelton
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Jennifer Shelton
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation DetectionJennifer Shelton
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Jennifer Shelton
 
Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Jennifer Shelton
 
Applied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqApplied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqJennifer Shelton
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014Jennifer Shelton
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSJennifer Shelton
 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Jennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Jennifer Shelton
 
Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2Jennifer Shelton
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 

Mehr von Jennifer Shelton (17)

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
 
Bng presentation draft
Bng presentation draftBng presentation draft
Bng presentation draft
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
 
Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...
 
Hub gene selection_ds
Hub gene selection_dsHub gene selection_ds
Hub gene selection_ds
 
Applied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqApplied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-Seq
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
 
Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 

Kürzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Kürzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Intro to field_of_bioinformatics

  • 1. 09/05/13 K-INBRE Bioinformatics Core KSU Bioinformatics 1 Introduction to the field of bioinformatics Sept, 2013 Jennifer Shelton K-INBRE Bioinformatics Core KSU
  • 2. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 2 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 3. 09/05/13 K-INBRE Bioinformatics Core KSU Definition of bioinformatics 3 Acquire data Store/archive data Organize data Analyzedata Visualizedata Biological, Medical, Behavioral, or Health “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.” -NIH Biomedical Information Science and Technology Initiative Consortium 2000
  • 4. 09/05/13 K-INBRE Bioinformatics Core KSU Definition of bioinformatics 4 Acquire data Store/archive data Organize data Analyzedata Visualizedata Biological, Medical, Behavioral, or Health Acquire data Store/archive data Organize data Analyzedata Visualizedata Biological, Medical, Behavioral, or Health “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.” -NIH Biomedical Information Science and Technology Initiative Consortium 2000
  • 5. 09/05/13 K-INBRE Bioinformatics Core KSU Problem with volume 5 “We believe the field of bioinformatics for genetic analysis will be one of the biggest areas of disruptive innovation in life science tools over the next few years,” -Isaac Ro, Goldman Sachs Mark Smiciklas, Flickr.com/photos/intersectionconsulting Ro, Goldman Sachs Per year worldwide we can generate ~13,000,000,000,000,000 bp of data
  • 6. 09/05/13 K-INBRE Bioinformatics Core KSU "This unprecedented amount of sequencing information poses bottlenecks that vary, depending on application, at the level of data extraction, analysis, and interpretation” "These challenges have become part and parcel of the biomedical research community where investigators have increasingly needed to incorporate bioinformatics and biostatistics into their armamentarium." Problem with volume 6 Mark Smiciklas, Flickr.com/photos/intersectionconsulting Opportunities and Challenges Associated with Clinical Diagnostic Genome Sequencing: A Report of the Association for Molecular Pathology. The Journal of Molecular Diagnostics - November 2012
  • 7. 09/05/13 K-INBRE Bioinformatics Core KSU “It sounds like an analog solution in a digital age,”-Sifei He, head of cloud computing for BGI (referring to FedExing disks of data because internet connections are often too slow) NY Times 2011 article: DNA Sequencing Caught in a Deluge of Data http:// www.nytimes.com/ 2011/12/01/business/dna- sequencing-caught-in- deluge-of-data.html? pagewanted=all&_r=0 Problem with volume 7
  • 8. 09/05/13 K-INBRE Bioinformatics Core KSU Examples of bioinformatics tools 8 9/4/13 tumblr_m5sa3oXBAB1rrtrfso1_500.jpg (500×500) ? ? ? ? ? ? ? ? ?
  • 9. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 9 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 10. 09/05/13 K-INBRE Bioinformatics Core KSU Flat-file databases ‘records’ about one unique object ‘fields’ same kind of data about different object http://www.ncbi.nlm.nih.gov/ genbank/ 10 GenBank:
  • 11. 09/05/13 K-INBRE Bioinformatics Core KSU 11 Flat-file databases Any flat-file database, like GenBank can be thought of as a single spreadsheet called a ‘table’ of ‘fields’ and ‘records’
  • 12. 09/05/13 K-INBRE Bioinformatics Core KSU Relational databases Have multiple tables with some shared fields and some different **‘fields’ same kind of data about different objects http://www.genome.jp/kegg/ pathway.html 12
  • 13. 09/05/13 K-INBRE Bioinformatics Core KSU Relational databases Relational databases are like multiple tables that are linked with a shared field. This acts like a “key” between them 13 9/25/12 KEGG PATHWAY: hsa05204 2/10www.genome.jp/dbget-‐‑bin/www_bget?pathway+hsa05204 Organism Homo sapiens (human) [GN:hsa] Gene 1543 CYP1A1; cytochrome P450, family 1, subfamily A, polypeptide 1 (EC:1.14.14.1) [KO:K07408] [EC:1.14.14.1] 1576 CYP3A4; cytochrome P450, family 3, subfamily A, polypeptide 4 (EC:1.14.13.67 1.14.13.97 1.14.13.32) [KO:K07424] [EC:1.14.14.1] 1577 CYP3A5; cytochrome P450, family 3, subfamily A, polypeptide 5 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1] 1551 CYP3A7; cytochrome P450, family 3, subfamily A, polypeptide 7 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1] 64816 CYP3A43; cytochrome P450, family 3, subfamily A, polypeptide 43 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1] 5743 PTGS2; prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase) (EC:1.14.99.1) [KO:K11987] [EC:1.14.99.1] 10 NAT2; N-acetyltransferase 2 (arylamine N-acetyltransferase) (EC:2.3.1.5) [KO:K00622] [EC:2.3.1.5] 9 NAT1; N-acetyltransferase 1 (arylamine N-acetyltransferase) (EC:2.3.1.5) [KO:K00622] [EC:2.3.1.5] 1544 CYP1A2; cytochrome P450, family 1, subfamily A, polypeptide 2 (EC:1.14.14.1) [KO:K07409] [EC:1.14.14.1] 6799 SULT1A2; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 2 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 6817 SULT1A1; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 1 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 6818 SULT1A3; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 3 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 445329 SULT1A4; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 4 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 1545 CYP1B1; cytochrome P450, family 1, subfamily B, polypeptide 1 (EC:1.14.14.1) [KO:K07410] [EC:1.14.14.1] 1558 CYP2C8; cytochrome P450, family 2, subfamily C, polypeptide 8 (EC:1.14.14.1) [KO:K07413] [EC:1.14.14.1] 1562 CYP2C18; cytochrome P450, family 2, subfamily C, polypeptide 18 (EC:1.14.14.1) [KO:K07413] [EC:1.14.14.1] 1557 CYP2C19; cytochrome P450, family 2, subfamily C, polypeptide 19 (EC:1.14.13.48 1.14.13.49 1.14.13.80) [KO:K07413] [EC:1.14.14.1] 1559 CYP2C9; cytochrome P450, family 2, subfamily C, polypeptide 9 (EC:1.14.13.48 1.14.13.49 1.14.13.80) [KO:K07413] [EC:1.14.14.1] 2052 EPHX1; epoxide hydrolase 1, microsomal (xenobiotic)
  • 14. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 14 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 15. 09/05/13 K-INBRE Bioinformatics Core KSU Assembly 15 Of the ~13,000,000,000,000,000bp of sequence data we can generate each year, most is not the full length of the molecule of DNA or RNA. Instead, scientists get back multiple copies of their genome (or transcriptome) but all in short segments (between 50bp and several kbs) Steps of Overlap-Layout- Consensus (OLC): 1) Lets’ think of a genome like the text of a book. We get back multiple copies of the book
  • 16. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 16 1) Instead of being nicely bound, we get randomly shredded text all mixed together from our multiple copies ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing
  • 17. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 17 2) We look for lines that overlap for more than some minimum number of letters (in these programs all overlaps are found, then a single “path” is found through this “graph” of overlaps) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing
  • 18. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 18 2) We look for lines that overlap for more than some minimum number of letters (in these programs overlaps are found, then a single “path” is found through this “graph” of overlaps) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing
  • 19. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 19 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 20. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 20 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 21. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 21 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 22. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 22 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 23. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 23 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 24. 09/05/13 K-INBRE Bioinformatics Core KSU 0" 10" 20" 30" 40" 50" 60" 400! 500! 600! 700! 800! Sand"bluestem" (removed)" Sand"bluestem" (intact)" 0! 10! 20! 30! 40! 50! 60! 400! 500! 600! 700! 800! Big$bluestem$ (removed)$ Big$bluestem$(intact)$ RelativereflectanceofEWC Wavelength (nm) Big bluestem Sand bluestem Bischof B. Bittersweet Balsam Assemblies homenursery.com gardeninginsomnia.com 24 60 145 230 315 400 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 MIRA(454) MIRAcluster 0 75 150 225 300 375 450 525 600 Sand bluestem assembly length and number of contigs Cumulativelengthofsequences(Mb) Assembly k-mer value or name Numberofsequences(k) Cumulative length of sequences (Mb) Number of sequences x 10^5 0.4 1.6 2.7 3.9 5.0 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 MIRA(454) MIRAcluster Sand bluestem N values Contiglength(kb) Assembly k-mer value or name N75 (kb) N50 (kb) N25 (kb) k-mer N75 (kb) N50 (kb) N25 (kb) Cumulative length of sequences (Mb) Number of sequences x 105 k-mer N75 (kb) N50 (kb) 27 37 47 57 merge CDH cluster MIRA cluster 1.219 2.028 3.126 142.633358 1.28113 27 1.219 2.0 1.206 2.008 3.087 128.100083 1.1091 37 1.206 2.0 1.195 1.977 3.051 113.176134 0.93839 47 1.195 1.9 1.271 2.035 3.096 102.507455 0.82755 57 1.271 2.0 1.41 2.211 3.331 345.752982 2.31102 merge 1.41 2.2 1.44 2.27 3.422 84.202533 0.59174 CDH cluster 1440 2270 1.804 2.69 3.941 105.920843 0.50279 MIRA cluster 1804 2690 1.1 1.7 2.3 2.8 3.4 4.0 27 37 47 57 merge CDHcluster MIRAcluster Balsam N values Contiglength(kb) Assembly k-mer value or name N75 (kb) N50 (kb) N25 (kb) 80 185 290 395 500 27 37 47 57 merge CDHcluster MIRAcluster 0 0.75 1.5 2.25 3 Balsam assembly length and number of contigs Cumulativelengthofsequences(Mb) Assembly k-mer value or name Numberofsequencesx10^5 Cumulative length of sequences (Mb) Number of sequences x 10^5 k-mer N75 (kb) N50 (kb) N25 (kb) Cumulative length of sequences (Mb) Number of sequences x 105 27 37 47 57 merge CDH cluster MIRA cluster 1.213 2.11 3.221 175.505163 1.61952 1.176 2.026 3.068 154.222168 1.36947 1.168 1.948 2.932 129.331497 1.07545 1.218 1.974 2.95 111.672465 0.90385 1.404 2.23 3.299 418.762352 2.77833 1.399 2.274 3.339 96.411479 0.70852 CDH cluster 1399 2274 3339 96411479 70852 1.825 2.676 3.856 123.666263 0.59598 MIRA cluster 1825 2676 3856 123666263 59598 100 200 300 400 500 27 37 47 57 merge CDHcluster MIRAcluster 0 0.75 1.5 2.25 3 Bittersweet assembly length and number of contigs Cumulativelengthofsequences(Mb) Assembly k-mer value or name Numberofsequencesx10^5 Cumulative length of sequences (Mb) Number of sequences x 10^5 1.1 1.8 2.6 3.3 4.0 27 37 47 57 merge CDHcluster MIRAcluster Bittersweet N values Contiglength(kb) Assembly k-mer value or name N75 (kb) N50 (kb) N25 (kb) Red flour beetle Day E.
  • 25. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 25 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 26. 09/05/13 K-INBRE Bioinformatics Core KSU What can you do to get prepared? 26 -Manoj Samanta http://www.homolog.us/blogs/2011/07/22/a-beginners- guide-to-bioinformatics-part-i/ •Layer 1 – Using web to analyze biological data •Layer 2 – Ability to install and run new programs •Layer 3 – Writing own scripts for analysis in PERL, python or R •Layer 4 – High level coding in C/C++/Java for implementing existing algorithms or modifying existing codes for new functionality •Layer 5 – Thinking mathematically, developing own algorithms and implementing in C/C++/ Java If you are interested in studying bioinformatics here is an outline of increasingly complex levels of skills you might work towards
  • 27. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 27 Over the fall semester the Bioinformatics Core and Virginia Rider from Pittsburg State University will be hosting an undergraduate bioinformatics club. Our first topic will be command-line blast. Students will get an account on Beocat (Kansas’ largest compute cluster). http://bioinformaticsk-state-undergrad.blogspot.com
  • 28. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 28 K-INBRE hosts a journal club, Wednesday at noon, via PolyCom to discuss current bioinformatics tools. http://bioinformaticsk-state.blogspot.com/
  • 29. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 29 Bradley Olson and K-INBRE – Perl Justin Blumenstiel et al. – Python http://bioinformaticskstateperl.blogspot.com/
  • 30. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 30 K-INBRE and i5K have begun a Github script sharing organization to archive and share scripts. https://github.com/i5K-KINBRE-script-share i5K-KINBRE- script-share RNA-Seq annotation and comparison genome annotation and comparison genome and transcriptome assembly read cleaning and format conversion KSU bioinfo lab Olson lab read me KSU bioinfo lab Olson lab read me read me KSU bioinfo lab Olson lab read me GitHub organization Category of ‘omics’ tool Lab or research group List and description of scripts
  • 31. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 31 -Git has very well developed version control built-in http://git- scm.com/video/what-is-version-control -Easy to search -More advantages are reviewed in this quick introduction http:// git-scm.com/video/quick-wins -Provides continuity within labs (as students and post docs rotate out) - Increases collaboration and sharing of workflows between our community - It is also a good way to distribute the code you describe in a publication. - Git is also widely used by beginners as well as developers of technology and software in the omics community. Including: https://github.com/broadinstitute (The Broad Institute) https://github.com/lh3 (Li H. developer of BWA etc) https://github.com/dzerbino (Daniel Zerbino developer of oases and velvet) https://github.com/PacificBiosciences
  • 32. 09/05/13 K-INBRE Bioinformatics Core KSU Questions? 32 9/4/13 tumblr_mp3qolvEiS1rr34bqo1_500.jpg (497×628) Contact information: sheltonj@ksu.edu K-INBRE Bioinformatics Core: http://www.kumc.edu/kinbre/ bioinformatics.html http://bioinformatics.k- state.edu/