An self guided tutorial based overview of the UCSC genome browser for accessing public neuroscience data, in particular data from the ENCODE project. Including additional transcriptomic resources for the Neurosciences.
Disentangling the origin of chemical differences using GHOST
The UCSC genome browser: A Neuroscience focused overview
1. The UCSC genome browser: A Neuroscience focused overview
Vicky Perreau, The Florey Bioinformatics Core
Tuesday 17th March 2015
vperreau@unimelb.edu.au
2. Overview
• Browser
– Training
– Configura2on
– Manipula2on
– naviga2on
• Loca2ng
and
loading
Encode
data
• Data
types
3. UCSC
genome
browser
• Purpose
– Lots
of
data
– Customisable
– Detailed
info
pages
– Access
images
(visigene)
– Access
sequence
informa2on-‐FASTA
– Do
sequence
alignments-‐
• BLAT
• Virtual
PCR
4. UCSC
genome
browser
• Structure
– Built
upon
tables
of
data
– Each
table
must
have
genomic
coordinates
• Eg.
list
of
known
genes
– Browser
visualizes
the
data
– Endless
customizable
searches
• Correla2ng
one
type
of
data
with
another
8. Default
view
for
tracks
in
human
hg19
MBP
String
search
or
loca8on
9. Organisa2on
of
genomic
data
(customizable)
• Chromosome
band
• Gap
loca2ons
• Known
genes
• Predicted
genes
• Phenotype
and
disease
• Enhancer/promoter
data
• Microarray
expression
data
• Evolu2onary
conserva2on
• SNPs
and
structural
varia2on
• Repeated
regions
10. Types
of
Data
Reference
sequence
Annota8on
tracks
Gene/protein
informa8on
Comparision
with
other
species
SNPs
11. NGS
data:
raw
data
to
bigwig
files
filename.fastq
=raw
sequence
data,
sequence
and
quality
scores
only.
filename.bam
=aligned
sequence
data,
sequence
data
preserved.
filename.bedgraph
=
posi2on
data
only
for
reads,
no
sequence
data
preserved.
filename.bigwig
=
histogram
of
coverage
for
genomic
posi2on
only,
reads
and
sequence
data
not
preserved.
Small
file
size
allowing
for
ease
of
use
in
genome
browsers
and
overlay
of
mul2ple
bigwig
files.
12. NGS
data:
coverage
plots
for
RNAseq
data
Sebastian Schubert et al. Blood
2014;124:493-502
General
features
of
an
mRNA
transcript
as
visualized
by
RNA-‐seq.
13. Types
of
Data
NGS
data
coverage
plot
(histogram)
is
con8nuous.
SNP
posi8ons
are
discrete
Gene
models:
Line
height
denotes
exon,
intron
or
UTR
Arrows
show
direc8on
of
transcripton
14. Whole
page
overview
Expression (such as microarray)
Variation and Repeats
(including SNPs, copy number variation)
Groups of data (Tracks)
Mapping and Sequencing Tracks
Genes and Gene Prediction Tracks
(including sno/miRNA data)
Phenotype and Disease Tracks
Regulation (including TFBS)
mRNA and EST Tracks
Comparative Genomics
• As a group
• Individual species
18. Data
from
the
gene
detail
page
and
links
out
to
other
resources
informative
description
other resource links
microarray data
mRNA secondary structure
links to sequences
protein domains/structure
orthologs in other species
Gene Ontology™ descriptions
mRNA descriptions
pathways
genetic association
studies
comparative toxicology
gene model
23. ENCODE
project
• In
2003
the
Na2onal
Human
Genome
Research
Ins2tute
embarked
upon:
• The
ENClyopedia
Of
DNA
Elements
(ENCODE)
• Aim
to
delineate
all
of
the
func2onal
elements
in
the
human
genome.
More
recent
data
includes
a
lot
of
mouse
data.
• Goal:
• To
provide
the
scien2fic
community
with
high
quality,
comprehensive
annota2ons
of
candidate
func2onal
elements
in
the
human
genome.
• Func2onal
elements?
• “discrete
region
of
the
genome
that
encodes
a
defined
product
(eg
protein)
or
a
reproducible
biochemical
signature,
such
as
transcrip2on
or
specific
chroma2n
structure”
• Developed
detailed
experiment
guidelines.
•
A
great
resources
if
you
are
considering
designing
your
own
NGS
experiment
(hdps://www.encodeproject.org/about/experiment-‐guidelines/)
24. ENCODE:
data
use
policy
• Early
phase:
• Moratorium
on
public
presenta2on
or
publica2on
of
data
un2l
9
months
aeer
release.
• Now:
• All
data
produced
will
be
available
for
unrestricted
use
immediately
upon
release
to
public
databases,
elimina2ng
the
nine-‐month
moratorium
previously
used
by
ENCODE.
• External
data
users
may
freely
download,
analyze
and
publish
results
based
on
any
ENCODE
data
without
restric8ons
as
soon
as
they
are
released.
• Must
include
appropriate
cita2on.
hdps://www.encodeproject.org/about/data-‐use-‐policy
25. ENCODE:
accessing
data
• 2003-‐2007:
Pilot
phase
examining
1%
of
the
genome
• 2007:
expanded
to
study
en2re
genome
• 2012:
30
high
profile
ar2cles
published
• 2014:
>150
experiments
using
brain
or
spinal
cord
released
• UCSC
was
the
original
Data
Coordina2on
Center
for
ENCODE
and
data
prior
to
2013
is
fully
integrated.
• ENCODE
results
from
2013
and
later
are
available
from
the
ENCODE
Project
Portal.
32. Click
“Visualise
data”
budon
Enter gene name
Note:
Not
all
experiments
have
a
“visualise
data”
budon.
For
some
experiments
you
can
down
load
the
bigwig
file
and
upload
it
into
UCSC
as
a
custom
track.
Data
from
some
experiments
may
require
some
addi2onal
formalng
for
viewing
in
a
genome
browser.
39. MBP
expression
in
7
cell
lines
Select
region
and
add
ver2cal
highlight
40. Transcriptome
data
• Other
tracks
in
the
“expression”
block
of
tracks
supply
data
on
– Poly
A
status
– Subcellular
localisa2on
– Proteogenomics-‐mapping
pep2de
loca2ons
– Start
and
end
points
of
RNA
molecules
in
cells
– Exon
array
and
RNAseq
data
both
available
• Choose
them
all,
but
one
at
a
2me
to
start
with.
It’s
a
lot
of
data!
41. Drill
down
to
mul2ple
layers
• Tracks
with
similar
data
collected
together:
– Super
tracks
• View
meta
data
• Many
customizable
op2ons
– Custom
filtering
thresholds-‐
• level
of
detec2on
• Dependent
on
project
and
technology
– Cell
lines
on
or
off
– Replicates
on
or
off
– Viewing
op2ons
48. Monoallelic
expression
in
mouse
CNS
cell
lines
Li
SM,
Valo
Z,
Wang
J,
Gao
H,
Bowers
CW,
et
al.
(2012)
Transcriptome-‐Wide
Survey
of
Mouse
CNS-‐Derived
Cells
Reveals
Monoallelic
Expression
within
Novel
Gene
Families.
PLoS
ONE
7(2):
e31751.
doi:10.1371/journal.pone.0031751
hdp://127.0.0.1:8081/plosone/ar2cle?id=info:doi/10.1371/journal.pone.0031751
49. Glutamate
Receptor,
Ionotropic,
AMPA
3
Use configure to increase the width of the track
name column to view complete cell line names
50. Monoallelic
expression
preserved
aeer
differen2a2on
into
neurons
an
astrocytes
Li
SM,
Valo
Z,
Wang
J,
Gao
H,
Bowers
CW,
et
al.
(2012)
Transcriptome-‐Wide
Survey
of
Mouse
CNS-‐Derived
Cells
Reveals
Monoallelic
Expression
within
Novel
Gene
Families.
PLoS
ONE
7(2):
e31751.
doi:10.1371/journal.pone.0031751
hdp://127.0.0.1:8081/plosone/ar2cle?id=info:doi/10.1371/journal.pone.0031751
56. Type
gene
of
interest
into
search
bar.
Click here to get RNAseq
expression data.
Find genes with similar
expression profiles across region
and/or developmental age.
First
select
gene
57. RNAseq
data
view:
sorted
by
2ssue
region
Exon location (grey box)
White arrow
denotes sample
Change sort
order from
region to age
Download
58. RNAseq
data
view:
sorted
by
age
Change sort
order from
region to age
Increasing age 8 pcw to 40 years
62. Viewing
BDNF
in
human
brain
RNAseq
data
in
UCSC
Peak
expression
does
not
correspond
with
the
genomic
loca2on
of
a
coding
exon
for
BDNF,
but
rather
to
a
region
of
the
processed
non
coding
an2sense
transcript,
transcribed
off
the
opposite
strand.
63. Inhibi2on
of
BDNF
an2sense
transcript
increased
BDNF
protein
BDNF
an2sense
transcript
level
reduced
BDNF
protein
levels
increased
65. Acknowledgements
If
you
use
a
database
in
your
research
please
acknowledge
it.
• Most
websites
have
a
page
where
they
specify
how
to
acknowledge
them,
usually
by
most
recent
pub.
• Cita8on
or
acknowledgement
is
their
main
means
of
applying
for
con8nued
funding.
If
they
cant
get
funding
one
of
three
things
will
happen:
• They
are
no
longer
free.
• They
are
no
longer
maintained.
• They
no
longer
exist!
Cau8on:
• Check
update/news
page
of
an
unfamiliar
website.
Some
are
s8ll
accessible
but
not
maintained.
Informa8cs
resources
go
out
of
date
quickly
in
this
field.
Look
for
recent
NAR
pub.
• Be
sure
of
your
gene/protein
ID.
Synonyms
can
cause
havoc
when
searching
the
literature
and
databases
(esp
PPI
databases).
If
necessary
check
the
DNA/AA
sequence.