3. About Me
Graduate school student at the University of Tokyo.
About 2-year experience of Julia programming.
Contributing to Julia and its ecosystem:
https://github.com/docopt/DocOpt.jl
https://github.com/bicycle1885/IntArrays.jl
https://github.com/BioJulia/IndexableBitVectors.jl
https://github.com/BioJulia/WaveletMatrices.jl
https://github.com/BioJulia/FMIndexes.jl
https://github.com/isagalaev/highlight.js (Julia support)
etc.
Core developer of BioJulia - https://github.com/BioJulia/Bio.jl
Julia Summer of Code 2015 Student -
http://julialang.org/blog/2015/10/biojulia-sequence-analysis/
3 / 72
9. Simple
Syntax with least astonishment
no semicolons
no variable declarations
no argument types
Unicode support
1-based index
blocks end with end
No implicit type conversion
Quick sort with 24 lines
quicksort(xs)=quicksort!(copy(xs))
quicksort!(xs)=quicksort!(xs,1,endof(xs))
functionquicksort!(xs,lo,hi)
iflo<hi
p=partition(xs,lo,hi)
quicksort!(xs,lo,p-1)
quicksort!(xs,p+1,hi)
end
returnxs
end
functionpartition(xs,lo,hi)
pivot=div(lo+hi,2)
pvalue=xs[pivot]
xs[pivot],xs[hi]=xs[hi],xs[pivot]
j=lo
@inboundsforiinlo:hi-1
ifxs[i]≤pvalue
xs[i],xs[j]=xs[j],xs[i]
j+=1
end
end
xs[j],xs[hi]=xs[hi],xs[j]
returnj
end
9 / 72
11. Fast
The LLVM-backed JIT compiler emits machine code at runtime.
julia>4>>1 #bitwiseright-shiftfunction
2
julia>@code_native4>>1
.section __TEXT,__text,regular,pure_instructions
Filename:int.jl
Sourceline:115
pushq %rbp
movq %rsp,%rbp
movl $63,%ecx
cmpq $63,%rsi
Sourceline:115
cmovbeq%rsi,%rcx
sarq %cl,%rdi
movq %rdi,%rax
popq %rbp
ret
11 / 72
12. Dynamic
No need to precompile your program.
hello.jl:
println("hello,world")
Output:
$juliahello.jl
hello,world
In REPL:
julia>include("hello.jl")
hello,world
12 / 72
14. Who Created?
Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman
Soon the team was building their dream language.
MIT, where Bezanson is a graduate student, became
an anchor for the project, with much of the work
being done within computer scientist and
mathematician Alan Edelman’s research group. But
development of the language remained completely
distributed. “Jeff and I didn’t actually meet until we’d
been working on it for over a year, and Viral was in
India the entire time,” Karpinski says. “So the whole
language was designed over email.”
— "Out in the Open: Man Creates One Programming Language to Rule Them All"
http://www.wired.com/2014/02/julia/
“
14 / 72
16. Why Created?
The creators wanted a language that satisfies:
the speed of C
with the dynamism of Ruby
macros like Lisp
mathematical notations like Matlab
as usable for general programming as Python
as easy for statistics as R
as natural for string processing as Perl
as powerful for linear algebra as Matlab
as good at gluing programs together as the shell
16 / 72
17. Batteries Included
You can start technical computing without installing lots of libraries.
Numeric types
{8, 16, 32, 64, 128}-bit {signed, unsigned} integers,
16, 32, 64-bit floating point numbers,
and arbitrary-precision numbers.
Numerical linear algebra
matrix multiplication, matrix decomposition/factorization, solver for
system of linear equations, and more!
sparse matrices
Random number generator
Mersenne-Twister method accelerated by SIMD
17 / 72
18. Batteries Included
You can start technical computing without installing lots of libraries.
Unicode support
Perl-compatible regular expressions (PCRE)
Parallel computing
Dates and times
Unit tests
Profiler
Package manager
18 / 72
21. Functions
All function definitions below are equivalent:
functionfunc(x,y)
returnx+y
end
functionfunc(x,y)
x+y
end
func(x,y)=returnx+y
func(x,y)=x+y
Force inlining:
@inlinefunc(x,y)=x+y
This simple function will be automatically inlined by the compiler.❏
21 / 72
23. Functions Return Values
You can return multiple values from a function as a tuple:
functiondivrem64(n)
returnn>>6,n&0b111111
end
And you can receive returned values with multiple assignments:
julia>divrem64(1025)
(16,1)
julia>d,r=divrem64(1025)
(16,1)
julia>d
16
julia>r
1
23 / 72
24. Functions Document
A document string can be attached to a function definition:
"""
Thisfunctioncomputesquotientandremainder
dividedby64foranon-negativeinteger.
"""
functiondivrem64(n)
returnn>>6,n&0b111111
end
In REPL, you can read the attached document with the ?command:
help?>divrem64
search:divrem64divrem
Thisfunctioncomputesquotientandremainder
dividedby64foranon-negativeinteger.
24 / 72
25. Types
Two kinds of types:
concrete types: instantiatable
abstract types: not instantiatable
25 / 72
27. Parametric Types
Types can take type parameters:
typePoint{T}
x::T
y::T
end
Point: abstract type
Point{Int64}: concrete type
subtype of Point(Point{Int64}<:Point)
all of the members (i.e. xand y) are Int64s
typeNucleotideSequence{T<:Nucleotide}<:Sequence
data::Vector{UInt64}
...
end
27 / 72
28. Constructors
Julia automatically generates default constructors.
Point(1,2)creates an object of Point{Int}type.
Point(1.0,2.0)creates an object of Point{Float64}type.
Point{Float64}(1,2)creates an object of Point{Float64}type.
Users can create custom constructors.
typePoint{T}
x::T
y::T
end
#outerconstructor
functionPoint(x)
returnPoint(x,x)
end
p=Point(1) #>Point{Int64}(1,1)
28 / 72
29. Memory Layout
Compact memory layout like C's structs
C compatible memory layout
You can pass Julia objects to C functions without copy.
This is especially important in bioinformatics
when defining data structures for efficient algorithms
when handling lots of small objects
julia>@enumStrandforwardreversebothunknown
julia>immutableExon
chrom::Int
start::Int
stop::Int
strand::Strand
end
julia>sizeof(Exon(1,12345,12446,forward))
32
29 / 72
30. Multiple Dispatch
Combination of all argument types determines a called method.
Single dispatch (e.g. Python)
The first argument is special and
determines a method.
Multiple dispatch (e.g. Julia)
All arguments are equally
responsible to determine a
method.
classSerializer:
defwrite(self,val):
ifisinstance(val,int)
#...
elifisinstance(val,float)
#...
#...
functionwrite(dst::Serializer,
val::Int64)
#...
end
functionwrite(dst::Serializer,
val::Float64)
#...
end
#...
30 / 72
33. Metaprogramming
Julia can represent its own program code as a data structure (Expr).
Three metaprogramming components in Julia:
Macros
generate an expression from expressions.
Expr↦ Expr
Generated functions
generate an expression from types.
Types↦ Expr
Non-standard string literals
generate an expression from a string.
String↦ Expr
33 / 72
34. Metaprogramming Macros
Generate an expression from expressions.
Expr↦ Expr
Denoted as @<macroname>.
Distinguishable from function calls
We've already seen some macros.
macroassert(ex)
msg=string(ex)
:($ex?nothing:throw(AssertionError($msg)))
end
julia>x=-1
-1
julia>@assertx>1
ERROR:AssertionError:x>1
34 / 72
35. Metaprogramming Useful Macros (1)
@show: print variables, useful for debug:
julia>x=-1
-1
julia>@showx
x=-1
@inbounds: omit to check bounds:
@inboundsh[i,j]=h[i-1,j-1]+submat[a[i],b[j]]
@which: return which function will be called:
julia>@whichmax(1,2)
max{T<:Real}(x::T<:Real,y::T<:Real)atpromotion.jl:239
35 / 72
36. Metaprogramming Useful Macros (2)
@time: measure elapsed time to evaluate the expression:
julia>xs=rand(1_000_000);
julia>@timesum(xs)
0.022633seconds(27.24kallocations:1.155MB)
499795.2805424741
julia>@timesum(xs)
0.000574seconds(5allocations:176bytes)
499795.2805424741
@profile: profile the expression:
julia>sort(xs);@profilesort(xs);
julia>Profile.print()
69REPL.jl;anonymous;line:92
68REPL.jl;eval_user_input;line:62
...
36 / 72
37. Generated Functions
Generate a specialized program code for argument types.
Type(s)↦ Expr
Same as function call.
indistinguishable syntax from a calling site
@generatedfunction_sub2ind{N,M}(dims::NTuple{N,Integer},
subs::NTuple{M,Integer})
meta=Expr(:meta,:inline)
ex=:(subs[$M]-1)
fori=M-1:-1:1
ifi>N
ex=:(subs[$i]-1+$ex)
else
ex=:(subs[$i]-1+dims[$i]*$ex)
end
end
Expr(:block,meta,:($ex+1))
end
37 / 72
38. Nonstandard String Literals
Generate an expression from a string.
String↦ Expr
Denoted as <literalname>"..."
Regular expression literal (e.g. r"^>[^n]+n[ACGTN]+") is an
example.
In Bio.jl, dna"ACGT"is converted to a DNASequenceobject.
macror_str(s)
Regex(s)
end
#Regexobject
r"^>[^n]+n[ACGTN]+"
#DNASequenceobject
dna"ACGT"
38 / 72
39. Modules
Modules are namespace.
Names right under a module are considered as global names.
Import/export system enables to exchange names between
modules.
moduleFoo
exportfoo,gvar
#function
foo()=println("hello,foo")
bar()=println("hello,bar")
#globalvariable
constgvar=42
end
Foo.foo()
Foo.bar()
Foo.gvar
importFoo:foo
foo()
importFoo:bar
bar()
usingFoo
foo()
gvar
39 / 72
40. Packages
A package manager is bundled with Julia.
No other package manager; this is the standard.
The package manager can build, install, and create packages.
Almost all packages are hosted on GitHub.
Registered packages
Registered packages are public packages that can be installed by
name.
List: http://pkg.julialang.org/
Repository: https://github.com/JuliaLang/METADATA.jl
40 / 72
41. Packages Management
The package manager is accessible from REPL.
Pkg.update(): update registered package data and upgrade
packages
The way to install a package depends on whether the package is
registered or not.
Pkg.add(<package>): install a registered package
Pkg.clone(<url>): install a package from the git URL
julia>Pkg.update()
julia>Pkg.add("DocOpt")
julia>Pkg.clone("git@github.com:docopt/DocOpt.jl.git")
41 / 72
42. Packages Create a Package
Package template can be generated with Pkg.generate(<package>).
This generates a disciplined scaffold to develop a new package.
Generated packages will be located in ~/.julia/v0.4/.
Pkg.tag(<package>,<version>)tags the version to the current
commit of the package.
This tag is considered as a release of the package.
Developers should follow Semantic Versioning.
major: incompatible API changes
minor: backwards-compatible functionality addition
patch: backwards-compatible bug fixes
julia>Pkg.generate("DocOpt")
julia>Pkg.tag("DocOpt",:patch) #patchupdate
42 / 72
44. BioJulia
Collaborative project to build bioinformatics infrastructure for Julia.
Packages:
Bio.jl - https://github.com/BioJulia/Bio.jl
Other packages - https://github.com/BioJulia
44 / 72
45. BioJulia Basic Principles
BioJulia will be fast.
All contributions undergo code review.
We'll design it to suit modern bioinformatics and Julia, not just copy
other Bio-projects.
https://github.com/BioJulia/Bio.jl/wiki/roadmap
45 / 72
48. Sequences
Sequence types are defined in Bio.Seqmodule:
DNASequence, RNASequence, AminoAcidSequence, Kmer
julia>usingBio.Seq
julia>dna"ACGTN" #non-standardstringliteral
5ntDNASequence
ACGTN
julia>rna"ACGUN"
5ntRNASequence
ACGUN
julia>aa"ARNDCWYV"
8aaSequence:
ARNDCWYV
julia>kmer(dna"ACGT")
DNA4-mer:
ACGT
48 / 72
49. Sequences Packed Nucleotides
A/C/G/Tare packed into an array with 2-bit encoding (+1 bit for N).
typeNucleotideSequence{T<:Nucleotide}<:Sequence
data::Vector{UInt64}#2-bitencodedsequence
ns::BitVector #'N'mask
...
end
In Kmer, nucleotides are packed into a 64-bit type.
bitstype64Kmer{T<:Nucleotide,K}
typealiasDNAKmer{K}Kmer{DNANucleotide,K}
typealiasRNAKmer{K}Kmer{RNANucleotide,K}
49 / 72
50. Sequences Immutable by Convention
Sequences are immutable by convention.
No copy when creating a subsequence from an existing sequence.
julia>seq=dna"ACGTATG"
7ntDNASequence
ACGTATG
julia>seq[2:4]
3ntDNASequence
CGT
#internaldataissharedbetween
#theoriginalanditssubsequences
julia>seq.data===seq[2:4].data
true
50 / 72
51. Intervals
Genomic interval types are defined in Bio.Intervalsmodule:
Interval{T}: Tis the type of metadata attached to the interval.
typeInterval{T}<:AbstractInterval{Int64}
seqname::StringField
first::Int64
last::Int64
strand::Strand
metadata::T
end
This is useful when annotating a genomic range:
julia>usingBio.Intervals
julia>Interval("chr2",5692667,5701385,'+',"SOX11")
chr2:5692667-5701385 + SOX11
51 / 72
52. Intervals Indexed Collections
Set of intervals can be indexed by IntervalCollection:
immutableCDS;gene::ASCIIString;index::Int;end
ivals=IntervalCollection{CDS}()
push!(ivals,Interval("chr6",156777930,156779471,'+',
CDS("ARID1B",1)))
push!(ivals,Interval("chr6",156829227,156829421,'+',
CDS("ARID1B",2)))
push!(ivals,Interval("chr6",156901376,156901525,'+',
CDS("ARID1B",3)))
intersectiterates over intersecting intervals:
julia>query=Interval("chr6",156829200,156829300);
julia>foriinintersect(ivals,query)
println(i)
end
chr6:156829227-156829421 + CDS("ARID1B",2)
52 / 72
53. Parsers
Parsers are generated from the Ragel state machine compiler.
Finite state machines are described in regular language.
The Ragel compiler generates pure Julia programs.
Actions can be injected into the state transition.
The next Ragel release (v7) will be shipped with the Julia generator.
http://www.colm.net/open-source/ragel/
53 / 72
59. Alignments Speed (1)
Global alignment of titin sequences (human and mouse):
affinegap=AffineGapScoreModel(BLOSUM62,-10,-1)
a=first(open("Q8WZ42.fasta",FASTA)).seq
b=first(open("A2ASS6.fasta",FASTA)).seq
@timealn=pairalign(
GlobalAlignment(),
Vector{AminoAcid}(a),
Vector{AminoAcid}(b),
affinegap,
)
println(score(aln))
8.012499seconds(601.99kallocations:1.155GB,0.09%gctime)
165611
vs. R (Biostrings):
user systemelapsed
14.042 1.233 15.475
59 / 72
60. Alignments Speed (2)
vs. R (Biostrings):
user systemelapsed
14.042 1.233 15.475
library(Biostrings,quietly=T)
a=readAAStringSet("Q8WZ42.fasta")[[1]]
b=readAAStringSet("A2ASS6.fasta")[[1]]
t0=proc.time()
aln=pairwiseAlignment(a,b,type="global",
substitutionMatrix="BLOSUM62",
gapOpening=10,gapExtension=1)
t1=proc.time()
print(t1-t0)
print(score(aln))
60 / 72
61. Indexable Bit Vectors
Bit vectors that supports bit counting in constant time.
rank1(bv,i): Count the number of 1 bits within bv[1:i].
rank0(bv,i): Count the number of 0 bits within bv[1:i].
A fundamental data structure when defining other data structures.
WaveletMatrix, a generalization of the indexable bit vector,
depends on this data structure.
'N'nucleotides in a reference sequence can be compressed
using this data structure.
julia>bv=SucVector(bitrand(10_000_000));
julia>rank1(bv,9_000_000); #precompile
julia>@timerank1(bv,9_000_000)
0.000006seconds(149allocations:10.167KB)
4502258
61 / 72
62. Indexable Bit Vectors Internals
A bit vector is divided into 256-bit large blocks and each large block is
divided into 64-bit small blocks:
immutableBlock
#largeblock
large::UInt32
#smallblocks
smalls::NTuple{4,UInt8}
#bitchunks(64bits×4=256bits)
chunks::NTuple{4,UInt64}
end
Each block has a cache that counts the number of 1s.
62 / 72
63. FMIndexes
Index for full-text search.
Fast, compact, and often used in short-read sequence mappers
(Bowtie2, BWA, etc.).
Product of Julia Summer of Code 2015
https://github.com/BioJulia/FMIndexes.jl
This package is not specialized for biological sequences.
FMIndexes.jl does not depend on Bio.jl.
JIT compiler can optimize code for a specific type at runtime.
julia>fmindex=FMIndex(dna"ACGTATTGACTGTA");
julia>count(dna"TA",fmindex)
2
julia>count(dna"TATT",fmindex)
1
63 / 72
64. FMIndexed Queries
Create an FM-Index for chromosome 22:
julia>fmindex=FMIndex(first(open("chr22.fa",FASTA)).seq);
count(pattern,index): count the number of occurrences of pattern:
julia>count(dna"ACGT",fmindex)
37672
julia>count(dna"ACGTACGT",fmindex)
42
64 / 72
69. Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
69 / 72
70. Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
Moore foundation granted Julia Computing $600,000.
"Bringing Julia from beta to 1.0 to support data-intensive, scientific
computing" - https://www.moore.org/newsroom/in-the-
news/2015/11/10/bringing-julia-from-beta-to-1.0-to-support-data-
intensive-scientific-computing
70 / 72
71. Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
Moore foundation granted Julia Computing $600,000.
"Bringing Julia from beta to 1.0 to support data-intensive, scientific
computing" - https://www.moore.org/newsroom/in-the-
news/2015/11/10/bringing-julia-from-beta-to-1.0-to-support-data-
intensive-scientific-computing
Multi-threading Support
https://github.com/JuliaLang/julia/pull/13410
71 / 72
72. Julia Updates '15
Julia Computing Inc. was founded.
"Why the creators of the Julia programming language just
launched a startup" - http://venturebeat.com/2015/05/18/why-the-
creators-of-the-julia-programming-language-just-launched-a-
startup/
Moore foundation granted Julia Computing $600,000.
"Bringing Julia from beta to 1.0 to support data-intensive, scientific
computing" - https://www.moore.org/newsroom/in-the-
news/2015/11/10/bringing-julia-from-beta-to-1.0-to-support-data-
intensive-scientific-computing
Multi-threading Support
https://github.com/JuliaLang/julia/pull/13410
Intel released ParallelAccelerator.jl
https://github.com/IntelLabs/ParallelAccelerator.jl
72 / 72