Weitere ähnliche Inhalte Ähnlich wie Kudu austin oct 2015.pptx (20) Mehr von Felicia Haggarty (6) Kürzlich hochgeladen (20) Kudu austin oct 2015.pptx1. 1
©
Cloudera,
Inc.
All
rights
reserved.
David
Alves
on
behalf
of
the
Kudu
team
Kudu:
Resolving
Transac@onal
and
Analy@c
Trade-‐offs
in
Hadoop
1
2. 2
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
Storage
for
Fast
Analy@cs
on
Fast
Data
• New
upda@ng
column
store
for
Hadoop
• Apache-‐licensed
open
source
• Beta
now
available
Columnar
Store
Kudu
3. 3
©
Cloudera,
Inc.
All
rights
reserved.
Mo@va@on
and
Goals
Why
build
Kudu?
3
4. 4
©
Cloudera,
Inc.
All
rights
reserved.
Mo@va@ng
Ques@ons
• Are
there
user
problems
that
can
we
can’t
address
because
of
gaps
in
Hadoop
ecosystem
storage
technologies?
• Are
we
posi@oned
to
take
advantage
of
advancements
in
the
hardware
landscape?
5. 5
©
Cloudera,
Inc.
All
rights
reserved.
Current
Storage
Landscape
in
Hadoop
HDFS
excels
at:
• Efficiently
scanning
large
amounts
of
data
• Accumula@ng
data
with
high
throughput
HBase
excels
at:
• Efficiently
finding
and
wri@ng
individual
rows
• Making
data
mutable
Gaps
exist
when
these
proper@es
are
needed
simultaneously
6. 6
©
Cloudera,
Inc.
All
rights
reserved.
Changing
Hardware
landscape
• Spinning
disk
-‐>
solid
state
storage
• NAND
flash:
Up
to
450k
read
250k
write
iops,
about
2GB/sec
read
and
1.5GB/
sec
write
throughput,
at
a
price
of
less
than
$3/GB
and
dropping
• 3D
XPoint
memory
(1000x
faster
than
NAND,
cheaper
than
RAM)
• RAM
is
cheaper
and
more
abundant:
• 64-‐>128-‐>256GB
over
last
few
years
• Takeaway
1:
The
next
bo?leneck
is
CPU,
and
current
storage
systems
weren’t
designed
with
CPU
efficiency
in
mind.
• Takeaway
2:
Column
stores
are
feasible
for
random
access
7. 7
©
Cloudera,
Inc.
All
rights
reserved.
• High
throughput
for
big
scans
(columnar
storage
and
replica@on)
Goal:
Within
2x
of
Parquet
• Low-‐latency
for
short
accesses
(primary
key
indexes
and
quorum
replica@on)
Goal:
1ms
read/write
on
SSD
• Database-‐like
seman@cs
(ini@ally
single-‐row
ACID)
• RelaHonal
data
model
• SQL
query
• “NoSQL”
style
scan/insert/update
(Java
client)
Kudu
Design
Goals
8. 8
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
Design
Goals
how effectively primary key filters can be pushed down to Kudu.
What do I use Kudu for?
We talked about how Kudu is made for SQL, allows fast scans, and allows fast mutability at
scale. With that in context, let’s look at the variety of use cases done in Hadoop today and see
where Kudu fits in.
If we look at Kudu in the above figure, we will see that many of the traditional SQL use cases
9. 9
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
Usage
• Table
has
a
SQL-‐like
schema
• Finite
number
of
columns
(unlike
HBase/Cassandra)
• Types:
BOOL,
INT8,
INT16,
INT32,
INT64,
FLOAT,
DOUBLE,
STRING,
BINARY,
TIMESTAMP
• Some
subset
of
columns
makes
up
a
possibly-‐composite
primary
key
• Fast
ALTER
TABLE
• Java
and
C++
“NoSQL”
style
APIs
• Insert(),
Update(),
Delete(),
Scan()
• Integra@ons
with
MapReduce,
Spark,
and
Impala
• more
to
come!
9
10. 10
©
Cloudera,
Inc.
All
rights
reserved.
Use
cases
and
architectures
11. 11
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
Use
Cases
Kudu
is
best
for
use
cases
requiring
a
simultaneous
combinaHon
of
sequenHal
and
random
reads
and
writes
● Time
Series
○ Examples:
Stream
market
data;
fraud
detec@on
&
preven@on;
risk
monitoring
○ Workload:
Insert,
updates,
scans,
lookups
● Machine
Data
AnalyHcs
○ Examples:
Network
threat
detec@on
○ Workload:
Inserts,
scans,
lookups
● Online
ReporHng
○ Examples:
ODS
○ Workload:
Inserts,
updates,
scans,
lookups
12. 12
©
Cloudera,
Inc.
All
rights
reserved.
Real-‐Time
Analy@cs
in
Hadoop
Today
Fraud
Detec@on
in
the
Real
World
=
Storage
Complexity
ConsideraHons:
● How
do
I
handle
failure
during
this
process?
● How
oten
do
I
reorganize
data
streaming
in
into
a
format
appropriate
for
repor@ng?
● When
repor@ng,
how
do
I
see
data
that
has
not
yet
been
reorganized?
● How
do
I
ensure
that
important
jobs
aren’t
interrupted
by
maintenance?
New
Par@@on
Most
Recent
Par@@on
Historic
Data
HBase
Parquet
File
Have
we
accumulated
enough
data?
Reorganize
HBase
file
into
Parquet
• Wait
for
running
opera@ons
to
complete
• Define
new
Impala
par@@on
referencing
the
newly
wriwen
Parquet
file
Incoming
Data
(Messaging
System)
Repor@ng
Request
Impala
on
HDFS
13. 13
©
Cloudera,
Inc.
All
rights
reserved.
Real-‐Time
Analy@cs
in
Hadoop
with
Kudu
Improvements:
● One
system
to
operate
● No
cron
jobs
or
background
processes
● Handle
late
arrivals
or
data
correcHons
with
ease
● New
data
available
immediately
for
analyHcs
or
operaHons
Historical
and
Real-‐@me
Data
Incoming
Data
(Messaging
System)
Repor@ng
Request
Storage
in
Kudu
14. 14
©
Cloudera,
Inc.
All
rights
reserved.
How
it
works
Replica@on
and
distribu@on
14
15. 15
©
Cloudera,
Inc.
All
rights
reserved.
Tables
and
Tablets
• Table
is
horizontally
parHHoned
into
tablets
• Range
or
hash
par@@oning
• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY
HASH(timestamp) INTO 100 BUCKETS
• Each
tablet
has
N
replicas
(3
or
5),
with
RaX
consensus
• Allow
read
from
any
replica,
plus
leader-‐driven
writes
with
low
MTTR
• Tablet
servers
host
tablets
• Store
data
on
local
disks
(no
HDFS)
15
16. 16
©
Cloudera,
Inc.
All
rights
reserved.
Metadata
• Replicated
master*
• Acts
as
a
tablet
directory
(“META”
table)
• Acts
as
a
catalog
(table
schemas,
etc)
• Acts
as
a
load
balancer
(tracks
TS
liveness,
re-‐replicates
under-‐replicated
tablets)
• Caches
all
metadata
in
RAM
for
high
performance
• 80-‐node
load
test,
GetTableLoca@ons
RPC
perf:
• 99th
percen@le:
68us,
99.99th
percen@le:
657us
• <2%
peak
CPU
usage
• Client
configured
with
master
addresses
• Asks
master
for
tablet
loca@ons
as
needed
and
caches
them
16
18. 18
©
Cloudera,
Inc.
All
rights
reserved.
Rat
consensus
18
TS
A
Tablet
1
(LEADER)
Client
TS
B
Tablet
1
(FOLLOWER)
TS
C
Tablet
1
(FOLLOWER)
WAL
WAL
WAL
2b.
Leader
writes
local
WAL
1a.
Client-‐>Leader:
Write()
RPC
2a.
Leader-‐>Followers:
UpdateConsensus()
RPC
3.
Follower:
write
WAL
4.
Follower-‐>Leader:
success
3.
Follower:
write
WAL
5.
Leader
has
achieved
majority
6.
Leader-‐>Client:
Success!
19. 19
©
Cloudera,
Inc.
All
rights
reserved.
Fault
tolerance
• Transient
FOLLOWER
failure:
• Leader
can
s@ll
achieve
majority
• Restart
follower
TS
within
5
min
and
it
will
rejoin
transparently
• Transient
LEADER
failure:
• Followers
expect
to
hear
a
heartbeat
from
their
leader
every
1.5
seconds
• 3
missed
heartbeats:
leader
elec@on!
• New
LEADER
is
elected
from
remaining
nodes
within
a
few
seconds
• Restart
within
5
min
and
it
rejoins
as
a
FOLLOWER
• N
replicas
handle
(N-‐1)/2
failures
19
20. 20
©
Cloudera,
Inc.
All
rights
reserved.
Fault
tolerance
(2)
• Permanent
failure:
• Leader
no@ces
that
a
follower
has
been
dead
for
5
minutes
• Evicts
that
follower
• Master
selects
a
new
replica
• Leader
copies
the
data
over
to
the
new
one,
which
joins
as
a
new
FOLLOWER
20
21. 21
©
Cloudera,
Inc.
All
rights
reserved.
How
it
works
Storage
engine
internals
21
22. 22
©
Cloudera,
Inc.
All
rights
reserved.
Tablet
design
• Inserts
buffered
in
an
in-‐memory
store
(like
HBase’s
memstore)
• Flushed
to
disk
• Columnar
layout,
similar
to
Apache
Parquet
• Updates
use
MVCC
(updates
tagged
with
@mestamp,
not
in-‐place)
• Allow
“SELECT
AS
OF
<@mestamp>”
queries
and
consistent
cross-‐tablet
scans
• Near-‐op@mal
read
path
for
“current
@me”
scans
• No
per
row
branches,
fast
vectorized
decoding
and
predicate
evalua@on
• Performance
worsens
based
on
number
of
recent
updates
22
23. 23
©
Cloudera,
Inc.
All
rights
reserved.
LSM
vs
Kudu
• LSM
–
Log
Structured
Merge
(Cassandra,
HBase,
etc)
• Inserts
and
updates
all
go
to
an
in-‐memory
map
(MemStore)
and
later
flush
to
on-‐disk
files
(HFile/SSTable)
• Reads
perform
an
on-‐the-‐fly
merge
of
all
on-‐disk
HFiles
• Kudu
• Shares
some
traits
(memstores,
compac@ons)
• More
complex.
• Slower
writes
in
exchange
for
faster
reads
(especially
scans)
23
24. 24
©
Cloudera,
Inc.
All
rights
reserved.
LSM
Insert
Path
24
MemStore
INSERT
Row=r1
col=c1
val=“blah”
Row=r1
col=c2
val=“1”
HFile
1
Row=r1
col=c1
val=“blah”
Row=r1
col=c2
val=“1”
flush
25. 25
©
Cloudera,
Inc.
All
rights
reserved.
LSM
Insert
Path
25
MemStore
INSERT
Row=r1
col=c1
val=“blah”
Row=r1
col=c2
val=“2”
HFile
2
Row=r2
col=c1
val=“blah2”
Row=r2
col=c2
val=“2”
flush
HFile
1
Row=r1
col=c1
val=“blah”
Row=r1
col=c2
val=“1”
26. 26
©
Cloudera,
Inc.
All
rights
reserved.
LSM
Update
path
26
MemStore
UPDATE
HFile
1
Row=r1
col=c1
val=“blah”
Row=r1
col=c2
val=“2”
HFile
2
Row=r2
col=c1
val=“v2”
Row=r2
col=c2
val=“5”
Row=r2
col=c1
val=“newval”
Note:
all
updates
are
“fully
decoupled”
from
reads.
Random-‐
write
workload
is
transformed
to
fully
sequen@al!
27. 27
©
Cloudera,
Inc.
All
rights
reserved.
LSM
Read
path
27
MemStore
HFile
1
Row=r1
col=c1
val=“blah”
Row=r1
col=c2
val=“2”
HFile
2
Row=r2
col=c1
val=“v2”
Row=r2
col=c2
val=“5”
Row=r2
col=c1
val=“newval”
Merge
based
on
string
row
keys
R1:
c1=blah
c2=2
R2:
c1=newval
c2=5
….
CPU
intensive!
Must
always
read
rowkeys
Any
given
row
may
exist
across
mul@ple
HFiles:
must
always
merge!
The
more
HFiles
to
merge,
the
slower
it
reads
28. 28
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
Inserts
and
Flushes
28
MemRowSet
INSERT
(“todd”,
“$1000”,”engineer”)
name
pay
role
DiskRowSet
1
flush
29. 29
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
Inserts
and
Flushes
29
MemRowSet
name
pay
role
DiskRowSet
1
name
pay
role
DiskRowSet
2
INSERT
(“doug”,
“$1B”,
“Hadoop
man”)
flush
30. 30
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
-‐
Updates
30
MemRowSet
name
pay
role
DiskRowSet
1
name
pay
role
DiskRowSet
2
Delta
MS
Delta
MS
Each
DiskRowSet
has
its
own
DeltaMemStore
to
accumulate
updates
base
data
base
data
31. 31
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
-‐
Updates
31
MemRowSet
name
pay
role
DiskRowSet
1
name
pay
role
DiskRowSet
2
Delta
MS
Delta
MS
UPDATE
set
pay=“$1M”
WHERE
name=“todd”
Is
the
row
in
DiskRowSet
2?
(check
bloom
filters)
Is
the
row
in
DiskRowSet
1?
(check
bloom
filters)
Bloom
says:
no!
Bloom
says:
maybe!
Search
key
column
to
find
offset:
rowid
=
150
150:
col
1=$1M
base
data
32. 32
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
Read
path
32
MemRowSet
name
pay
role
DiskRowSet
1
name
pay
role
DiskRowSet
2
Delta
MS
Delta
MS
150:
pay=$1M
Read
rows
in
DiskRowSet
2
Then,
read
rows
in
DiskRowSet
1
Any
row
is
only
in
exactly
one
DiskRowSet–
no
need
to
merge
cross-‐
DRS!
Updates
are
merged
based
on
ordinal
offset
within
DRS:
array
indexing,
no
string
compares
base
data
base
data
33. 33
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
Delta
flushes
33
MemRowSet
name
pay
role
DiskRowSet
1
name
pay
role
DiskRowSet
2
Delta
MS
Delta
MS
0:
pay=foo
REDO
DeltaFile
Flush
A
REDO
delta
indicates
how
to
transform
between
the
‘base
data’
(columnar)
and
a
later
version
base
data
base
data
34. 34
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
Major
delta
compac@on
34
name
pay
role
DiskRowSet(pre-‐compac@on)
Delta
MS
REDO
DeltaFile
REDO
DeltaFile
REDO
DeltaFile
Many
deltas
accumulate:
lots
of
delta
applica@on
work
on
reads
name
pay
role
DiskRowSet(post-‐compac@on)
Delta
MS
Unmerged
REDO
deltas
UNDO
deltas
If
a
column
has
few
updates,
doesn’t
need
to
be
re-‐
wriwen:
those
deltas
maintained
in
new
DeltaFile
Merge
updates
for
columns
with
high
update
percentage
base
data
35. 35
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
RowSet
Compac@ons
35
DRS
1
(32MB)
[PK=alice],
[PK=joe],
[PK=linda],
[PK=zach]
DRS
2
(32MB)
[PK=bob],
[PK=jon],
[PK=mary]
[PK=zeke]
DRS
3
(32MB)
[PK=carl],
[PK=julie],
[PK=omar]
[PK=zoe]
DRS
4
(32MB)
DRS
5
(32MB)
DRS
6
(32MB)
[alice,
bob,
carl,
joe]
[jon,
julie,
linda,
mary]
[omar,
zach,
zeke,
zoe]
Reorganize
rows
to
avoid
rowsets
with
overlapping
key
ranges
36. 36
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
storage
–
Compac@on
policy
• Solves
an
op@miza@on
problem
(knapsack
problem)
• Minimize
“height”
of
rowsets
for
the
average
key
lookup
• Bound
on
number
of
seeks
for
write
or
random-‐read
• Restrict
total
IO
of
any
compac@on
to
a
budget
(128MB)
• No
long
compacHons,
ever
• No
“minor”
vs
“major”
disHncHon
• Always
be
compac@ng
or
flushing
• Low
IO
priority
maintenance
threads
36
37. 37
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
trade-‐offs
• Random
updates
will
be
slower
• HBase
model
allows
random
updates
without
incurring
a
disk
seek
• Kudu
requires
a
key
lookup
before
update,
bloom
lookup
before
insert
• Single-‐row
reads
may
be
slower
• Columnar
design
is
op@mized
for
scans
• Future:
may
introduce
“column
groups”
for
applica@ons
where
single-‐row
access
is
more
important
• Especially
slow
at
reading
a
row
that
has
had
many
recent
updates
(e.g
YCSB
“zipfian”)
37
39. 39
©
Cloudera,
Inc.
All
rights
reserved.
TPC-‐H
(Analy@cs
benchmark)
• 75TS
+
1
master
cluster
• 12
(spinning)
disk
each,
enough
RAM
to
fit
dataset
• Using
Kudu
0.5.0,
Impala
2.2
with
Kudu
support,
CDH
5.4
• TPC-‐H
Scale
Factor
100
(100GB)
• Example
query:
• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer,
orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND
l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey
AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA'
AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY
n_name ORDER BY revenue desc;
39
40. 40
©
Cloudera,
Inc.
All
rights
reserved.
-‐
Kudu
outperforms
Parquet
by
31%
(geometric
mean)
for
RAM-‐resident
data
-‐
Parquet
likely
to
outperform
Kudu
for
HDD-‐resident
(larger
IO
requests)
41. 41
©
Cloudera,
Inc.
All
rights
reserved.
What
about
Apache
Phoenix?
• 10
node
cluster
(9
worker,
1
master)
• HBase
1.0,
Phoenix
4.3
• TPC-‐H
LINEITEM
table
only
(6B
rows)
41
2152
219
76
131
0.04
1918
13.2
1.7
0.7
0.15
155
9.3
1.4
1.5
1.37
0.01
0.1
1
10
100
1000
10000
Load
TPCH
Q1
COUNT(*)
COUNT(*)
WHERE…
single-‐row
lookup
Time
(sec)
Phoenix
Kudu
Parquet
42. 42
©
Cloudera,
Inc.
All
rights
reserved.
What
about
NoSQL-‐style
random
access?
(YCSB)
• YCSB
0.5.0-‐snapshot
• 10
node
cluster
(9
worker,
1
master)
• HBase
1.0
• 100M
rows,
10M
ops
42
43. 43
©
Cloudera,
Inc.
All
rights
reserved.
But
don’t
trust
me
(a
vendor)…
43
44. 44
©
Cloudera,
Inc.
All
rights
reserved.
About
Xiaomi
Mobile
Internet
Company
Founded
in
2010
Smartphones SoXware
E-‐commerce
MIUI
Cloud
Services
App
Store/Game
Payment/Finance
…
Smart
Home
Smart
Devices
45. 45
©
Cloudera,
Inc.
All
rights
reserved.
Big
Data
AnalyHcs
Pipeline
Before
Kudu
• Long
pipeline
high
latency(1
hour
~
1
day),
data
conversion
pains
• No
ordering
Log
arrival(storage)
order
not
exactly
logical
order
e.g.
read
2-‐3
days
of
log
for
data
in
1
day
46. 46
©
Cloudera,
Inc.
All
rights
reserved.
Big
Data
Analysis
Pipeline
Simplified
With
Kudu
• ETL
Pipeline(0~10s
latency)
Apps
that
need
to
prevent
backpressure
or
require
ETL
• Direct
Pipeline(no
latency)
Apps
that
don’t
require
ETL
and
no
backpressure
issues
OLAP
scan
Side
table
lookup
Result
store
47. 47
©
Cloudera,
Inc.
All
rights
reserved.
Use
Case
1
Mobile
service
monitoring
and
tracing
tool
Requirements
u High
write
throughput
>5
Billion
records/day
and
growing
u Query
latest
data
and
quick
response
Iden@fy
and
resolve
issues
quickly
u Can
search
for
individual
records
Easy
for
troubleshoo@ng
Gather
important
RPC
tracing
events
from
mobile
app
and
backend
service.
Service
monitoring
&
troubleshoo@ng
tool.
48. 48
©
Cloudera,
Inc.
All
rights
reserved.
Use
Case
1:
Benchmark
Environment
u 71
Node
cluster
u Hardware
CPU:
E5-‐2620
2.1GHz
*
24
core
Memory:
64GB
Network:
1Gb
Disk:
12
HDD
u Sotware
Hadoop2.6/Impala
2.1/Kudu
Data
u 1
day
of
server
side
tracing
data
~2.6
Billion
rows
~270
bytes/row
17
columns,
5
key
columns
49. 49
©
Cloudera,
Inc.
All
rights
reserved.
Use
Case
1:
Benchmark
Results
1.4
2.0
2.3
3.1
1.3
0.9
1.3
2.8
4.0
5.7
7.5
16.7
Q1
Q2
Q3
Q4
Q5
Q6
kudu
parquet
Total
Time(s)
Throughput(Total)
Throughput(per
node)
Kudu
961.1
2.8M
record/s
39.5k
record/s
Parquet
114.6
23.5M
record/s
331k
records/s
Bulk
load
using
impala
(INSERT
INTO):
Query
latency:
*
HDFS
parquet
file
replica@on
=
3
,
kudu
table
replica@on
=
3
*
Each
query
run
5
@mes
then
take
average
50. 50
©
Cloudera,
Inc.
All
rights
reserved.
Use
Case
1:
Result
Analysis
u Lazy
materializa@on
Ideal
for
search
style
query
Q6
returns
only
a
few
records
(of
a
single
user)
with
all
columns
u Scan
range
pruning
using
primary
index
Predicates
on
primary
key
Q5
only
scans
1
hour
of
data
u Future
work
Primary
index:
speed-‐up
order
by
and
dis@nct
Hash
Par@@oning:
speed-‐up
count(dis@nct),
no
need
for
global
shuffle/merge
51. 51
©
Cloudera,
Inc.
All
rights
reserved.
Use
Case
2
OLAP
PaaS
for
ecosystem
cloud
u Provide
big
data
service
for
smart
hardware
startups
(Xiaomi’s
ecosystem
members)
u OLAP
database
with
some
OLTP
features
u Manage/Ingest/query
your
data
and
serving
results
in
one
place
Backend/Mobile
App/Smart
Device/IoT
…
53. 53
©
Cloudera,
Inc.
All
rights
reserved.
Kudu
is…
• NOT
a
SQL
database
• “BYO
SQL”
• NOT
a
filesystem
• data
must
have
tabular
structure
• NOT
a
replacement
for
HBase
or
HDFS
• Cloudera
con@nues
to
invest
in
those
systems
• Many
use
cases
where
they’re
s@ll
more
appropriate
• NOT
an
in-‐memory
database
• Very
fast
for
memory-‐sized
workloads,
but
can
operate
on
larger
data
too!
53
55. 55
©
Cloudera,
Inc.
All
rights
reserved.
Ge…ng
started
as
a
user
• hwp://getkudu.io
• kudu-‐user@googlegroups.com
• Quickstart
VM
• Easiest
way
to
get
started
• Impala
and
Kudu
in
an
easy-‐to-‐install
VM
• CSD
and
Parcels
• For
installa@on
on
a
Cloudera
Manager-‐managed
cluster
55
56. 56
©
Cloudera,
Inc.
All
rights
reserved.
Ge…ng
started
as
a
developer
• hwp://github.com/cloudera/kudu
• All
commits
go
here
first
• Public
gerrit:
hwp://gerrit.cloudera.org
• All
code
reviews
happening
here
• Public
JIRA:
hwp://issues.cloudera.org
• Includes
bugs
going
back
to
2013.
Come
see
our
dirty
laundry!
• kudu-‐dev@googlegroups.com
• Apache
2.0
license
open
source
• Contribu@ons
are
welcome
and
encouraged!
56
57. 57
©
Cloudera,
Inc.
All
rights
reserved.
Demo?
(if
we
have
@me
and
internet
gods
willing)
57
58. 58
©
Cloudera,
Inc.
All
rights
reserved.
hwp://getkudu.io/
@getkudu