Cloudera Impala

Cloudera
Impala

Some
Observa2ons

Strengths

•  Excellent
for
analy2cal
queries

– Queries
that
scan
large
amounts
of
data

•  Parquet
with
Snappy
codec
provides
good

trade-‐oﬀ

– Fast
data
load

– Good
query
performance

Strengths
-‐
con2nued

•  SQL
compliance

– Most
queries
from
other
systems
work
as-‐in

•  Impala-‐shell
–
Handy
Interface

– Easy
to
use
interface

•  Hadoop
Integra2on

– Uses
Hive
as
Metastore
and
HDFS
as
storage

– Uses
its
own
daemons
to
execute
query

Weaknesses

•  Random
Access
is
Slow

•  No
Fault
Tolerance

– If
a
node
fails,
all
queries
running
on
that
node

will
fail.
Only
op2on
is
to
retry
the
query.

• 
Upda2ng/Cleaning
Data
Tedious

– No
direct
updates
are
supported

– Mul2-‐step
process

•  For
example,
to
remove
rows
from
exis2ng
table:

Create
a
temp
table
by
selec2ng
rows
from
source,

drop
source
table,
rename
temp
table
to
source.

Weaknesses
-‐
con2nued

•  Update
Stats
Manual
Process

– On
loading
signiﬁcant
amount
of
data,
stats
must

be
updated
manually.
Some
queries
will
perform

poorly
or
fail
if
this
is
not
done.

•  Memory
Consump2on

– For
Impala
queries
to
perform
fast,
signiﬁcant

amount
RAM
needed.

– It
is
possible
to
spill
to
disk
but
that
slows
down

performance.

Conclusion

•  Impala
makes
SQL
ﬁrst
class
ci2zens
in

Hadoop
ecosystem

•  Great
for
workloads
where
data
is
immutable

•  Excellent
query
performance
to
analy2cal

queries

•  Not
suitable
for
work
loads
that
involve

frequent
data
updates

•  Queries
are
not
fault-‐tolerant

Cloudera Impala

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Cloudera Impala

Ähnlich wie Cloudera Impala (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cloudera Impala