2. Strengths
• Excellent
for
analy2cal
queries
– Queries
that
scan
large
amounts
of
data
• Parquet
with
Snappy
codec
provides
good
trade-‐off
– Fast
data
load
– Good
query
performance
3. Strengths
-‐
con2nued
• SQL
compliance
– Most
queries
from
other
systems
work
as-‐in
• Impala-‐shell
–
Handy
Interface
– Easy
to
use
interface
• Hadoop
Integra2on
– Uses
Hive
as
Metastore
and
HDFS
as
storage
– Uses
its
own
daemons
to
execute
query
4. Weaknesses
• Random
Access
is
Slow
• No
Fault
Tolerance
– If
a
node
fails,
all
queries
running
on
that
node
will
fail.
Only
op2on
is
to
retry
the
query.
•
Upda2ng/Cleaning
Data
Tedious
– No
direct
updates
are
supported
– Mul2-‐step
process
• For
example,
to
remove
rows
from
exis2ng
table:
Create
a
temp
table
by
selec2ng
rows
from
source,
drop
source
table,
rename
temp
table
to
source.
5. Weaknesses
-‐
con2nued
• Update
Stats
Manual
Process
– On
loading
significant
amount
of
data,
stats
must
be
updated
manually.
Some
queries
will
perform
poorly
or
fail
if
this
is
not
done.
• Memory
Consump2on
– For
Impala
queries
to
perform
fast,
significant
amount
RAM
needed.
– It
is
possible
to
spill
to
disk
but
that
slows
down
performance.
6. Conclusion
• Impala
makes
SQL
first
class
ci2zens
in
Hadoop
ecosystem
• Great
for
workloads
where
data
is
immutable
• Excellent
query
performance
to
analy2cal
queries
• Not
suitable
for
work
loads
that
involve
frequent
data
updates
• Queries
are
not
fault-‐tolerant