Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Hive vs. Impala

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 4 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (13)

Anzeige

Ähnlich wie Hive vs. Impala (20)

Weitere von Omid Vahdaty (20)

Anzeige

Aktuellste (20)

Hive vs. Impala

  1. 1. Hive Vs Impala Omid Vahdaty, Big Data ninja
  2. 2. Differences of Hive VS. Impala Hive Impala Author Apache Cloudera/Apache design Map reduce jobs MPP database Use cases Hive which transforms SQL queries into MapReduce or Apache Spark jobs under the covers, is great for long- running ETL jobs (for which fault tolerance is highly desirable; for such jobs, you don't want to have to re-do a long- running query that failed after several hours) Impala is a MPP analytic database on top of Hadoop and is largely written in C++ for speed, pushes data processing down to local DataNodes, avoiding network bottlenecks. enables low- latency/interactive queries, especially under multi-user load. This makes Impala very popular with data analysts who need and expect an interactive "BI" experience
  3. 3. Differences of Hive VS. Impala Hive Impala Read/write parallel Read in parallel, write on 1 virtual disk - may change. Resource management Yarn 128GB per node.Yarn supported. SQL syntax HiveSQL ? Performance Disk In memory, All heavy calculations like group by, conversions would be memory based. Querying May start in a delay (batch jobs) No delay Query fault tolerance Will restart on failure. Start over in failure.
  4. 4. Differences of Hive VS. Impala Hive Impala Complex data types yes no Anti pattern Interactive / ad hoc. ?

×