Suche senden
Hochladen
Cloudera Impala technical deep dive
•
28 gefällt mir
•
12,296 views
huguk
Folgen
Why Cloudera build Imapla and how it achieves it's blistering speed.
Weniger lesen
Mehr lesen
Technologie
Sport
Melden
Teilen
Melden
Teilen
1 von 28
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
How Impala Works
How Impala Works
Yue Chen
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
Scott Leberknight
Cloudera Impala Internals
Cloudera Impala Internals
David Groozman
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
Query Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.
(Aaron myers) hdfs impala
(Aaron myers) hdfs impala
NAVER D2
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
Incredible Impala
Incredible Impala
Gwen (Chen) Shapira
Empfohlen
How Impala Works
How Impala Works
Yue Chen
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
Scott Leberknight
Cloudera Impala Internals
Cloudera Impala Internals
David Groozman
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
Query Compilation in Impala
Query Compilation in Impala
Cloudera, Inc.
(Aaron myers) hdfs impala
(Aaron myers) hdfs impala
NAVER D2
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
Incredible Impala
Incredible Impala
Gwen (Chen) Shapira
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Jason Shih
Impala presentation
Impala presentation
trihug
Introduction to Impala
Introduction to Impala
markgrover
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Cloudera, Inc.
Impala Architecture presentation
Impala Architecture presentation
hadooparchbook
Nested Types in Impala
Nested Types in Impala
Myung Ho Yun
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
Cloudera impala
Cloudera impala
Swiss Big Data User Group
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajp
Cloudera Japan
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
Yukinori Suda
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
Cloudera, Inc.
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Wangda Tan
Hive vs. Impala
Hive vs. Impala
Omid Vahdaty
Apache Drill - Why, What, How
Apache Drill - Why, What, How
mcsrivas
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Chicago Hadoop Users Group
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
Cloudera Impala
Cloudera Impala
Scott Leberknight
Node labels in YARN
Node labels in YARN
Wangda Tan
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
Weitere ähnliche Inhalte
Was ist angesagt?
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Jason Shih
Impala presentation
Impala presentation
trihug
Introduction to Impala
Introduction to Impala
markgrover
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Cloudera, Inc.
Impala Architecture presentation
Impala Architecture presentation
hadooparchbook
Nested Types in Impala
Nested Types in Impala
Myung Ho Yun
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
Cloudera impala
Cloudera impala
Swiss Big Data User Group
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajp
Cloudera Japan
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
Yukinori Suda
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
Cloudera, Inc.
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Wangda Tan
Hive vs. Impala
Hive vs. Impala
Omid Vahdaty
Apache Drill - Why, What, How
Apache Drill - Why, What, How
mcsrivas
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Chicago Hadoop Users Group
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
Cloudera Impala
Cloudera Impala
Scott Leberknight
Node labels in YARN
Node labels in YARN
Wangda Tan
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
Was ist angesagt?
(20)
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
Impala presentation
Impala presentation
Introduction to Impala
Introduction to Impala
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Impala Architecture presentation
Impala Architecture presentation
Nested Types in Impala
Nested Types in Impala
The Impala Cookbook
The Impala Cookbook
Cloudera impala
Cloudera impala
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Impala 2.0 Update #impalajp
Impala 2.0 Update #impalajp
Performance evaluation of cloudera impala (with Comparison to Hive)
Performance evaluation of cloudera impala (with Comparison to Hive)
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hive vs. Impala
Hive vs. Impala
Apache Drill - Why, What, How
Apache Drill - Why, What, How
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera Impala
Cloudera Impala
Node labels in YARN
Node labels in YARN
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Andere mochten auch
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
Admission Control in Impala
Admission Control in Impala
Cloudera, Inc.
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
Cloudera, Inc.
Impala Performance Update
Impala Performance Update
Cloudera, Inc.
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Cloudera, Inc.
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
mattlieber
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
Turi, Inc.
Swimming
Swimming
guestcdd2928
A2 Sociology: Defining Relgion
A2 Sociology: Defining Relgion
April Lennox-Hill's Sociology Lessons
Steps involved in Sugar Production
Steps involved in Sugar Production
mustafeex
Types of brand
Types of brand
Chao Onlamai
Product life cycle & marketing strategy
Product life cycle & marketing strategy
Hitesh Sunny
10 Ways to Improve Internal Communication
10 Ways to Improve Internal Communication
Weekdone.com
Congratulations Graduate! Eleven Reasons Why I Will Never Hire You.
Congratulations Graduate! Eleven Reasons Why I Will Never Hire You.
Mark O'Toole
Product life cycle & marketing strategies
Product life cycle & marketing strategies
Amar Ingale
XII Marketing Project Work
XII Marketing Project Work
Rahil Jain
Andere mochten auch
(19)
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Admission Control in Impala
Admission Control in Impala
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
Impala Performance Update
Impala Performance Update
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
Swimming
Swimming
A2 Sociology: Defining Relgion
A2 Sociology: Defining Relgion
Steps involved in Sugar Production
Steps involved in Sugar Production
Types of brand
Types of brand
Product life cycle & marketing strategy
Product life cycle & marketing strategy
10 Ways to Improve Internal Communication
10 Ways to Improve Internal Communication
Congratulations Graduate! Eleven Reasons Why I Will Never Hire You.
Congratulations Graduate! Eleven Reasons Why I Will Never Hire You.
Product life cycle & marketing strategies
Product life cycle & marketing strategies
XII Marketing Project Work
XII Marketing Project Work
Ähnlich wie Cloudera Impala technical deep dive
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
Felicia Haggarty
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
Using spider for sharding in production
Using spider for sharding in production
Kentoku
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
michaelguia
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Alluxio, Inc.
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
Oracle business analytics best practices
Oracle business analytics best practices
Nitai Partners Inc
ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
Real World Storage in Treasure Data
Real World Storage in Treasure Data
Kai Sasaki
Introduction to MySQL Cluster
Introduction to MySQL Cluster
Abel Flórez
45 ways to speed up firebird database
45 ways to speed up firebird database
Fabio Codebue
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Databricks
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Dataconomy Media
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Felicia Haggarty
Ähnlich wie Cloudera Impala technical deep dive
(20)
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
Using spider for sharding in production
Using spider for sharding in production
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Oracle business analytics best practices
Oracle business analytics best practices
ORC Deep Dive 2020
ORC Deep Dive 2020
Real World Storage in Treasure Data
Real World Storage in Treasure Data
Introduction to MySQL Cluster
Introduction to MySQL Cluster
45 ways to speed up firebird database
45 ways to speed up firebird database
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Mehr von huguk
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
huguk
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
huguk
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
huguk
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
huguk
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
huguk
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
huguk
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
huguk
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
huguk
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
huguk
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
huguk
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
huguk
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
huguk
Cytora: Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
huguk
Cubitic: Predictive Analytics
Cubitic: Predictive Analytics
huguk
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
huguk
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
huguk
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
huguk
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
huguk
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
Mehr von huguk
(20)
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
Cytora: Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
Cubitic: Predictive Analytics
Cubitic: Predictive Analytics
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
Kürzlich hochgeladen
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Nicole Novielli
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
LoriGlavin3
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
LoriGlavin3
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Lars Bell
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Kürzlich hochgeladen
(20)
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
How to write a Business Continuity Plan
How to write a Business Continuity Plan
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Cloudera Impala technical deep dive
1.
Practical Performance Analysis and
Tuning for Cloudera Impala Headline Goes Here Marcel Kornacker | marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-11 Copyright © 2013 Cloudera Inc. All rights reserved. Copyright © 2013 Cloudera Inc. All rights reserved.
2.
Just how fast
is Impala? You might just be surprised !2
3.
How fast is
Impala? “DeWitt Clause” prohibits using DBMS vendor name !3 Copyright © 2013 Cloudera Inc. All rights reserved.
4.
Practical Performance Running with
Impala !4
5.
Pre-execution Checklist • Data types •
Partitioning • File Format !5 Copyright © 2013 Cloudera Inc. All rights reserved.
6.
Data Type Choices •
Define integer columns as INT/BIGINT • Operations • Convert on INT/BIGINT more efficient than STRING “external” data to good “internal” types on load • e.g. CAST date strings to TIMESTAMPS • This avoids expensive CASTs in queries later !6 Copyright © 2013 Cloudera Inc. All rights reserved.
7.
Partitioning • “The fastest I/O
is the one that never takes place.” • Understand your query filter predicates • For time-series data, this is usually the date/timestamp column • Use this/these column(s) for a partition key(s) • Validate queries leverage partition pruning using EXPLAIN • You can have too much of a good thing •A few thousand partitions per table is probably OK • Tens of thousands partitions is probably too much • Partitions/Files should be no less than a few hundred MBs !7 Copyright © 2013 Cloudera Inc. All rights reserved.
8.
Partition Pruning in
EXPLAIN explain select count(*) from sales_fact where sold_date in('2000-01-01','2000-01-02'); Note the partition count and missing “predicates” filter due to parse time ! — Partition Pruning (partitioned table) 0:SCAN HDFS table=grahn.sales_fact #partitions=2 size=803.15MB tuple ids: 0 ! — Filter Predicate (non-partitioned table) 0:SCAN HDFS table=grahn.sales_fact #partitions=1 size=799.42MB predicates: sold_date IN ('2000-01-01', '2000-01-02') tuple ids: 0 !8 Copyright © 2013 Cloudera Inc. All rights reserved.
9.
Why use Parquet
Columnar Format for HDFS? • Well defined open format - http://parquet.io/ • Works in Impala, Pig, Hive & Map/Reduce • I/O reduction by only reading necessary columns • Columnar layout compresses/encodes better • Supports nested data by shredding columns • Uses techniques used by Google’s ColumnIO • Impala • Gzip • Quick !9 loads use Snappy compression by default available: set PARQUET_COMPRESSION_CODEC=gzip; word on Snappy vs. Gzip Copyright © 2013 Cloudera Inc. All rights reserved.
10.
Parquet: A Motivating
Example !10 Copyright © 2013 Cloudera Inc. All rights reserved.
11.
Quick Note on
Compression • Snappy • Faster compression/decompression speeds • Less CPU cycles • Lower compression ratio • Gzip/Zlib • Slower compression/decompression speeds • More CPU cycles • Higher compression ratio • It’s all about trade-offs !11 Copyright © 2013 Cloudera Inc. All rights reserved.
12.
It’s all about
the compression codec Hortonworks graphic fails to call out that different codecs are used. Gzip compresses better than Snappy, but we all knew that. Source: http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/ Compressed with Snappy !12 Copyright © 2013 Cloudera Inc. All rights reserved. Compressed with Gzip
13.
TPC-DS 500GB Scale
Factor % of Original Size - Lower is Better 32.3% 34.8% Snappy 26.9% 27.4% Gzip 0 !13 Using the same compression codec produces comparable results 0.088 0.175 0.263 Copyright © 2013 Cloudera Inc. All rights reserved. 0.35 Parquet ORCFile
14.
Query Execution What happens
behind the scenes !14
15.
Left-Deep Join Tree largest*
table should be listed first in the FROM clause • Joins are done in the order tables are listed in FROM clause • Filter early - most selective joins/tables first • v1.2.1 will do JOIN ordering ⨝ • The !15 ⨝ ⨝ R3 R1 R2 Copyright © 2013 Cloudera Inc. All rights reserved. R4
16.
Two Types of
Hash Joins • Default hash join type is BROADCAST (aka replicated) • Each node ends up with a copy of the right table(s)* • Left side, read locally and streamed through local hash join(s) • Best choice for “star join”, single large fact table, multiple small dims • Alternate hash join type is SHUFFLE (aka partitioned) • Right side hashed and shuffled; each node gets ~1/Nth the data • Left side hashed and shuffled, then streamed through join • Best choice for “large_table JOIN large_table” • Only available if ANALYZE was used to gather table/column stats* !16 Copyright © 2013 Cloudera Inc. All rights reserved.
17.
How to use
ANALYZE • Table Stats (from Hive shell) • analyze • Column table T1 [partition(partition_key)] compute statistics; Stats (from Hive shell) • analyze table T1 [partition(partition_key)] compute statistics for columns c1,c2,… • Impala 1.2.1 will have a built-in ANALYZE command ! !17 Copyright © 2013 Cloudera Inc. All rights reserved.
18.
Hinting Joins select ... from
large_fact join [broadcast] small_dim ! ! select ... from large_fact join [shuffle] large_dim ! *square brackets required !18 Copyright © 2013 Cloudera Inc. All rights reserved.
19.
Determining Join Type
From EXPLAIN explain select s_state, count(*) from store_sales join store on (ss_store_sk = s_store_sk) group by s_state; ! ! 2:HASH JOIN | join op: INNER JOIN (BROADCAST) | hash predicates: | ss_store_sk = s_store_sk | tuple ids: 0 1 | |----4:EXCHANGE | tuple ids: 1 | 0:SCAN HDFS table=tpcds.store_sales tuple ids: 0 !19 explain select c_preferred_cust_flag, count(*) from store_sales join customer on (ss_customer_sk = c_customer_sk) group by c_preferred_cust_flag; 2:HASH JOIN | join op: INNER JOIN (PARTITIONED) | hash predicates: | ss_customer_sk = c_customer_sk | tuple ids: 0 1 | |----5:EXCHANGE | tuple ids: 1 | 4:EXCHANGE tuple ids: 0 Copyright © 2013 Cloudera Inc. All rights reserved.
20.
Memory Requirements for
Joins & Aggregates • Impala does not “spill” to disk -- pipelines are in-memory • Operators’ mem usage need to fit within the memory limit • This is not the same as “all data needs to fit in memory” • Buffered data generally significantly smaller than total accessed data • Aggregations’ mem usage proportional to number of groups • Applies for each in-flight query (sum of total) • Minimum of 128GB of RAM is recommended for impala nodes !20 Copyright © 2013 Cloudera Inc. All rights reserved.
21.
Impala Debug Web
Pages A great resource for debug information !21
22.
23.
!23
24.
25.
Quick Review Impala performance
checklist !25
26.
Impala Performance Checklist •
Choose appropriate data types • Leverage partition pruning • Adopt Parquet format • Gather table & column stats • Order JOINS optimally Automatic in v1.2.1 • Validate JOIN type • Monitor hardware resources !26 Copyright © 2013 Cloudera Inc. All rights reserved.
27.
Test drive Impala •
Impala Community: • Download: • Github: • User !27 http://cloudera.com/impala https://github.com/cloudera/impala group: impala-user on groups.cloudera.org Copyright © 2013 Cloudera Inc. All rights reserved.
28.
SELECT questions FROM
audience; Thank you. !28
Jetzt herunterladen