SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
User-Defined
Table-Generating Functions (UDTF)
By Paul Yang (pyang@facebook.com)
Outline
 UDTF Description and Usage
 Execution Phase
 Compile Phase
UDF vs UDAF vs UDTF
 User Defined Functions
 – One-to-one mapping
 – concat(“foo”, “bar”)
 User Defined Aggregate Functions
 – Many-to-one mapping
 – sum(num_ads)
 User Defined Table-generating Functions
 – One-to-many mapping
 – explode([1,2,3])
UDTF Example (Transform)
 explode(Array<?> arg)
 – Converts an array into multiple rows, with one
   element per row
 Transform-like syntax
 – SELECT udtf(col0, col1, …) AS
   colAlias FROM srcTable
UDTF Example (Transform)
 SELECT explode(group_ids) AS group_id
 FROM src


    Table src      Output
     group_ids      group_id
     [1]            1
     [2, 3]         2
                    3
UDTF (Lateral View)
 Transform syntax limited to single
 expression
 – SELECT pageid, explode(adid_list)…
 Use lateral view
 – Creates a virtual table using UDTF
 Lateral view syntax
 – …FROM baseTable
   LATERAL VIEW udtf(col0, col1…)
   tableAlias AS colAlias0, colAlias1…
UDTF (Lateral View)
 Example Query
 SELECT src.*, myTable.*
 FROM src LATERAL VIEW
 explode(group_ids) myTable AS
 group_id

            src

          user_id group_ids
          100     [1]
          101     [2,3]
UDTF (Lateral View)
  src                   explode(group_ids) myTable AS group_id

user_id group_ids                group_id
100     [1]                      1
101     [2,3]                    2
                                 3

                                  Join input rows to
                                  output rows
        Result
         user_id    group_ids group_id
         100        [1]       1
         101        [2,3]        2
         101        [2,3]        3
Outline
 UDTF Description and Usage
 Execution Phase
 Compile Phase
Execution Phase
 2 New Operators
 – UDTFOperator
 – LateralViewJoinOperator
 Different Operator DAG’s
 – For SELECT udtf(…)
 – For … FROM udtf(…) … LATERAL VIEW
Execution Phase - UDTFOperator
                          [1,2]
   UDTFOperator

                                  GenericUDTF
         processOp()                      process()
                          [1,2]
                                                   1
                                       Collector
                                   1
    forwardUDTFOutput()                   collect()




                             1
GenericUDTF Interface
Execution Phase - Transform
    SELECT explode(group_ids) AS group_id
    FROM src

                         TS      Gets the rows of src



    Selects group_ids   SELECT




                        UDTF        Calls explode


                   More Operators
Execution Phase - LateralViewJoinOperator

            foo              bar
                             baz


                   LVJ



                  foo, bar
                  foo, baz
Execution Phase (Lateral View)
FROM src LATERAL VIEW
explode(group_ids) myTable AS
group_id                           TS     Gets the rows of src


      Selects other                                  Selects arg.
                      SELECT                SELECT
      needed                                         cols for UDTF
      columns                                        (group_ids)
      (user_id,
                                             UDTF
      group_ids)                                     Calls explode()

                                    LVJ
                                          Joins input and
                                          ouput rows

                               More Operators
Outline
 UDTF Description and Usage
 Execution Phase
 Compile Phase
Compile Phase - Transform
 SELECT explode(group_id_array) AS
 group_id FROM src
Compile Time - Transform
 UDTF detected in genSelectPlan()

SemanticAnalyzer::genSelectPlan()
 ...

 if (udtfExpr.getType() == HiveParser.TOK_FUNCTION) {
   String funcName =
     TypeCheckProcFactory.DefaultExprProcessor.getFunctionText(
         udtfExpr, true);
   FunctionInfo fi = FunctionRegistry.getFunctionInfo(funcName);
   if (fi != null) {
     genericUDTF = fi.getGenericUDTF();
   }
   isUDTF = (genericUDTF != null);
 }

 ...
Compile Time - Transform
 If a UDTF is present
 – Generate a select operator to get the columns
   needed for UDTF
 – Attach UDTFOperator to SelectOperator
     SemanticAnalyzer::genSelectPlan()
      ...
      if (isInTransform) {
        output = genScriptPlan(trfm, qb, output);
      }

      if (isUDTF) {
        output = genUDTFPlan(genericUDTF,
      udtfTableAlias, udtfColAliases, qb,
                             output);
      }
Compile Phase - Lateral View
  SELECT * FROM src LATERAL VIEW
  explode(group_ids) myTable AS
  group_id


Partial AST for above query:
Compile Phase – Lateral View
 In SemanticAnalyzer::doPhase1()
 – make a mapping from the source table alias to
   TOK_LATERAL_VIEW


    “src”
Compile Phase– Lateral View
 In SemanticAnalyzer::               TS
 genPlan()
 – Iterate through
   Map<String, Operator>
   aliasToOpInfo
 – Attach SELECT/UDTF
   Operator DAG with
   genLateralViewPlans()
                           Attached by
                           genLateralViewPlans()

Weitere ähnliche Inhalte

Andere mochten auch

Hive introduction 介绍
Hive  introduction 介绍Hive  introduction 介绍
Hive introduction 介绍ablozhou
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConamarsri
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveWill Du
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Replacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and HiveReplacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and HiveJunHo Cho
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 

Andere mochten auch (9)

Hive introduction 介绍
Hive  introduction 介绍Hive  introduction 介绍
Hive introduction 介绍
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Replacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and HiveReplacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and Hive
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 

Ähnlich wie User-Defined Table Generating Functions

Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)Kohei KaiGai
 
python-cheat-sheet-v1
python-cheat-sheet-v1python-cheat-sheet-v1
python-cheat-sheet-v1Hiroshi Ono
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesGuido Chari
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)gekiaruj
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlowOswald Campesato
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015Michiel Borkent
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java APIAdam Kawa
 
Testing for share
Testing for share Testing for share
Testing for share Rajeev Mehta
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)Jacek Laskowski
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkAlexey Smirnov
 
User defined functions
User defined functionsUser defined functions
User defined functionsshubham_jangid
 
エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理maruyama097
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Databricks
 

Ähnlich wie User-Defined Table Generating Functions (20)

Unix system calls
Unix system callsUnix system calls
Unix system calls
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
Writable Foreign Data Wrapper (JPUG Unconference 16-Feb-2013)
 
Hw09 Hadoop + Clojure
Hw09   Hadoop + ClojureHw09   Hadoop + Clojure
Hw09 Hadoop + Clojure
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
python-cheat-sheet-v1
python-cheat-sheet-v1python-cheat-sheet-v1
python-cheat-sheet-v1
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual Machines
 
Python 2.5 reference card (2009)
Python 2.5 reference card (2009)Python 2.5 reference card (2009)
Python 2.5 reference card (2009)
 
Practical cats
Practical catsPractical cats
Practical cats
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
Testing for share
Testing for share Testing for share
Testing for share
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
User defined functions
User defined functionsUser defined functions
User defined functions
 
エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理エンタープライズ・クラウドと 並列・分散・非同期処理
エンタープライズ・クラウドと 並列・分散・非同期処理
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
 
Summary of C++17 features
Summary of C++17 featuresSummary of C++17 features
Summary of C++17 features
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

User-Defined Table Generating Functions

  • 1. User-Defined Table-Generating Functions (UDTF) By Paul Yang (pyang@facebook.com)
  • 2. Outline UDTF Description and Usage Execution Phase Compile Phase
  • 3. UDF vs UDAF vs UDTF User Defined Functions – One-to-one mapping – concat(“foo”, “bar”) User Defined Aggregate Functions – Many-to-one mapping – sum(num_ads) User Defined Table-generating Functions – One-to-many mapping – explode([1,2,3])
  • 4. UDTF Example (Transform) explode(Array<?> arg) – Converts an array into multiple rows, with one element per row Transform-like syntax – SELECT udtf(col0, col1, …) AS colAlias FROM srcTable
  • 5. UDTF Example (Transform) SELECT explode(group_ids) AS group_id FROM src Table src Output group_ids group_id [1] 1 [2, 3] 2 3
  • 6. UDTF (Lateral View) Transform syntax limited to single expression – SELECT pageid, explode(adid_list)… Use lateral view – Creates a virtual table using UDTF Lateral view syntax – …FROM baseTable LATERAL VIEW udtf(col0, col1…) tableAlias AS colAlias0, colAlias1…
  • 7. UDTF (Lateral View) Example Query SELECT src.*, myTable.* FROM src LATERAL VIEW explode(group_ids) myTable AS group_id src user_id group_ids 100 [1] 101 [2,3]
  • 8. UDTF (Lateral View) src explode(group_ids) myTable AS group_id user_id group_ids group_id 100 [1] 1 101 [2,3] 2 3 Join input rows to output rows Result user_id group_ids group_id 100 [1] 1 101 [2,3] 2 101 [2,3] 3
  • 9. Outline UDTF Description and Usage Execution Phase Compile Phase
  • 10. Execution Phase 2 New Operators – UDTFOperator – LateralViewJoinOperator Different Operator DAG’s – For SELECT udtf(…) – For … FROM udtf(…) … LATERAL VIEW
  • 11. Execution Phase - UDTFOperator [1,2] UDTFOperator GenericUDTF processOp() process() [1,2] 1 Collector 1 forwardUDTFOutput() collect() 1
  • 13. Execution Phase - Transform SELECT explode(group_ids) AS group_id FROM src TS Gets the rows of src Selects group_ids SELECT UDTF Calls explode More Operators
  • 14. Execution Phase - LateralViewJoinOperator foo bar baz LVJ foo, bar foo, baz
  • 15. Execution Phase (Lateral View) FROM src LATERAL VIEW explode(group_ids) myTable AS group_id TS Gets the rows of src Selects other Selects arg. SELECT SELECT needed cols for UDTF columns (group_ids) (user_id, UDTF group_ids) Calls explode() LVJ Joins input and ouput rows More Operators
  • 16. Outline UDTF Description and Usage Execution Phase Compile Phase
  • 17. Compile Phase - Transform SELECT explode(group_id_array) AS group_id FROM src
  • 18. Compile Time - Transform UDTF detected in genSelectPlan() SemanticAnalyzer::genSelectPlan() ... if (udtfExpr.getType() == HiveParser.TOK_FUNCTION) { String funcName = TypeCheckProcFactory.DefaultExprProcessor.getFunctionText( udtfExpr, true); FunctionInfo fi = FunctionRegistry.getFunctionInfo(funcName); if (fi != null) { genericUDTF = fi.getGenericUDTF(); } isUDTF = (genericUDTF != null); } ...
  • 19. Compile Time - Transform If a UDTF is present – Generate a select operator to get the columns needed for UDTF – Attach UDTFOperator to SelectOperator SemanticAnalyzer::genSelectPlan() ... if (isInTransform) { output = genScriptPlan(trfm, qb, output); } if (isUDTF) { output = genUDTFPlan(genericUDTF, udtfTableAlias, udtfColAliases, qb, output); }
  • 20. Compile Phase - Lateral View SELECT * FROM src LATERAL VIEW explode(group_ids) myTable AS group_id Partial AST for above query:
  • 21. Compile Phase – Lateral View In SemanticAnalyzer::doPhase1() – make a mapping from the source table alias to TOK_LATERAL_VIEW “src”
  • 22. Compile Phase– Lateral View In SemanticAnalyzer:: TS genPlan() – Iterate through Map<String, Operator> aliasToOpInfo – Attach SELECT/UDTF Operator DAG with genLateralViewPlans() Attached by genLateralViewPlans()