SlideShare a Scribd company logo
1 of 11
Hive – SerDe and LazySerDe ,[object Object],[object Object]
Where is SerDe? File on HDFS Hierarchical Object Writable Stream Stream Hierarchical Object Map Output File Writable Writable Writable Writable Writable Hierarchical Object File on HDFS User Script Hierarchical Object Hierarchical Object Hive Operator Hive Operator SerDe FileFormat  / Hadoop Serialization Mapper Reducer ObjectInspector imp 1.0 3 54 Imp 0.2 1 33 clk 2.2 8 212 Imp 0.7 2 22  thrift_record<…> thrift_record<…> thrift_record<…> thrift_record<…> BytesWritable(3F647200) Text(‘imp 1.0 3 54’) // UTF8 encoded Java Object Object of a Java Class Standard Object Use ArrayList for struct and array Use HashMap for map LazyObject Lazily-deserialized
SerDe, ObjectInspector and TypeInfo Hierarchical Object Writable Writable Struct int string list struct map string string Hierarchical Object String Object TypeInfo BytesWritable(3F647200) Text(‘ a=av:b=bv 23 1:2=4:5 abcd ’) class HO { HashMap<String, String> a, Integer b, List<ClassC> c, String d; } Class ClassC { Integer a, Integer b; } List ( HashMap(“a”    “av”, “b”    “bv”), 23, List(List(1,null),List(2,4),List(5,null)), “ abcd” ) int int HashMap(“a”    “av”, “b”    “bv”), HashMap<String, String> a, “ av” getType ObjectInspector1 getFieldOI getStructField getType ObjectInspector2 getMapValueOI getMapValue deserialize SerDe serialize getOI getType ObjectInspector3
LazySimpleSerDe components LazyStruct LazyInteger LazyString LazyMap LazyString LazyString LazyString LazyString LazyStructOI(“ “) LazyArrayOI(“:”) LazyMapOI(“:”,”=“) StandardIntegerOI StandardStringOI StandardStringOI byte[] data Hierarchical Object / LazyObject One Per SerDe instance LazyObjectInspector Singleton byte[](‘ a=av:b=bv 23 1:2=4:5 abcd ’) LazyStruct LazyStructOI(“=“) StandardIntegerOI LazyStruct LazyArray LazyInteger LazyInteger LazyInteger LazyInteger
LazyPrimitive ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
LazyNonPrimitive ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why another SerDe? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Features of LazySimpleSerDe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Profiling result of a mapper ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TypeInfo String specification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Future of SerDe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011Hortonworks
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Declarative Internal DSLs in Lua: A Game Changing Experience
Declarative Internal DSLs in Lua: A Game Changing ExperienceDeclarative Internal DSLs in Lua: A Game Changing Experience
Declarative Internal DSLs in Lua: A Game Changing ExperienceAlexander Gladysh
 
Intro To Cascading
Intro To CascadingIntro To Cascading
Intro To CascadingNate Murray
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? DataWorks Summit
 
Implementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickImplementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickHermann Hueck
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkZalando Technology
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraRustam Aliyev
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
2014 holden - databricks umd scala crash course
2014   holden - databricks umd scala crash course2014   holden - databricks umd scala crash course
2014 holden - databricks umd scala crash courseHolden Karau
 
Object Relational Mapping in PHP
Object Relational Mapping in PHPObject Relational Mapping in PHP
Object Relational Mapping in PHPRob Knight
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAlexandre Victoor
 

What's hot (20)

Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of code
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Declarative Internal DSLs in Lua: A Game Changing Experience
Declarative Internal DSLs in Lua: A Game Changing ExperienceDeclarative Internal DSLs in Lua: A Game Changing Experience
Declarative Internal DSLs in Lua: A Game Changing Experience
 
Intro To Cascading
Intro To CascadingIntro To Cascading
Intro To Cascading
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch?
 
Implementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickImplementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with Slick
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
2014 holden - databricks umd scala crash course
2014   holden - databricks umd scala crash course2014   holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
 
Object Relational Mapping in PHP
Object Relational Mapping in PHPObject Relational Mapping in PHP
Object Relational Mapping in PHP
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 

Similar to Hive - SerDe and LazySerde

VelocityGraph Introduction
VelocityGraph IntroductionVelocityGraph Introduction
VelocityGraph IntroductionMats Persson
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...Holden Karau
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?Kevin Pilch
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeRick Copeland
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
 
Introduction to jQuery
Introduction to jQueryIntroduction to jQuery
Introduction to jQueryGunjan Kumar
 
Introduction to Client-Side Javascript
Introduction to Client-Side JavascriptIntroduction to Client-Side Javascript
Introduction to Client-Side JavascriptJulie Iskander
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Ida python intro
Ida python introIda python intro
Ida python intro小静 安
 
PofEAA and SQLAlchemy
PofEAA and SQLAlchemyPofEAA and SQLAlchemy
PofEAA and SQLAlchemyInada Naoki
 
.NET Database Toolkit
.NET Database Toolkit.NET Database Toolkit
.NET Database Toolkitwlscaudill
 
Serializing EMF models with Xtext
Serializing EMF models with XtextSerializing EMF models with Xtext
Serializing EMF models with Xtextmeysholdt
 
Introduction To Groovy 2005
Introduction To Groovy 2005Introduction To Groovy 2005
Introduction To Groovy 2005Tugdual Grall
 
Js info vis_toolkit
Js info vis_toolkitJs info vis_toolkit
Js info vis_toolkitnikhilyagnic
 

Similar to Hive - SerDe and LazySerde (20)

VelocityGraph Introduction
VelocityGraph IntroductionVelocityGraph Introduction
VelocityGraph Introduction
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Big Data Analytics Part2
Big Data Analytics Part2Big Data Analytics Part2
Big Data Analytics Part2
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Dynamic Python
Dynamic PythonDynamic Python
Dynamic Python
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?
 
Allura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForgeAllura - an Open Source MongoDB Based Document Oriented SourceForge
Allura - an Open Source MongoDB Based Document Oriented SourceForge
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Introduction to jQuery
Introduction to jQueryIntroduction to jQuery
Introduction to jQuery
 
JavaEE Spring Seam
JavaEE Spring SeamJavaEE Spring Seam
JavaEE Spring Seam
 
Introduction to Client-Side Javascript
Introduction to Client-Side JavascriptIntroduction to Client-Side Javascript
Introduction to Client-Side Javascript
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Ida python intro
Ida python introIda python intro
Ida python intro
 
PofEAA and SQLAlchemy
PofEAA and SQLAlchemyPofEAA and SQLAlchemy
PofEAA and SQLAlchemy
 
.NET Database Toolkit
.NET Database Toolkit.NET Database Toolkit
.NET Database Toolkit
 
Java
JavaJava
Java
 
Serializing EMF models with Xtext
Serializing EMF models with XtextSerializing EMF models with Xtext
Serializing EMF models with Xtext
 
Introduction To Groovy 2005
Introduction To Groovy 2005Introduction To Groovy 2005
Introduction To Groovy 2005
 
Js info vis_toolkit
Js info vis_toolkitJs info vis_toolkit
Js info vis_toolkit
 

Recently uploaded

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Hive - SerDe and LazySerde

  • 1.
  • 2. Where is SerDe? File on HDFS Hierarchical Object Writable Stream Stream Hierarchical Object Map Output File Writable Writable Writable Writable Writable Hierarchical Object File on HDFS User Script Hierarchical Object Hierarchical Object Hive Operator Hive Operator SerDe FileFormat / Hadoop Serialization Mapper Reducer ObjectInspector imp 1.0 3 54 Imp 0.2 1 33 clk 2.2 8 212 Imp 0.7 2 22 thrift_record<…> thrift_record<…> thrift_record<…> thrift_record<…> BytesWritable(3F647200) Text(‘imp 1.0 3 54’) // UTF8 encoded Java Object Object of a Java Class Standard Object Use ArrayList for struct and array Use HashMap for map LazyObject Lazily-deserialized
  • 3. SerDe, ObjectInspector and TypeInfo Hierarchical Object Writable Writable Struct int string list struct map string string Hierarchical Object String Object TypeInfo BytesWritable(3F647200) Text(‘ a=av:b=bv 23 1:2=4:5 abcd ’) class HO { HashMap<String, String> a, Integer b, List<ClassC> c, String d; } Class ClassC { Integer a, Integer b; } List ( HashMap(“a”  “av”, “b”  “bv”), 23, List(List(1,null),List(2,4),List(5,null)), “ abcd” ) int int HashMap(“a”  “av”, “b”  “bv”), HashMap<String, String> a, “ av” getType ObjectInspector1 getFieldOI getStructField getType ObjectInspector2 getMapValueOI getMapValue deserialize SerDe serialize getOI getType ObjectInspector3
  • 4. LazySimpleSerDe components LazyStruct LazyInteger LazyString LazyMap LazyString LazyString LazyString LazyString LazyStructOI(“ “) LazyArrayOI(“:”) LazyMapOI(“:”,”=“) StandardIntegerOI StandardStringOI StandardStringOI byte[] data Hierarchical Object / LazyObject One Per SerDe instance LazyObjectInspector Singleton byte[](‘ a=av:b=bv 23 1:2=4:5 abcd ’) LazyStruct LazyStructOI(“=“) StandardIntegerOI LazyStruct LazyArray LazyInteger LazyInteger LazyInteger LazyInteger
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.