SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Toronto Hadoop User Group
February 20, 2013

Apache HCatalog Overview
& Next-gen Hive
Adam Muise




Š Hortonworks Inc. 2012     Page 1
HCatalog




Š Hortonworks Inc. 2012   Page 2
Apache HCatalog
• Incubator Project at Apache.org
• Good adoption
• Will likely merge with Hive project as it adds
  important functionality to metastore
• Allows for a “schema-on-read” approach to Big Data
  in HDFS
• Seat of a lot of innovation in Data Management
• Used by platform partners to enhance Hadoop
  integration
• Will likely be used to enhance existing Data
  Management products in the Enterprise & create new
  products

     Š Hortonworks Inc. 2012                       Page 3
Great Tooling Options
                             MapReduce
                             •  Early adopters
                             •  Non-relational algorithms
                             •  Performance sensitive applications

                             Pig
                             •  ETL
                             •  Data modeling
                             •  Iterative algorithms

                             Hive
                             •  Analysis
                             •  Connectors to BI tools


              Strength: Right tool for right application
               Weakness: Hard to share their data
   Š Hortonworks Inc. 2012                                           Page 4
HCatalog Changes the Game
Apache HCatalog provides flexible metadata
services across tools and external access
 •  Consistency of metadata and data models across the
    Enterprise (MapReduce, Pig, Hbase, Hive, External Systems)
 •  Accessibility: share data as tables in and out of HDFS
 •  Availability: enables flexible, thin-client access via REST API




                                  HCatalog                        Shared table
                                                                  and schema
                                                                  management
   •  Raw Hadoop data                        Table access         opens the
   •  Inconsistent, unknown                  Aligned metadata     platform
   •  Tool specific access                   REST API



        Š Hortonworks Inc. 2012                                             Page 5
Options == Complexity
 Feature                        MapReduce         Pig                   Hive
 Record format                  Key value pairs   Tuple                 Record
 Data model                     User defined      int, float, string,   int, float, string,
                                                  bytes, maps,          maps, structs, lists
                                                  tuples, bags
 Schema                         Encoded in app    Declared in script    Read from
                                                  or read by loader     metadata
 Data location                  Encoded in app    Declared in script    Read from
                                                                        metadata
 Data format                    Encoded in app    Declared in script    Read from
                                                                        metadata


•    Pig and MR users need to know a lot to write their apps
•    When data schema, location, or format change Pig and MR apps must be
     rewritten, retested, and redeployed
•    Hive users have to load data from Pig/MR users to have access to it
           Š Hortonworks 2012
                                                                                       Page 6
Hcatalog == Simple, Consistent
 Feature                         MapReduce +            Pig + HCatalog        Hive
                                 HCatalog
 Record format                   Record                 Tuple                 Record
 Data model                      int, float, string,    int, float, string,   int, float, string,
                                 maps, structs, lists   bytes, maps,          maps, structs, lists
                                                        tuples, bags
 Schema                          Read from              Read from             Read from
                                 metadata               metadata              metadata
 Data location                   Read from              Read from             Read from
                                 metadata               metadata              metadata
 Data format                     Read from              Read from             Read from
                                 metadata               metadata              metadata

•    Pig/MR users can read schema from metadata
•    Pig/MR users are insulated from schema, location, and format changes
•    All users have access to other users’ data as soon as it is committed
            Š Hortonworks 2012
                                                                                            Page 7
Hadoop Ecosystem

                                   Hive                       Pig                 MapReduce
                                  (SQL)                   (scripting)               (Java)




                    Interface:             Interface:      Interface:
                       SQL                   SerDe        Load/Store

                                           DML

                                             Input/         Input/                   Input/
                                          OutputFormat   OutputFormat             OutputFormat
                   DDL




                      metastore                          dn1   dn2      dn3   .       .    .
                 - tables
                 - partitions
                 - les                                   .     .        .    .       .    .
                 - types

                                                          .              .    .       .   dnN


                                                                         HDFS



   Š Hortonworks Inc. 2012
Opening up Metadata to MR & Pig

                                   Hive             Pig                 MapReduce
                                  (SQL)         (scripting)               (Java)




                                                       HCat Metadata layer
                                Interface:      Interface:
                                   SQL        HCatLoad/Store


                                                      Interface:
                                                        SerDe

                                                     HCatInput/OutputFormat




                                  metastore    dn1   dn2      dn3   .     .    .
                             - tables
                             - partitions
                             - les             .      .       .    .     .    .
                             - types

                                                .              .    .     .   dnN


                                                               HDFS



   Š Hortonworks Inc. 2012
Pig Example
Assume you want to count how many time each of your users went to each of your URLs


raw     = load '/data/rawevents/20120530' as (url, user);
botless = filter raw by myudfs.NotABot(user);
grpd    = group botless by (url, user);
cntd    = foreach grpd generate flatten(url, user), COUNT(botless);
               No need to know            No need to
store cntd into '/data/counted/20120530';
                                                    file location                     declare schema
Using HCatalog:


raw     = load 'rawevents' using HCatLoader();
botless = filter raw by myudfs.NotABot(user) and ds == '20120530';
grpd    = group botless by (url, user);
cntd    = foreach grpd generate flatten(url, user), COUNT(botless);
store cntd into 'counted' using HCatStorer();
                                                                                                       Partition filter




                       Š Hortonworks 2012
                                                                                                                 Page 10
Working with HCatalog in MapReduce
Setting input:             table to read from
HCatInputFormat.setInput(job,
  InputJobInfo.create(dbname, tableName, filter));

                    database to read      specify which
Setting output:
                         from            partitions to read
HCatOutputFormat.setOutput(job,
  OutputJobInfo.create(dbname, tableName, partitionSpec));

                                                 specify which
Obtaining schema:                               partition to write
schema = HCatInputFormat.getOutputSchema();
                                    access fields by
Key is unused, Value is HCatRecord:     name
String url = value.get("url", schema);
output.set("cnt", schema, cnt);



       Š Hortonworks 2012
                                                                     Page 11
Managing Metadata
•  If you are a Hive user, you can use your Hive metastore with no modifications

•  If not, you can use the HCatalog command line tool to issue Hive DDL (Data
   Definition Language) commands:
> /usr/bin/hcat -e ”create table rawevents (url string,
user string) partitioned by (ds string);";

•  Starting in Pig 0.11, you will be able to issue DDL commands from Pig




        Š Hortonworks 2012
                                                                           Page 12
Templeton - REST API
•  REST endpoints: databases, tables, partitions, columns, table properties
•  PUT to create/update, GET to list or describe, DELETE to drop




                                Get a list of all tables in the default database:

                                        Create new table “rawevents”
                                                                                    Hadoop/
                                                                                    HCatalog
                                       Describe table “rawevents”




           Š Hortonworks 2012
                                                                                               Page 13
HCatalog is in Hortonworks Data Platform…

  OPERATIONAL	
  SERVICES	
                                                DATA	
  
                                                                         SERVICES	
  

                  AMBARI	
                  FLUME	
                  PIG	
                HIVE	
  
                                                                                                                    HBASE	
  
                                            SQOOP	
                            HCATALOG	
  
                   OOZIE	
  


                                                   WEBHDFS	
                                  MAP	
  REDUCE	
  
   HADOOP	
  CORE	
  
                                                         HDFS	
                               YARN	
  (in	
  2.0)	
  


                                                        Enterprise Readiness
   PLATFORM	
  SERVICES	
                               High Availability, Disaster Recovery, Snapshots, Security, etc…




                                                                                                HORTONWORKS	
  	
  
                                                                                       DATA	
  PLATFORM	
  (HDP)	
  

         OS	
                   Cloud	
                                  VM	
                                      Appliance	
  


      Š Hortonworks Inc. 2012                                                                                                      Page 14
Key 2013 “Enterprise Hadoop” Initiatives

                                                                                         Invest In:
                            Hive / “Stinger”
                                Interactive Query
                                                                                      – Platform Services
   Ambari                                                             HBase               – DR, Snapshot, …
Manage & Operate                                                     Online Data
                           OPERATIONAL	
              DATA	
  
                             SERVICES	
             SERVICES	
  



                                         HADOOP	
  CORE	
  
                                                                                      – Data Services
                                      PLATFORM	
  SERVICES	
                              – In support of Refine,
   “Knox”                             HORTONWORKS	
  	
               “Herd”                Explore, Enrich
 Secure Access
                                   DATA	
  PLATFORM	
  (HDP)	
     Data Integration


                                                                                      – Operational Services
                               “Continuum”                                                – Manageability,
                                   Biz Continuity
                                                                                            Security, …




            Š Hortonworks Inc. 2012                                                                       Page 15
Hive/Stinger: Interactive Query
Near-realtime queries in good old Hive…




Š Hortonworks Inc. 2012                   Page 16
Top BI Vendors Support Hive Today




   Š Hortonworks Inc. 2012          Page 17
Goal: Enhance Hive for BI Use Cases
Enterprise Reports                                                       Parameterized Reports



                                         Dashboard / Scorecard




   Data Mining                                                               Visualization




                                             More SQL
                                                 &
                                        Better Performance

                                Batch                      Interactive

      Š Hortonworks Inc. 2012                                                                Page 18
Differing Needs For Scale / Interaction
       Interactivity               Scalability and Reliability
          is key                             are key

                                    Non-
         Interactive                                       Batch
                                 Interactive
   •  Parameterized           •  Data preparation    •  Operational batch
      Reports                 •  Incremental batch      processing
   •  Drilldown                  processing          •  Enterprise
   •  Visualization           •  Dashboards /           Reports
   •  Exploration                Scorecards          •  Data Mining




              5s – 1m              1m – 1h                   1h+

                                   Data Size


    Š Hortonworks Inc. 2012                                                 Page 19
Stinger: Make Hive Best for All Needs

                                             Non-
                  Interactive                                       Batch
                                          Interactive
             •  Parameterized          •  Data preparation    •  Operational batch
                Reports                •  Incremental batch      processing
             •  Drilldown                 processing          •  Enterprise
             •  Visualization          •  Dashboards /           Reports
             •  Exploration               Scorecards          •  Data Mining




                       5s – 1m              1m – 1h                   1h+

                                            Data Size

Improve Latency & Throughput                     Extend Deep Analytical Ability
•  Query engine improvements                     •  Analytics functions
•  New “Optimized RCFile” column store           •  Improved SQL coverage
•  Next-gen runtime (elim’s M/R latency)         •  Continued focus on core Hive use cases

             Š Hortonworks Inc. 2012                                                    Page 20
Analytic Function Use Cases
• OVER
  – Rankings, top 10, bottom 10
  – Running balances
  – Statistics within time windows (e.g. last 3 months, last 6 months)
• LEAD / LAG
  – Trend identification
  – Sessionization
  – Forecasting / prediction
• Distributions
  – Histograms and bucketing
• Good for Enterprise Reports, Dashboards, Data
  Mining and Business Processing.


     Š Hortonworks Inc. 2012                                        Page 21
Stinger 2013 Roadmap Summary
• HDP 1.x (aka Hadoop 1.x …)
  – Additional SQL Types
  – SQL Analytic Functions (OVER, Subqueries in WHERE, etc.)
  – Modern Optimized Column Store (ORC file)
  – Hive Query Enhancements
       – Startup time, star joins, optimize M/R DAGs, vectorization, etc.


• HDP 2.x (aka Hadoop 2.x …)
  – Features in HDP 1.3 & 1.4
  – Next-gen runtime that eliminates startup time
  – Persistent function registry
  – Other features



     Š Hortonworks Inc. 2012                                                Page 22
Questions?




Š Hortonworks Inc. 2012   Page 23

Weitere ähnliche Inhalte

Was ist angesagt?

2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemStanley Wang
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopOvidiu Dimulescu
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011Milind Bhandarkar
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 

Was ist angesagt? (20)

2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop
Hadoop Hadoop
Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 

Andere mochten auch

Mining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetJ On The Beach
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadamAdam Muise
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop IntroductionAdam Muise
 
Good touch bad touch ppt
Good touch bad touch pptGood touch bad touch ppt
Good touch bad touch pptMonika Manchanda
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Good touch bad touch ppt
Good touch bad touch pptGood touch bad touch ppt
Good touch bad touch pptMonika Manchanda
 

Andere mochten auch (9)

Mining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert BifetMining big data streams with APACHE SAMOA by Albert Bifet
Mining big data streams with APACHE SAMOA by Albert Bifet
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop Introduction
 
Good touch bad touch ppt
Good touch bad touch pptGood touch bad touch ppt
Good touch bad touch ppt
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Good touch bad touch ppt
Good touch bad touch pptGood touch bad touch ppt
Good touch bad touch ppt
 

Ähnlich wie 2013 feb 20_thug_h_catalog

HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012Hortonworks
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Big Data Spain
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the ElephantDataWorks Summit
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013alanfgates
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data cloudsdamienjoyce
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 

Ähnlich wie 2013 feb 20_thug_h_catalog (20)

HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
H cat berlinbuzzwords2012
H cat berlinbuzzwords2012H cat berlinbuzzwords2012
H cat berlinbuzzwords2012
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Nov 2011 HUG: HParser
Nov 2011 HUG: HParserNov 2011 HUG: HParser
Nov 2011 HUG: HParser
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalogJan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Future of HCatalog
Future of HCatalogFuture of HCatalog
Future of HCatalog
 

Mehr von Adam Muise

Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopAdam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitectureAdam Muise
 
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mdaAdam Muise
 
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - HadoopAdam Muise
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACAdam Muise
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013Adam Muise
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_pointsAdam Muise
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012Adam Muise
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotechAdam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 

Mehr von Adam Muise (16)

Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
 
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
 
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 

KĂźrzlich hochgeladen

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 

KĂźrzlich hochgeladen (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

2013 feb 20_thug_h_catalog

  • 1. Toronto Hadoop User Group February 20, 2013 Apache HCatalog Overview & Next-gen Hive Adam Muise Š Hortonworks Inc. 2012 Page 1
  • 3. Apache HCatalog • Incubator Project at Apache.org • Good adoption • Will likely merge with Hive project as it adds important functionality to metastore • Allows for a “schema-on-read” approach to Big Data in HDFS • Seat of a lot of innovation in Data Management • Used by platform partners to enhance Hadoop integration • Will likely be used to enhance existing Data Management products in the Enterprise & create new products Š Hortonworks Inc. 2012 Page 3
  • 4. Great Tooling Options MapReduce •  Early adopters •  Non-relational algorithms •  Performance sensitive applications Pig •  ETL •  Data modeling •  Iterative algorithms Hive •  Analysis •  Connectors to BI tools Strength: Right tool for right application Weakness: Hard to share their data Š Hortonworks Inc. 2012 Page 4
  • 5. HCatalog Changes the Game Apache HCatalog provides flexible metadata services across tools and external access •  Consistency of metadata and data models across the Enterprise (MapReduce, Pig, Hbase, Hive, External Systems) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API Š Hortonworks Inc. 2012 Page 5
  • 6. Options == Complexity Feature MapReduce Pig Hive Record format Key value pairs Tuple Record Data model User defined int, float, string, int, float, string, bytes, maps, maps, structs, lists tuples, bags Schema Encoded in app Declared in script Read from or read by loader metadata Data location Encoded in app Declared in script Read from metadata Data format Encoded in app Declared in script Read from metadata •  Pig and MR users need to know a lot to write their apps •  When data schema, location, or format change Pig and MR apps must be rewritten, retested, and redeployed •  Hive users have to load data from Pig/MR users to have access to it Š Hortonworks 2012 Page 6
  • 7. Hcatalog == Simple, Consistent Feature MapReduce + Pig + HCatalog Hive HCatalog Record format Record Tuple Record Data model int, float, string, int, float, string, int, float, string, maps, structs, lists bytes, maps, maps, structs, lists tuples, bags Schema Read from Read from Read from metadata metadata metadata Data location Read from Read from Read from metadata metadata metadata Data format Read from Read from Read from metadata metadata metadata •  Pig/MR users can read schema from metadata •  Pig/MR users are insulated from schema, location, and format changes •  All users have access to other users’ data as soon as it is committed Š Hortonworks 2012 Page 7
  • 8. Hadoop Ecosystem Hive Pig MapReduce (SQL) (scripting) (Java) Interface: Interface: Interface: SQL SerDe Load/Store DML Input/ Input/ Input/ OutputFormat OutputFormat OutputFormat DDL metastore dn1 dn2 dn3 . . . - tables - partitions - les . . . . . . - types . . . . dnN HDFS Š Hortonworks Inc. 2012
  • 9. Opening up Metadata to MR & Pig Hive Pig MapReduce (SQL) (scripting) (Java) HCat Metadata layer Interface: Interface: SQL HCatLoad/Store Interface: SerDe HCatInput/OutputFormat metastore dn1 dn2 dn3 . . . - tables - partitions - les . . . . . . - types . . . . dnN HDFS Š Hortonworks Inc. 2012
  • 10. Pig Example Assume you want to count how many time each of your users went to each of your URLs raw = load '/data/rawevents/20120530' as (url, user); botless = filter raw by myudfs.NotABot(user); grpd = group botless by (url, user); cntd = foreach grpd generate flatten(url, user), COUNT(botless); No need to know No need to store cntd into '/data/counted/20120530'; file location declare schema Using HCatalog: raw = load 'rawevents' using HCatLoader(); botless = filter raw by myudfs.NotABot(user) and ds == '20120530'; grpd = group botless by (url, user); cntd = foreach grpd generate flatten(url, user), COUNT(botless); store cntd into 'counted' using HCatStorer(); Partition filter Š Hortonworks 2012 Page 10
  • 11. Working with HCatalog in MapReduce Setting input: table to read from HCatInputFormat.setInput(job, InputJobInfo.create(dbname, tableName, filter)); database to read specify which Setting output: from partitions to read HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbname, tableName, partitionSpec)); specify which Obtaining schema: partition to write schema = HCatInputFormat.getOutputSchema(); access fields by Key is unused, Value is HCatRecord: name String url = value.get("url", schema); output.set("cnt", schema, cnt); Š Hortonworks 2012 Page 11
  • 12. Managing Metadata •  If you are a Hive user, you can use your Hive metastore with no modifications •  If not, you can use the HCatalog command line tool to issue Hive DDL (Data Definition Language) commands: > /usr/bin/hcat -e ”create table rawevents (url string, user string) partitioned by (ds string);"; •  Starting in Pig 0.11, you will be able to issue DDL commands from Pig Š Hortonworks 2012 Page 12
  • 13. Templeton - REST API •  REST endpoints: databases, tables, partitions, columns, table properties •  PUT to create/update, GET to list or describe, DELETE to drop Get a list of all tables in the default database: Create new table “rawevents” Hadoop/ HCatalog Describe table “rawevents” Š Hortonworks 2012 Page 13
  • 14. HCatalog is in Hortonworks Data Platform… OPERATIONAL  SERVICES   DATA   SERVICES   AMBARI   FLUME   PIG   HIVE   HBASE   SQOOP   HCATALOG   OOZIE   WEBHDFS   MAP  REDUCE   HADOOP  CORE   HDFS   YARN  (in  2.0)   Enterprise Readiness PLATFORM  SERVICES   High Availability, Disaster Recovery, Snapshots, Security, etc… HORTONWORKS     DATA  PLATFORM  (HDP)   OS   Cloud   VM   Appliance   Š Hortonworks Inc. 2012 Page 14
  • 15. Key 2013 “Enterprise Hadoop” Initiatives Invest In: Hive / “Stinger” Interactive Query – Platform Services Ambari HBase – DR, Snapshot, … Manage & Operate Online Data OPERATIONAL   DATA   SERVICES   SERVICES   HADOOP  CORE   – Data Services PLATFORM  SERVICES   – In support of Refine, “Knox” HORTONWORKS     “Herd” Explore, Enrich Secure Access DATA  PLATFORM  (HDP)   Data Integration – Operational Services “Continuum” – Manageability, Biz Continuity Security, … Š Hortonworks Inc. 2012 Page 15
  • 16. Hive/Stinger: Interactive Query Near-realtime queries in good old Hive… Š Hortonworks Inc. 2012 Page 16
  • 17. Top BI Vendors Support Hive Today Š Hortonworks Inc. 2012 Page 17
  • 18. Goal: Enhance Hive for BI Use Cases Enterprise Reports Parameterized Reports Dashboard / Scorecard Data Mining Visualization More SQL & Better Performance Batch Interactive Š Hortonworks Inc. 2012 Page 18
  • 19. Differing Needs For Scale / Interaction Interactivity Scalability and Reliability is key are key Non- Interactive Batch Interactive •  Parameterized •  Data preparation •  Operational batch Reports •  Incremental batch processing •  Drilldown processing •  Enterprise •  Visualization •  Dashboards / Reports •  Exploration Scorecards •  Data Mining 5s – 1m 1m – 1h 1h+ Data Size Š Hortonworks Inc. 2012 Page 19
  • 20. Stinger: Make Hive Best for All Needs Non- Interactive Batch Interactive •  Parameterized •  Data preparation •  Operational batch Reports •  Incremental batch processing •  Drilldown processing •  Enterprise •  Visualization •  Dashboards / Reports •  Exploration Scorecards •  Data Mining 5s – 1m 1m – 1h 1h+ Data Size Improve Latency & Throughput Extend Deep Analytical Ability •  Query engine improvements •  Analytics functions •  New “Optimized RCFile” column store •  Improved SQL coverage •  Next-gen runtime (elim’s M/R latency) •  Continued focus on core Hive use cases Š Hortonworks Inc. 2012 Page 20
  • 21. Analytic Function Use Cases • OVER – Rankings, top 10, bottom 10 – Running balances – Statistics within time windows (e.g. last 3 months, last 6 months) • LEAD / LAG – Trend identification – Sessionization – Forecasting / prediction • Distributions – Histograms and bucketing • Good for Enterprise Reports, Dashboards, Data Mining and Business Processing. Š Hortonworks Inc. 2012 Page 21
  • 22. Stinger 2013 Roadmap Summary • HDP 1.x (aka Hadoop 1.x …) – Additional SQL Types – SQL Analytic Functions (OVER, Subqueries in WHERE, etc.) – Modern Optimized Column Store (ORC file) – Hive Query Enhancements – Startup time, star joins, optimize M/R DAGs, vectorization, etc. • HDP 2.x (aka Hadoop 2.x …) – Features in HDP 1.3 & 1.4 – Next-gen runtime that eliminates startup time – Persistent function registry – Other features Š Hortonworks Inc. 2012 Page 22