SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Data Integration in 2013:
  A working session
  Adam Muise
  March 26 2013




Note: This deck is purposely sparse. Want value?
Join the conversation in the Toronto Hadoop User
Group:
http://www.meetup.com/TorontoHUG/

  © Hortonworks Inc. 2012
Proposed Agenda
•   Introductions
•   Discuss common Data Integration Patterns
•   Round-table of User Group Member CDC/ETL Use Cases
•   New Data Integration Solutions: A change from the Old Guard:
    –   Hadoop and the Data Lake
    –   Streaming (+ Hadoop)
    –   Data Lake Governance / Management (InfoTrellis)
    –   Databus (LinkedIn)




                                                              Page 2
         © Hortonworks Inc. 2012
Introductions
Who let you in?




                               Page 3
     © Hortonworks Inc. 2012
General Data Integration Patterns
• Enterprise Application Integration*
       – Metadata lookup
       – Validation
       – Extra-app communication

• Enterprise Service Bus (SOA, Message Bus/Hub)*
• Federation*
       – Bridging multiple databases with a query layer
       – Eg: Composite

• Extract Transform Load (ETL)*
       – Collection
       – Aggregation
       – Format/Schema transformation

• Data Lake
       – Landing Zone for multiple datasets in one store
       – Mixed schema, often raw structured/unstructured data
       – Eg: Hadoop

* Source: Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture, Anthony David Giordano, 2010, IBM Press.


                                                                                                                                                      Page 4
                © Hortonworks Inc. 2012
Use Case Roundtable
Data that’s keeping you up at night…




                                       Page 5
     © Hortonworks Inc. 2012
Scotia iTrade: Geoffrey Li




                              Page 6
    © Hortonworks Inc. 2012
New Data Integration Solutions

Fresh Ideas to new and old problems…




                                       Page 7
     © Hortonworks Inc. 2012
Hadoop: The Data Lake


                                               Publish Event
                                                Signal Data
                                              Transformation


                                 Model/                  Transform &
                             Apply Metadata               Aggregate
                                                                        Publish
                                                                       Exchange




                                                                       Explore
                                                                       Visualize
      Extract &                                                         Report
      Load




                                                                          Analyze




                                                                                    Page 8
   © Hortonworks Inc. 2012
Streaming & Hadoop




http://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/

                                                                                                                      Page 9
            © Hortonworks Inc. 2012
Streaming & Hadoop




http://developer.yahoo.com/blogs/ydn/posts/2013/02/storm-and-hadoop-convergence-of-big-data-and-low-latency-processing/

                                                                                                                     Page 10
            © Hortonworks Inc. 2012
DataBus (LinkedIn)
Databus is a low latency change capture system which has become an
integral part of LinkedIn’s data processing pipeline. Databus addresses a
fundamental requirement to reliably capture, flow and processes primary
data changes. Databus provides the following features:
   1.    Isolation between sources and consumers
   2.    Guaranteed in order and at least once delivery with high availability
   3.    Consumption from an arbitrary time point in the change stream including full bootstrap
         capability of the entire data.
   4.    Partitioned consumption
   5.    Source consistency preservation




  https://github.com/linkedin/databus/wiki

                                                                                                  Page 11
           © Hortonworks Inc. 2012
DataBus (LinkedIn)




 https://github.com/linkedin/databus/wiki


                                            Page 12
          © Hortonworks Inc. 2012

Weitere ähnliche Inhalte

Was ist angesagt?

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Cloudera, Inc.
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Kolja Manuel Rödel
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches DataWorks Summit
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 

Was ist angesagt? (20)

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches Implementing and running a secure datalake from the trenches
Implementing and running a secure datalake from the trenches
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 

Ähnlich wie 2013 march 26_thug_etl_cdc_talking_points

Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applicationsrussell_jurney
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 

Ähnlich wie 2013 march 26_thug_etl_cdc_talking_points (20)

Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 

Mehr von Adam Muise

Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadamAdam Muise
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopAdam Muise
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitectureAdam Muise
 
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mdaAdam Muise
 
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - HadoopAdam Muise
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACAdam Muise
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013Adam Muise
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalogAdam Muise
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012Adam Muise
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotechAdam Muise
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 

Mehr von Adam Muise (20)

Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam2015 feb 24_paytm_labs_intro_ashwin_armandoadam
2015 feb 24_paytm_labs_intro_ashwin_armandoadam
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
 
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
 
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog2013 feb 20_thug_h_catalog
2013 feb 20_thug_h_catalog
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 

2013 march 26_thug_etl_cdc_talking_points

  • 1. Data Integration in 2013: A working session Adam Muise March 26 2013 Note: This deck is purposely sparse. Want value? Join the conversation in the Toronto Hadoop User Group: http://www.meetup.com/TorontoHUG/ © Hortonworks Inc. 2012
  • 2. Proposed Agenda • Introductions • Discuss common Data Integration Patterns • Round-table of User Group Member CDC/ETL Use Cases • New Data Integration Solutions: A change from the Old Guard: – Hadoop and the Data Lake – Streaming (+ Hadoop) – Data Lake Governance / Management (InfoTrellis) – Databus (LinkedIn) Page 2 © Hortonworks Inc. 2012
  • 3. Introductions Who let you in? Page 3 © Hortonworks Inc. 2012
  • 4. General Data Integration Patterns • Enterprise Application Integration* – Metadata lookup – Validation – Extra-app communication • Enterprise Service Bus (SOA, Message Bus/Hub)* • Federation* – Bridging multiple databases with a query layer – Eg: Composite • Extract Transform Load (ETL)* – Collection – Aggregation – Format/Schema transformation • Data Lake – Landing Zone for multiple datasets in one store – Mixed schema, often raw structured/unstructured data – Eg: Hadoop * Source: Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture, Anthony David Giordano, 2010, IBM Press. Page 4 © Hortonworks Inc. 2012
  • 5. Use Case Roundtable Data that’s keeping you up at night… Page 5 © Hortonworks Inc. 2012
  • 6. Scotia iTrade: Geoffrey Li Page 6 © Hortonworks Inc. 2012
  • 7. New Data Integration Solutions Fresh Ideas to new and old problems… Page 7 © Hortonworks Inc. 2012
  • 8. Hadoop: The Data Lake Publish Event Signal Data Transformation Model/ Transform & Apply Metadata Aggregate Publish Exchange Explore Visualize Extract & Report Load Analyze Page 8 © Hortonworks Inc. 2012
  • 11. DataBus (LinkedIn) Databus is a low latency change capture system which has become an integral part of LinkedIn’s data processing pipeline. Databus addresses a fundamental requirement to reliably capture, flow and processes primary data changes. Databus provides the following features: 1. Isolation between sources and consumers 2. Guaranteed in order and at least once delivery with high availability 3. Consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data. 4. Partitioned consumption 5. Source consistency preservation https://github.com/linkedin/databus/wiki Page 11 © Hortonworks Inc. 2012