SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Data Management Platform
on Hadoop
Srikanth Sundarrajan
Venkatesh Seetharam
(Incubating)
whoami
Principal Architect
InMobi
Apache Hadoop
Contributor
Hadoop Team
@Yahoo!
Srikanth
Sundarrajan
Architect/Developer
Hortonworks
Apache Hadoop
Contributor
Data Management
@ Yahoo!
Venkatesh
Seetharam
Agenda
2 Falcon Overview
1 Motivation
3 Case Studies
4 Questions & Answers
MOTIVATION
Data Processing Landscape
External
data
source
Acquire
(Import)
Data Processing
(Transform/Pipeline
)
Eviction Archive
Replicate
(Copy)
Export
Core Services
Process
• Late data management
• Relays
Data
management
• Acquisition
• Replication
• Retention
Operability
• SLA
• Lineage
Process Management – Relays
picture courtersy: http://istockphoto.com/
Late Data Management
picture courtersy: http://iwebask.com
Data Retention As Service
picture courtersy: http://vimeo.com/
Data Replication As Service
picture courtersy: http://boylesmedia.com
Data Acquisition As Service
picture courtersy: http://wmpu.org
Operability – Dashboard
picture courtersy: http://www.opentrack.ch/
FALCON OVERVIEW
Holistic Declaration of Intent
picture courtersy: http://bigboxdetox.com
Entity Dependency Graph
Hadoop /
Hbase …
Cluster
External
data
source
feed Process
depends
depends
High Level Architecture
Apache
Falcon
Oozie
Messaging
HCatalog
Hadoop
Entity
Entity
status
Process
status /
notification
CLI/RES
T
JMS
Config
store
Feed Schedule
Cluster
xml
Feed xml Falcon
Falcon config
store / Graph
Retention /
Replication
workflow
Oozie
Scheduler HDFS
JMS Notification
per action
Catalog
service
Instance
Management
Process Schedule
Cluster/fe
ed xml
Process
xml
Falcon
Falcon config
store / Graph
Process
workflow
Oozie
Scheduler HDFS
JMS Notification
per available
feed
Catalog
service
Instance
Management
Physical Architecture
Falcon Colo 1
Falcon Colo 2
Falcon Colo 3
Scheduler
Scheduler
Scheduler
Falcon – Prism
Global view
CASE STUDY
Multi Cluster Failover
CASE STUDY
Distributed Processing
Example: Digital Advertising @ InMobi
Hadoop @ InMobi
 About InMobi
 Worlds leading independent mobile advertising company
 Hadoop usage at InMobi
 ~ 6 Clusters
 > 1PB of storage
 > 5TB new data ingested each day
 > 20TB data crunched each day
 > 200 nodes in HDFS/MR clusters & > 40 nodes in Hbase
 > 175K hadoop jobs / day
 > 60K Oozie workflows / day
 300+ Falcon feed definitions
 100+ Falcon process definitions
Processing – Single Data Center
Ad Request
data
Impression
render event
Click event
Conversion
event
Continuou
s
Streaming
(minutely)
Hourly
summary
Enrichment
(minutely/5
minutely)
Summarizer
Global Aggregation
Ad Request data
Impression render
event
Click event
Conversion event
Continuo
us
Streamin
g
(minutely
)
Hourly
summar
y
Enrichment
(minutely/5
minutely) Summarizer
Ad Request data
Impression render
event
Click event
Conversion event
Continuo
us
Streamin
g
(minutely
)
Hourly
summar
y
Enrichment
(minutely/5
minutely) Summarizer
……..
DataCenter1
DataCenterN
Consumable
global aggregate
HIGHLIGHTS
Future
Security
Embed Pig/Hive scripts
Data Acquisition – file-based
Monitoring/Management
Dashboard
Summary
Questions?
 Apache Falcon
 http://falcon.incubator.apache.org
 mailto: dev@falcon.incubator.apache.org
 Srikanth Sundarrajan
 sriksun@apache.org
 #sriksun
 Venkatesh Seetharam
 venkatesh@apache.org
 #innerzeal

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with Oracle APEX
Getting Started with Oracle APEXGetting Started with Oracle APEX
Getting Started with Oracle APEXDataNext Solutions
 
Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASFHortonworks
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my dataAndrejs Vorobjovs
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3DataWorks Summit
 
Obiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementObiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementStewart Bryson
 
SharePoint Performance - Best Practices from the Field
SharePoint Performance - Best Practices from the Field SharePoint Performance - Best Practices from the Field
SharePoint Performance - Best Practices from the Field Jason Himmelstein
 
Pretius Oracle Apex Primer
Pretius Oracle Apex PrimerPretius Oracle Apex Primer
Pretius Oracle Apex PrimerPretius
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Kim Hammar
 

Was ist angesagt? (17)

Getting Started with Oracle APEX
Getting Started with Oracle APEXGetting Started with Oracle APEX
Getting Started with Oracle APEX
 
Getting involved with Open Source at the ASF
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASF
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Peteris Arajs - Where is my data
Peteris Arajs - Where is my dataPeteris Arajs - Where is my data
Peteris Arajs - Where is my data
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
 
Obiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle ManagementObiee 12C and the Leap Forward in Lifecycle Management
Obiee 12C and the Leap Forward in Lifecycle Management
 
SharePoint Performance - Best Practices from the Field
SharePoint Performance - Best Practices from the Field SharePoint Performance - Best Practices from the Field
SharePoint Performance - Best Practices from the Field
 
Pretius Oracle Apex Primer
Pretius Oracle Apex PrimerPretius Oracle Apex Primer
Pretius Oracle Apex Primer
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
Spark ai summit_oct_17_2019_kimhammar_jimdowling_v6
 

Andere mochten auch

DMP Data Management Platform
DMP Data Management PlatformDMP Data Management Platform
DMP Data Management PlatformAvinash Tiwary
 
The Data Management Platform: The Digital Brain You Wish You Had by Audrey R...
The Data Management Platform: The Digital Brain You Wish You Had by  Audrey R...The Data Management Platform: The Digital Brain You Wish You Had by  Audrey R...
The Data Management Platform: The Digital Brain You Wish You Had by Audrey R...FOUNDConference
 
The DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms ExplainedThe DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms ExplainedEddy Widerker
 
MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.Data Services, Inc.
 
What Is a Data Management Platform and Why You Should Care?
What Is a Data Management Platform and Why You Should Care?What Is a Data Management Platform and Why You Should Care?
What Is a Data Management Platform and Why You Should Care?IgnitionOne
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementDataWorks Summit
 
Hoe verhoog je de impact van sponsoring?
Hoe verhoog je de impact van sponsoring?Hoe verhoog je de impact van sponsoring?
Hoe verhoog je de impact van sponsoring?MEDIALAAN RESEARCH
 
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...Antoine Gay
 
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...Windows Developer
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
 
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS PlatformALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS PlatformSeonho Kim
 
Connected Government Reference Architecture - WSO2Con 2014 USA
Connected Government Reference Architecture - WSO2Con 2014 USAConnected Government Reference Architecture - WSO2Con 2014 USA
Connected Government Reference Architecture - WSO2Con 2014 USASelvaratnam Uthaiyashankar
 
Dmp essential
Dmp essentialDmp essential
Dmp essentialBe2See.
 
WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014
WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014
WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014Selvaratnam Uthaiyashankar
 
Digital in-store - Reality vs Fantasy
Digital in-store - Reality vs FantasyDigital in-store - Reality vs Fantasy
Digital in-store - Reality vs FantasyRaymond Interactive
 
The SmartH2O project: a platform supporting residential water management thro...
The SmartH2O project: a platform supporting residential water management thro...The SmartH2O project: a platform supporting residential water management thro...
The SmartH2O project: a platform supporting residential water management thro...SmartH2O
 
Bluekai: Data Management Platforms (dmp) for Publishers
Bluekai: Data Management Platforms (dmp) for PublishersBluekai: Data Management Platforms (dmp) for Publishers
Bluekai: Data Management Platforms (dmp) for PublishersBrian Crotty
 
EA Intensive Course "Building Enterprise Architecture" by mr.danairat
EA Intensive Course "Building Enterprise Architecture" by mr.danairatEA Intensive Course "Building Enterprise Architecture" by mr.danairat
EA Intensive Course "Building Enterprise Architecture" by mr.danairatSoftware Park Thailand
 

Andere mochten auch (20)

DMP Data Management Platform
DMP Data Management PlatformDMP Data Management Platform
DMP Data Management Platform
 
The DMP
The DMPThe DMP
The DMP
 
The Data Management Platform: The Digital Brain You Wish You Had by Audrey R...
The Data Management Platform: The Digital Brain You Wish You Had by  Audrey R...The Data Management Platform: The Digital Brain You Wish You Had by  Audrey R...
The Data Management Platform: The Digital Brain You Wish You Had by Audrey R...
 
The DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms ExplainedThe DMP 101 - Data Management Platforms Explained
The DMP 101 - Data Management Platforms Explained
 
MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.MarketView Marketing Database Platform | Data Services, Inc.
MarketView Marketing Database Platform | Data Services, Inc.
 
What Is a Data Management Platform and Why You Should Care?
What Is a Data Management Platform and Why You Should Care?What Is a Data Management Platform and Why You Should Care?
What Is a Data Management Platform and Why You Should Care?
 
Using Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data ManagementUsing Hadoop as a platform for Master Data Management
Using Hadoop as a platform for Master Data Management
 
Hoe verhoog je de impact van sponsoring?
Hoe verhoog je de impact van sponsoring?Hoe verhoog je de impact van sponsoring?
Hoe verhoog je de impact van sponsoring?
 
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
Atelier "Comment Epater votre direction avec votre projet DMP" avec TagComman...
 
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
Build2016 - P470 - Using Non-volatile Memory (NVDIMM-N) as Byte-Addressable S...
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...The Dendro research data management platform: Applying ontologies to long-ter...
The Dendro research data management platform: Applying ontologies to long-ter...
 
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS PlatformALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
ALLDATA 2015 - RDF Based Linked Data Management as a DaaS Platform
 
Connected Government Reference Architecture - WSO2Con 2014 USA
Connected Government Reference Architecture - WSO2Con 2014 USAConnected Government Reference Architecture - WSO2Con 2014 USA
Connected Government Reference Architecture - WSO2Con 2014 USA
 
Dmp essential
Dmp essentialDmp essential
Dmp essential
 
WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014
WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014
WSO2 Platform Overview - WSO2 Meetup 01 - 16th Oct 2014
 
Digital in-store - Reality vs Fantasy
Digital in-store - Reality vs FantasyDigital in-store - Reality vs Fantasy
Digital in-store - Reality vs Fantasy
 
The SmartH2O project: a platform supporting residential water management thro...
The SmartH2O project: a platform supporting residential water management thro...The SmartH2O project: a platform supporting residential water management thro...
The SmartH2O project: a platform supporting residential water management thro...
 
Bluekai: Data Management Platforms (dmp) for Publishers
Bluekai: Data Management Platforms (dmp) for PublishersBluekai: Data Management Platforms (dmp) for Publishers
Bluekai: Data Management Platforms (dmp) for Publishers
 
EA Intensive Course "Building Enterprise Architecture" by mr.danairat
EA Intensive Course "Building Enterprise Architecture" by mr.danairatEA Intensive Course "Building Enterprise Architecture" by mr.danairat
EA Intensive Course "Building Enterprise Architecture" by mr.danairat
 

Ähnlich wie Falcon - Data Management Platform on Hadoop (Beyond ETL)

Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Seetharam Venkatesh
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAlluxio, Inc.
 
haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...
haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...
haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...Isabelle Van Campenhoudt
 
Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...
Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...
Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...serge luca
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...
Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...
Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...serge luca
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationMichael Rainey
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Actian Corporation
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner
 
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio, Inc.
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and BeyondDataWorks Summit
 

Ähnlich wie Falcon - Data Management Platform on Hadoop (Beyond ETL) (20)

Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...
haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...
haute Disponibilité et reprise sur incident dans SharePoint avec groupes de d...
 
Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...
Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...
Haute Disponibilité et Reprise sur incidents en SharePoint 2013 avec Sql Serv...
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...
Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...
Unbreakable SharePoint 2013 with SQL Server Always On Availability Groups (HA...
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
 
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Falcon - Data Management Platform on Hadoop (Beyond ETL)

Hinweis der Redaktion

  1. In a typical big data environment involving Hadoop, the use cases tend to be around processing very large volumes of data either for machine or human consumption. Some of the data that gets to the hadoop platform can contain critical business & financial information. The data processing team in such an environment is often distracted by the multitude of data management and process orchestration challenges. To name a fewIngesting large volumes of events/streamsIngesting slowly changing data typically available on a traditional databaseCreating a pipeline / sequence of processing logic to extract the desired piece of insight / informationHandling processing complexities relating to change of data / failuresManaging eviction of older data elementsBackup the data in an alternate location or archive it in a cheaper storage for DR/BCP & Compliance requirementsShip data out of the hadoop environment periodically for machine or human consumption etcThese tend to be standard challenges that are better handled in a platform and this might allow the data processing team to focus on their core business application. A platform approach to this also allows us to adopt best practices in solving each of these for subsequent users of the platform to leverage.========================What do we mean by DMPlatform should provide these as services to users so users worry about business processingCaptures common themes and follows best practicesFrees users from such
  2. As we just noted that there are numerous data and process management services when made available to the data processing team, can reduce their day-to-day complexities significantly and allow them to focus on their business application. This is an enumeration of such services, which we intend to cover in adequate detail as we go along.
  3. More often than not pipelines are sequence of data processing or data movement tasks that need to happen before raw data can be transformed into a meaningfully consumable form. Normally the end stage of the pipeline where the final sets of data are produced is in the critical path and may be subject to tight SLA bounds. Any step in the sequence/pipeline if either delayed or failed could cause the pipeline to stall. It is important that each step in the pipeline handoff to the next step to avoid any buffering of time and to allow seamless progression of the pipeline. People who are familiar with Apache Oozie might be able to appreciate this feature provided through the Coordinator.As the pipelines gets more and more time critical and time sensitive, this becomes very very critical and this ought to be available off the shelf for application developers. It is also important for this feature to scalable to support the needs of concurrent pipelines.
  4. From our experience there are typically two reasons why large volumes of data are processed, namelySLA critical machine consumable data (with some tolerance to error)Factual reporting with a “Close of Books” notion for human consumable (not always but frequently enough)While the first class of application doesn’t get affected much if some small percentage of data arrives late. Some examples of these class of applications include forecasting, predictions, risk management etc.However the second class of application are used for factual reporting, results of which may be subject to audit. For these use cases, it is not acceptable to ignore data that arrived out of order or late. The platform in such cases need to provide an option to the application author the ability to detect arrival of late data and enable re-processing. This might also require a cascading reprocess flow of all downstream apps. This service being available off the shelf to the application developer would relieve him/her of the pain of having to manage this themselves.
  5. A fact that data volumes are large and increasing by the day is the reason one adopts a big data platform like Hadoop and that would automatically mean that we would run of space pretty soon, if we didn’t take care of evicting & purging older instances of data. Few problems to consider for retention areShould avoid using a general purpose super user with world writable privileges to delete old data (for obvious reasons)Different types of data may require different criteria for aging and hence purgingOther life cycle functions like Archival of old data if defined ought to be scheduled before eviction kicks in
  6. Hadoop is being increasingly critical for many businesses and for some users the raw data volumes are too large for them to be shipped to one place for processing, for others data needs to be redundantly available for business continuity reasons. In either scenarios replication of data from one cluster to another plays a vital role. This being available as a service would again free up the cycles from the application developer of these responsibilities. The key challenges to consider while offering this as a service areBandwidth consumption and managementChunking/bulking strategyCorrectness guaranteesHDFS version compatibility issues =========================2 Dimensions:BCP/DRLocal/Global Agg – ship local aggs as part of a pipeline
  7. Integrated view of what is happening currently in the system based on the holistic information about all the elements in the system (data, associated management functions, processing logic and the location) provide for a compelling view of the “State of the system” at any time. This is a much needed platform feature for the larger goal of “allowing data application developer to focus on the business or processing logic”.Adding alerting & notifications to this will complete the operability story.===============================DashboardAlertsNotifications
  8. Some of the things we have spoken about so far can be done if we took a silo-ed approach. For instance it is possible to process few data sets and produce a few more through a scheduler. However if there are two other consumers of the data produced by the first workflow then the same will be repeatedly defined by the other two consumers and so on. There is a serious duplication of metadata information of what data is ingested, processed or produced and where they are processed and how they are produced. A single system which creates a complete view of this would be able to provide a fairly complete picture of what is happening in the system compared to collection to independent scheduled applications. Both the production support and application development team on Hadoop platform have to scramble and write custom script and monitoring system to get a broader and holistic view of what is happening. An approach where this information is systemically collected and used for seamless management can alleviate much of the pains of folks operating or developing data processing application on hadoop.
  9. The entity graph at the core is what makes Falcon what it is and that in a way enables all the unique features that Falcon has to offer or can potentially make available in future. At the coreDependency between Data Processing logic andCluster end pointsRules governing Data managementProcessing managementMetadata management
  10. System accepts entities using DSLInfrastructure, Datasets, Pipeline/Processing logicTransforms the input into automated and scheduled workflowsSystem orchestrates workflowsInstruments execution of configured policiesHandles retry logic and late data processingRecords audit, lineage Seamless integration with metastore/catalog (WIP)Provides notifications based on availabilityIntegrated Seamless experience to usersAutomates processing and tracks the end to end progressData Set management (Replication, Retention, etc.) offered as a serviceUsers can cherry pick, No coupling between primitivesProvides hooks for monitoring, metrics collection
  11. Ad Request, Click, Impression, Conversion feedMinutely (with identical location, retention configuration, but with many data centers)Summary dataHourly (with multiple partitions – one per dc, each configured as source and one target which is global datacenter)Click, Impression Conversion enrichment & Summarizer processesSingle definition with multiple data centersIdentical periodicity and scheduling configuration