SlideShare a Scribd company logo
1 of 26
Min Tu Pradhan Cadabam
Gobblin Configuration
Management
Gobblin Meetup June 2016
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
Job Configs Vs. Dataset Configs
Copy Job
- Permission for loginEvent 700
- Permission for logoutEvent 777
Option 1 : One job per dataset
- Too many jobs
- Long whitelist
- Difficult to maintain
Option 2 : Prefix
- Too many configs
- Can not have single config for
all datasets with same
permissions
/events/loginEvent
/events/logoutEvent
/events/loginEvent - 700
/events/logoutEvent - 777
Source Destination
Copy Job 1 Copy Job
2
dest.permission = 700
whitelist = loginEvent
dest.permission = 777
whitelist = logoutEvent
loginEvent.dest.permission = 700
logoutEvent.dest.permission = 777
Copy Job with prefix
Data Life Cycle Management Configs
/events/loginEvent_Avro /events/loginEvent_Orc
/events/loginEvent_Orc Retention Job
Conversion Job
Copy Job
• Shared configs across jobs
• Destination path of conversion job is source path of copy job
• Retention job works on destination path of copy job
• Dataset needs to be enabled in all jobs
/events/loginEvent_Orc
/events/loginEvent_Orc
Retention Job
Retention Job
Other Motivations
• New version of configs should be deployable
without deploying new binaries
• Should be easy to rollback to previous stable
version of configs
• Config changes should have an audit trail
• Complex value types and substitution resolution
support
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
At a very high-level, we extend typesafe config with:
• Abstraction of a Config Store
• Config versioning
• Support for logical “import” URIs
• Ability to traverse the ”import” relationships
Dataset Configuration Management
Architecture
Client Application
ConfigClient API
ConfigStore API
HadoopF
S
Store
Hive
MetaStor
e
Adapter
MySQL
Adapter
Zookeepe
r
Adapter
…
Data Model
Config Store
Dataset config key (URI):
/events/loginEvent
Key1: value1
Key2: value2
…
KeyM: valueM
Dataset config key (URI):
/events
Tag config key(URI):
/tags
imports
Imported by
Tag config key(URI):
/tags/highPriority
keyA: valueX
keyB: valueY
Implicit import Implicit import
HOCON format
• Support Java Properties file
• Support Json file
• Value substitution
• “+=“ syntax to append elements to arrays, path += "/bin”
• …
gobblin.retention : {
selection {
timeBased.lookbackTime=3y
}
}
Using Configs in code
ConfigClient client =
ConfigClient.createConfigClient(VersionStabilityPolicy policy);
Config config = client.getConfig(URI uri);
Collection<URI> imports = client.getImports(URI dataset, boolean recursive);
Collection<URI> importedBy = client.getImportedBy(URI tag, boolean recursive);
Config lifecycle at LinkedIn
Example of a config store on HDFS
ROOT
├── _CONFIG_STORE // contents = latest non-rolled-back version
├── 1.0.53 // version directory
├── events
│ ├── main.conf
│ ├── loginEvent
│ │ └── main.conf // configuration file for /events/loginEvent
│ │ └── includes.conf // specify import links for /events/loginEvent
│ ├── shareEvent
│ │ └── includes.conf
│ └── clickEvent
│ └── includes.conf
│
└── tags
├── highPriority
│ └── main.conf // configuration file for /tags/highPriority
│ └── includes.conf // specify import links for /tags/highPriority
├── blacklist
└── 10Days
1. Current Solutions and Motivation – Why we
built Gobblin config?
2. Architecture – Gobblin config internals
3. Retention Example – How retention is
configured using Gobblin config?
Agenda
Retention
├── events
├── loginEvent
│ ├── 2016-06-20.avro
│ └── 2016-06-25.avro
└── logoutEvent
├── 2016-05-10.avro
└── 2016-06-10.avro
├── events
├── loginEvent
│ └── 2016-06-25.avro
└── logoutEvent
└── 2016-06-10.avro
• Deleting data that is not required
• Most common retention policy is to delete data older than some days
Example
• Retention policy of 10 days for loginEvent
• Retention policy of 30 days for logoutEvent
Before Retention After Retention
More complex use cases in Production
• Default retention policy of 30 days for all events
• Retention policy of 10 days for loginEvent
• Blacklist retention for clickEvent
• 3 years retention for high priority events like shareEvent
● “events” is the common parent block for “shareEvent”, “loginEvent”,
“logoutEvent”, “clickEvent”
● Each block implicitly imports configs from the parent block, “logoutEvent”
implicitly imports “events” (Dashed lines)
● Any block can explicitly import any other block (Solid lines)
● A child block overrides any key value pairs specified in the parent block
Retention Config
● “logoutEvent” inherits the default retention of 30 days from implicit import,
“events”
logoutEvent 30 Days
● “loginEvent” inherits the default retention of 30 days from implicit import,
“events”
● “loginEvent” defines a 10 days policy which overrides the 30 days inherited
from “events”
loginEvent 10 Days
● “shareEvent” explicitly imports a high priority tag which has retention of 3
years
● “clickEvent” explicitly imports blacklist tag which disables retention for
“clickEvent”
Retention Config for share/clickEvent
├── events
│ ├── main.conf // Default 30 Days
│ ├── loginEvent
│ │ └── main.conf // 10 Days
│ ├── shareEvent
│ │ └── includes.conf // Import /tags/highPriority
│ └── clickEvent
│ └── includes.conf // Import /tags/blacklist
│
└── tags
├── highPriority
│ └── main.conf // Define 3 Years retention
└── blacklist
HDFS Config store
Retention Config Examples
/events/main.conf
gobblin.retention : {
dataset : {
finder.class=gobblin.data.management.retention.CleanableDatasetFinder
pattern="/events/*"
}
selection {
policy.class = gobblin.data.management.SelectBeforeTimeBasedSelectionPolicy
timeBased.lookbackTime=30d
}
version : {
finder.class=gobblin.data.management.DateTimeDatasetVersionFinder
}
}
gobblin.retention : {
selection {
timeBased.lookbackTime=3y
}
}
/tags/highPriority/main.conf
Supported Policies
• SelectBeforeTimeBasedSelectionPolicy
• NewestKSelectionPolicy
• DailyDependentHourlyPolicy
• CombineSelectionPolicy
More policies - http://gobblin.readthedocs.io/en/latest/data-management/Gobblin-
Retention/
Future work
• Config stores other than Hdfs based config store
• Improve tooling, validation and UI for config store
deployment
Questions

More Related Content

What's hot

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPHBaseCon
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at PinterestQubole
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkDatabricks
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Sangjin Lee
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
My Favorite PostgreSQL Books
My Favorite PostgreSQL BooksMy Favorite PostgreSQL Books
My Favorite PostgreSQL BooksEDB
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInDatabricks
 

What's hot (20)

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Big Data Platform at Pinterest
Big Data Platform at PinterestBig Data Platform at Pinterest
Big Data Platform at Pinterest
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)Timeline Service v.2 (Hadoop Summit 2016)
Timeline Service v.2 (Hadoop Summit 2016)
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
My Favorite PostgreSQL Books
My Favorite PostgreSQL BooksMy Favorite PostgreSQL Books
My Favorite PostgreSQL Books
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 

Similar to Gobbin config-meetup-june-2016

Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Ville Mattila
 
Building Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And TricksBuilding Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And TricksMike Hugo
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - RundeckNeil McCaughley
 
國民雲端架構 Django + GAE
國民雲端架構 Django + GAE國民雲端架構 Django + GAE
國民雲端架構 Django + GAEWinston Chen
 
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...GITS Indonesia
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with LuigiTeemu Kurppa
 
7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script Task7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script TaskPramod Singla
 
Spring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who CodeSpring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who CodePurnima Kamath
 
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rulesSrijan Technologies
 
Web Standards Support in WebKit
Web Standards Support in WebKitWeb Standards Support in WebKit
Web Standards Support in WebKitJoone Hur
 
Odoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMSOdoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMSElínAnna Jónasdóttir
 
GitPro Whitepaper
GitPro WhitepaperGitPro Whitepaper
GitPro WhitepaperERP Buddies
 
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...Edureka!
 
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...Rudy Jahchan
 
Open Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java DevelopersOpen Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java Developerscboecking
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWebinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWeaveworks
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 

Similar to Gobbin config-meetup-june-2016 (20)

Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
 
Building Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And TricksBuilding Grails Plugins - Tips And Tricks
Building Grails Plugins - Tips And Tricks
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - Rundeck
 
國民雲端架構 Django + GAE
國民雲端架構 Django + GAE國民雲端架構 Django + GAE
國民雲端架構 Django + GAE
 
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
Gits class #22: [ONLINE] Analyze Your User's Activities Using BigQuery and Da...
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script Task7\9 SSIS 2008R2_Training - Script Task
7\9 SSIS 2008R2_Training - Script Task
 
Spring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who CodeSpring Boot Workshop - January w/ Women Who Code
Spring Boot Workshop - January w/ Women Who Code
 
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
[Srijan Wednesday Webinars] Ruling Drupal 8 with #d8rules
 
Web Standards Support in WebKit
Web Standards Support in WebKitWeb Standards Support in WebKit
Web Standards Support in WebKit
 
Odoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMSOdoo Experience 2018 - From a Web Controller to a Full CMS
Odoo Experience 2018 - From a Web Controller to a Full CMS
 
GitPro Whitepaper
GitPro WhitepaperGitPro Whitepaper
GitPro Whitepaper
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
What is Git | What is GitHub | Git Tutorial | GitHub Tutorial | Devops Tutori...
 
Tips & Tricks for Maven Tycho
Tips & Tricks for Maven TychoTips & Tricks for Maven Tycho
Tips & Tricks for Maven Tycho
 
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
iOSDevCamp 2011 - Getting "Test"-y: Test Driven Development & Automated Deplo...
 
Open Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java DevelopersOpen Source ERP Technologies for Java Developers
Open Source ERP Technologies for Java Developers
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOpsWebinar: End to End Security & Operations with Chainguard and Weave GitOps
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
 
Introduction to Django
Introduction to DjangoIntroduction to Django
Introduction to Django
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 

Recently uploaded

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 

Recently uploaded (20)

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Gobbin config-meetup-june-2016

  • 1. Min Tu Pradhan Cadabam Gobblin Configuration Management Gobblin Meetup June 2016
  • 2. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 3. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 4. Job Configs Vs. Dataset Configs Copy Job - Permission for loginEvent 700 - Permission for logoutEvent 777 Option 1 : One job per dataset - Too many jobs - Long whitelist - Difficult to maintain Option 2 : Prefix - Too many configs - Can not have single config for all datasets with same permissions /events/loginEvent /events/logoutEvent /events/loginEvent - 700 /events/logoutEvent - 777 Source Destination Copy Job 1 Copy Job 2 dest.permission = 700 whitelist = loginEvent dest.permission = 777 whitelist = logoutEvent loginEvent.dest.permission = 700 logoutEvent.dest.permission = 777 Copy Job with prefix
  • 5. Data Life Cycle Management Configs /events/loginEvent_Avro /events/loginEvent_Orc /events/loginEvent_Orc Retention Job Conversion Job Copy Job • Shared configs across jobs • Destination path of conversion job is source path of copy job • Retention job works on destination path of copy job • Dataset needs to be enabled in all jobs /events/loginEvent_Orc /events/loginEvent_Orc Retention Job Retention Job
  • 6. Other Motivations • New version of configs should be deployable without deploying new binaries • Should be easy to rollback to previous stable version of configs • Config changes should have an audit trail • Complex value types and substitution resolution support
  • 7. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 8. At a very high-level, we extend typesafe config with: • Abstraction of a Config Store • Config versioning • Support for logical “import” URIs • Ability to traverse the ”import” relationships Dataset Configuration Management
  • 9. Architecture Client Application ConfigClient API ConfigStore API HadoopF S Store Hive MetaStor e Adapter MySQL Adapter Zookeepe r Adapter …
  • 10. Data Model Config Store Dataset config key (URI): /events/loginEvent Key1: value1 Key2: value2 … KeyM: valueM Dataset config key (URI): /events Tag config key(URI): /tags imports Imported by Tag config key(URI): /tags/highPriority keyA: valueX keyB: valueY Implicit import Implicit import
  • 11. HOCON format • Support Java Properties file • Support Json file • Value substitution • “+=“ syntax to append elements to arrays, path += "/bin” • … gobblin.retention : { selection { timeBased.lookbackTime=3y } }
  • 12. Using Configs in code ConfigClient client = ConfigClient.createConfigClient(VersionStabilityPolicy policy); Config config = client.getConfig(URI uri); Collection<URI> imports = client.getImports(URI dataset, boolean recursive); Collection<URI> importedBy = client.getImportedBy(URI tag, boolean recursive);
  • 14. Example of a config store on HDFS ROOT ├── _CONFIG_STORE // contents = latest non-rolled-back version ├── 1.0.53 // version directory ├── events │ ├── main.conf │ ├── loginEvent │ │ └── main.conf // configuration file for /events/loginEvent │ │ └── includes.conf // specify import links for /events/loginEvent │ ├── shareEvent │ │ └── includes.conf │ └── clickEvent │ └── includes.conf │ └── tags ├── highPriority │ └── main.conf // configuration file for /tags/highPriority │ └── includes.conf // specify import links for /tags/highPriority ├── blacklist └── 10Days
  • 15. 1. Current Solutions and Motivation – Why we built Gobblin config? 2. Architecture – Gobblin config internals 3. Retention Example – How retention is configured using Gobblin config? Agenda
  • 16. Retention ├── events ├── loginEvent │ ├── 2016-06-20.avro │ └── 2016-06-25.avro └── logoutEvent ├── 2016-05-10.avro └── 2016-06-10.avro ├── events ├── loginEvent │ └── 2016-06-25.avro └── logoutEvent └── 2016-06-10.avro • Deleting data that is not required • Most common retention policy is to delete data older than some days Example • Retention policy of 10 days for loginEvent • Retention policy of 30 days for logoutEvent Before Retention After Retention
  • 17. More complex use cases in Production • Default retention policy of 30 days for all events • Retention policy of 10 days for loginEvent • Blacklist retention for clickEvent • 3 years retention for high priority events like shareEvent
  • 18. ● “events” is the common parent block for “shareEvent”, “loginEvent”, “logoutEvent”, “clickEvent” ● Each block implicitly imports configs from the parent block, “logoutEvent” implicitly imports “events” (Dashed lines) ● Any block can explicitly import any other block (Solid lines) ● A child block overrides any key value pairs specified in the parent block Retention Config
  • 19. ● “logoutEvent” inherits the default retention of 30 days from implicit import, “events” logoutEvent 30 Days
  • 20. ● “loginEvent” inherits the default retention of 30 days from implicit import, “events” ● “loginEvent” defines a 10 days policy which overrides the 30 days inherited from “events” loginEvent 10 Days
  • 21. ● “shareEvent” explicitly imports a high priority tag which has retention of 3 years ● “clickEvent” explicitly imports blacklist tag which disables retention for “clickEvent” Retention Config for share/clickEvent
  • 22. ├── events │ ├── main.conf // Default 30 Days │ ├── loginEvent │ │ └── main.conf // 10 Days │ ├── shareEvent │ │ └── includes.conf // Import /tags/highPriority │ └── clickEvent │ └── includes.conf // Import /tags/blacklist │ └── tags ├── highPriority │ └── main.conf // Define 3 Years retention └── blacklist HDFS Config store
  • 23. Retention Config Examples /events/main.conf gobblin.retention : { dataset : { finder.class=gobblin.data.management.retention.CleanableDatasetFinder pattern="/events/*" } selection { policy.class = gobblin.data.management.SelectBeforeTimeBasedSelectionPolicy timeBased.lookbackTime=30d } version : { finder.class=gobblin.data.management.DateTimeDatasetVersionFinder } } gobblin.retention : { selection { timeBased.lookbackTime=3y } } /tags/highPriority/main.conf
  • 24. Supported Policies • SelectBeforeTimeBasedSelectionPolicy • NewestKSelectionPolicy • DailyDependentHourlyPolicy • CombineSelectionPolicy More policies - http://gobblin.readthedocs.io/en/latest/data-management/Gobblin- Retention/
  • 25. Future work • Config stores other than Hdfs based config store • Improve tooling, validation and UI for config store deployment

Editor's Notes

  1. Config versioning – For stable config store once the a version has been deployed it should not be changed
  2. - two logic types: dataset, tags - both in tree hierarchy  - inherent/override parent/imported tags
  3. Fix font Add hocon example
  4. CROSS_JVM_STABILITY STRONG_LOCAL_STABILITY WEAK_LOCAL_STABILITY READ_FRESHEST Handle case: config updated while version released ( configStore is NOT stable) Calling getConfig multiple times , get same value?
  5. Each directory is one config node (dataset, tags )