SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Treasure Data Hands-On: Managing Slowly
Changing Dimensions Using TD Workflow
Agenda
● Introduction
● Treasure Data Workflow
● Overview of Slowly Changing Dimensions
● Window Functions
● Handling Type 2 SCDs using Treasure Data
Introduction
• Scott Mitchell
• Senior Solution Engineer
• Work with Enterprise clients to
maximize the activation of the
client data
• smitchell@treasure-data.com
Introduction
Treasure Data is a Customer Data Platform
“Customer Data Platform (CDP) is a marketer-based management system
that creates a persistent, unified customer database that is accessible to
other systems. Data is pulled from multiple sources, cleaned, and combined
to create a single customer view. This structured data is then made available
to other marketing systems. CDP provides real-time segmentation for
sophisticated personalized marketing.”
https://en.wikipedia.org/wiki/Customer_Data_Platform
Our Customer Data Platform: Foundation
Data Management
1st party data
(Your data)
● Web
● Mobile
● Apps
● CRMs
● Offline
2nd & 3rd party DMPs
(enrichment)
Tool Integration
● Campaigns
● Advertising
● Social media
● Reporting
● BI & data
science
ID Unification
Persistent Storage
Workflow Orchestration
ActivationAll Your Data
Segmentation
Profiles Segments
Measurement
Treasure Data Workflow
DATA ORCHESTRATION AND WORKFLOW MANAGEMENT
•Workflow management across data input, processing and output
•Supports both scheduled & trigger-based execution
•Cloud-based and Client-hosted. Client-hosted version can run custom code.
•Cloud-based version has both web UI & REST API
The core engine is built on our open source project
Digdag
Treasure Workflow allow users to build repeatable data processing pipelines that consist of
Treasure Data jobs.
Overview
Why use Treasure Workflow?
1. Enhanced Organization
• Organize your processing workflows into groups of similarly-purposed tasks
2. Reduce Errors
• No longer must manage dependencies by scheduled-time alone
3. Ease Error Handling
• Split large scripts & queries into smaller, more manageable, jobs
4. Improve Collaboration
• Organize your job flows into projects
Benefits
WORKFLOW DEFINITION: CLOSER LOOK
timezone: Asia/Tokyo
schedule:
daily>: 07:00:00
_export:
td:
database: nishi
+load:
td_load>: import/s3_load.yml
database: nishi
table: monthly_goods_sales
+daily:
td>: queries/daily_open.sql
create_table: daily_open
+monthly:
td>: queries/monthly_open.sql
result_connection: nishi_s3
result_settings:
bucket: nishitetsu-test
path: /monthly_open.csv
•File extension should be “.dig” ‘to be
recognized as workflow
•Standard YAML
•Task names are prefixed by “+”
•Operators are postfixed by “>”
•Schedules can be set with schedule
•Variables are supported via ${variable_name}
REPRESENTATIVE OPERATORS
Category Name Description
Control Flow
call>: Call another workflow
loop>: Repeat tasks a specified # of times
for_each>: Loop through a specified list
if>: if/else control flow
Treasure Data
td>: Run a specified TD query
td_run>: Run a saved query
td_ddl>: Create, delete, rename, truncate tables
td_load>: Invoke an input data transfer
td_for_each>: Loop through a query result row by row
AWS
s3_wait>: Wait for new files in S3 & download
redshift>: Run Redshift query
redshift_load>: Load data into Redshift
redshift_unload>: Unload data from Redshift
Google Cloud Platform
bq>: Run BigQuery query
bq_extract>: Unload data from BigQuery to GCS
Slowly Changing
Dimensions
Slowly Changing Dimensions
• Particular dimensions within a dataset that are prone to change
unpredictably
• Example: the phone number or email field of a CRM dataset
• Data available from a CRM usually represents the current, up-to-date value
of each field for each customer
• Storing a history this customer data requires managing these slowly
changing dimensions (SCDs)
Different Ways to Handle SCDs
• Type 1
• Type 2
• Type 3
• Type 4
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
Type 1: Overwrite the field
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 1:
company_id company_name company_state
123 Sterling Cooper California
Type 2: Keep both records, flag the “current” row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 2:
company_id company_name company_state is_current
123 Sterling Cooper New York 0
123 Sterling Cooper California 1
Type 3: Store the latest two values in one row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 3:
company_id company_name company_state_current company_state_previous
123 Sterling Cooper California New York
Type 4: Use a separate history table
SCD Type 4:
company_id company_name company_state
123 Sterling Cooper California
company
company_id company_name company_state last_modified_date
123 Sterling Cooper New York 2007-06-19
123 Sterling Cooper California 2008-10-12
company_history
Window Functions
Type 2: Keep both records, flag the “current” row
company_id company_name company_state
123 Sterling Cooper New York
Old Record:
New Record:
company_id company_name company_state
123 Sterling Cooper California
SCD Type 2:
company_id company_name company_state is_current
123 Sterling Cooper New York 0
123 Sterling Cooper California 1
Type 2: Keep both records, flag the “current” row
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
Old Record:
New Record:
SCD Type 2:
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
Type 2: Keep both records, flag the “current” row
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
Old Record:
New Record:
SCD Type 2:
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
Window Functions
• Window functions perform calculations across rows of the query result
• They run after the ‘HAVING’ clause but before the ‘ORDER BY’ clause
• They are written in the ‘SELECT’ clause and display results in their own
column
• They have three parts:
Window Functions
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC)
ordering specificationfunction partition specification
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
123 Sterling Cooper California 2008-10-12
company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 2
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 2
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 2
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 2
123 Sterling Cooper New York 2007-06-19 1
124 CGC Connecticut 2018-05-22 2
124 CGC New York 2010-08-22 1
Window Functions
SELECT
company_id,
company_name,
company_state,
rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
Window Functions
SELECT
company_id,
company_name,
company_state,
CASE WHEN rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) = 1 THEN 1 ELSE 0 AS END as isCurrent
FROM company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
Implementation in Treasure Data
1. Load incremental data from a data source to a staging table
1. Drop the target table that contains outdated SCD information
1. Window over the staging table, rebuilding the target table with the latest
SCD information
Implementation in Treasure Data
1. Load incremental data from a data source to a staging table
1. Drop the target table that contains outdated SCD information
1. Window over the staging table, rebuilding the target table with the latest
SCD information
Implementation in Treasure Data
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
target_company
Implementation in Treasure Data
company_id company_name company_state lastmodifieddate
123 Sterling Cooper New York 2007-06-19
124 CGC New York 2010-08-22
123 Sterling Cooper California 2008-10-12
124 CGC Connecticut 2018-05-22
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper California 2008-10-12 1
123 Sterling Cooper New York 2007-06-19 0
124 CGC Connecticut 2018-05-22 1
124 CGC New York 2010-08-22 0
target_company
Thank You
And
Questions
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
1. Store a temp table of the current rows that will not be current after the new data is
ingested
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 1
124 CGC New York 2010-08-22 1
target_company
1. Store a temp table of the current rows that will not be current after the new data is
ingested
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
target_company
2. Delete from the data lake any current rows that have a matching id in the new data
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
target_company
3. Insert the temp rows into the target table
company_id company_name company_state lastmodifieddate is_current
123 Sterling Cooper New York 2007-06-19 0
tmp_no_longer_current
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
3. Insert the temp rows into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
3. Insert the temp rows into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
123 Sterling Cooper California 2008-10-12
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
target_company
4. Insert the new data into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
target_company
4. Insert the new data into the target table
SCD Type 2 Workflow with Persistent Architecture
company_id company_name company_state lastmodifieddate
staging_company
company_id company_name company_state lastmodifieddate is_current
124 CGC New York 2010-08-22 1
123 Sterling Cooper New York 2007-06-19 0
123 Sterling Cooper California 2008-10-12 1
target_company
4. Insert the new data into the target table
Contact Information
• Scott Mitchell
• Senior Solution Engineer
• smitchell@treasure-data.com

Weitere ähnliche Inhalte

Was ist angesagt?

9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptxTarekHamdi8
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaVaibhav Khanna
 
(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdfMobeenMasoudi
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
DITA Quick Start for Authors Part II
DITA Quick Start for Authors Part IIDITA Quick Start for Authors Part II
DITA Quick Start for Authors Part IISuite Solutions
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)James Serra
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional ModellingVincent Rainardi
 
Table partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + RailsTable partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + RailsAgnieszka Figiel
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileDaniel Upton
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension Sunita Sahu
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Designsarakirsten
 

Was ist angesagt? (20)

9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
data-mesh-101.pptx
data-mesh-101.pptxdata-mesh-101.pptx
data-mesh-101.pptx
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
DITA Quick Start for Authors Part II
DITA Quick Start for Authors Part IIDITA Quick Start for Authors Part II
DITA Quick Start for Authors Part II
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Mdm
MdmMdm
Mdm
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
Table partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + RailsTable partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + Rails
 
Data Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes AgileData Vault: Data Warehouse Design Goes Agile
Data Vault: Data Warehouse Design Goes Agile
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
 

Ähnlich wie Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

SetFocus SQL Portfolio
SetFocus SQL PortfolioSetFocus SQL Portfolio
SetFocus SQL Portfoliogeometro17
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus workKevinPSF
 
Datawarehousing with MySQL
Datawarehousing with MySQLDatawarehousing with MySQL
Datawarehousing with MySQLHarshit Parekh
 
Pierre Xavier Portfolio
Pierre Xavier PortfolioPierre Xavier Portfolio
Pierre Xavier Portfoliopbxavier
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSDmitry Anoshin
 
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfalokindustries1
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cRachelBarker26
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development Open Party
 
Elshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementElshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementAhmed Elshayeb
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniqueslucenerevolution
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudIke Ellis
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Sergii Khomenko
 
1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptxMullaMainuddin
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationJerod Johnson
 

Ähnlich wie Hands-On: Managing Slowly Changing Dimensions Using TD Workflow (20)

SetFocus SQL Portfolio
SetFocus SQL PortfolioSetFocus SQL Portfolio
SetFocus SQL Portfolio
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Porfolio of Setfocus work
Porfolio of Setfocus workPorfolio of Setfocus work
Porfolio of Setfocus work
 
Datawarehousing with MySQL
Datawarehousing with MySQLDatawarehousing with MySQL
Datawarehousing with MySQL
 
Pierre Xavier Portfolio
Pierre Xavier PortfolioPierre Xavier Portfolio
Pierre Xavier Portfolio
 
AWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWSAWS User Group: Building Cloud Analytics Solution with AWS
AWS User Group: Building Cloud Analytics Solution with AWS
 
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdfScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
ScenarioXYZ Corp. is a parent corporation with 2 handbag stores l.pdf
 
SQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19cSQL Performance Tuning and New Features in Oracle 19c
SQL Performance Tuning and New Features in Oracle 19c
 
Evolutionary db development
Evolutionary db development Evolutionary db development
Evolutionary db development
 
Sql Portfolio
Sql PortfolioSql Portfolio
Sql Portfolio
 
Df12 Performance Tuning
Df12 Performance TuningDf12 Performance Tuning
Df12 Performance Tuning
 
Elshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order ManagementElshayeb Oracle R12 Order Management
Elshayeb Oracle R12 Order Management
 
Advanced Relevancy Ranking
Advanced Relevancy RankingAdvanced Relevancy Ranking
Advanced Relevancy Ranking
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
 
1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx1585625790_SQL-SESSION1.pptx
1585625790_SQL-SESSION1.pptx
 
Why Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API IntegrationWhy Standards-Based Drivers Offer Better API Integration
Why Standards-Based Drivers Offer Better API Integration
 

Mehr von Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerTreasure Data, Inc.
 

Mehr von Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 

Kürzlich hochgeladen

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Kürzlich hochgeladen (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Hands-On: Managing Slowly Changing Dimensions Using TD Workflow

  • 1. Treasure Data Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
  • 2. Agenda ● Introduction ● Treasure Data Workflow ● Overview of Slowly Changing Dimensions ● Window Functions ● Handling Type 2 SCDs using Treasure Data
  • 3. Introduction • Scott Mitchell • Senior Solution Engineer • Work with Enterprise clients to maximize the activation of the client data • smitchell@treasure-data.com
  • 4. Introduction Treasure Data is a Customer Data Platform “Customer Data Platform (CDP) is a marketer-based management system that creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned, and combined to create a single customer view. This structured data is then made available to other marketing systems. CDP provides real-time segmentation for sophisticated personalized marketing.” https://en.wikipedia.org/wiki/Customer_Data_Platform
  • 5. Our Customer Data Platform: Foundation Data Management 1st party data (Your data) ● Web ● Mobile ● Apps ● CRMs ● Offline 2nd & 3rd party DMPs (enrichment) Tool Integration ● Campaigns ● Advertising ● Social media ● Reporting ● BI & data science ID Unification Persistent Storage Workflow Orchestration ActivationAll Your Data Segmentation Profiles Segments Measurement
  • 7. DATA ORCHESTRATION AND WORKFLOW MANAGEMENT •Workflow management across data input, processing and output •Supports both scheduled & trigger-based execution •Cloud-based and Client-hosted. Client-hosted version can run custom code. •Cloud-based version has both web UI & REST API The core engine is built on our open source project Digdag
  • 8. Treasure Workflow allow users to build repeatable data processing pipelines that consist of Treasure Data jobs. Overview
  • 9. Why use Treasure Workflow? 1. Enhanced Organization • Organize your processing workflows into groups of similarly-purposed tasks 2. Reduce Errors • No longer must manage dependencies by scheduled-time alone 3. Ease Error Handling • Split large scripts & queries into smaller, more manageable, jobs 4. Improve Collaboration • Organize your job flows into projects Benefits
  • 10. WORKFLOW DEFINITION: CLOSER LOOK timezone: Asia/Tokyo schedule: daily>: 07:00:00 _export: td: database: nishi +load: td_load>: import/s3_load.yml database: nishi table: monthly_goods_sales +daily: td>: queries/daily_open.sql create_table: daily_open +monthly: td>: queries/monthly_open.sql result_connection: nishi_s3 result_settings: bucket: nishitetsu-test path: /monthly_open.csv •File extension should be “.dig” ‘to be recognized as workflow •Standard YAML •Task names are prefixed by “+” •Operators are postfixed by “>” •Schedules can be set with schedule •Variables are supported via ${variable_name}
  • 11. REPRESENTATIVE OPERATORS Category Name Description Control Flow call>: Call another workflow loop>: Repeat tasks a specified # of times for_each>: Loop through a specified list if>: if/else control flow Treasure Data td>: Run a specified TD query td_run>: Run a saved query td_ddl>: Create, delete, rename, truncate tables td_load>: Invoke an input data transfer td_for_each>: Loop through a query result row by row AWS s3_wait>: Wait for new files in S3 & download redshift>: Run Redshift query redshift_load>: Load data into Redshift redshift_unload>: Unload data from Redshift Google Cloud Platform bq>: Run BigQuery query bq_extract>: Unload data from BigQuery to GCS
  • 13. Slowly Changing Dimensions • Particular dimensions within a dataset that are prone to change unpredictably • Example: the phone number or email field of a CRM dataset • Data available from a CRM usually represents the current, up-to-date value of each field for each customer • Storing a history this customer data requires managing these slowly changing dimensions (SCDs)
  • 14. Different Ways to Handle SCDs • Type 1 • Type 2 • Type 3 • Type 4
  • 15. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record:
  • 16. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California
  • 17. Type 1: Overwrite the field company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 1: company_id company_name company_state 123 Sterling Cooper California
  • 18. Type 2: Keep both records, flag the “current” row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 2: company_id company_name company_state is_current 123 Sterling Cooper New York 0 123 Sterling Cooper California 1
  • 19. Type 3: Store the latest two values in one row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 3: company_id company_name company_state_current company_state_previous 123 Sterling Cooper California New York
  • 20. Type 4: Use a separate history table SCD Type 4: company_id company_name company_state 123 Sterling Cooper California company company_id company_name company_state last_modified_date 123 Sterling Cooper New York 2007-06-19 123 Sterling Cooper California 2008-10-12 company_history
  • 22. Type 2: Keep both records, flag the “current” row company_id company_name company_state 123 Sterling Cooper New York Old Record: New Record: company_id company_name company_state 123 Sterling Cooper California SCD Type 2: company_id company_name company_state is_current 123 Sterling Cooper New York 0 123 Sterling Cooper California 1
  • 23. Type 2: Keep both records, flag the “current” row company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 Old Record: New Record: SCD Type 2: company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12
  • 24. Type 2: Keep both records, flag the “current” row company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 Old Record: New Record: SCD Type 2: company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12
  • 25. Window Functions • Window functions perform calculations across rows of the query result • They run after the ‘HAVING’ clause but before the ‘ORDER BY’ clause • They are written in the ‘SELECT’ clause and display results in their own column • They have three parts:
  • 26. Window Functions rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) ordering specificationfunction partition specification
  • 27. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 123 Sterling Cooper California 2008-10-12 company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 2
  • 28. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate DESC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 2 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 2
  • 29. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 2 123 Sterling Cooper New York 2007-06-19 1 124 CGC Connecticut 2018-05-22 2 124 CGC New York 2010-08-22 1
  • 30. Window Functions SELECT company_id, company_name, company_state, rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) AS isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0
  • 31. Window Functions SELECT company_id, company_name, company_state, CASE WHEN rank() OVER (PARTITION BY company_id ORDER BY lastmodifieddate ASC) = 1 THEN 1 ELSE 0 AS END as isCurrent FROM company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0
  • 32. Implementation in Treasure Data 1. Load incremental data from a data source to a staging table 1. Drop the target table that contains outdated SCD information 1. Window over the staging table, rebuilding the target table with the latest SCD information
  • 33. Implementation in Treasure Data 1. Load incremental data from a data source to a staging table 1. Drop the target table that contains outdated SCD information 1. Window over the staging table, rebuilding the target table with the latest SCD information
  • 35. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 36. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 37. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company target_company
  • 38. Implementation in Treasure Data company_id company_name company_state lastmodifieddate 123 Sterling Cooper New York 2007-06-19 124 CGC New York 2010-08-22 123 Sterling Cooper California 2008-10-12 124 CGC Connecticut 2018-05-22 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper California 2008-10-12 1 123 Sterling Cooper New York 2007-06-19 0 124 CGC Connecticut 2018-05-22 1 124 CGC New York 2010-08-22 0 target_company
  • 40. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company
  • 41. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company 1. Store a temp table of the current rows that will not be current after the new data is ingested company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 42. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 1 124 CGC New York 2010-08-22 1 target_company 1. Store a temp table of the current rows that will not be current after the new data is ingested company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 43. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 target_company 2. Delete from the data lake any current rows that have a matching id in the new data company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 44. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 target_company 3. Insert the temp rows into the target table company_id company_name company_state lastmodifieddate is_current 123 Sterling Cooper New York 2007-06-19 0 tmp_no_longer_current
  • 45. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 3. Insert the temp rows into the target table
  • 46. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 3. Insert the temp rows into the target table
  • 47. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate 123 Sterling Cooper California 2008-10-12 staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 target_company 4. Insert the new data into the target table
  • 48. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 target_company 4. Insert the new data into the target table
  • 49. SCD Type 2 Workflow with Persistent Architecture company_id company_name company_state lastmodifieddate staging_company company_id company_name company_state lastmodifieddate is_current 124 CGC New York 2010-08-22 1 123 Sterling Cooper New York 2007-06-19 0 123 Sterling Cooper California 2008-10-12 1 target_company 4. Insert the new data into the target table
  • 50. Contact Information • Scott Mitchell • Senior Solution Engineer • smitchell@treasure-data.com