SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
GM/DM 1:1 1
IN FOUR SIMPLE STEPS,
ETL CLICKSTREAM TO
DATA PRODUCTS
(NO ENGINEER NEEDED!)
SENIOR DATA SCIENTIST|JOSH JANZEN
GM/DM 1:1 2
JOSH JANZEN
SENIOR DATA SCIENTIST
Degrees from:
Data Science Tools:
About:
Life Time champions a healthy
and happy life for its members
across 138 destinations in 38
major markets in the U.S. and
Canada
GM/DM 1:1
1. DATA FEED 2. EXPLORE 3. ETL/ML 4. DEPLOY
Ø FTP to S3 w/bucket
credentials
Ø Sample data and
explore
Ø Find columns of
interest
Ø ETL columns of
interest
Ø Apply ML
algorithms
Ø Create web APIs
with Azure ML
Ø Interactive Web
Apps
GM/DM 1:1 4
STEP
ØFTP to S3 w/bucket
credentials
ØStart off as batch (nightly)
1. DATA FEEDeffort
25%
50%
75%
100%
progress
GM/DM 1:1 5
STEP
ØSample data and explore
ØFind columns of interest
2. EXPLOREeffort
25%
50%
75%
100%
progress
GM/DM 1:1 6
STEP 2. EXPLOREeffort
25%
50%
75%
100%
progress
Func RemoveNullColumns:
for column in dataframe:
if column is null:
remove column
Int threshold = 2
Func RemoveLowVariationColumns:
for column in dataframe:
if count(distinct values) in column < threshold:
remove column
GM/DM 1:1 7
STEP 2. EXPLOREeffort
25%
50%
75%
100%
progress
GM/DM 1:1 8
STEP
ØETL columns of interest
ØApply ML algorithms
3. ETL/MLeffort
25%
50%
75%
100%
progress
GM/DM 1:1 9
STEP
Auto-scaling of Cluster Size
3. ETL/MLeffort
25%
50%
75%
100%
progress
GM/DM 1:1 10
STEP 3. ETL/MLeffort
25%
50%
75%
100%
progress
event_date_time user_id action page_name os
11/15/18 7:25AM u_345 Menu_click Home Android
11/15/18 7:26AM u_345 NULL ScheduleClass Android
Array files_etl_complete = [‘raw_clicks_12_01_18’,‘raw_clicks_12_02_18’ …]
Func DetectNewDataTMS:
for file in raw_clicks_bucket:
if file NOT EXISTS in files_etl_complete:
PeformETL(file)
files_etl_complete.append(file)
GM/DM 1:1 11
STEP 3. ETL/MLeffort
25%
50%
75%
100%
progress
Images may be subject to copyright
source: https://johnolamendy.wordpress.com/2015/10/14/collaborative-filtering-in-apache-spark/
GM/DM 1:1 12
STEP
ØCreate web APIs with Azure ML
ØInteractive Web Apps
4. DEPLOYeffort
25%
50%
75%
100%
progress
GM/DM 1:1 13
STEP 4. DEPLOYeffort
25%
50%
75%
100%
progress
Images may be subject to copyright
source: https://wikiazure.com/artificial-intelligence/predict-temperature-using-azure-machine-learning/
GM/DM 1:1 14
STEP 4. DEPLOYeffort
25%
50%
75%
100%
progress
GM/DM 1:1 15
TIPS/TRICKS
Images may be subject to copyright
source: https://gifer.com/en/7kRO

Weitere ähnliche Inhalte

Ähnlich wie In Four Simple Steps, ETL Clickstream to Data Product APIs (no Engineer needed!)

How to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FMEHow to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FMESafe Software
 
Using The Master Genealogist - Basics
Using The Master Genealogist - BasicsUsing The Master Genealogist - Basics
Using The Master Genealogist - BasicsTeresa Pask
 
NDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business NeedsNDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business NeedsTorben Hoffmann
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify Dataconomy Media
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18DataconomyGmbH
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchErudite
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01jade_22
 
Mastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native DatabasesMastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native DatabasesSafe Software
 
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)Rodrigo Radtke de Souza
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar SlidesSumo Logic
 
UEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The HoodUEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The HoodIvanti
 
70-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 201370-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 2013Nikki0014
 
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsUnleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsSafe Software
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierDatabricks
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkDatabricks
 
Magical Methods for Batch Data Processing
Magical Methods for Batch Data ProcessingMagical Methods for Batch Data Processing
Magical Methods for Batch Data ProcessingSafe Software
 
Ken Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FMEKen Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FMEGIM_nv
 

Ähnlich wie In Four Simple Steps, ETL Clickstream to Data Product APIs (no Engineer needed!) (20)

How to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FMEHow to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FME
 
Using The Master Genealogist - Basics
Using The Master Genealogist - BasicsUsing The Master Genealogist - Basics
Using The Master Genealogist - Basics
 
NDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business NeedsNDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business Needs
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
Toad
ToadToad
Toad
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Mastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native DatabasesMastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native Databases
 
Neethu_Abraham
Neethu_AbrahamNeethu_Abraham
Neethu_Abraham
 
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
UEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The HoodUEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The Hood
 
70-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 201370-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 2013
 
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsUnleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache Spark
 
Magical Methods for Batch Data Processing
Magical Methods for Batch Data ProcessingMagical Methods for Batch Data Processing
Magical Methods for Batch Data Processing
 
Ken Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FMEKen Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FME
 

Kürzlich hochgeladen

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Kürzlich hochgeladen (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

In Four Simple Steps, ETL Clickstream to Data Product APIs (no Engineer needed!)

  • 1. GM/DM 1:1 1 IN FOUR SIMPLE STEPS, ETL CLICKSTREAM TO DATA PRODUCTS (NO ENGINEER NEEDED!) SENIOR DATA SCIENTIST|JOSH JANZEN
  • 2. GM/DM 1:1 2 JOSH JANZEN SENIOR DATA SCIENTIST Degrees from: Data Science Tools: About: Life Time champions a healthy and happy life for its members across 138 destinations in 38 major markets in the U.S. and Canada
  • 3. GM/DM 1:1 1. DATA FEED 2. EXPLORE 3. ETL/ML 4. DEPLOY Ø FTP to S3 w/bucket credentials Ø Sample data and explore Ø Find columns of interest Ø ETL columns of interest Ø Apply ML algorithms Ø Create web APIs with Azure ML Ø Interactive Web Apps
  • 4. GM/DM 1:1 4 STEP ØFTP to S3 w/bucket credentials ØStart off as batch (nightly) 1. DATA FEEDeffort 25% 50% 75% 100% progress
  • 5. GM/DM 1:1 5 STEP ØSample data and explore ØFind columns of interest 2. EXPLOREeffort 25% 50% 75% 100% progress
  • 6. GM/DM 1:1 6 STEP 2. EXPLOREeffort 25% 50% 75% 100% progress Func RemoveNullColumns: for column in dataframe: if column is null: remove column Int threshold = 2 Func RemoveLowVariationColumns: for column in dataframe: if count(distinct values) in column < threshold: remove column
  • 7. GM/DM 1:1 7 STEP 2. EXPLOREeffort 25% 50% 75% 100% progress
  • 8. GM/DM 1:1 8 STEP ØETL columns of interest ØApply ML algorithms 3. ETL/MLeffort 25% 50% 75% 100% progress
  • 9. GM/DM 1:1 9 STEP Auto-scaling of Cluster Size 3. ETL/MLeffort 25% 50% 75% 100% progress
  • 10. GM/DM 1:1 10 STEP 3. ETL/MLeffort 25% 50% 75% 100% progress event_date_time user_id action page_name os 11/15/18 7:25AM u_345 Menu_click Home Android 11/15/18 7:26AM u_345 NULL ScheduleClass Android Array files_etl_complete = [‘raw_clicks_12_01_18’,‘raw_clicks_12_02_18’ …] Func DetectNewDataTMS: for file in raw_clicks_bucket: if file NOT EXISTS in files_etl_complete: PeformETL(file) files_etl_complete.append(file)
  • 11. GM/DM 1:1 11 STEP 3. ETL/MLeffort 25% 50% 75% 100% progress Images may be subject to copyright source: https://johnolamendy.wordpress.com/2015/10/14/collaborative-filtering-in-apache-spark/
  • 12. GM/DM 1:1 12 STEP ØCreate web APIs with Azure ML ØInteractive Web Apps 4. DEPLOYeffort 25% 50% 75% 100% progress
  • 13. GM/DM 1:1 13 STEP 4. DEPLOYeffort 25% 50% 75% 100% progress Images may be subject to copyright source: https://wikiazure.com/artificial-intelligence/predict-temperature-using-azure-machine-learning/
  • 14. GM/DM 1:1 14 STEP 4. DEPLOYeffort 25% 50% 75% 100% progress
  • 15. GM/DM 1:1 15 TIPS/TRICKS Images may be subject to copyright source: https://gifer.com/en/7kRO