SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
The tutorial to build shared AI
services
--Session 2
Suqiang Song (Jack)
Director & Chapter Leader of Data/AI Engineering @ Mastercard
jackssqcyy@gmail.com
https://www.linkedin.com/in/suqiang-song-72041716/
Agenda
Module 3: AI Engineering platform and AI
Engineers ( 40 mins)
• Key factors to consider an AI Engineering platform
• architect a data pipeline framework
• Apache NiFi introduction
• Traditional AI Tribe and its challenges
• knowledges and skills are required for AI Engineer
• Growing path for an AI Engineer
Session 2: Feb. 1rd Friday 10am-12pm PT
Module 4: Benchmark between
Spark Machine learning and Deep
learning + Code Lab 2 (30 mins)
• Traditional Collaborative Filtering
approach with Spark Mllib ALS (Scala)
• Build an NCF deep learning approach
with Intel Analytic Zoo on Spark (Scala)
Q & A (10 mins)Live Demo (40 mins)
• Build an end to end AI Pipeline with
Kafka, NiFi, Spark Streaming and Keras
on Spark
Course Prerequisites
• Install Docker at your local laptop
• Download two Docker images from shared drive URL
kafka.tar and demo-whole.tar and also
demo_pipeline.xml
passcode : jack
• Load images to your Docker environment
https://1drv.ms/f/s!AsXKHMXBWUIBiBpaYk9FFjdoUifg
$ docker load -i demo-whole.tar
$ docker load -i kafka.tar
Module 3
AI Engineering Organization/ProcessAPI/ Pipeline Enablement
Talent Data
Technology
infrastructure
ConsolidateLeverage Automate
Key factors to consider an AI Engineering platform
Continue
AI Engineering platform
ML /DL learning pipelines
Historical + Incremental
Data Sources
Data Pipeline Bus
Real Time
Event Integration
Batch
Data Integration
Business Rule
Integration
Online Systems CRM
Files
Transfer
Data LakeMessage
Bus
Real Time
Serving
Streaming
Serving
Batch
Serving
Monitoring
Metrics
Serving Engine
Data Pipeline Engine
Machine Learning & Deep
Learning Libs/Frameworks
Performance
Analyzer
Predefined Integrated Pipelines
Predefined Serving APIs & Templates
Predefined AI Service
Templates
Workbench
Admin
Cloud Native
Data Flow Pipeline
• Flow-based ”programming”
• Source –Channel-Sink “structure”
• Ingest Data from various sources
and Transform data to various
destinations
• Extract – Transform – Load
• High-Throughput, straight-through
data flows
• Data Governance
• Combine Batch and Stream-
Processing
• Visual coding with flow editor
• Event Processing (ESP & CEP)
The X(quality attributes) for data pipeline
framework includes
• Clustering
• High Availability & Recovery
• Delivery Guarantee
• Data Buffering ,Flow Control and Back Pressure
• Data Governance
• Usability
• Extensibility
• Multi-Tenancy
• Version Control & Deployment
• Security
• Monitoring & Diagnostic Capabilities
• Integration Capabilities
• Cloud Native
• Performance , Latency and Throughputs
Architect a data
pipeline framework
What is the DFX ?
Along with functional
requirements, there are various
quality attributes.
The difference in these attributes
can make the product very
different.
Such as Tesla and Leaf
DFX is Design For Quality
Attributes
Example : High
Availability and
Recovery
 High availability
– Pipeline level : Each step or processor at flow that is likely to
encounter failures will have a "failure" routing relationship
– Pipeline Failure is handled by looping that failure
relationship back on the same step or to new steps
– Node level failover will depends on a "cluster coordinator"
and a "primary node" elected
– Pipeline failover between nodes ?
 Recovery
– Replay : Content repository should be designed to act as a
rolling buffer of history which supports replay every well
– Data Recovery after failover , the eventual consistency
– Breakpoint resume : Last-saved offset , how you resume the
pipeline from the broken pieces after fixed
Example :
Data Buffering ,Flow
Control and Back
Pressure
 Buffering with Prioritization
– Configure a prioritizer per connection, such as FirstInFristOut
, NewestFirst,OldestFirst etc..
– Determine what is important for your data – time based,
arrival order,
importance of a data set
– Funnel many connections down to a single connection to
prioritize across
data sets
– Develop your own prioritizer if needed
 Flow Control & Back-Pressure
– Configure back-pressure such as expiration, threshold for
per connection
– Based on number of flows or total size of flows
– Upstream processor no longer scheduled to run until below
threshold
Example : Security  Control Plane
– Pluggable authentication :2-Way SSL, LDAP, Kerberos
– File-based authority provider out of the box
– Multiple roles to defines access controls
 Data Plane
– Optional 2-Way SSL between cluster nodes
– Optional 2-Way SSL on Site-To-Site ( or Edge-to-Edge)
connections
– Encryption/Decryption of data through processors
 Data privacy and compliance
– PCI/PII compliance
– GDPR (General Data Protection Regulation)
Yes , you don’t want
your CEO to be testified
before Congress ☺
Example :
Multi-Tenancy
Ability for multiple groups of
entities (people or systems) to
command, control, and observe
state of different parts of the
dataflow
 Multi-tenant Authorization
– Enable a self-service model for dataflow management,
allowing each team or organization to manage flows with a
full awareness of the rest of the flow, to which they do not
have access.
 Multi-tenant isolation and Separated SLA/QoS
– Data is absolutely critical and it is loss intolerant
– Enables the fine-grained flow specific configuration to each
tenant
– Data Buffering ,Flow Control and Back Pressure should be
considered at tenant level
 Multi-tenant isolated resources management
– Integrate with 3rd popular resources management
framework such as Yarn
– Split up the functionalities of resource management and job
scheduling/monitoring into separate daemons
Technology Assessment Score Definition
Clustering
Assessment Score Ratings
High Availability and Recovery
2 431 5
N
NiFiN
2 431 5
N
Delivery Guarantee
2 431 5
N
Data Buffering ,Flow
Control and Back Pressure
2 431 5
N
Data Governance
2 431 5
N
Usability
2 431 5
N
Extensibility
2 431 5
N
Multi-Tenancy
2 431 5
N
Version Control & Deployment
2 431 5
N
Authentication & Authorization
2 431 5
N
Encryption and decryption
2 431 5
N
Monitoring & Diagnostic
2 431 5
N
Integration capabilities
2 431 5
N
Cloud Native
2 431 5
N
Performance, Latency and Throughputs
(Real Time & Streaming)
2 431 5
N
Performance, Latency and Throughputs
(Batch files / DB actions )
2 431 5
N
Ingest
TransformAnalyze
Output
Understand
Problem
Ingest
Data
Explore and
Understand
Data
Clean and
Shape Data
Evaluate
Data
Create and
build Models
Communicate
Results
Deliver &
Deploy Model
Data Engineer
Architect how data is organized
& ensure operability
Data Scientist
Deep analytics and modeling
for hidden insights
Business Analyst
Work with data to apply
insights to business strategy
App Developer
Integrates data & insights with
existing or new applications
Traditional AI Tribe
Trends
https://vmware.wd1.myworkdayjobs.com/VMware/job/USA-California-Palo-Alto/Staff-Machine-Learning-
Engineer_R1813174?from=timeline
4
4
4
4
4
3
3
0 1 2 3 4
ALGORITHM
ML MODELING
RESEARCH
MATH
STATISTICS
HYPERTUNING
FEATURE ENGINEERING
2
1
1
1
2
2
2
0 1 2 3 4
QA
TROUBLE SHOOTING
ABSTRACT THINKING
DEBUGING
DATA STRUCTURES
UNIT TEST
SOFTWARE ALGORITHM
1
1
1
3
4
0 1 2 3 4
SHELL/PERL
C++
JAVA/SCALA
PYTHON
R
2
1
2
1
2
1
2
2
0 1 2 3 4
DATA MODELING
DATA WAREHOUSING…
SQL
RDB & NOSQL
ETL
DATA GOVERNANCE
DATA PIPELINE
JOB/WORKFLOW
3
1
1
2
2
4
2
0 1 2 3 4
APPLIANCE(SCALE UP)
DISTRIBUTE…
KAFKA/STREAMING
SPARK
HADOOP
SAS
MPP
Data Mining Programing / Coding Languages
Data Engineering Big Data Stacks
3
3
2
3
4
2
2
2
0 1 2 3 4
PANDAS
H2O
SPARK MLLIB
SCIKIT LEARN
R LIB
DL-CAFFE
DL-KERAS
DL-TENSORFLOW
APIs /Services/App
( Model Serving)
1
1
1
1
1
1
1
0 1 2 3 4
AUTOMATION
REAL TIME MESSAGING
CACHE
API/APP FRAMEWORK
RPC/RESTFUL…
CI/CD
CLOUD NATIVE
3
4
3
2
2
3
0 1 2 3 4
DATA VISUALIZATION
MODEL INTERPRETABILITY
BUSINESS ACUMEN
COMMUNICATION SKILLS
BUSINESS ANALYSIS
PRESENTATION
Ratings for traditional data scientist
ML/DL Frameworks
Visualization
& Communications
Ratings for traditional data engineer
1
1
1
2
1
1
2
0 1 2 3 4
ALGORITHM
ML MODELING
RESEARCH
MATH
STATISTICS
HYPERTUNING
FEATURE ENGINEERING
3
3
3
3
4
3
3
0 1 2 3 4
QA
TROUBLE SHOOTING
ABSTRACT THINKING
DEBUGING
DATA STRUCTURES
UNIT TEST
SOFTWARE ALGORITHM
4
2
3
3
1
0 1 2 3 4
SHELL/PERL
C++
JAVA/SCALA
PYTHON
R
4
4
4
4
4
4
4
4
0 1 2 3 4
DATA MODELING
DATA WAREHOUSING…
SQL
RDB & NOSQL
ETL
DATA GOVERNANCE
DATA PIPELINE
JOB/WORKFLOW
3
4
3
4
4
2
3
0 1 2 3 4
APPLIANCE(SCALE UP)
DISTRIBUTE…
KAFKA/STREAMING
SPARK
HADOOP
SAS
MPP
Data Mining Programing / Coding Languages
Data Engineering Big Data Stacks
2
1
2
2
1
2
2
2
0 1 2 3 4
PANDAS
H2O
SPARK MLLIB
SCIKIT LEARN
R LIB
DL-CAFFE
DL-KERAS
DL-TENSORFLOW
APIs /Services/App
( Model Serving)
3
2
2
3
3
3
3
0 1 2 3 4
AUTOMATION
REAL TIME MESSAGING
CACHE
API/APP FRAMEWORK
RPC/RESTFUL…
CI/CD
CLOUD NATIVE
2
1
2
1
1
2
0 1 2 3 4
DATA VISUALIZATION
MODEL INTERPRETABILITY
BUSINESS ACUMEN
COMMUNICATION SKILLS
BUSINESS ANALYSIS
PRESENTATION
ML/DL Frameworks
Visualization
& Communications
Ratings for traditional application developer
1
1
1
2
1
1
1
0 1 2 3 4
ALGORITHM
ML MODELING
RESEARCH
MATH
STATISTICS
HYPERTUNING
FEATURE ENGINEERING
3
4
4
4
3
4
4
0 1 2 3 4
QA
TROUBLE SHOOTING
ABSTRACT THINKING
DEBUGING
DATA STRUCTURES
UNIT TEST
SOFTWARE ALGORITHM
2
3
4
2
1
0 1 2 3 4
SHELL/PERL
C++
JAVA/SCALA
PYTHON
R
2
1
3
3
2
2
2
3
0 1 2 3 4
DATA MODELING
DATA WAREHOUSING…
SQL
RDB & NOSQL
ETL
DATA GOVERNANCE
DATA PIPELINE
JOB/WORKFLOW
1
1
2
1
1
1
1
0 1 2 3 4
APPLIANCE(SCALE UP)
DISTRIBUTE…
KAFKA/STREAMING
SPARK
HADOOP
SAS
MPP
Data Mining Programing / Coding Languages
Data Engineering Big Data Stacks
1
1
1
1
1
1
1
1
0 1 2 3 4
PANDAS
H2O
SPARK MLLIB
SCIKIT LEARN
R LIB
DL-CAFFE
DL-KERAS
DL-TENSORFLOW
APIs /Services/App
( Model Serving)
3
4
4
4
4
4
3
0 1 2 3 4
AUTOMATION
REAL TIME MESSAGING
CACHE
API/APP FRAMEWORK
RPC/RESTFUL…
CI/CD
CLOUD NATIVE
2
1
3
2
1
2
0 1 2 3 4
DATA VISUALIZATION
MODEL INTERPRETABILITY
BUSINESS ACUMEN
COMMUNICATION SKILLS
BUSINESS ANALYSIS
PRESENTATION
ML/DL Frameworks
Visualization
& Communications
Ratings for modern AI engineer
3
2
2
3
2
3
3
0 1 2 3 4
ALGORITHM
ML MODELING
RESEARCH
MATH
STATISTICS
HYPERTUNING
FEATURE ENGINEERING
3
4
4
3
4
3
4
0 1 2 3 4
QA
TROUBLE SHOOTING
ABSTRACT THINKING
DEBUGING
DATA STRUCTURES
UNIT TEST
SOFTWARE ALGORITHM
3
2
4
4
2
0 1 2 3 4
SHELL/PERL
C++
JAVA/SCALA
PYTHON
R
3
3
3
3
3
3
4
3
0 1 2 3 4
DATA MODELING
DATA WAREHOUSING…
SQL
RDB & NOSQL
ETL
DATA GOVERNANCE
DATA PIPELINE
JOB/WORKFLOW
3
4
4
4
4
2
3
0 1 2 3 4
APPLIANCE(SCALE UP)
DISTRIBUTE…
KAFKA/STREAMING
SPARK
HADOOP
SAS
MPP
Data Mining Programing / Coding Languages
Data Engineering Big Data Stacks
3
2
4
3
2
3
4
4
0 1 2 3 4
PANDAS
H2O
SPARK MLLIB
SCIKIT LEARN
R LIB
DL-CAFFE
DL-KERAS
DL-TENSORFLOW
APIs /Services/App
( Model Serving)
3
4
4
4
4
3
3
0 1 2 3 4
AUTOMATION
REAL TIME MESSAGING
CACHE
API/APP FRAMEWORK
RPC/RESTFUL…
CI/CD
CLOUD NATIVE
4
3
3
3
2
3
0 1 2 3 4
DATA VISUALIZATION
MODEL INTERPRETABILITY
BUSINESS ACUMEN
COMMUNICATION SKILLS
BUSINESS ANALYSIS
PRESENTATION
Visualization
& Communications
ML/DL Frameworks
AI EngineerAreas need enhancements Training / Improving approach
Languages
Deep Learning To be expert , from bottom to the top
Use Deep Learning to avoid the gap
Spark ***** Hadoop ***
6 months certificate Plus
Data Engineering
Big Data Stacks
API /Application Design and
implement Essentials
Model Serving
Programing Course or Hands on
project , no need a CS MasterProgramming / Coding
Java/ Scala ++ ,Python +
Growing path 1 : traditional data scientist - > AI Engineer
AI EngineerAreas need enhancements Training / Improving approach
Languages
ML Framework
Deep Learning
Visualization , Communication Skills
Presentation
Visualization
& Communication
API /Application Design and
implement Advanced
Model Serving
Secondary DS/BA Master
At least 6 months certificate
Use Deep Learning to simply
Data Mining
Java/ Scala + ,Python +,R ++
Fast.ai ,GitHub , Kaggle, Coursera
/Udemy
At least 6 months DL certificate
Growing path 2 : traditional data engineer - > AI Engineer
AI EngineerAreas need enhancements Training / Improving approach
Languages
Fast.ai ,GitHub , Kaggle, Coursera
/Udemy
At least 6 months DL certificate
ETL Essentials
Data Pipeline ++
Use Deep Learning to avoid the gap
Hadoop + Spark Essentials
At least 6 months certificate
Data Engineering
Big Data Stacks
Python +++, R ++
Secondary DS/BA Master
At least 6 months certificate
Use Deep Learning to simply
Data Mining
ML Framework
Deep Learning
Visualization , Communication Skills
Presentation
Visualization
& Communication
Growing path 3 : traditional application developer - > AI Engineer
Module 4+ Code Lab2
https://github.com/jack1981/AaaSDemo
Collaborative Filtering -- concept
Spark Mllib
ALS
NCF on Spark
Collaborative Filtering -- Model selection
Spark Mllib
KMeans
• Offers a set of parallelized machine learning algorithms for ML
• Supports Model Selection (hyper parameter tuning) using Cross
Validation and Train-Validation Split.
• Supports Java, Scala or Python apps using Data Frame-based API
Enables Parallel, Distributed ML for large datasets on
Spark Clusters
Spark Mllib
Spark Mllib Algorithms
Spark Mllib ML Pipeline
DataFrame: Spark ML uses DataFrame rather than regular RDD as they hold a
variety of data types (e.g. feature vectors, true labels, and predictions).
Transformer: a transformer converts a DataFrame into another DataFrame
usually by appending columns. (since Spark DataFrame is immutable, it actually
creates a new DataFrame). The implement method for a transformer is
“transform()”.
Estimator: An Estimator is an algorithm which can be fit on a DataFrame to
produce a Transformer. Implements method fit() taking a DataFrame and a
model (also a transformer) as input.
Pipeline: Chains multiple Transformers and Estimators each as a stage to
specify an ML workflow. These stages are run in order, and the input
DataFrame is transformed as it passes through each stage.
Parameter: All Transformers and Estimators now share a common API for
specifying parameters.
Evaluator: Evaluate model performance. The Evaluator can be
• RegressionEvaluator for regression problems,
• BinaryClassificationEvaluator for binary data, or
• MulticlassClassificationEvaluator for multiclass problems.
Alternating Least Squares (ALS) Spark ML
Collaborative Filtering -- Trade Offs
Cluster Based
MF Based
Cluster +MF
Based
Deep Learning
(CPU)
Code Time !
Live Demo (Build an end to end AI Pipeline with Kafka,
NiFi, Spark Streaming and Keras on Spark)
Kafka
Livy
Spark Streaming
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platformhadooparchbook
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Amy W. Tang
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Databricks
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Databricks
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 Chester Chen
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarDenny Lee
 
Teradata - Hadoop profile
Teradata - Hadoop profileTeradata - Hadoop profile
Teradata - Hadoop profileSantosh Dandge
 
Yugandhar uppala oracle dba_2016
Yugandhar uppala oracle dba_2016Yugandhar uppala oracle dba_2016
Yugandhar uppala oracle dba_2016Yugandhar Uppala
 
Resume_Mohammed_Ali_Updated
Resume_Mohammed_Ali_UpdatedResume_Mohammed_Ali_Updated
Resume_Mohammed_Ali_UpdatedMohammed Ali
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Tammy Bednar
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014Amazon Web Services
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xshradha ambekar
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache SparkMatt Ingenthron
 

Was ist angesagt? (20)

Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform with Srivatsan...
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
Teradata - Hadoop profile
Teradata - Hadoop profileTeradata - Hadoop profile
Teradata - Hadoop profile
 
Yugandhar uppala oracle dba_2016
Yugandhar uppala oracle dba_2016Yugandhar uppala oracle dba_2016
Yugandhar uppala oracle dba_2016
 
Resume_Mohammed_Ali_Updated
Resume_Mohammed_Ali_UpdatedResume_Mohammed_Ali_Updated
Resume_Mohammed_Ali_Updated
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
(SPOT305) Event-Driven Computing on Change Logs in AWS | AWS re:Invent 2014
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache Spark
 

Ähnlich wie C19013010 the tutorial to build shared ai services session 2

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakeDatabricks
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901WeCloudData
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training ClassDeepak Shankar
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environmentsDocker, Inc.
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3Databricks
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 

Ähnlich wie C19013010 the tutorial to build shared ai services session 2 (20)

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 

Mehr von Bill Liu

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectBill Liu
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Bill Liu
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeBill Liu
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixBill Liu
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScaleBill Liu
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Bill Liu
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Bill Liu
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileBill Liu
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningBill Liu
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsBill Liu
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldBill Liu
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeBill Liu
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 

Mehr von Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

C19013010 the tutorial to build shared ai services session 2

  • 1. The tutorial to build shared AI services --Session 2 Suqiang Song (Jack) Director & Chapter Leader of Data/AI Engineering @ Mastercard jackssqcyy@gmail.com https://www.linkedin.com/in/suqiang-song-72041716/
  • 2. Agenda Module 3: AI Engineering platform and AI Engineers ( 40 mins) • Key factors to consider an AI Engineering platform • architect a data pipeline framework • Apache NiFi introduction • Traditional AI Tribe and its challenges • knowledges and skills are required for AI Engineer • Growing path for an AI Engineer Session 2: Feb. 1rd Friday 10am-12pm PT Module 4: Benchmark between Spark Machine learning and Deep learning + Code Lab 2 (30 mins) • Traditional Collaborative Filtering approach with Spark Mllib ALS (Scala) • Build an NCF deep learning approach with Intel Analytic Zoo on Spark (Scala) Q & A (10 mins)Live Demo (40 mins) • Build an end to end AI Pipeline with Kafka, NiFi, Spark Streaming and Keras on Spark
  • 3. Course Prerequisites • Install Docker at your local laptop • Download two Docker images from shared drive URL kafka.tar and demo-whole.tar and also demo_pipeline.xml passcode : jack • Load images to your Docker environment https://1drv.ms/f/s!AsXKHMXBWUIBiBpaYk9FFjdoUifg $ docker load -i demo-whole.tar $ docker load -i kafka.tar
  • 5. AI Engineering Organization/ProcessAPI/ Pipeline Enablement Talent Data Technology infrastructure ConsolidateLeverage Automate Key factors to consider an AI Engineering platform
  • 6. Continue AI Engineering platform ML /DL learning pipelines Historical + Incremental Data Sources Data Pipeline Bus Real Time Event Integration Batch Data Integration Business Rule Integration Online Systems CRM Files Transfer Data LakeMessage Bus Real Time Serving Streaming Serving Batch Serving Monitoring Metrics Serving Engine Data Pipeline Engine Machine Learning & Deep Learning Libs/Frameworks Performance Analyzer Predefined Integrated Pipelines Predefined Serving APIs & Templates Predefined AI Service Templates Workbench Admin Cloud Native
  • 7. Data Flow Pipeline • Flow-based ”programming” • Source –Channel-Sink “structure” • Ingest Data from various sources and Transform data to various destinations • Extract – Transform – Load • High-Throughput, straight-through data flows • Data Governance • Combine Batch and Stream- Processing • Visual coding with flow editor • Event Processing (ESP & CEP)
  • 8. The X(quality attributes) for data pipeline framework includes • Clustering • High Availability & Recovery • Delivery Guarantee • Data Buffering ,Flow Control and Back Pressure • Data Governance • Usability • Extensibility • Multi-Tenancy • Version Control & Deployment • Security • Monitoring & Diagnostic Capabilities • Integration Capabilities • Cloud Native • Performance , Latency and Throughputs Architect a data pipeline framework What is the DFX ? Along with functional requirements, there are various quality attributes. The difference in these attributes can make the product very different. Such as Tesla and Leaf DFX is Design For Quality Attributes
  • 9. Example : High Availability and Recovery  High availability – Pipeline level : Each step or processor at flow that is likely to encounter failures will have a "failure" routing relationship – Pipeline Failure is handled by looping that failure relationship back on the same step or to new steps – Node level failover will depends on a "cluster coordinator" and a "primary node" elected – Pipeline failover between nodes ?  Recovery – Replay : Content repository should be designed to act as a rolling buffer of history which supports replay every well – Data Recovery after failover , the eventual consistency – Breakpoint resume : Last-saved offset , how you resume the pipeline from the broken pieces after fixed
  • 10. Example : Data Buffering ,Flow Control and Back Pressure  Buffering with Prioritization – Configure a prioritizer per connection, such as FirstInFristOut , NewestFirst,OldestFirst etc.. – Determine what is important for your data – time based, arrival order, importance of a data set – Funnel many connections down to a single connection to prioritize across data sets – Develop your own prioritizer if needed  Flow Control & Back-Pressure – Configure back-pressure such as expiration, threshold for per connection – Based on number of flows or total size of flows – Upstream processor no longer scheduled to run until below threshold
  • 11. Example : Security  Control Plane – Pluggable authentication :2-Way SSL, LDAP, Kerberos – File-based authority provider out of the box – Multiple roles to defines access controls  Data Plane – Optional 2-Way SSL between cluster nodes – Optional 2-Way SSL on Site-To-Site ( or Edge-to-Edge) connections – Encryption/Decryption of data through processors  Data privacy and compliance – PCI/PII compliance – GDPR (General Data Protection Regulation) Yes , you don’t want your CEO to be testified before Congress ☺
  • 12. Example : Multi-Tenancy Ability for multiple groups of entities (people or systems) to command, control, and observe state of different parts of the dataflow  Multi-tenant Authorization – Enable a self-service model for dataflow management, allowing each team or organization to manage flows with a full awareness of the rest of the flow, to which they do not have access.  Multi-tenant isolation and Separated SLA/QoS – Data is absolutely critical and it is loss intolerant – Enables the fine-grained flow specific configuration to each tenant – Data Buffering ,Flow Control and Back Pressure should be considered at tenant level  Multi-tenant isolated resources management – Integrate with 3rd popular resources management framework such as Yarn – Split up the functionalities of resource management and job scheduling/monitoring into separate daemons
  • 13.
  • 15. Clustering Assessment Score Ratings High Availability and Recovery 2 431 5 N NiFiN 2 431 5 N Delivery Guarantee 2 431 5 N Data Buffering ,Flow Control and Back Pressure 2 431 5 N Data Governance 2 431 5 N Usability 2 431 5 N Extensibility 2 431 5 N Multi-Tenancy 2 431 5 N Version Control & Deployment 2 431 5 N Authentication & Authorization 2 431 5 N Encryption and decryption 2 431 5 N Monitoring & Diagnostic 2 431 5 N Integration capabilities 2 431 5 N Cloud Native 2 431 5 N Performance, Latency and Throughputs (Real Time & Streaming) 2 431 5 N Performance, Latency and Throughputs (Batch files / DB actions ) 2 431 5 N
  • 16. Ingest TransformAnalyze Output Understand Problem Ingest Data Explore and Understand Data Clean and Shape Data Evaluate Data Create and build Models Communicate Results Deliver & Deploy Model Data Engineer Architect how data is organized & ensure operability Data Scientist Deep analytics and modeling for hidden insights Business Analyst Work with data to apply insights to business strategy App Developer Integrates data & insights with existing or new applications Traditional AI Tribe
  • 18. 4 4 4 4 4 3 3 0 1 2 3 4 ALGORITHM ML MODELING RESEARCH MATH STATISTICS HYPERTUNING FEATURE ENGINEERING 2 1 1 1 2 2 2 0 1 2 3 4 QA TROUBLE SHOOTING ABSTRACT THINKING DEBUGING DATA STRUCTURES UNIT TEST SOFTWARE ALGORITHM 1 1 1 3 4 0 1 2 3 4 SHELL/PERL C++ JAVA/SCALA PYTHON R 2 1 2 1 2 1 2 2 0 1 2 3 4 DATA MODELING DATA WAREHOUSING… SQL RDB & NOSQL ETL DATA GOVERNANCE DATA PIPELINE JOB/WORKFLOW 3 1 1 2 2 4 2 0 1 2 3 4 APPLIANCE(SCALE UP) DISTRIBUTE… KAFKA/STREAMING SPARK HADOOP SAS MPP Data Mining Programing / Coding Languages Data Engineering Big Data Stacks 3 3 2 3 4 2 2 2 0 1 2 3 4 PANDAS H2O SPARK MLLIB SCIKIT LEARN R LIB DL-CAFFE DL-KERAS DL-TENSORFLOW APIs /Services/App ( Model Serving) 1 1 1 1 1 1 1 0 1 2 3 4 AUTOMATION REAL TIME MESSAGING CACHE API/APP FRAMEWORK RPC/RESTFUL… CI/CD CLOUD NATIVE 3 4 3 2 2 3 0 1 2 3 4 DATA VISUALIZATION MODEL INTERPRETABILITY BUSINESS ACUMEN COMMUNICATION SKILLS BUSINESS ANALYSIS PRESENTATION Ratings for traditional data scientist ML/DL Frameworks Visualization & Communications
  • 19. Ratings for traditional data engineer 1 1 1 2 1 1 2 0 1 2 3 4 ALGORITHM ML MODELING RESEARCH MATH STATISTICS HYPERTUNING FEATURE ENGINEERING 3 3 3 3 4 3 3 0 1 2 3 4 QA TROUBLE SHOOTING ABSTRACT THINKING DEBUGING DATA STRUCTURES UNIT TEST SOFTWARE ALGORITHM 4 2 3 3 1 0 1 2 3 4 SHELL/PERL C++ JAVA/SCALA PYTHON R 4 4 4 4 4 4 4 4 0 1 2 3 4 DATA MODELING DATA WAREHOUSING… SQL RDB & NOSQL ETL DATA GOVERNANCE DATA PIPELINE JOB/WORKFLOW 3 4 3 4 4 2 3 0 1 2 3 4 APPLIANCE(SCALE UP) DISTRIBUTE… KAFKA/STREAMING SPARK HADOOP SAS MPP Data Mining Programing / Coding Languages Data Engineering Big Data Stacks 2 1 2 2 1 2 2 2 0 1 2 3 4 PANDAS H2O SPARK MLLIB SCIKIT LEARN R LIB DL-CAFFE DL-KERAS DL-TENSORFLOW APIs /Services/App ( Model Serving) 3 2 2 3 3 3 3 0 1 2 3 4 AUTOMATION REAL TIME MESSAGING CACHE API/APP FRAMEWORK RPC/RESTFUL… CI/CD CLOUD NATIVE 2 1 2 1 1 2 0 1 2 3 4 DATA VISUALIZATION MODEL INTERPRETABILITY BUSINESS ACUMEN COMMUNICATION SKILLS BUSINESS ANALYSIS PRESENTATION ML/DL Frameworks Visualization & Communications
  • 20. Ratings for traditional application developer 1 1 1 2 1 1 1 0 1 2 3 4 ALGORITHM ML MODELING RESEARCH MATH STATISTICS HYPERTUNING FEATURE ENGINEERING 3 4 4 4 3 4 4 0 1 2 3 4 QA TROUBLE SHOOTING ABSTRACT THINKING DEBUGING DATA STRUCTURES UNIT TEST SOFTWARE ALGORITHM 2 3 4 2 1 0 1 2 3 4 SHELL/PERL C++ JAVA/SCALA PYTHON R 2 1 3 3 2 2 2 3 0 1 2 3 4 DATA MODELING DATA WAREHOUSING… SQL RDB & NOSQL ETL DATA GOVERNANCE DATA PIPELINE JOB/WORKFLOW 1 1 2 1 1 1 1 0 1 2 3 4 APPLIANCE(SCALE UP) DISTRIBUTE… KAFKA/STREAMING SPARK HADOOP SAS MPP Data Mining Programing / Coding Languages Data Engineering Big Data Stacks 1 1 1 1 1 1 1 1 0 1 2 3 4 PANDAS H2O SPARK MLLIB SCIKIT LEARN R LIB DL-CAFFE DL-KERAS DL-TENSORFLOW APIs /Services/App ( Model Serving) 3 4 4 4 4 4 3 0 1 2 3 4 AUTOMATION REAL TIME MESSAGING CACHE API/APP FRAMEWORK RPC/RESTFUL… CI/CD CLOUD NATIVE 2 1 3 2 1 2 0 1 2 3 4 DATA VISUALIZATION MODEL INTERPRETABILITY BUSINESS ACUMEN COMMUNICATION SKILLS BUSINESS ANALYSIS PRESENTATION ML/DL Frameworks Visualization & Communications
  • 21. Ratings for modern AI engineer 3 2 2 3 2 3 3 0 1 2 3 4 ALGORITHM ML MODELING RESEARCH MATH STATISTICS HYPERTUNING FEATURE ENGINEERING 3 4 4 3 4 3 4 0 1 2 3 4 QA TROUBLE SHOOTING ABSTRACT THINKING DEBUGING DATA STRUCTURES UNIT TEST SOFTWARE ALGORITHM 3 2 4 4 2 0 1 2 3 4 SHELL/PERL C++ JAVA/SCALA PYTHON R 3 3 3 3 3 3 4 3 0 1 2 3 4 DATA MODELING DATA WAREHOUSING… SQL RDB & NOSQL ETL DATA GOVERNANCE DATA PIPELINE JOB/WORKFLOW 3 4 4 4 4 2 3 0 1 2 3 4 APPLIANCE(SCALE UP) DISTRIBUTE… KAFKA/STREAMING SPARK HADOOP SAS MPP Data Mining Programing / Coding Languages Data Engineering Big Data Stacks 3 2 4 3 2 3 4 4 0 1 2 3 4 PANDAS H2O SPARK MLLIB SCIKIT LEARN R LIB DL-CAFFE DL-KERAS DL-TENSORFLOW APIs /Services/App ( Model Serving) 3 4 4 4 4 3 3 0 1 2 3 4 AUTOMATION REAL TIME MESSAGING CACHE API/APP FRAMEWORK RPC/RESTFUL… CI/CD CLOUD NATIVE 4 3 3 3 2 3 0 1 2 3 4 DATA VISUALIZATION MODEL INTERPRETABILITY BUSINESS ACUMEN COMMUNICATION SKILLS BUSINESS ANALYSIS PRESENTATION Visualization & Communications ML/DL Frameworks
  • 22. AI EngineerAreas need enhancements Training / Improving approach Languages Deep Learning To be expert , from bottom to the top Use Deep Learning to avoid the gap Spark ***** Hadoop *** 6 months certificate Plus Data Engineering Big Data Stacks API /Application Design and implement Essentials Model Serving Programing Course or Hands on project , no need a CS MasterProgramming / Coding Java/ Scala ++ ,Python + Growing path 1 : traditional data scientist - > AI Engineer
  • 23. AI EngineerAreas need enhancements Training / Improving approach Languages ML Framework Deep Learning Visualization , Communication Skills Presentation Visualization & Communication API /Application Design and implement Advanced Model Serving Secondary DS/BA Master At least 6 months certificate Use Deep Learning to simply Data Mining Java/ Scala + ,Python +,R ++ Fast.ai ,GitHub , Kaggle, Coursera /Udemy At least 6 months DL certificate Growing path 2 : traditional data engineer - > AI Engineer
  • 24. AI EngineerAreas need enhancements Training / Improving approach Languages Fast.ai ,GitHub , Kaggle, Coursera /Udemy At least 6 months DL certificate ETL Essentials Data Pipeline ++ Use Deep Learning to avoid the gap Hadoop + Spark Essentials At least 6 months certificate Data Engineering Big Data Stacks Python +++, R ++ Secondary DS/BA Master At least 6 months certificate Use Deep Learning to simply Data Mining ML Framework Deep Learning Visualization , Communication Skills Presentation Visualization & Communication Growing path 3 : traditional application developer - > AI Engineer
  • 25. Module 4+ Code Lab2 https://github.com/jack1981/AaaSDemo
  • 27. Spark Mllib ALS NCF on Spark Collaborative Filtering -- Model selection Spark Mllib KMeans
  • 28. • Offers a set of parallelized machine learning algorithms for ML • Supports Model Selection (hyper parameter tuning) using Cross Validation and Train-Validation Split. • Supports Java, Scala or Python apps using Data Frame-based API Enables Parallel, Distributed ML for large datasets on Spark Clusters Spark Mllib
  • 30. Spark Mllib ML Pipeline DataFrame: Spark ML uses DataFrame rather than regular RDD as they hold a variety of data types (e.g. feature vectors, true labels, and predictions). Transformer: a transformer converts a DataFrame into another DataFrame usually by appending columns. (since Spark DataFrame is immutable, it actually creates a new DataFrame). The implement method for a transformer is “transform()”. Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. Implements method fit() taking a DataFrame and a model (also a transformer) as input. Pipeline: Chains multiple Transformers and Estimators each as a stage to specify an ML workflow. These stages are run in order, and the input DataFrame is transformed as it passes through each stage. Parameter: All Transformers and Estimators now share a common API for specifying parameters. Evaluator: Evaluate model performance. The Evaluator can be • RegressionEvaluator for regression problems, • BinaryClassificationEvaluator for binary data, or • MulticlassClassificationEvaluator for multiclass problems.
  • 31. Alternating Least Squares (ALS) Spark ML
  • 32. Collaborative Filtering -- Trade Offs Cluster Based MF Based Cluster +MF Based Deep Learning (CPU)
  • 34. Live Demo (Build an end to end AI Pipeline with Kafka, NiFi, Spark Streaming and Keras on Spark)
  • 35. Kafka
  • 36. Livy
  • 38. Q & A