SlideShare ist ein Scribd-Unternehmen logo
1 von 36
FUTURE OF DATA PLATFORM IN CLOUD
NATIVE ERA
- Srivatsan Srinivasan
WHO AM I?
Chief Data Scientist at Cognizant
https://www.linkedin.com/in/srivatsan-srinivasan-b8131b/
https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw
AIEngineering
Cloud Native Data Application
Edge AI/Analytics
Hybrid Cloud
Prescriptive Analytics (From what to why)
Augmented Analytics
BACKGROUND FOR THIS TALK
Is it really End of Hadoop Era?
Is it really End of Hadoop Era?
• It did not live up with performance need of
Organization
• It was not able to replace existing EDW
Infrastructure
• It is too Hard to maintain and even hard for it
being cloud ready
• Cloud killed Hadoop
Is it really End of Hadoop Era?
• People failed Hadoop. It is people who did not
know what use case best fitted Hadoop
• People who were trying to solve technology
problem rather business problem
• Hadoop Architecture needs a Refresh in todays
world
• Underlying assumptions on which Hadoop was
created decade back is no longer relevant for
years now
• There is better way of doing Hadoop on premise
CHALLENGES WITH BIG DATA
PLATFORM
CHALLENGE 1 – Separate Data and Application Infrastructure
Data Infrastructure Application Infrastructure
CHALLENGE 1 – Separate Data and Application Infrastructure
 Separate Infrastructure management
 Separate Dev Ops/Data Ops
 Not so efficient use of Infrastructure and
Specialized hardware accelerators
 Application have to re-written during
movement from one environment to another
CHALLENGE 2 – Difficult Dependency Management
CHALLENGE 2 – Difficult Dependency Management
CHALLENGE 2 – Difficult Dependency and Version Management
 Data Scientist need access to latest and
greatest version
 Interdependency between multiple versions
 Yarn does not provide way to isolate
dependency easily
 Package dependency during spark-submit
 Create different conda environment per
project
CHALLENGE 3 – Portability to Hybrid Infrastructure
On Premise
Application Application
Public Cloud
Pattern 1 – Build On premise and Deploy on Cloud
On Premise
(Primary)
Application Application
Public Cloud
(DR)
Pattern 2 – Primary On premise and DR on Cloud
Failover
Pattern 3 – Cloud Bursting
On Premise
(Primary Infra)
Application Application
Public Cloud
(Extended Infra)
Bust on
demand
On Premise
(Sensitive Data)
Application Application
Public Cloud
(Non sensitive data)
Pattern 4 – Placement based on Data Sensitivity and Data Gravity
CHALLENGE 4 – Reproducibility from development to production
CHALLENGES – Others
 Spark version upgrade – All tenants impacted
 Difficult defining deployment strategies like Champion/Challenger
deployment
 Data Locality - Linearly scale storage and compute
 All data has to be together
FUTURE OF DATA ARCHITECTURE
What Happened?
More’s law on Bandwidth happened making data locality not so important
Containers and Kubernetes happened making Yarn exclusive to few data
applications
Cloud Storage happened making Hadoop storage not so cheap (With Caveat
though..)
Apache Hadoop and supporting distributed systems were built in a world
were underlying assumptions were different than what it is today
What happened today?
What do we really need?
 Common run time layer across your private and public cloud
 Abstract away dependency and version conflicts
 Efficient usage of existing infrastructure
 Consistent tooling and CI/CD process across environments to increase
efficiency
 Avoid vendor lock in for vendor portability
 Handle Bursty workload
 Time to provision new environments and agility to test latest offering
Converged Infrastructure and Consistent Tooling
Data Applications Other Application
Kubernetes
Infrastructure
Converged Infrastructure and Consistent Tooling
Operator Support for Data Application
Spark Operator
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
Kafka Operator
https://www.confluent.io/confluent-operator/
https://github.com/strimzi/strimzi-kafka-operator
Flink Operator
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator
Airflow Operator
https://github.com/GoogleCloudPlatform/airflow-operator
Step 1: Decouple compute and storage
S3, HDFS, GPFS, MapR-FS Spark
• Compute not being bound to storage. At same time use existing enterprise data storage if exists
• Assumes network throughput is higher
• Adds 2 to 6% latency depending on use case
Step 1: Decouple compute and storage
S3, HDFS, GPFS, MapR-FS Spark
Compute nodes can be adjusted to compute needs and Storage can scale independently
Step 1: Decouple compute and storage
S3, GCS, Azure Blob Spark
Cloud Ready
Spark on Kubernetes – Native Support
spark-submit 
--master k8s://<kubeserver>:<port> 
--deploy-mode cluster 
--name spark-tensorflow
--conf spark.executor.instances=4 
--conf spark.kubernetes.container.image=pyspark-tf:v2.4.3 
--conf spark.kubernetes.namespace=user1 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark 
--conf spark.kubernetes.pyspark.pythonVersion=3 
local:///app/model/train/spark_tf.py
Spark on Kubernetes – Native Support
Source: Google Cloud
Spark on Kubernetes – Native Support
spark-submit 
--master k8s://<kubeserver>:<port> 
--deploy-mode cluster 
--name spark-tensorflow
--conf spark.executor.instances=4 
--conf spark.kubernetes.container.image=pyspark-tf:v2.4.3 
--conf spark.kubernetes.namespace=user1 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark 
--conf spark.kubernetes.pyspark.pythonVersion=3 
local:///app/model/train/spark_tf.py
Kubernetes Operator
Automates deployment of
application
Operator is an method of
packaging, deploying and
managing instances of complex
stateful applications
It builds upon the basic
Kubernetes resource and
controller concepts but includes
domain or application-specific
knowledge to automate
common tasks
Spark Operator Stack
Infrastructure
Source: cern.ch
Spark Application Definition
Spark Operator
Spark Operator
Spark Operator controller watches for
create/delete/update events of
SparkApplication
Submission runner runs spark-
submit for submissions received from
the controller
Spark Operator
Spark Pod Monitor reports updates of
pods to controller
Mutating Admission WebHook handles
customization of Spark driver and
executor pods
IS IT PRIMETIME READY?
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

Azure architecture
Azure architectureAzure architecture
Azure architecture
Amal Dev
 

Was ist angesagt? (20)

Services comparison among Microsoft Azure AWS and Google Cloud Platform
Services comparison among Microsoft Azure AWS and Google Cloud PlatformServices comparison among Microsoft Azure AWS and Google Cloud Platform
Services comparison among Microsoft Azure AWS and Google Cloud Platform
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)
 
Azure IoT Edge: a breakthrough platform and service running cloud intelligenc...
Azure IoT Edge: a breakthrough platform and service running cloud intelligenc...Azure IoT Edge: a breakthrough platform and service running cloud intelligenc...
Azure IoT Edge: a breakthrough platform and service running cloud intelligenc...
 
Getting Started with Amazon EC2
Getting Started with Amazon EC2Getting Started with Amazon EC2
Getting Started with Amazon EC2
 
The Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET frameworkThe Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET framework
 
tcp cloud - Advanced Cloud Computing
tcp cloud - Advanced Cloud Computingtcp cloud - Advanced Cloud Computing
tcp cloud - Advanced Cloud Computing
 
Why Microservice
Why Microservice Why Microservice
Why Microservice
 
Virtualization and cloud computing
Virtualization and cloud computingVirtualization and cloud computing
Virtualization and cloud computing
 
Best Practices with Azure & Kubernetes
Best Practices with Azure & KubernetesBest Practices with Azure & Kubernetes
Best Practices with Azure & Kubernetes
 
Moving Applications into Azure Kubernetes
Moving Applications into Azure KubernetesMoving Applications into Azure Kubernetes
Moving Applications into Azure Kubernetes
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 
Microsoft Azure Overview
Microsoft Azure OverviewMicrosoft Azure Overview
Microsoft Azure Overview
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud Services
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Application Virtualization
Application VirtualizationApplication Virtualization
Application Virtualization
 
AWS
AWSAWS
AWS
 
Azure architecture
Azure architectureAzure architecture
Azure architecture
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 

Ähnlich wie Future of Data Platform in Cloud Native world

Ähnlich wie Future of Data Platform in Cloud Native world (20)

How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Hybrid Cloud Point of View - IBM Event, 2015
Hybrid Cloud Point of View - IBM Event, 2015Hybrid Cloud Point of View - IBM Event, 2015
Hybrid Cloud Point of View - IBM Event, 2015
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
 
Best practices for application migration to public clouds interop presentation
Best practices for application migration to public clouds interop presentationBest practices for application migration to public clouds interop presentation
Best practices for application migration to public clouds interop presentation
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 

Kürzlich hochgeladen

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Kürzlich hochgeladen (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Future of Data Platform in Cloud Native world

  • 1. FUTURE OF DATA PLATFORM IN CLOUD NATIVE ERA - Srivatsan Srinivasan
  • 2. WHO AM I? Chief Data Scientist at Cognizant https://www.linkedin.com/in/srivatsan-srinivasan-b8131b/ https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw AIEngineering
  • 3. Cloud Native Data Application Edge AI/Analytics Hybrid Cloud Prescriptive Analytics (From what to why) Augmented Analytics
  • 5. Is it really End of Hadoop Era?
  • 6. Is it really End of Hadoop Era? • It did not live up with performance need of Organization • It was not able to replace existing EDW Infrastructure • It is too Hard to maintain and even hard for it being cloud ready • Cloud killed Hadoop
  • 7. Is it really End of Hadoop Era? • People failed Hadoop. It is people who did not know what use case best fitted Hadoop • People who were trying to solve technology problem rather business problem • Hadoop Architecture needs a Refresh in todays world • Underlying assumptions on which Hadoop was created decade back is no longer relevant for years now • There is better way of doing Hadoop on premise
  • 8. CHALLENGES WITH BIG DATA PLATFORM
  • 9. CHALLENGE 1 – Separate Data and Application Infrastructure Data Infrastructure Application Infrastructure
  • 10. CHALLENGE 1 – Separate Data and Application Infrastructure  Separate Infrastructure management  Separate Dev Ops/Data Ops  Not so efficient use of Infrastructure and Specialized hardware accelerators  Application have to re-written during movement from one environment to another
  • 11. CHALLENGE 2 – Difficult Dependency Management
  • 12. CHALLENGE 2 – Difficult Dependency Management
  • 13. CHALLENGE 2 – Difficult Dependency and Version Management  Data Scientist need access to latest and greatest version  Interdependency between multiple versions  Yarn does not provide way to isolate dependency easily  Package dependency during spark-submit  Create different conda environment per project
  • 14. CHALLENGE 3 – Portability to Hybrid Infrastructure On Premise Application Application Public Cloud Pattern 1 – Build On premise and Deploy on Cloud On Premise (Primary) Application Application Public Cloud (DR) Pattern 2 – Primary On premise and DR on Cloud Failover Pattern 3 – Cloud Bursting On Premise (Primary Infra) Application Application Public Cloud (Extended Infra) Bust on demand On Premise (Sensitive Data) Application Application Public Cloud (Non sensitive data) Pattern 4 – Placement based on Data Sensitivity and Data Gravity
  • 15. CHALLENGE 4 – Reproducibility from development to production
  • 16. CHALLENGES – Others  Spark version upgrade – All tenants impacted  Difficult defining deployment strategies like Champion/Challenger deployment  Data Locality - Linearly scale storage and compute  All data has to be together
  • 17. FUTURE OF DATA ARCHITECTURE
  • 18. What Happened? More’s law on Bandwidth happened making data locality not so important Containers and Kubernetes happened making Yarn exclusive to few data applications Cloud Storage happened making Hadoop storage not so cheap (With Caveat though..) Apache Hadoop and supporting distributed systems were built in a world were underlying assumptions were different than what it is today What happened today?
  • 19. What do we really need?  Common run time layer across your private and public cloud  Abstract away dependency and version conflicts  Efficient usage of existing infrastructure  Consistent tooling and CI/CD process across environments to increase efficiency  Avoid vendor lock in for vendor portability  Handle Bursty workload  Time to provision new environments and agility to test latest offering
  • 20. Converged Infrastructure and Consistent Tooling Data Applications Other Application Kubernetes Infrastructure
  • 21. Converged Infrastructure and Consistent Tooling
  • 22. Operator Support for Data Application Spark Operator https://github.com/GoogleCloudPlatform/spark-on-k8s-operator Kafka Operator https://www.confluent.io/confluent-operator/ https://github.com/strimzi/strimzi-kafka-operator Flink Operator https://github.com/GoogleCloudPlatform/flink-on-k8s-operator Airflow Operator https://github.com/GoogleCloudPlatform/airflow-operator
  • 23. Step 1: Decouple compute and storage S3, HDFS, GPFS, MapR-FS Spark • Compute not being bound to storage. At same time use existing enterprise data storage if exists • Assumes network throughput is higher • Adds 2 to 6% latency depending on use case
  • 24. Step 1: Decouple compute and storage S3, HDFS, GPFS, MapR-FS Spark Compute nodes can be adjusted to compute needs and Storage can scale independently
  • 25. Step 1: Decouple compute and storage S3, GCS, Azure Blob Spark Cloud Ready
  • 26. Spark on Kubernetes – Native Support spark-submit --master k8s://<kubeserver>:<port> --deploy-mode cluster --name spark-tensorflow --conf spark.executor.instances=4 --conf spark.kubernetes.container.image=pyspark-tf:v2.4.3 --conf spark.kubernetes.namespace=user1 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.pyspark.pythonVersion=3 local:///app/model/train/spark_tf.py
  • 27. Spark on Kubernetes – Native Support Source: Google Cloud
  • 28. Spark on Kubernetes – Native Support spark-submit --master k8s://<kubeserver>:<port> --deploy-mode cluster --name spark-tensorflow --conf spark.executor.instances=4 --conf spark.kubernetes.container.image=pyspark-tf:v2.4.3 --conf spark.kubernetes.namespace=user1 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.pyspark.pythonVersion=3 local:///app/model/train/spark_tf.py
  • 29. Kubernetes Operator Automates deployment of application Operator is an method of packaging, deploying and managing instances of complex stateful applications It builds upon the basic Kubernetes resource and controller concepts but includes domain or application-specific knowledge to automate common tasks
  • 33. Spark Operator Spark Operator controller watches for create/delete/update events of SparkApplication Submission runner runs spark- submit for submissions received from the controller
  • 34. Spark Operator Spark Pod Monitor reports updates of pods to controller Mutating Admission WebHook handles customization of Spark driver and executor pods
  • 35. IS IT PRIMETIME READY?