SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Building Intelligent Applications & Experimental ML
with Uber’s Data Science Workbench
Felix Cheung & Atul Gupte
Uber Technologies, Inc.
/ Data at Uber
/ Analytics Stack
/ Spark at Uber
/ Machine Learning at Uber
/ Data Science Workbench
/ Common User Flows & Impact
Contents
Engineer turned Product Manager
Previously: building FarmVille & the mobile advertising platform @ Zynga
Currently: Product Manager for Data Science Workbench & Data Warehouse
/ About Atul
Apache Spark PMC & Committer
Engineer, Tech Lead & Area Owner of Spark @ Uber
/ About Felix
/ Data at Uber
Uber's mission is to bring reliable
transportation - to everyone, everywhere
Data informs every decision at the company
Uber’s massive data holds deep, hidden insights.
We surface them
6,000+ data scientists, engineers, and operations
managers rely on us to support the business
Data is what differentiates Uber
but, data at Uber is unlike anywhere else.
Delicate marketplace with
network effects
Bits to atoms
Business
New LOBs spun up in a snap
Pluggable mobility platform
Spatio-temporal
Analytics
Sheer scale
Real-time. Real-world.
ML is Uber’s brain
Apps/Machine generated queries
Varied skills: BI to DNN
Consumers
Internal and external
6,000 and growing
What makes Uber unique
MISSION
Move the world with
global data, local
insights, and intelligent
decisions.
Data Platform Team
/ Data Analytics Stack
The Data Team
Ingest
Workflow
Management
Store
Produce Model
Ad-Hoc &
Streaming
Analytics
Business
Intelligence
Machine
Learning
Metadata/
Knowledge
Experimentation/
Segmentation
Visualization
Data
Infrastructure
Data Platforms
Data Services
& Analytics
Disperse
Kafka
Schemaless
SOA
BI Apps Ad-hocExperimentation ML Notebooks
Cluster
Management
All-Active
Observability
Security
Raw
Data
Raw
Tables
Hadoop
Hive Presto Spark
Modeled
Tables
Vertica
Vertica
Warehouse
AthenaX
Apollo
Streaming
Real-time
Metadata/Workflow Management
Data Infrastructure
/ Spark At Uber
at Uber Scale
100,000+
Spark jobs per day
~96%
ETL pipelines
~98%
YARN job resource use (in
vcore-seconds) on Spark
● 11,000+ machines across multiple data-centers
● Many 10s-petabytes of data
● Runs on one of the largest production HDFS clusters
Introducing Uber’s Spark Compute Service
Simplifies lives of developers & cluster operators
Consolidate Infrastructure Investments
YARN, Mesos
Available across multiple data-centers
Improve Developer Experience
Standardized Spark builds across Uber
Bring-your-own-stack (optional)
Advanced monitoring & debugging
Serve Multiple Use Cases
Exploratory, bursty & scheduled batch
Manage full Spark application lifecycle
Proliferate
Better language support (R/Python/Java)
Consumption Interfaces (CLI/REST/GUI)
Session Recap (June 5th)
Karthikeyan Natarajan
Senior Software Engineer
Bo Yang
Senior Software Engineer
/ Machine Learning At Uber
The hype
● Ability of a machine to learn without being explicitly programmed
● Identify hidden patterns in the world based on current and historical data
and use it to predict the future
● Ability of a machine to get better at a task with data and experience
● Learn from mistakes and improve when given newer/more information
Demand prediction
Object detection/tracking
Motion prediction
Route planning
Pick-up clustering
Voice recognition
Supply modeling
Occupancy
modeling
Route planning, ETA, road modeling,
low-latency image classifier
Elasticity estimation, ETA, route
optimization, demand prediction
Speech generation, Natural language generations,
image classifiers, drop-off clustering
2. prototype
3. productionize
1. define
4. measure
Launch and Iterate
Typical ML Workflow
UNDERSTAND
BUSINESS NEED(S)
DEFINE MINIMUM
VIABLE PRODUCT (MVP)
○ Customers + cross-functional team
○ Define objectives and key results
○ Data-driven
○ Research
○ Ruthless prioritization
2. prototype
3. productionize
4. measure
1. define
Problem Definition
UNDERSTAND
BUSINESS NEED(S)
DEFINE MINIMUM
VIABLE PRODUCT (MVP)
2. prototype
1. define
GET DATA
DATA PREPARATION
TRAIN MODELS
EVALUATE MODELS
3. productionize
4. measure
validation
computational cost
interpretability
SQL, Spark
data cleansing and
pre-processing,
R / Python
CPU or GPU
Exploration
UNDERSTAND
BUSINESS NEED(S)
2. prototype
1. define
DATA PREPARATION
TRAIN MODELS
EVALUATE MODELS
4. measure
GET DATA
PRODUCTIONIZE
MODELS
3. productionize
DEPLOY MODELS
Engineers + Data Scientists,
Java or Go,
unit tests
MAKE PREDICTIONSReal-time or
batch
Experimentation and
rollout monitoring;
Retraining strategy
DEFINE MINIMUM
VIABLE PRODUCT (MVP)
Production
UNDERSTAND
BUSINESS NEED(S)
DEFINE MINIMUM
VIABLE PRODUCT (MVP)
2. prototype
1. define
DATA PREPARATION
TRAIN MODELS
EVALUATE MODELS
GET DATA
DEPLOY MODELS
PRODUCTIONIZE
MODELS
MONITOR
PREDICTIONS
4. measure
MAKE PREDICTIONS
3. productionize
Automatically detect
degradations
GATHER AND ANALYZE
INSIGHTS
Deep-dive analyses
inform future product
roadmap
Measure
3x growth in Data Science community
Py and R Machine Learning was mostly DIY - and on laptops
Moving a Py models to production was hard
Proliferation of tools, libraries, infra
None of which could scale to 1000s
Collaboration and Sharing non-existent
Security / Compliance / DC redundancy
Our world in 2016
Data Science Workbench
eng.uber.com/dsw
Unleash the productivity of the Data Science
community at Uber by providing scalable
infrastructure, tools, customization, and support.
Mission
Fully hosted environment - nothing to install
One-click to Jupyter Notebook or RStudio IDE
Pre-baked environment
Session Customization (BYOP)
Wired to all internal sources and compute engines
Our world today
Share/publish/comment on data/notebooks
One-click publish to Shiny dashboards
Multi-DC
Secure and GDPR Compliant
Support & documentation
Product Walkthrough
RStudio and Shiny are trademarks of RStudio, Inc
"Jupyter" is a trademark of the NumFOCUS foundation, of which Project Jupyter is a part.
"Python" is a registered trademark of the PSF. The Python logos (in several variants) are use trademarks of the PSF as well.
RStudio and Shiny are trademarks of RStudio, Inc
DSW + Spark
DSW + Spark Architecture
Storage Service
DataScientists
FrontEnd
Application
Management
DSW DSW cluster
ContainerContainer
Container
RStudio
Server
Container
Jupyter
Server Compute
Service
Hadoop Cluster
Hive
Presto
Spark
HDFS
SparkMagic
Livy
DSW + Spark Use-cases
● Explore large-scale dataset
● Parallelise Python native packages for feature
generation & model training
● Collaborate and review on a common interface for
ad-hoc analysis & prototyping
Common DS Patterns (#1)
PySpark
Python
Native
packages
PySpark
Hive Tables Hive Tables
scikit-learn
Features
DSW
Common DS Patterns (#2)
Spark
Scala
mllib
Hive Tables HDFS
Trained
Model
Production
DSW
Evaluate
DSW + Spark Impact
Safety
Trip classification
Risk
Driver account check
Driver referral risk scoring
Uber Eats
Restaurant recommendations
Support
NLP model for support tickets
Operations
Lifetime value (LTV) model
more!
/ … one last thing
We’re hiring!
Excited to build the data platform that moves the world?
Come join us!
http://t.uber.com/datahire
San Francisco, Palo Alto, Seattle, Bangalore
Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this
document may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt
from disclosure under applicable law. All recipients of this document are notified that the information
contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information to
any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.
Questions?
Thank you!
and remember, t.uber.com/datahire

Weitere ähnliche Inhalte

Was ist angesagt?

Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...Databricks
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015Lance Co Ting Keh
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Databricks
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at UberDataWorks Summit
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLYulia Tell
 
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Spark Summit
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Databricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Chris Nicholson, CEO Skymind at The AI Conference
Chris Nicholson, CEO Skymind at The AI Conference Chris Nicholson, CEO Skymind at The AI Conference
Chris Nicholson, CEO Skymind at The AI Conference MLconf
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Databricks
 

Was ist angesagt? (20)

Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015SparkApplicationDevMadeEasy_Spark_Summit_2015
SparkApplicationDevMadeEasy_Spark_Summit_2015
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDLDistributed Deep Learning At Scale On Apache Spark With BigDL
Distributed Deep Learning At Scale On Apache Spark With BigDL
 
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
Accelerating Inference in the Data Center with Malini Bhandaru and Karol Zale...
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Chris Nicholson, CEO Skymind at The AI Conference
Chris Nicholson, CEO Skymind at The AI Conference Chris Nicholson, CEO Skymind at The AI Conference
Chris Nicholson, CEO Skymind at The AI Conference
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 

Ähnlich wie Building Intelligent Applications, Experimental ML with Uber’s Data Science Workbench with Atul Gupte and Felix Cheung

Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...DataWorks Summit
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
 
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with SplunkSplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with SplunkSplunk
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache SparkSpark Summit
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceAlex Danvy
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resumevenkata sateeshs
 
Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. clive boulton
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Splunk
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 

Ähnlich wie Building Intelligent Applications, Experimental ML with Uber’s Data Science Workbench with Atul Gupte and Felix Cheung (20)

Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with SplunkSplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Surbhi Bhatnagar Resume
Surbhi Bhatnagar ResumeSurbhi Bhatnagar Resume
Surbhi Bhatnagar Resume
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Ravi Sundriyal
Ravi SundriyalRavi Sundriyal
Ravi Sundriyal
 
Resume201601
Resume201601Resume201601
Resume201601
 
Big Data
Big DataBig Data
Big Data
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resume
 
Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru. Whole Chain Traceability, pulling a Kobayashi Maru.
Whole Chain Traceability, pulling a Kobayashi Maru.
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On)
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 

Kürzlich hochgeladen (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 

Building Intelligent Applications, Experimental ML with Uber’s Data Science Workbench with Atul Gupte and Felix Cheung

  • 1. Building Intelligent Applications & Experimental ML with Uber’s Data Science Workbench Felix Cheung & Atul Gupte Uber Technologies, Inc.
  • 2. / Data at Uber / Analytics Stack / Spark at Uber / Machine Learning at Uber / Data Science Workbench / Common User Flows & Impact Contents
  • 3. Engineer turned Product Manager Previously: building FarmVille & the mobile advertising platform @ Zynga Currently: Product Manager for Data Science Workbench & Data Warehouse / About Atul
  • 4. Apache Spark PMC & Committer Engineer, Tech Lead & Area Owner of Spark @ Uber / About Felix
  • 5. / Data at Uber
  • 6. Uber's mission is to bring reliable transportation - to everyone, everywhere
  • 7. Data informs every decision at the company
  • 8. Uber’s massive data holds deep, hidden insights. We surface them
  • 9. 6,000+ data scientists, engineers, and operations managers rely on us to support the business
  • 10. Data is what differentiates Uber but, data at Uber is unlike anywhere else.
  • 11. Delicate marketplace with network effects Bits to atoms Business New LOBs spun up in a snap Pluggable mobility platform Spatio-temporal Analytics Sheer scale Real-time. Real-world. ML is Uber’s brain Apps/Machine generated queries Varied skills: BI to DNN Consumers Internal and external 6,000 and growing What makes Uber unique
  • 12. MISSION Move the world with global data, local insights, and intelligent decisions. Data Platform Team
  • 14. The Data Team Ingest Workflow Management Store Produce Model Ad-Hoc & Streaming Analytics Business Intelligence Machine Learning Metadata/ Knowledge Experimentation/ Segmentation Visualization Data Infrastructure Data Platforms Data Services & Analytics Disperse
  • 15. Kafka Schemaless SOA BI Apps Ad-hocExperimentation ML Notebooks Cluster Management All-Active Observability Security Raw Data Raw Tables Hadoop Hive Presto Spark Modeled Tables Vertica Vertica Warehouse AthenaX Apollo Streaming Real-time Metadata/Workflow Management Data Infrastructure
  • 16. / Spark At Uber
  • 17. at Uber Scale 100,000+ Spark jobs per day ~96% ETL pipelines ~98% YARN job resource use (in vcore-seconds) on Spark ● 11,000+ machines across multiple data-centers ● Many 10s-petabytes of data ● Runs on one of the largest production HDFS clusters
  • 18. Introducing Uber’s Spark Compute Service Simplifies lives of developers & cluster operators Consolidate Infrastructure Investments YARN, Mesos Available across multiple data-centers Improve Developer Experience Standardized Spark builds across Uber Bring-your-own-stack (optional) Advanced monitoring & debugging Serve Multiple Use Cases Exploratory, bursty & scheduled batch Manage full Spark application lifecycle Proliferate Better language support (R/Python/Java) Consumption Interfaces (CLI/REST/GUI)
  • 19. Session Recap (June 5th) Karthikeyan Natarajan Senior Software Engineer Bo Yang Senior Software Engineer
  • 21. The hype ● Ability of a machine to learn without being explicitly programmed ● Identify hidden patterns in the world based on current and historical data and use it to predict the future ● Ability of a machine to get better at a task with data and experience ● Learn from mistakes and improve when given newer/more information
  • 22. Demand prediction Object detection/tracking Motion prediction Route planning Pick-up clustering Voice recognition Supply modeling Occupancy modeling Route planning, ETA, road modeling, low-latency image classifier Elasticity estimation, ETA, route optimization, demand prediction Speech generation, Natural language generations, image classifiers, drop-off clustering
  • 23. 2. prototype 3. productionize 1. define 4. measure Launch and Iterate Typical ML Workflow
  • 24. UNDERSTAND BUSINESS NEED(S) DEFINE MINIMUM VIABLE PRODUCT (MVP) ○ Customers + cross-functional team ○ Define objectives and key results ○ Data-driven ○ Research ○ Ruthless prioritization 2. prototype 3. productionize 4. measure 1. define Problem Definition
  • 25. UNDERSTAND BUSINESS NEED(S) DEFINE MINIMUM VIABLE PRODUCT (MVP) 2. prototype 1. define GET DATA DATA PREPARATION TRAIN MODELS EVALUATE MODELS 3. productionize 4. measure validation computational cost interpretability SQL, Spark data cleansing and pre-processing, R / Python CPU or GPU Exploration
  • 26. UNDERSTAND BUSINESS NEED(S) 2. prototype 1. define DATA PREPARATION TRAIN MODELS EVALUATE MODELS 4. measure GET DATA PRODUCTIONIZE MODELS 3. productionize DEPLOY MODELS Engineers + Data Scientists, Java or Go, unit tests MAKE PREDICTIONSReal-time or batch Experimentation and rollout monitoring; Retraining strategy DEFINE MINIMUM VIABLE PRODUCT (MVP) Production
  • 27. UNDERSTAND BUSINESS NEED(S) DEFINE MINIMUM VIABLE PRODUCT (MVP) 2. prototype 1. define DATA PREPARATION TRAIN MODELS EVALUATE MODELS GET DATA DEPLOY MODELS PRODUCTIONIZE MODELS MONITOR PREDICTIONS 4. measure MAKE PREDICTIONS 3. productionize Automatically detect degradations GATHER AND ANALYZE INSIGHTS Deep-dive analyses inform future product roadmap Measure
  • 28. 3x growth in Data Science community Py and R Machine Learning was mostly DIY - and on laptops Moving a Py models to production was hard Proliferation of tools, libraries, infra None of which could scale to 1000s Collaboration and Sharing non-existent Security / Compliance / DC redundancy Our world in 2016
  • 30. Unleash the productivity of the Data Science community at Uber by providing scalable infrastructure, tools, customization, and support. Mission
  • 31.
  • 32. Fully hosted environment - nothing to install One-click to Jupyter Notebook or RStudio IDE Pre-baked environment Session Customization (BYOP) Wired to all internal sources and compute engines Our world today Share/publish/comment on data/notebooks One-click publish to Shiny dashboards Multi-DC Secure and GDPR Compliant Support & documentation
  • 34. RStudio and Shiny are trademarks of RStudio, Inc "Jupyter" is a trademark of the NumFOCUS foundation, of which Project Jupyter is a part. "Python" is a registered trademark of the PSF. The Python logos (in several variants) are use trademarks of the PSF as well.
  • 35. RStudio and Shiny are trademarks of RStudio, Inc
  • 36.
  • 37.
  • 38.
  • 40. DSW + Spark Architecture Storage Service DataScientists FrontEnd Application Management DSW DSW cluster ContainerContainer Container RStudio Server Container Jupyter Server Compute Service Hadoop Cluster Hive Presto Spark HDFS SparkMagic Livy
  • 41. DSW + Spark Use-cases ● Explore large-scale dataset ● Parallelise Python native packages for feature generation & model training ● Collaborate and review on a common interface for ad-hoc analysis & prototyping
  • 42. Common DS Patterns (#1) PySpark Python Native packages PySpark Hive Tables Hive Tables scikit-learn Features DSW
  • 43. Common DS Patterns (#2) Spark Scala mllib Hive Tables HDFS Trained Model Production DSW Evaluate
  • 44. DSW + Spark Impact Safety Trip classification Risk Driver account check Driver referral risk scoring Uber Eats Restaurant recommendations Support NLP model for support tickets Operations Lifetime value (LTV) model more!
  • 45. / … one last thing
  • 46. We’re hiring! Excited to build the data platform that moves the world? Come join us! http://t.uber.com/datahire San Francisco, Palo Alto, Seattle, Bangalore
  • 47. Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Questions? Thank you! and remember, t.uber.com/datahire