Building intelligent applications, experimental ML with Uber’s Data Science Workbench

Building Intelligent Applications & Experimental
ML with Uber’s Data Science Workbench
Adam Hudson & Atul Gupte
Uber Inc.

/ Data at Uber
/ Analytics Stack
/ Machine Learning at Uber
/ Data Science Workbench
/ Real-world Impact
Contents

Engineer turned Product Manager
Previously: building FarmVille & the mobile advertising platform @ Zynga
Currently: Product Manager for Data Science Workbench & Data
Warehouse
/ About Atul

Uber's mission is to bring reliable
transportation - to everyone, everywhere

Data informs every decision at the company

Uber’s massive data holds deep, hidden insights.
We surface them

6,000+ data scientists, engineers, and operations
managers rely on us to support the business

Data is what differentiates Uber
but, data at Uber is unlike anywhere else.

Delicate marketplace with
network effects
Bits to atoms
Business
New LOBs spun up in a snap
Pluggable mobility platform
Spatio-temporal
Analytics
Sheer scale
Real-time. Real-world.
ML is Uber’s brain
Apps/Machine generated
queries
Varied skills: BI to DNN
Consumers
Internal and external
6,000 and growing
What makes Uber unique

MISSION
Move the world with
global data, local
insights, and intelligent
decisions.
Data Platform Team

The Data Team
Ingest
Workflow
Management
Store
Produce Model
Ad-Hoc &
Streaming
Analytics
Business
Intelligence
Machine
Learning
Metadata/
Knowledge
Experimentation
/
Segmentation
Visualization
Data
Infrastructure
Data Platforms
Data Services
& Analytics
Disperse

Kafka
Schemaless
SOA
BI Apps Ad-hocExperimentation ML Notebooks
Cluster
Management
All-Active
Observability
Security
Raw
Data
Raw
Tables
Hadoop
Hive Presto Spark
Modeled
Tables
Vertica
Vertica
Warehouse
AthenaX
Apollo
Streaming
Real-time
Metadata/Workflow Management
Data Infrastructure

The hype
● Ability of a machine to learn without being explicitly programmed
● Identify hidden patterns in the world based on current and historical
data and use it to predict the future
● Ability of a machine to get better at a task with data and experience
● Learn from mistakes and improve when given newer/more information

Demand prediction
Object detection/tracking
Motion prediction
Route planning
Pick-up clustering
Voice recognition
Supply modeling
Occupancy
modeling
Route planning, ETA, road modeling, low-
latency image classifier
Elasticity estimation, ETA, route
optimization, demand prediction
Speech generation, Natural language generations,
image classifiers, drop-off clustering

2. prototype
3. productionize
1. define
4. measure
Launch and Iterate
Typical ML Workflow

UNDERSTAND
BUSINESS NEED(S)
DEFINE MINIMUM
VIABLE PRODUCT (MVP)
○ Customers + cross-functional team
○ Define objectives and key results
○ Data-driven
○ Research
○ Ruthless prioritization
2. prototype
3. productionize
4. measure
1. define
Problem Definition

UNDERSTAND
BUSINESS NEED(S)
DEFINE MINIMUM
2. prototype
1. define
GET DATA
DATA PREPARATION
TRAIN MODELS
EVALUATE MODELS
3. productionize
4. measure
validation
computational cost
interpretability
SQL, Spark
data cleansing and pre-
processing,
R / Python
CPU or GPU
Exploration

UNDERSTAND
BUSINESS NEED(S)
2. prototype
1. define
DATA PREPARATION
TRAIN MODELS
EVALUATE MODELS
4. measure GET DATA
PRODUCTIONIZE
MODELS
3. productionize
DEPLOY MODELS
Engineers + Data Scientists,
Java or Go,
unit tests
MAKE PREDICTIONSReal-time or
batch
Experimentation and
rollout monitoring;
Retraining strategy
DEFINE MINIMUM
Production

UNDERSTAND
BUSINESS NEED(S)
DEFINE MINIMUM
2. prototype
1. define
DATA PREPARATION
TRAIN MODELS
EVALUATE MODELS
GET DATA
DEPLOY MODELS
PRODUCTIONIZE
MODELS
MONITOR
PREDICTIONS
4. measure
MAKE PREDICTIONS
3. productionize
Automatically detect
degradations
GATHER AND
ANALYZE INSIGHTS
Deep-dive analyses
inform future product
roadmap
Measure

Senior Software Engineer
Previously: Big data and big network R&D in gaming, social media &
finance
Currently: Developer on Data Science Workbench
/ About Adam

A growing Data Science community was facing
problems with many aspects of their workflows
Our world in 2016 NEW
Getting Started
CollaborationShared Standards
Moving Models to
Production
Scalability
Available Features
Data Access

To unleash the productivity of
Uber’s Data Science
community
Mission

We Wanted More!
● Diverse customers working from same data
○ Data scientists
○ Developers
○ Interns
○ Operations
○ External parties
● Scalability with access to internal data, computation and accounts
● Acceptable licensing cost for large number of casual users

Introducing Data Science Workbench
eng.uber.com/dsw

Our World Today
Getting Started
Collaboration
Shared Standards
Scalability
Available Features
Data Access
Fully hosted 1-click Jupyter Notebook & RStudio IDE
Pre-baked Environments
Sharing options on notebooks; 1-click Shiny dashboard publication
All internal data sources / Multi-DC / Secure / GDPR Compliant
Various Session Sizes, Types (CPU, GPU)/Access to Compute
Engines
Documentation Support

Common Use-Cases
● Large-scale data exploration
● Feature generation and model training
● Ad-hoc analysis and prototypes
● Review and collaboration

RStudio and Shiny are trademarks of RStudio, Inc
"Jupyter" is a trademark of the NumFOCUS foundation, of which Project Jupyter is a part.
"Python" is a registered trademark of the PSF. The Python logos (in several variants) are use trademarks of the PSF as well.

RStudio and Shiny are trademarks of RStudio, Inc

The World of Tomorrow!
Getting Started
Collaboration
Customized team environments
Social media-like interface; more flexible dashboards
Distributed deep learning
Low friction workflow
Available Features
Available Features
Moving Models
to Production

DSW Impact
Safety
Trip classification
Risk
Driver account check
Driver referral risk scoring
Uber Eats
Restaurant recommendations
Support
NLP model for support tickets
Operations
Lifetime value (LTV) model
more
!
And with that, I will pass you back to Atul to
discuss the impact that DSW is having.

We’re hiring!
Excited to build the data platform that moves the world?
Come join us!
http://t.uber.com/datahire
San Francisco, Palo Alto, Seattle, Bangalore

Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this
document may be reproduced or utilized in any form or by any means, electronic or mechanical,
including photocopying, recording, or by any information storage or retrieval systems, without
permission in writing from Uber. This document is intended only for the use of the individual or entity
to whom it is addressed and contains information that is privileged, confidential or otherwise exempt
from disclosure under applicable law. All recipients of this document are notified that the information
contained herein includes proprietary and confidential information of Uber, and recipient may not
make use of, disseminate, or in any way disclose this document or any of the enclosed information
to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.
Thank you!
and remember, t.uber.com/datahire
Questions?

Building intelligent applications, experimental ML with Uber’s Data Science Workbench

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Building intelligent applications, experimental ML with Uber’s Data Science Workbench

Ähnlich wie Building intelligent applications, experimental ML with Uber’s Data Science Workbench (20)

Mehr von DataWorks Summit

Mehr von DataWorks Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Building intelligent applications, experimental ML with Uber’s Data Science Workbench