Had a great pleasure and honor to give a lecture about the Current and Future Challenges in Data Science at the Nextech 2019 conference alongside an impressive list of other speakers
2. First a bit about myselfâŚ
⢠Married to Yonit & father of Yahav & Arbel
⢠Lecturer & researcher @ Ben Gurion University
⢠Worked as a data science researcher @
⢠Co-founded the non-profit âDeep-Learning-Boot-Campâ
⢠Co-founder of Dudes (still in stealth)
⢠Ranked (at peak) #175 of over 80,000 active Kaggle competitors
⢠Self taught and very(!) curious
3. Data is everywhere
Nature monitoring
Automatic x-ray analysis
Seizure prediction
Fashion simulations
Video analysis
Speech 2 Text 2 Speech
4. But what is data science?
Well⌠the answer largely depends on who do we askâŚ
We will use some loose/partial definition most can agree upon:
Data science is:
Both academic and practical research field that is aimed to
programmatically extract insights and knowledge
from data in a reproduceable & generalizable manner
Training
data
New unseen
data
6. Current state â enablers
Hugh increase in compute ability mainly GPU compute and RAM
up to 48GB RAM on a single card & 11GB for most consumer cards
Highly available and
scalable through
cloud services
7. Current state â enablers
Large open and labeled datasets
âBookCorpusâ dataset
(11,038 books)
MSCOCOcelebsAImageNet
Yelp Open Datasets
8. Current state â enablers
Open source code and large model zoo
UNet
Efficient Net
Faster RCNN
9. Current state â enablers
Good ability to perform transfer-learning and fine tuning between datasets
transfer-learning
11. Current state tasks and applications â visual domain (partial list)
visual
domain
Classification
Detection
Segmentation
Pose
Estimation
Style transfer
Generation
(GANs)Document
Authentication
Deep Fake &
Face-swap
Adversarial
attacks
Depth
estimation
Tracking
Search Within
Images
Object & Crowd
Counting
Regression
Similarity &
metric learning
Image based
search
Severity
Estimation
Object price
Estimation
Super
resolution
Image
colorization
Image
captioning
Visual
question
answering
OCR
Image-Text
cross learning
Disease
classification
Activity
classification
Medical
imagery
segmentation
Risky objects
segmentation
Route
segmentation
auto. vehicle
Object Relation
Inference
12. Current tasks and applications â language domain (partial list)
language
domain
Classification
Machine
Translation
Free text to
structured
Sentiment
analysis
Question
Answering
summarization
Inappropriate
content
Semantic Text Similarity
(e.g. for search relevance)
Research
papers
legal
medical
Call centers
Automatic
Rating
Stress
detection
Named Entity
Recognition
Part of Speech
Tagging
Semantic Role
Labeling
Conversational
Bots
Free Text
querying
Online
Translation
Auto
Language
Classification
13. Current tasks and applications - other domains (extremely partial list)
Other
domains
Classification
Time series
sound Tabular Data
Graph based
learning
Speech 2 Text
Text 2 Speech
Speaker
Separation
Background
Noise removal
classification forecasting
Signal
disaggregation
segmentation
Anomaly
detection
regression
clustering
Content
recommendation
Node / Graph
Embeddings
Graph Relation
Inference
Partial Graph
Completion
18. Current challenges
⢠Limited or lack of monitoring in production systems
In many production systems monitoring is either lacking or limited
Training data
Production data
On deployment
Production data
2 month later
Production data
4 month later
20. Future challenges
⢠Bias detection and bias correction
professional hairstyle for work:
unprofessional hairstyle for work:
Gender bias in word embeddings:
Man -> doctor woman -> nurse
23. Future challenges
⢠Cross modality learning
The dog is about
to chase a cat
Voice
Image / video
text
Complementary inputs to the same model
24. Future challenges
⢠Function learning
A chair is an object
you can seat on
A parking spot is a place to
leave the car when not driving
25. Future challenges
⢠Wide adoption of content generation
Great potential in:
⢠Film making
⢠Transcribing
⢠Content candidate generation
⢠Art
⢠Simulation
29. With great power comes great responsibility
https://github.com/daviddao/awful-ai
Well⌠these are powerful tools â we must use them carefully
30. Future challenges
Data science involves many
other positions
Communication with
business is crucial for success
Management must become
data-literate
⢠Metrics
⢠Tasks
⢠Validation
⢠Findings to actions
⢠Engineering
⢠Analysts
⢠Business
⢠IT
⢠Results & goals
⢠Assumptions
⢠Domain knowledge
⢠Capabilities
⢠Needs (both sides)