Weak supervision - Pydata London 2019

•

0 gefällt mir•395 views

The document discusses weak supervision, which uses unreliable labels to train machine learning models. Weak supervision works by creating many labeling functions that assign probabilistic labels to data, rather than definitive labels. These functions can be rules, user reviews, model predictions, and other heuristics. A generative model then learns the accuracy of labeling functions to determine the true labels. This technique can achieve good results with only a small number of true labels. However, it is difficult to evaluate and influence the importance of labeling functions without labels. The document promotes creating many diverse labeling functions to take advantage of weak supervision.

Daten & Analysen

Weak Supervision
Making use of unreliable labels
Eddie Bell
twitter.com/ejlbell

I probably don’t need to talk about
labels to a room of data scientists

Ravelin has some labels of good quality
and many labels of unknown quality

High quality labels Low quality labels
Unsupervised ❌ ❌
Supervised ✅ ❌
Semi-supervised ✅ ❓
Weakly-supervised ✅ ✅

Supervised Weakly-supervisedSemi-supervised

Labelling functions
● rules
● reviews from humans
● graph methods
● similarity methods
● perturbations / transformations
● model predictions

Generative model
Learning the Structure of Generative Models without Labeled Data
- Bach et al.

Discriminative model
As long as loss supports probabilistic labels

We got an additional 2% recall with
10’s of labelling functions

But how do you choose labelling functions
without labels to evaluate them?

No way to inﬂuence label function importance

... or maybe there is?
https://hazyresearch.github.io/snorkel/blog/superglue.html
“In many datasets, especially in real-world applications, there are
subsets of the data that our model underperforms on, or that we care
more about performing well on than others”
“... expert-heads are used to learn slice-specific representations. Then, an
attention mechanism is learned over expert heads to determine when and
how to combine these representations.”
yay!

In the low-density setting, sparsity of labels will mean that
there is limited room for even an optimal weighting of the
labeling functions to diverge much from the majority vote.
Conversely, as the label density grows, the majority vote will
eventually be optimal. It is the middle-density regime where
we expect to most benefit from applying the generative model.

“These labeling functions do not
need to be precise, and we should
write many of them.”

https://hazyresearch.github.io/snorkel/
🏊 Dive deep into snorkel
🏊‍♀

https://hazyresearch.github.io/snorkel/
🏊 Dive deep into snorkel
🏊‍♀
C
urrentresearch

https://hazyresearch.github.io/snorkel/
🏊 Dive deep into snorkel
🏊‍♀
C
urrentresearch
Stream-line your labelling

https://hazyresearch.github.io/snorkel/
🏊 Dive deep into snorkel
🏊‍♀
C
urrentresearch
Stream-line your labelling
Sea what it is all about

https://hazyresearch.github.io/snorkel/
🏊 Dive deep into snorkel
🏊‍♀
C
urrentresearch
Stream-line your labelling
Sea what it is all about
Betta e-fish-in-sea

You shoald at least breefly
mullet over despite its
fishues and crab-e-ats

Thankyou
www.ravelin.com
p.s. we’re hiring data scientists, analysts and a director

Weitere ähnliche Inhalte

Ähnlich wie Weak supervision - Pydata London 2019

2012 03 27_philly_jug_rewrite_staticLincoln III

The Developer ExperienceAtlassian

Atmosphere Conference 2015: The 10 Myths of DevOpsPROIDEA

Cucumber Presentation Kiev Meet Updimakovalenko

Selenium and Cucumber Selenium Conf 2011dimakovalenko

Some thoughts on social taggingmarti_hearst

Barga Data Science lecture 9Roger Barga

Doodling for-great-successRadamanthus Batnag

LT-Accelerate 2016: Between Custom and Off-the-shelf NLPYves Peirsman

Spark MLlib and Viral TweetsAsim Jalis

Why Your Selenium Tests are so Dang Brittle, and What to Do About ItJay Aho

Get your Hero Groove On - Heroes RebornCaleb Jenkins

High performance sites made easyMatthew Wilkes

Selfish Accessibility: WordCamp London 2017Adrian Roselli

Extreme Makeover: Web Site EditionOhio Public Library Information Network (OPLIN)

Extreme Makeover: Web Site Edition (OPLIN)Laura Solomon

Principled And Clean CodingMetin Ogurlu

Selfish Accessibility: Government Digital ServiceAdrian Roselli

2014 toronto-torbugc.titus.brown

A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...Laura M. Castro

Ähnlich wie Weak supervision - Pydata London 2019 (20)

2012 03 27_philly_jug_rewrite_static

The Developer Experience

Atmosphere Conference 2015: The 10 Myths of DevOps

Cucumber Presentation Kiev Meet Up

Selenium and Cucumber Selenium Conf 2011

Some thoughts on social tagging

Barga Data Science lecture 9

Doodling for-great-success

LT-Accelerate 2016: Between Custom and Off-the-shelf NLP

Spark MLlib and Viral Tweets

Why Your Selenium Tests are so Dang Brittle, and What to Do About It

Get your Hero Groove On - Heroes Reborn

High performance sites made easy

Selfish Accessibility: WordCamp London 2017

Extreme Makeover: Web Site Edition

Extreme Makeover: Web Site Edition (OPLIN)

Principled And Clean Coding

Selfish Accessibility: Government Digital Service

2014 toronto-torbug

A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...

Mehr von Eddie Bell

Working with Fashion Models - PyDataLondon 2016Eddie Bell

Learned RepresentationsEddie Bell

PyData London CNN Lightning TalkEddie Bell

The dark art of search relevancyEddie Bell

The Science of Colour (ExtractConf)Eddie Bell

Fashion product de-duplication with image similarity and LSHEddie Bell

Mehr von Eddie Bell (6)

Working with Fashion Models - PyDataLondon 2016

Learned Representations

PyData London CNN Lightning Talk

The dark art of search relevancy

The Science of Colour (ExtractConf)

Fashion product de-duplication with image similarity and LSH

Kürzlich hochgeladen

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg

PLE-statistics document for primary schscnajjemba

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制vexqp

Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta

一比一原版(UCD毕业证书）加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt

SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

7. Epi of Chronic respiratory diseases.pptibrahimabdi22

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样wsppdmt

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

Gartner's Data Analytics Maturity Model.pptxchadhar227

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan

怎样办理圣路易斯大学毕业证（SLU毕业证书）成绩单学校原版复制vexqp

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Kürzlich hochgeladen (20)

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...

PLE-statistics document for primary schs

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制

Harnessing the Power of GenAI for BI and Reporting.pptx

一比一原版(UCD毕业证书）加州大学戴维斯分校毕业证成绩单原件一模一样

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

7. Epi of Chronic respiratory diseases.ppt

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Gartner's Data Analytics Maturity Model.pptx

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...

怎样办理圣路易斯大学毕业证（SLU毕业证书）成绩单学校原版复制

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Weak supervision - Pydata London 2019

1. Weak Supervision Making use of unreliable labels Eddie Bell twitter.com/ejlbell

2. Who am I?

3. Fraud

4. I probably don’t need to talk about labels to a room of data scientists

5. Ravelin has some labels of good quality and many labels of unknown quality

6. High quality labels Low quality labels Unsupervised ❌ ❌ Supervised ✅ ❌ Semi-supervised ✅ ❓ Weakly-supervised ✅ ✅

7. Supervised Weakly-supervisedSemi-supervised

9. How does it work?

10.

11. Labelling functions

12. Labelling functions ● rules ● reviews from humans ● graph methods ● similarity methods ● perturbations / transformations ● model predictions

13. Probabilistic Labels

14. Generative model Learning the Structure of Generative Models without Labeled Data - Bach et al.

15. Discriminative model As long as loss supports probabilistic labels

16. Is it any good?

17. We got an additional 2% recall with 10’s of labelling functions

18. But how do you choose labelling functions without labels to evaluate them?

19. No way to inﬂuence label function importance

20. ... or maybe there is? https://hazyresearch.github.io/snorkel/blog/superglue.html “In many datasets, especially in real-world applications, there are subsets of the data that our model underperforms on, or that we care more about performing well on than others” “... expert-heads are used to learn slice-specific representations. Then, an attention mechanism is learned over expert heads to determine when and how to combine these representations.” yay!

21. Labelling function instance density

22. In the low-density setting, sparsity of labels will mean that there is limited room for even an optimal weighting of the labeling functions to diverge much from the majority vote. Conversely, as the label density grows, the majority vote will eventually be optimal. It is the middle-density regime where we expect to most benefit from applying the generative model.

23. “These labeling functions do not need to be precise, and we should write many of them.”

24. “These labeling functions do not need to be precise, and we should write many of them.”

25. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀

26. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀ C urrentresearch

27. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀ C urrentresearch Stream-line your labelling

28. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀ C urrentresearch Stream-line your labelling Sea what it is all about

29. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀ C urrentresearch Stream-line your labelling Sea what it is all about Betta e-fish-in-sea

30. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀ C urrentresearch Stream-line your labelling Sea what it is all about Betta e-fish-in-sea Scale-ableoppo-tuna-ty

31. https://hazyresearch.github.io/snorkel/ 🏊 Dive deep into snorkel 🏊‍♀ C urrentresearch Stream-line your labelling Sea what it is all about Betta e-fish-in-sea Scale-ableoppo-tuna-ty The results shore are unbebreathable

32. You shoald at least breefly mullet over despite its fishues and crab-e-ats

33. Fin Fin

34. Thankyou www.ravelin.com p.s. we’re hiring data scientists, analysts and a director

Weak supervision - Pydata London 2019

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Weak supervision - Pydata London 2019

Ähnlich wie Weak supervision - Pydata London 2019 (20)

Mehr von Eddie Bell

Mehr von Eddie Bell (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Weak supervision - Pydata London 2019