Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Probabilistic Programming:
Why, What, How, When
Beau Cronin
@beaucronin
40 Action-Packed Minutes
‣ Why you should care - what’s wrong with what we’ve got?
‣ What probabilistic programming is, an...
Why?
We use data to learn about the world
Traditional!
Machine Learning
Hierarchical
Bayesian Modeling
Large Scale Small
Mature...
G = {V, E}
What order were these links added in?
What messages flow over this link?
What do we know about this user?
Why?
x1 x2 lat1 long1 t1 t2 t3 t4 address1
1 1.2 2 34.0 118.2 2.3 3.4 1.9 10.4 516 61st St,
2 0.1 1 40.7 73.9 -1.5 4.5 8.9 2305...
Diverse Data
Most real datasets contain compositions of these and
more, but we routinely homogenize in preprocessing
Lorem...
Business Data Is Heterogeneous and
Structured
id: “abcdef”
gender: “Male”
dob: 1978-12-09
twitter_id: 9458201
Profile
2014-...
Every Domain Is Heterogeneous
‣ Health data: doctor notes, lab results, imaging, family history,
prescriptions
‣ Quantified...
Mostly, no one even tries
to jointly model these
different kinds of data
Why?
A probabilistic programming system is…
a language + {compiler, interpreter}
	 or 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	...
Probabilistic Programming
Systems Model the World
‣ Programs directly represent the data generation process
‣ Measurement ...
A Probability Model
✕ N
Fixed
Observable
Unknown
Constant values and !
structural assumptions
Variables that discriminate
...
Obligatory Bayes’ Rule
Pr(H | D, A) ∝ Pr(D | H, A) Pr(H | A)
Data
Hypotheses
Pr(H | D) ∝ Pr(D | H) Pr(H)
Assumptions
What?
!
!
!
fair-prior = .999
!
fair-coin? = flip(fair-prior)
!
if fair-coin?:
weight = 0.5
else:
weight = 0.9
!
observe(repeat(...
Probabilistic Programming
Systems Are Diverse
‣ Library vs. stand-alone language
‣ Base language: Scala, Lisp, Python
‣ Ma...
PPSs Compared
Type Language Inference
BLOG Stand-alone Custom Fully Auto
BUGS / JAGS Stand-alone Custom Fully Auto
STAN Hy...
infer.net
‣ A C# framework (also F#)
‣ Developed at MSR
‣ Under active development, with good tutorials and many well-
doc...
VariableArray<bool> controlGroup =
Variable.Observed(new bool[] { false, false, true, false, false });
VariableArray<bool>...
PyMC
‣ Python (duh)
‣ Go watch Thomas Wiecki’s talk from PyData NY
‣ http://twiecki.github.io/blog/2013/12/12/bayesian-dat...
Church
‣ A Lisp
‣ Originally created to model cognitive development and human reasoning
‣ Active inference research, sever...
;stochastic memoization generator for class assignments
;sometimes return a previous symbol, sometimes create a new one
(d...
(define kind-distribution (DPmem 1.0 gensym))
!
(define feature->kind
(mem (lambda (feature) (kind-distribution))))
!
(def...
Churj?
!
Jurch?
How?
So Far
‣ Why
‣ What
‣ How
‣ When
What We Still Need
1. Basic CS: Improved compilers and run-times for more efficient
automatic inference
2. Tooling: Debugge...
When?
14
• Application
• Code Libraries
• Programming
Language
• Compiler
• Hardware
The Probabilistic Programming Revolution
• ...
15
• Shorter: Reduce LOC by 100x for machine learning applications
• Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG
•...
Optimizer
“What is happening
when I run this?”
Profiler
“Where is the
time and memory
being used?”
Debugger
“What is the exact
state of my program at
each point in time?”
Visualization
“What is the hidden
structure of my data,
and how certain
should I be?”
http://www.icg.tugraz.at/project/cal...
Probabilistic Programming Workflows?
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
S...
Evolution of PPSs
When?
Bottom Line
‣ Go experiment and learn! - there are several good options
‣ But be realistic about the current state of the ...
Parting Questions
‣ Which projects are good fits for probabilistic programming today?
‣ Exploration and prototyping vs. sca...
Resources
‣ probabilistic-programming.org
‣ Probabilistic Programming and Bayesian Methods for Hackers
‣ Probabilistic Mod...
People To Watch
Vikash Mansinghka (MIT)
!
Noah Goodman (Stanford)
!
David Wingate (Lyric Labs)
!
Avi Pfeffer (CRA)
Rob Zin...
Languages and Systems
‣ PyMC
‣ infer.net
‣ STAN
‣ Figaro
!
‣ BLOG
‣ Church
‣ factor.ie
‣ BUGS / JAGS
@beaucronin
Nächste SlideShare
Wird geladen in …5
×

Probabilistic Programming: Why, What, How, When?

7.000 Aufrufe

Veröffentlicht am

Probabilistic programming is a new approach to machine learning and data science that is currently the focus of intense academic research, including an ongoing DARPA program. If successful, probabilistic programming systems will allow sophisticated predictive models to be written by a wide range of domain experts. Before we get to the promised land, though, some basic challenges need to be addressed, including performance on real-world datasets, programming tools support, and education.

Veröffentlicht in: Ingenieurwesen, Technologie, Bildung
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Probabilistic Programming: Why, What, How, When?

  1. 1. Probabilistic Programming: Why, What, How, When Beau Cronin @beaucronin
  2. 2. 40 Action-Packed Minutes ‣ Why you should care - what’s wrong with what we’ve got? ‣ What probabilistic programming is, and what programs look like ‣ How you can get started today ‣ When will all of this be ready for production use?
  3. 3. Why?
  4. 4. We use data to learn about the world Traditional! Machine Learning Hierarchical Bayesian Modeling Large Scale Small Mature & Robust Tools & frameworks Immature & Spotty Discard Structure & Knowledge Keep & Leverage Homogeneous Data Types Heterogeneous Toolkit, Theory-light Philosophical Approach Modeling, Theory-heavy Why?
  5. 5. G = {V, E} What order were these links added in? What messages flow over this link? What do we know about this user? Why?
  6. 6. x1 x2 lat1 long1 t1 t2 t3 t4 address1 1 1.2 2 34.0 118.2 2.3 3.4 1.9 10.4 516 61st St, 2 0.1 1 40.7 73.9 -1.5 4.5 8.9 2305 Tustin 3 10.5 0 37.9 122.3 4.7 -2.5 -3.4 1 Market St. 4 8.3 -1 -22.9 43.2 4.2 5.6 1.6 9.5 5 4.9 5 -37.8 -145.0 1600 Pennsyl 6 1.5 1 3.4 4.0 4.6 5.2 650 7th St., S Positive numbers Categorical values Locations Time Series AddressesMissing values Why?
  7. 7. Diverse Data Most real datasets contain compositions of these and more, but we routinely homogenize in preprocessing Lorem Ipsum Trees & Graphs Time Series Relations Locations & Addresses Images & Movies Audio Sets & Partitions Text Why?
  8. 8. Business Data Is Heterogeneous and Structured id: “abcdef” gender: “Male” dob: 1978-12-09 twitter_id: 9458201 Profile 2014-01-21 18:41:04, “https://devcenter.heroku.com/articles/quickstart”, … 2014-01-20 12:35:56, “https://devcenter.heroku.com/categories/java”, … 2014-01-20 09:12:52, “https://devcenter.heroku.com/articles/ssl-endpoint”, … Page Views Order Date Order ID Title Category ASIN/ISBN Release DateConditionSeller Per Unit Price 1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B003RYQJJW new The Sock Company, Inc.$21.99 1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B004UONNXI new The Sock Company, Inc.$21.99 1/8/13 002-2593752-8837806 CivilWarLand in Bad DeclinePaperback 1573225797 1/31/97 new Amazon.com LLC $8.4 1/8/13 109-0985451-2187421 Nothing to Envy: Ordinary Lives in North KoreaPaperback 385523912 9/20/10 new Amazon.com LLC$10.88 1/12/13 109-8581642-2322617 Excession Mass Market Paperback553575376 2/1/98 new Amazon.com LLC $7.99 Transactions [ { text: “key to compelling VR is…”, retweet_count: 3, favorites_count: 5, urls: [ ], hashtags: [ ], in_reply_to: 39823792801012 … }, { text: “@John4man really liked your piece”, retweets: 0, favorites: 0, … } ] Social Posts [ 657693, 7588892, 9019482, …] Followers blocked: False want_retweets: True marked_spam: False since: 2013-09-13 Relationship
  9. 9. Every Domain Is Heterogeneous ‣ Health data: doctor notes, lab results, imaging, family history, prescriptions ‣ Quantified self: motion sensors, heart rate, GPS tracks, self- reporting, sleep patterns ‣ Autonomous vehicles: LIDAR, cameras, maps, audio, gyros, telemetry, GPS Why?
  10. 10. Mostly, no one even tries to jointly model these different kinds of data Why?
  11. 11. A probabilistic programming system is… a language + {compiler, interpreter} or that a {library, framework} for an existing language - includes random choices as native elements - and provides a clean separation between probabilistic modeling and inference - and may provide automated generation of inference solutions for a given program What?
  12. 12. Probabilistic Programming Systems Model the World ‣ Programs directly represent the data generation process ‣ Measurement processes can be modeled directly, including their imperfections and the uncertainty that comes with them ‣ Philosophy ‣ DO: capture the essential aspects of real-world processes in a model ‣ DON’T: torture the data into the right form for an algorithm What?
  13. 13. A Probability Model ✕ N Fixed Observable Unknown Constant values and ! structural assumptions Variables that discriminate between hypotheses Data and potential data What?
  14. 14. Obligatory Bayes’ Rule Pr(H | D, A) ∝ Pr(D | H, A) Pr(H | A) Data Hypotheses Pr(H | D) ∝ Pr(D | H) Pr(H) Assumptions What?
  15. 15. ! ! ! fair-prior = .999 ! fair-coin? = flip(fair-prior) ! if fair-coin?: weight = 0.5 else: weight = 0.9 ! observe(repeat(flip(weight), 10)), [H, H, H, H, H, H, H, H, H, H]) ! query(fair-coin?) First example: Deciding if a coin is fair based on flips Assumptions ! Unknowns ! Observables
  16. 16. Probabilistic Programming Systems Are Diverse ‣ Library vs. stand-alone language ‣ Base language: Scala, Lisp, Python ‣ Manual, semi-, or fully-automated inference ‣ Modeling domain: directed/undirected graphical models, relational data, all programs ‣ Home field: cognitive science, programming languages, databases, Bayesian statistics, artificial intelligence What?
  17. 17. PPSs Compared Type Language Inference BLOG Stand-alone Custom Fully Auto BUGS / JAGS Stand-alone Custom Fully Auto STAN Hybrid R, Python Fully Auto PyMC Library Python Manual Infer.net Library C# Semi-auto Church Stand-alone Lisp Fully Auto Venture Stand-alone Javascript, Lisp Semi-auto Figaro Library Scala Semi-auto factorie Library Scala Semi-auto What?
  18. 18. infer.net ‣ A C# framework (also F#) ‣ Developed at MSR ‣ Under active development, with good tutorials and many well- documented examples How?
  19. 19. VariableArray<bool> controlGroup = Variable.Observed(new bool[] { false, false, true, false, false }); VariableArray<bool> treatedGroup = Variable.Observed(new bool[] { true, false, true, true, true }); Range i = controlGroup.Range; Range j = treatedGroup.Range; ! Variable<bool> isEffective = Variable.Bernoulli(0.5); ! Variable<double> probIfTreated, probIfControl; using (Variable.If(isEffective)) { // Model if treatment is effective probIfControl = Variable.Beta(1, 1); controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i); probIfTreated = Variable.Beta(1, 1); treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j); } ! using (Variable.IfNot(isEffective)) { // Model if treatment is not effective Variable<double> probAll = Variable.Beta(1, 1); controlGroup[i] = Variable.Bernoulli(probAll).ForEach(i); treatedGroup[j] = Variable.Bernoulli(probAll).ForEach(j); } ! InferenceEngine ie = new InferenceEngine(); Console.WriteLine("Probability treatment has an effect = " + ie.Infer(isEffective)); Infer.net example: Is a new treatment effective? http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Clinical%20trial%20tutorial.aspx Observations Unknown Assumptions & Unknowns Query
  20. 20. PyMC ‣ Python (duh) ‣ Go watch Thomas Wiecki’s talk from PyData NY ‣ http://twiecki.github.io/blog/2013/12/12/bayesian-data-analysis-pymc3/ ‣ And read Bayesian Methods for Hackers by Cam Davidson-Pilon et al. How?
  21. 21. Church ‣ A Lisp ‣ Originally created to model cognitive development and human reasoning ‣ Active inference research, several implementations ‣ Connection between functional purity / independence vs. stochastic memoization / exchangeability ‣ Hypothesis space is possible program executions ‣ “Probabilistic Models of Cognition” How?
  22. 22. ;stochastic memoization generator for class assignments ;sometimes return a previous symbol, sometimes create a new one (define class-distribution (DP-stochastic-mem 1.0 gensym)) ! ;associate a class with an object via memoization (define object->class (mem (lambda (object) (class-distribution)))) ! ;associate gaussian parameters with a class via memoization (define class->gaussian-parameters (mem (lambda (class) (list (gaussian 65 10) (gaussian 0 8))))) ! ;generate observed values for an object (define (observe object) (apply gaussian (class->gaussian-parameters (object->class object)))) ! ;generate observations for some objects (map observe '(tom dick harry bill fred)) modified from https://probmods.org/non-parametric-models.html Church example: Infinite Gaussian Mixture Model
  23. 23. (define kind-distribution (DPmem 1.0 gensym)) ! (define feature->kind (mem (lambda (feature) (kind-distribution)))) ! (define kind->class-distribution (mem (lambda (kind) (DPmem 1.0 gensym)))) ! (define feature-kind/object->class (mem (lambda (kind object) (sample (kind->class-distribution kind))))) ! (define class->parameters (mem (lambda (object-class) (first (beta 1 1))))) ! (define (observe object feature) (flip (class->parameters (feature-kind/object->class (feature->kind feature) object)))) ! (observe 'eggs 'breakfast) https://probmods.org/non-parametric-models.html Church example: Cross-categorization (BayesDB)
  24. 24. Churj? ! Jurch? How?
  25. 25. So Far ‣ Why ‣ What ‣ How ‣ When
  26. 26. What We Still Need 1. Basic CS: Improved compilers and run-times for more efficient automatic inference 2. Tooling: Debuggers, optimizers, IDEs, visualization 3. Tribal knowledge: idioms, patterns, best practices When?
  27. 27. When?
  28. 28. 14 • Application • Code Libraries • Programming Language • Compiler • Hardware The Probabilistic Programming Revolution • Model • Model Libraries • Probabilistic Programming Language • Inference Engine • Hardware Traditional Programming Probabilistic Programming Code models capture how the data was generated using random variables to represent uncertainty Libraries contain common model components: Markov chains, deep belief networks, etc. PPL provides probabilistic primitives & traditional PL constructs so users can express model, queries, and data Inference engine analyzes probabilistic program and chooses appropriate solver(s) for available hardware Hardware can include multi-core, GPU, cloud-based resources, GraphLab, UPSIDE/Analog Logic results, etc. High-level programming languages facilitate building complex systems Probabilistic programming languages facilitate building rich ML applications Approved for Public Release; Distribution Unlimited
  29. 29. 15 • Shorter: Reduce LOC by 100x for machine learning applications • Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG • Microsoft MatchBox: 15K LOC in C# vs. 300 LOC in Fun • Faster: Reduce development time by 100x • Seismic Monitoring: Several years vs. 1 hour • Microsoft TrueSkill: Six months for competent developer vs. 2 hours with Infer.Net • Enable quick exploration of many models • More Informative: Develop models that are 10x more sophisticated • Enable surprising, new applications • Incorporate rich domain-knowledge • Produce more accurate answers • Require less data • Increase robustness with respect to noise • Increase ability to cope with contradiction • With less expertise: Enable 100x more programmers • Separate the model (the program) from the solvers (the compiler), enabling domain experts without machine learning PhDs to write applications The Promise of Probabilistic Programming Languages Probabilistic Programming could empower domain experts and ML experts Sources: • Bayesian Data Analysis, Gelman, 2003 • Pattern Recognition and Machine Learning, Bishop, 2007 • Science, Tanenbaum et al, 2011 DISTRIBUTION STATEMENT F. Further dissemination only as directed by DARPA, (February 20, 2013) or higher DoD authority.
  30. 30. Optimizer “What is happening when I run this?”
  31. 31. Profiler “Where is the time and memory being used?”
  32. 32. Debugger “What is the exact state of my program at each point in time?”
  33. 33. Visualization “What is the hidden structure of my data, and how certain should I be?” http://www.icg.tugraz.at/project/caleydo/
  34. 34. Probabilistic Programming Workflows? ETL data prep predictive model data sources end uses Lingual: DW → ANSI SQL Pattern: SAS, R, etc. → PMML business logic in Java, Clojure, Scala, etc. sink taps for Memcached, HBase, MongoDB, etc. source taps for Cassandra, JDBC, Splunk, etc. Definition: Data Workflows For example, Cascading and related projects implement the following components, based on 100% open source: cascading.org adapted from Paco Nathan: Data Workflows for Machine Learning
  35. 35. Evolution of PPSs When?
  36. 36. Bottom Line ‣ Go experiment and learn! - there are several good options ‣ But be realistic about the current state of the art ‣ And keep your ear to the ground - this area is moving fast
  37. 37. Parting Questions ‣ Which projects are good fits for probabilistic programming today? ‣ Exploration and prototyping vs. scaled production deployment? ‣ How long before we have the Python, Ruby, and even PHP of PPSs? ‣ Is there a unification with the log-centric view of big data processing? ‣ Can natively stochastic hardware provide compelling performance gains? When?
  38. 38. Resources ‣ probabilistic-programming.org ‣ Probabilistic Programming and Bayesian Methods for Hackers ‣ Probabilistic Models of Cognition ‣ Mathematica Journal article ‣ Thomas Wiecki’s PyData talk on PyMC
  39. 39. People To Watch Vikash Mansinghka (MIT) ! Noah Goodman (Stanford) ! David Wingate (Lyric Labs) ! Avi Pfeffer (CRA) Rob Zinkov (USC) ! Andrew Gordon (MSR) ! John Winn (MSR) ! Dan Roy (Cambridge)
  40. 40. Languages and Systems ‣ PyMC ‣ infer.net ‣ STAN ‣ Figaro ! ‣ BLOG ‣ Church ‣ factor.ie ‣ BUGS / JAGS
  41. 41. @beaucronin

×