SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
classification
chris.wiggins@columbia.edu
2017-03-10
wat?
example: spam/ham
(cf. jake’s great deck on this)
Learning by example
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
Learning by example
• How did you solve this problem?
• Can you make this process explicit (e.g. write code to do so)?
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
classification?
build a theory of 3’s?
1-slide summary of classification
• banana or orange?
what would Gauss do?
1-slide summary of classification
what would Gauss do?
length
height
• banana or orange?
1-slide summary of classification
length
height
pricesmell
time of purchase
• banana or orange?
1-slide summary of classification
length
height
game theory:
“assume the worst”
• banana or orange?
1-slide summary of classification
length
height
large deviation theory:
“maximum margin”
• banana or orange?
1-slide summary of classification
length
height
large deviation theory:
“maximum margin”
• banana or orange?
1-slide summary of classification
length
height
pricesmell
time of purchase
boosting (1997)
SVMs (1990s)
• banana or orange?
1-slide summary of classification
“acgt” & gene 45 down?
“cat” & gene 11 up?
“tag” &
gene 34 up?
“gataca” &
gene 37 down?
learn predictive
features from data
“gaga” & gene
1066 up?
“gataca” &
gene 37 down?
• up- or down- regulated?
“gaga” & gene 137 up?
example: bad bananas
example@NYT in CAR (computer assisted reporting)
Figure 1: Tabuchi article
example in CAR (computer assisted reporting)
◮ cf. Friedman’s “Statistical models and Shoe Leather”1
example in CAR (computer assisted reporting)
◮ cf. Friedman’s “Statistical models and Shoe Leather”1
◮ Takata airbag fatalities
example in CAR (computer assisted reporting)
◮ cf. Friedman’s “Statistical models and Shoe Leather”1
◮ Takata airbag fatalities
◮ 2219 labeled2 examples from 33,204 comments
example in CAR (computer assisted reporting)
◮ cf. Friedman’s “Statistical models and Shoe Leather”1
◮ Takata airbag fatalities
◮ 2219 labeled2 examples from 33,204 comments
◮ cf. Box’s “Science and Statistics”3
computer assisted reporting
◮ Impact
Figure 3: impact
conjecture: cost function?
fallback: probability
review: regression as probability
classification as probability
binary/dichotomous/boolean features + NB
digression: bayes rule
generalize, maintain linerarity
Learning by example
• How did you solve this problem?
• Can you make this process explicit (e.g. write code to do so)?
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
Diagnoses a la Bayes1
• You’re testing for a rare disease:
• 1% of the population is infected
• You have a highly sensitive and specific test:
• 99% of sick patients test positive
• 99% of healthy patients test negative
• Given that a patient tests positive, what is probability the
patient is sick?
1
Wiggins, SciAm 2006
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 3 / 16
Diagnoses a la Bayes
Population
10,000 ppl
1% Sick
100 ppl
99% Test +
99 ppl
1% Test -
1 per
99% Healthy
9900 ppl
1% Test +
99 ppl
99% Test -
9801 ppl
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
Diagnoses a la Bayes
Population
10,000 ppl
1% Sick
100 ppl
99% Test +
99 ppl
1% Test -
1 per
99% Healthy
9900 ppl
1% Test +
99 ppl
99% Test -
9801 ppl
So given that a patient tests positive (198 ppl), there is a 50%
chance the patient is sick (99 ppl)!
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
Diagnoses a la Bayes
Population
10,000 ppl
1% Sick
100 ppl
99% Test +
99 ppl
1% Test -
1 per
99% Healthy
9900 ppl
1% Test +
99 ppl
99% Test -
9801 ppl
The small error rate on the large healthy population produces
many false positives.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
Natural frequencies a la Gigerenzer2
2
http://bit.ly/ggbbc
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 5 / 16
Inverting conditional probabilities
Bayes’ Theorem
Equate the far right- and left-hand sides of product rule
p (y|x) p (x) = p (x, y) = p (x|y) p (y)
and divide to get the probability of y given x from the probability
of x given y:
p (y|x) =
p (x|y) p (y)
p (x)
where p (x) =
P
y∈ΩY
p (x|y) p (y) is the normalization constant.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 6 / 16
Diagnoses a la Bayes
Given that a patient tests positive, what is probability the patient
is sick?
p (sick|+) =
99/100
z }| {
p (+|sick)
1/100
z }| {
p (sick)
p (+)
| {z }
99/1002+99/1002=198/1002
=
99
198
=
1
2
where p (+) = p (+|sick) p (sick) + p (+|healthy) p (healthy).
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 7 / 16
(Super) Naive Bayes
We can use Bayes’ rule to build a one-word spam classifier:
p (spam|word) =
p (word|spam) p (spam)
p (word)
where we estimate these probabilities with ratios of counts:
ˆp(word|spam) =
# spam docs containing word
# spam docs
ˆp(word|ham) =
# ham docs containing word
# ham docs
ˆp(spam) =
# spam docs
# docs
ˆp(ham) =
# ham docs
# docs
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 8 / 16
(Super) Naive Bayes
$ ./enron_naive_bayes.sh meeting
1500 spam examples
3672 ham examples
16 spam examples containing meeting
153 ham examples containing meeting
estimated P(spam) = .2900
estimated P(ham) = .7100
estimated P(meeting|spam) = .0106
estimated P(meeting|ham) = .0416
P(spam|meeting) = .0923
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 9 / 16
(Super) Naive Bayes
$ ./enron_naive_bayes.sh money
1500 spam examples
3672 ham examples
194 spam examples containing money
50 ham examples containing money
estimated P(spam) = .2900
estimated P(ham) = .7100
estimated P(money|spam) = .1293
estimated P(money|ham) = .0136
P(spam|money) = .7957
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 10 / 16
(Super) Naive Bayes
$ ./enron_naive_bayes.sh enron
1500 spam examples
3672 ham examples
0 spam examples containing enron
1478 ham examples containing enron
estimated P(spam) = .2900
estimated P(ham) = .7100
estimated P(enron|spam) = 0
estimated P(enron|ham) = .4025
P(spam|enron) = 0
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 11 / 16
Naive Bayes
Represent each document by a binary vector ~x where xj = 1 if the
j-th word appears in the document (xj = 0 otherwise).
Modeling each word as an independent Bernoulli random variable,
the probability of observing a document ~x of class c is:
p (~x|c) =
Y
j
✓
xj
jc (1 − ✓jc)1−xj
where ✓jc denotes the probability that the j-th word occurs in a
document of class c.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 12 / 16
Naive Bayes
Using this likelihood in Bayes’ rule and taking a logarithm, we have:
log p (c|~x) = log
p (~x|c) p (c)
p (~x)
=
X
j
xj log
✓jc
1 − ✓jc
+
X
j
log(1 − ✓jc) + log
✓c
p (~x)
where ✓c is the probability of observing a document of class c.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 13 / 16
(a) big picture: surrogate convex loss functions
general
Figure 4: Reminder: Surrogate Loss Functions
boosting
Figure 5: ‘Cited by 12599’
tangent: logistic function as surrogate loss function
◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
tangent: logistic function as surrogate loss function
◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf ))
tangent: logistic function as surrogate loss function
◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf ))
◮ − log2 p({y}N
1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi ))
tangent: logistic function as surrogate loss function
◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf ))
◮ − log2 p({y}N
1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi ))
◮ ℓ′′ > 0, ℓ(µ) > 1[µ < 0] ∀µ ∈ R.
tangent: logistic function as surrogate loss function
◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf ))
◮ − log2 p({y}N
1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi ))
◮ ℓ′′ > 0, ℓ(µ) > 1[µ < 0] ∀µ ∈ R.
◮ ∴ maximizing log-likelihood is minimizing a surrogate convex
loss function for classification
tangent: logistic function as surrogate loss function
◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf ))
◮ − log2 p({y}N
1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi ))
◮ ℓ′′ > 0, ℓ(µ) > 1[µ < 0] ∀µ ∈ R.
◮ ∴ maximizing log-likelihood is minimizing a surrogate convex
loss function for classification
◮ but i log2 1 + e−yi wT h(xi ) not as easy as i e−yi wT h(xi )
boosting 1
L exponential surrogate loss function, summed over examples:
◮ L[F] = i exp (−yi F(xi ))
boosting 1
L exponential surrogate loss function, summed over examples:
◮ L[F] = i exp (−yi F(xi ))
◮ = i exp −yi
t
t′ wt′ ht′ (xi ) ≡ Lt(wt)
boosting 1
L exponential surrogate loss function, summed over examples:
◮ L[F] = i exp (−yi F(xi ))
◮ = i exp −yi
t
t′ wt′ ht′ (xi ) ≡ Lt(wt)
◮ Draw ht ∈ H large space of rules s.t. h(x) ∈ {−1, +1}
boosting 1
L exponential surrogate loss function, summed over examples:
◮ L[F] = i exp (−yi F(xi ))
◮ = i exp −yi
t
t′ wt′ ht′ (xi ) ≡ Lt(wt)
◮ Draw ht ∈ H large space of rules s.t. h(x) ∈ {−1, +1}
◮ label y ∈ {−1, +1}
boosting 1
L exponential surrogate loss function, summed over examples:
◮ Lt+1(wt; w) ≡ i dt
i exp (−yi wht+1(xi ))
Punchlines: sparse, predictive, interpretable, fast (to execute), and
easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞
4, . . .
boosting 1
L exponential surrogate loss function, summed over examples:
◮ Lt+1(wt; w) ≡ i dt
i exp (−yi wht+1(xi ))
◮ = y=h′ dt
i e−w + y=h′ dt
i e+w ≡ e−w D+ + e+w D−
Punchlines: sparse, predictive, interpretable, fast (to execute), and
easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞
4, . . .
boosting 1
L exponential surrogate loss function, summed over examples:
◮ Lt+1(wt; w) ≡ i dt
i exp (−yi wht+1(xi ))
◮ = y=h′ dt
i e−w + y=h′ dt
i e+w ≡ e−w D+ + e+w D−
◮ ∴ wt+1 = argminw Lt+1(w) = (1/2) log D+/D−
Punchlines: sparse, predictive, interpretable, fast (to execute), and
easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞
4, . . .
boosting 1
L exponential surrogate loss function, summed over examples:
◮ Lt+1(wt; w) ≡ i dt
i exp (−yi wht+1(xi ))
◮ = y=h′ dt
i e−w + y=h′ dt
i e+w ≡ e−w D+ + e+w D−
◮ ∴ wt+1 = argminw Lt+1(w) = (1/2) log D+/D−
◮ Lt+1(wt+1) = 2 D+D− = 2 ν+(1 − ν+)/D, where
0 ≤ ν+ ≡ D+/D = D+/Lt ≤ 1
Punchlines: sparse, predictive, interpretable, fast (to execute), and
easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞
4, . . .
boosting 1
L exponential surrogate loss function, summed over examples:
◮ Lt+1(wt; w) ≡ i dt
i exp (−yi wht+1(xi ))
◮ = y=h′ dt
i e−w + y=h′ dt
i e+w ≡ e−w D+ + e+w D−
◮ ∴ wt+1 = argminw Lt+1(w) = (1/2) log D+/D−
◮ Lt+1(wt+1) = 2 D+D− = 2 ν+(1 − ν+)/D, where
0 ≤ ν+ ≡ D+/D = D+/Lt ≤ 1
◮ update example weights dt+1
i = dt
i e∓w
Punchlines: sparse, predictive, interpretable, fast (to execute), and
easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞
4, . . .
4
Duchi + Singer “Boosting with structural sparsity” ICML ’09
svm

Weitere ähnliche Inhalte

Ähnlich wie Modeling Social Data, Lecture 8: Classification

Probability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.pptProbability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.ppt
ShamshadAli58
 

Ähnlich wie Modeling Social Data, Lecture 8: Classification (20)

Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Machine Learning Summer School 2016
Machine Learning Summer School 2016Machine Learning Summer School 2016
Machine Learning Summer School 2016
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
4_logit_printable_.pdf
4_logit_printable_.pdf4_logit_printable_.pdf
4_logit_printable_.pdf
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Classification
ClassificationClassification
Classification
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Lecture3
Lecture3Lecture3
Lecture3
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledgProbability_Review.ppt for your knowledg
Probability_Review.ppt for your knowledg
 
Probability_Review.ppt
Probability_Review.pptProbability_Review.ppt
Probability_Review.ppt
 
Probability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.pptProbability_Review HELPFUL IN STATISTICS.ppt
Probability_Review HELPFUL IN STATISTICS.ppt
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theory
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 
Slides barcelona Machine Learning
Slides barcelona Machine LearningSlides barcelona Machine Learning
Slides barcelona Machine Learning
 

Mehr von jakehofman

NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
jakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
jakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
jakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
jakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
jakehofman
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
jakehofman
 

Mehr von jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
 

Kürzlich hochgeladen

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Modeling Social Data, Lecture 8: Classification

  • 3. example: spam/ham (cf. jake’s great deck on this)
  • 4. Learning by example Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
  • 5. Learning by example • How did you solve this problem? • Can you make this process explicit (e.g. write code to do so)? Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
  • 7. 1-slide summary of classification • banana or orange? what would Gauss do?
  • 8. 1-slide summary of classification what would Gauss do? length height • banana or orange?
  • 9. 1-slide summary of classification length height pricesmell time of purchase • banana or orange?
  • 10. 1-slide summary of classification length height game theory: “assume the worst” • banana or orange?
  • 11. 1-slide summary of classification length height large deviation theory: “maximum margin” • banana or orange?
  • 12. 1-slide summary of classification length height large deviation theory: “maximum margin” • banana or orange?
  • 13. 1-slide summary of classification length height pricesmell time of purchase boosting (1997) SVMs (1990s) • banana or orange?
  • 14. 1-slide summary of classification “acgt” & gene 45 down? “cat” & gene 11 up? “tag” & gene 34 up? “gataca” & gene 37 down? learn predictive features from data “gaga” & gene 1066 up? “gataca” & gene 37 down? • up- or down- regulated? “gaga” & gene 137 up?
  • 16. example@NYT in CAR (computer assisted reporting) Figure 1: Tabuchi article
  • 17. example in CAR (computer assisted reporting) ◮ cf. Friedman’s “Statistical models and Shoe Leather”1
  • 18. example in CAR (computer assisted reporting) ◮ cf. Friedman’s “Statistical models and Shoe Leather”1 ◮ Takata airbag fatalities
  • 19. example in CAR (computer assisted reporting) ◮ cf. Friedman’s “Statistical models and Shoe Leather”1 ◮ Takata airbag fatalities ◮ 2219 labeled2 examples from 33,204 comments
  • 20. example in CAR (computer assisted reporting) ◮ cf. Friedman’s “Statistical models and Shoe Leather”1 ◮ Takata airbag fatalities ◮ 2219 labeled2 examples from 33,204 comments ◮ cf. Box’s “Science and Statistics”3
  • 21. computer assisted reporting ◮ Impact Figure 3: impact
  • 24. review: regression as probability
  • 26. digression: bayes rule generalize, maintain linerarity
  • 27. Learning by example • How did you solve this problem? • Can you make this process explicit (e.g. write code to do so)? Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
  • 28. Diagnoses a la Bayes1 • You’re testing for a rare disease: • 1% of the population is infected • You have a highly sensitive and specific test: • 99% of sick patients test positive • 99% of healthy patients test negative • Given that a patient tests positive, what is probability the patient is sick? 1 Wiggins, SciAm 2006 Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 3 / 16
  • 29. Diagnoses a la Bayes Population 10,000 ppl 1% Sick 100 ppl 99% Test + 99 ppl 1% Test - 1 per 99% Healthy 9900 ppl 1% Test + 99 ppl 99% Test - 9801 ppl Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
  • 30. Diagnoses a la Bayes Population 10,000 ppl 1% Sick 100 ppl 99% Test + 99 ppl 1% Test - 1 per 99% Healthy 9900 ppl 1% Test + 99 ppl 99% Test - 9801 ppl So given that a patient tests positive (198 ppl), there is a 50% chance the patient is sick (99 ppl)! Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
  • 31. Diagnoses a la Bayes Population 10,000 ppl 1% Sick 100 ppl 99% Test + 99 ppl 1% Test - 1 per 99% Healthy 9900 ppl 1% Test + 99 ppl 99% Test - 9801 ppl The small error rate on the large healthy population produces many false positives. Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
  • 32. Natural frequencies a la Gigerenzer2 2 http://bit.ly/ggbbc Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 5 / 16
  • 33. Inverting conditional probabilities Bayes’ Theorem Equate the far right- and left-hand sides of product rule p (y|x) p (x) = p (x, y) = p (x|y) p (y) and divide to get the probability of y given x from the probability of x given y: p (y|x) = p (x|y) p (y) p (x) where p (x) = P y∈ΩY p (x|y) p (y) is the normalization constant. Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 6 / 16
  • 34. Diagnoses a la Bayes Given that a patient tests positive, what is probability the patient is sick? p (sick|+) = 99/100 z }| { p (+|sick) 1/100 z }| { p (sick) p (+) | {z } 99/1002+99/1002=198/1002 = 99 198 = 1 2 where p (+) = p (+|sick) p (sick) + p (+|healthy) p (healthy). Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 7 / 16
  • 35. (Super) Naive Bayes We can use Bayes’ rule to build a one-word spam classifier: p (spam|word) = p (word|spam) p (spam) p (word) where we estimate these probabilities with ratios of counts: ˆp(word|spam) = # spam docs containing word # spam docs ˆp(word|ham) = # ham docs containing word # ham docs ˆp(spam) = # spam docs # docs ˆp(ham) = # ham docs # docs Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 8 / 16
  • 36. (Super) Naive Bayes $ ./enron_naive_bayes.sh meeting 1500 spam examples 3672 ham examples 16 spam examples containing meeting 153 ham examples containing meeting estimated P(spam) = .2900 estimated P(ham) = .7100 estimated P(meeting|spam) = .0106 estimated P(meeting|ham) = .0416 P(spam|meeting) = .0923 Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 9 / 16
  • 37. (Super) Naive Bayes $ ./enron_naive_bayes.sh money 1500 spam examples 3672 ham examples 194 spam examples containing money 50 ham examples containing money estimated P(spam) = .2900 estimated P(ham) = .7100 estimated P(money|spam) = .1293 estimated P(money|ham) = .0136 P(spam|money) = .7957 Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 10 / 16
  • 38. (Super) Naive Bayes $ ./enron_naive_bayes.sh enron 1500 spam examples 3672 ham examples 0 spam examples containing enron 1478 ham examples containing enron estimated P(spam) = .2900 estimated P(ham) = .7100 estimated P(enron|spam) = 0 estimated P(enron|ham) = .4025 P(spam|enron) = 0 Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 11 / 16
  • 39. Naive Bayes Represent each document by a binary vector ~x where xj = 1 if the j-th word appears in the document (xj = 0 otherwise). Modeling each word as an independent Bernoulli random variable, the probability of observing a document ~x of class c is: p (~x|c) = Y j ✓ xj jc (1 − ✓jc)1−xj where ✓jc denotes the probability that the j-th word occurs in a document of class c. Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 12 / 16
  • 40. Naive Bayes Using this likelihood in Bayes’ rule and taking a logarithm, we have: log p (c|~x) = log p (~x|c) p (c) p (~x) = X j xj log ✓jc 1 − ✓jc + X j log(1 − ✓jc) + log ✓c p (~x) where ✓c is the probability of observing a document of class c. Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 13 / 16
  • 41. (a) big picture: surrogate convex loss functions
  • 42. general Figure 4: Reminder: Surrogate Loss Functions
  • 44. tangent: logistic function as surrogate loss function ◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R
  • 45. tangent: logistic function as surrogate loss function ◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R ◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf ))
  • 46. tangent: logistic function as surrogate loss function ◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R ◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf )) ◮ − log2 p({y}N 1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi ))
  • 47. tangent: logistic function as surrogate loss function ◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R ◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf )) ◮ − log2 p({y}N 1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi )) ◮ ℓ′′ > 0, ℓ(µ) > 1[µ < 0] ∀µ ∈ R.
  • 48. tangent: logistic function as surrogate loss function ◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R ◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf )) ◮ − log2 p({y}N 1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi )) ◮ ℓ′′ > 0, ℓ(µ) > 1[µ < 0] ∀µ ∈ R. ◮ ∴ maximizing log-likelihood is minimizing a surrogate convex loss function for classification
  • 49. tangent: logistic function as surrogate loss function ◮ define f (x) ≡ log p(y = 1|x)/p(y = −1|x) ∈ R ◮ p(y = 1|x) + p(y = −1|x) = 1 → p(y|x) = 1/(1 + exp(−yf )) ◮ − log2 p({y}N 1 ) = i log2 1 + e−yi f (xi ) ≡ i ℓ(yi f (xi )) ◮ ℓ′′ > 0, ℓ(µ) > 1[µ < 0] ∀µ ∈ R. ◮ ∴ maximizing log-likelihood is minimizing a surrogate convex loss function for classification ◮ but i log2 1 + e−yi wT h(xi ) not as easy as i e−yi wT h(xi )
  • 50. boosting 1 L exponential surrogate loss function, summed over examples: ◮ L[F] = i exp (−yi F(xi ))
  • 51. boosting 1 L exponential surrogate loss function, summed over examples: ◮ L[F] = i exp (−yi F(xi )) ◮ = i exp −yi t t′ wt′ ht′ (xi ) ≡ Lt(wt)
  • 52. boosting 1 L exponential surrogate loss function, summed over examples: ◮ L[F] = i exp (−yi F(xi )) ◮ = i exp −yi t t′ wt′ ht′ (xi ) ≡ Lt(wt) ◮ Draw ht ∈ H large space of rules s.t. h(x) ∈ {−1, +1}
  • 53. boosting 1 L exponential surrogate loss function, summed over examples: ◮ L[F] = i exp (−yi F(xi )) ◮ = i exp −yi t t′ wt′ ht′ (xi ) ≡ Lt(wt) ◮ Draw ht ∈ H large space of rules s.t. h(x) ∈ {−1, +1} ◮ label y ∈ {−1, +1}
  • 54. boosting 1 L exponential surrogate loss function, summed over examples: ◮ Lt+1(wt; w) ≡ i dt i exp (−yi wht+1(xi )) Punchlines: sparse, predictive, interpretable, fast (to execute), and easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞ 4, . . .
  • 55. boosting 1 L exponential surrogate loss function, summed over examples: ◮ Lt+1(wt; w) ≡ i dt i exp (−yi wht+1(xi )) ◮ = y=h′ dt i e−w + y=h′ dt i e+w ≡ e−w D+ + e+w D− Punchlines: sparse, predictive, interpretable, fast (to execute), and easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞ 4, . . .
  • 56. boosting 1 L exponential surrogate loss function, summed over examples: ◮ Lt+1(wt; w) ≡ i dt i exp (−yi wht+1(xi )) ◮ = y=h′ dt i e−w + y=h′ dt i e+w ≡ e−w D+ + e+w D− ◮ ∴ wt+1 = argminw Lt+1(w) = (1/2) log D+/D− Punchlines: sparse, predictive, interpretable, fast (to execute), and easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞ 4, . . .
  • 57. boosting 1 L exponential surrogate loss function, summed over examples: ◮ Lt+1(wt; w) ≡ i dt i exp (−yi wht+1(xi )) ◮ = y=h′ dt i e−w + y=h′ dt i e+w ≡ e−w D+ + e+w D− ◮ ∴ wt+1 = argminw Lt+1(w) = (1/2) log D+/D− ◮ Lt+1(wt+1) = 2 D+D− = 2 ν+(1 − ν+)/D, where 0 ≤ ν+ ≡ D+/D = D+/Lt ≤ 1 Punchlines: sparse, predictive, interpretable, fast (to execute), and easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞ 4, . . .
  • 58. boosting 1 L exponential surrogate loss function, summed over examples: ◮ Lt+1(wt; w) ≡ i dt i exp (−yi wht+1(xi )) ◮ = y=h′ dt i e−w + y=h′ dt i e+w ≡ e−w D+ + e+w D− ◮ ∴ wt+1 = argminw Lt+1(w) = (1/2) log D+/D− ◮ Lt+1(wt+1) = 2 D+D− = 2 ν+(1 − ν+)/D, where 0 ≤ ν+ ≡ D+/D = D+/Lt ≤ 1 ◮ update example weights dt+1 i = dt i e∓w Punchlines: sparse, predictive, interpretable, fast (to execute), and easy to extend, e.g., trees, flexible hypotheses spaces, L1, L∞ 4, . . . 4 Duchi + Singer “Boosting with structural sparsity” ICML ’09
  • 59. svm