SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Downloaden Sie, um offline zu lesen
1st edition | July 8-11, 2019
BigML, Inc #DutchMLSchool
Clusters
Finding Similarities
Poul Petersen
CIO, BigML, Inc
2
BigML, Inc #DutchMLSchool
What is Clustering?
3
• An unsupervised learning technique
• No labels necessary
• Useful for finding similar instances
• Smart sampling/labelling
• Finds “self-similar" groups of instances
• Customer: groups with similar behavior
• Medical: patients with similar diagnostic measurements
• Defines each group by a “centroid”
• Geometric center of the group
• Represents the “average” member
• Number of centroids (k) can be specified or determined
BigML, Inc #DutchMLSchool
Cluster Centroids
4
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
BigML, Inc #DutchMLSchool
Cluster Centroids
5
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
auth = pin
amount ~ $100
Same:
date: Mon != Wed
customer: Sally != Bob
account: 6788 != 3421
class: clothes != gas
zip: 26339 != 46140
Different:
date = Wed (2 out of 3)
customer = Bob
account = 3421
auth = pin
class = gas
zip = 46140
amount = $104
Centroid:
similar
BigML, Inc #DutchMLSchool
Use Cases
6
• Customer segmentation
• Which customers are similar?
• How many natural groups are there?
• Item discovery
• What other items are similar to this one?
• Similarity
• What other instances share a specific property?
• Recommender (almost)
• If you like this item, what other items might you like?
• Active learning
• Labelling unlabelled data efficiently
BigML, Inc #DutchMLSchool
Customer Segmentation
7
GOAL: Cluster the users by usage
statistics. Identify clusters with a
higher percentage of high LTV users.
Since they have similar usage
patterns, the remaining users in
these clusters may be good
candidates for up-sell.
• Dataset of mobile game users.
• Data for each user consists of usage
statistics and a LTV based on in-
game purchases
• Assumption: Usage correlates to LTV
0%
3%
1%
BigML, Inc #DutchMLSchool
Similarity
8
GOAL: Cluster the loans by
application profile to rank loan
quality by percentage of trouble
loans in population
• Dataset of Lending Club Loans
• Mark any loan that is currently or has
even been late as “trouble”
0%
3%
7%
1%
BigML, Inc #DutchMLSchool
Active Learning
9
GOAL:
Rather than sample randomly, use clustering to group
patients by similarity and then test a sample from each
cluster to label the data.
• Dataset of diagnostic measurements
of 768 patients.
• Want to test each patient for
diabetes and label the dataset to
build a model but the test is
expensive*.
BigML, Inc #DutchMLSchool
Active Learning
10
*For a more realistic example of high cost, imagine a dataset with a
billion transactions, each one needing to be labelled as fraud/not-
fraud. Or a million images which need to be labeled as cat/not-cat.
2323
BigML, Inc #DutchMLSchool
Item Discovery
11
GOAL: Cluster the whiskies by flavor
profile to discover whiskies that have
similar taste.
• Dataset of 86 whiskies
• Each whiskey scored on a scale from
0 to 4 for each of 12 possible flavor
characteristics.
Smoky
Fruity
BigML, Inc #DutchMLSchool
Clusters Demo #1
12
BigML, Inc #DutchMLSchool
Human Expert
13
Cluster into 3 groups…
BigML, Inc #DutchMLSchool
Human Expert
14
BigML, Inc #DutchMLSchool
Human Expert
15
• Jesa used prior knowledge to select possible features that
separated the objects.
• “round”, “skinny”, “edges”, “hard”, etc
• Items were then clustered based on the chosen features
• Separation quality was then tested to ensure:
• met criteria of K=3
• groups were sufficiently “distant”
• no crossover
BigML, Inc #DutchMLSchool
Human Expert
16
• Length/Width
• greater than 1 => “skinny”
• equal to 1 => “round”
• less than 1 => invert
• Number of Surfaces
• distinct surfaces require “edges” which have corners
• easier to count
Create features that capture these object differences
BigML, Inc #DutchMLSchool
Clustering Features
17
Object Length / Width Num Surfaces
penny 1 3
dime 1 3
knob 1 4
eraser 2,75 6
box 1 6
block 1,6 6
screw 8 3
battery 5 3
key 4,25 3
bead 1 2
BigML, Inc #DutchMLSchool
Plot by Features
18
Num

Surfaces
Length / Width
box block eraser
knob
penny

dime
bead
key battery screw
K-Means Key Insight:

We can nd clusters using distances

in n-dimensional feature space
K=3
BigML, Inc #DutchMLSchool
Plot by Features
19
Num

Surfaces
Length / Width
box block eraser
knob
penny

dime
bead
key battery screw
K-Means

Find “best” (minimum distance)

circles that include all points
BigML, Inc #DutchMLSchool
K-Means Algorithm
20
K=3
BigML, Inc #DutchMLSchool
K-Means Algorithm
21
K=3
Repeat until centroids stop moving
BigML, Inc #DutchMLSchool
Features Matter
22
Metal Other
Wood
BigML, Inc #DutchMLSchool
Convergence
23
Convergence guaranteed

but not necessarily unique

Starting points important (K++)
BigML, Inc #DutchMLSchool
Starting Points
24
• Random points or instances in n-dimensional space
• Might start "too close"
• Risk of sub-optimal convergence
BigML, Inc #DutchMLSchool
Sub-Optimal Converge
25
Arbitrarily Far Apart

Sub-Optimal
Arbitrarily Far Apart

Optimal
BigML, Inc #DutchMLSchool
Starting Points
26
• Random points or instances in n-dimensional space
• Might start "too close"
• Risk of sub-optimal convergence
• Chose points “farthest” away from each other
• but this is sensitive to outliers
• k++
• the first point is chosen randomly from instances
• each subsequent point is chosen from the remaining
instances with a probability proportional to the squared
distance from the point's closest existing cluster center
BigML, Inc #DutchMLSchool
K++ Initial Centers
27
Low

Probability
High

ProbabilityHighest

Probability
K=3
BigML, Inc #DutchMLSchool
K++ Initial Centers
28
Low

Probability
Low

Probability
K=3
BigML, Inc #DutchMLSchool
K++ Initial Centers
29
K=3
BigML, Inc #DutchMLSchool
Scaling Matters
30
price
number of bedrooms
d = 160,000
d = 1
BigML, Inc #DutchMLSchool
Other Tricks
31
• What is the distance to a “missing value”?
• What is the distance between categorical values?
• How far is “red” from “green”?
• What is the distance between text features?
• Does it have to be Euclidean distance?
• Unknown ideal number of clusters, “K”?
BigML, Inc #DutchMLSchool
Distance to Missing?
32
• Nonsense! Try replacing missing values with:
• Maximum
• Mean
• Median
• Minimum
• Zero
• Ignore instances with missing values
BigML, Inc #DutchMLSchool
Distance to Categorical?
33
• Define special distance function: For two instances 𝑥 and 𝑦
and the categorical field 𝑎:
• if 𝑥 𝑎 = 𝑦 𝑎 then

(𝑥,𝑦)distance=0 (or field scaling value) 

else 

(𝑥,𝑦)distance=1
Approach: similar to “k-prototypes”
BigML, Inc #DutchMLSchool
Distance to Categorical?
34
animal favorite toy toy color
cat ball red
cat ball green
d=0 d=0 d=1
cat laser red
dog squeaky red
d=1 d=1 d=0
D = 1
Then compute Euclidean distance between vectors
D = √2
Note: the centroid is assigned the most common
category of the member instances
BigML, Inc #DutchMLSchool
Text Vectors
35
1
Cosine Similarity
0
-1
"hippo" "safari" "zebra" ….
1 0 1 …
1 1 0 …
0 1 1 …
Text Field #1
Text Field #2
Features(thousands)
• Cosine Similarity
• cos() between two vectors
• 1 if collinear, 0 if orthogonal
• only positive vectors: 0 ≤ CS ≤ 1
• Cosine Distance=1-Cosine
Similarity
• CD(TF1, TF2) = 0.5
BigML, Inc #DutchMLSchool
Finding K: G-Means
36
BigML, Inc #DutchMLSchool
Finding K: G-Means
37
BigML, Inc #DutchMLSchool
Finding K: G-Means
38
Let K=2
Keep 1, Split 1
New K=3
BigML, Inc #DutchMLSchool
Finding K: G-Means
39
Let K=3
Keep 1, Split 2
New K=5
BigML, Inc #DutchMLSchool
Finding K: G-Means
40
Let K=5
K=5
BigML, Inc #DutchMLSchool
Clusters Demo #2
41
BigML, Inc #DutchMLSchool
Summary
42
• Cluster Purpose
• Unsupervised technique for finding self-similar groups
of instances
• Number of centroids (k) can be inputed or computed
• Outputs list of centroids
• Configuration:
• Algorithm: K-means / G-means
• Cluster Parameter: k or critical value
• Default missing / Summary fields / Scales / Weights
• Model Clusters
• Centroid / Batchcentroids
BigML, Inc #DutchMLSchool
Anomaly Detection
Finding the Unusual
Poul Petersen
CIO, BigML, Inc
43
BigML, Inc #DutchMLSchool
What is Anomaly Detection?
44
• An unsupervised learning technique
• No labels necessary
• Useful for finding unusual instances
• Filtering, finding mistakes, 1-class classifiers
• Finds instances that do not match
• Customer: big or small spender for profile
• Medical: healthy patient despite indicative diagnostics
• Defines each unusual instance by an “anomaly score”
• in BigML: 0=normal, 1=unusual, and 0.7 ≫ 0.6 ﹥0.5

• Standard deviation, distributions, etc
BigML, Inc #DutchMLSchool
Clusters
45
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
BigML, Inc #DutchMLSchool
Clusters
46
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
similar
BigML, Inc #DutchMLSchool
Anomaly Detection
47
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
BigML, Inc #DutchMLSchool
Anomaly Detection
48
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
anomaly
• Amount $2,459 is higher than all other transactions
• It is the only transaction
• In zip 21350
• for the purchase class "tech"
BigML, Inc #DutchMLSchool
Use Cases
49
• Unusual instance discovery - "exploration"
• Intrusion Detection - "looking for unusual usage patterns"
• Fraud - "looking for unusual behavior"
• Identify Incorrect Data - "looking for mistakes"
• Remove Outliers - "improve model quality"
• Model Competence / Input Data Drift
BigML, Inc #DutchMLSchool
Removing Outliers
50
• Models need to generalize
• Outliers negatively impact generalization
GOAL: Use anomaly detector to identify most anomalous
points and then remove them before modeling.
DATASET FILTERED
DATASET
ANOMALY
DETECTOR
CLEAN
MODEL
BigML, Inc #DutchMLSchool
Diabetes Anomalies
51
DIABETES
SOURCE
DIABETES
DATASET
TRAIN SET
TEST SET
ALL
MODEL
CLEAN
DATASET
FILTER
ALL
MODEL
ALL
EVALUATION
CLEAN
EVALUATION
COMPARE
EVALUATIONS
ANAOMALY
DETECTOR
BigML, Inc #DutchMLSchool
Title
52
BigML, Inc #DutchMLSchool
Intrusion Detection
53
GOAL: Identify unusual command line behavior per user and
across all users that might indicate an intrusion.
• Dataset of command line history for users
• Data for each user consists of commands,
flags, working directories, etc.
• Assumption: Users typically issue the
same flag patterns and work in certain
directories
Per User Per Dir All User All Dir
BigML, Inc #DutchMLSchool
Fraud
54
• Dataset of credit card transactions
• Additional user profile information
GOAL: Cluster users by profile and use multiple anomaly
scores to detect transactions that are anomalous on multiple
levels.
Card Level User Level Similar User Level
BigML, Inc #DutchMLSchool
Model Competence
55
• After putting a model it into production, data that is being
predicted can become statistically different than the
training data.
• Train an anomaly detector at the same time as the model.
GOAL: For every prediction, compute an anomaly score. If the
anomaly score is high, then the model may not be competent
and should not be trusted.
Prediction T T
Condence 0,86 0,84
Anomaly Score 0,5367 0,7124
Competent? Y N
At Prediction TimeAt Training Time
DATASET
MODEL
ANOMALY
DETECTOR
BigML, Inc #DutchMLSchool
Benford’s Law
56
• In real-life numeric sets the small digits occur
disproportionately often as leading signicant digits.
• Applications include:
• accounting records
• electricity bills
• street addresses
• stock prices
• population numbers
• death rates
• lengths of rivers
• Available in BigML API
BigML, Inc #DutchMLSchool
Univariate Approach
57
• Single variable: heights, test scores, etc
• Assume the value is distributed “normally”
• Compute standard deviation
• a measure of how “spread out” the numbers are
• the square root of the variance (The average of the squared
differences from the Mean.)
• Depending on the number of instances, choose a “multiple”
of standard deviations to indicate an anomaly. A multiple of 3
for 1000 instances removes ~ 3 outliers.
BigML, Inc #DutchMLSchool
Univariate Approach
58
measurement
frequency
outliersoutliers
• Available in BigML API
BigML, Inc #DutchMLSchool
Multivariate Matters
59
BigML, Inc #DutchMLSchool
Multivariate Matters
60
BigML, Inc #DutchMLSchool
Human Expert
61
Most Unusual?
BigML, Inc #DutchMLSchool
Human Expert
62
“Round”“Skinny” “Corners”
“Skinny”
but not “smooth”
No
“Corners”
Not
“Round”
Key Insight

The “most unusual” object

is different in some way from

every partition of the features.
Most unusual
BigML, Inc #DutchMLSchool
Human Expert
63
• Human used prior knowledge to select possible features
that separated the objects.
• “round”, “skinny”, “smooth”, “corners”
• Items were then separated based on the chosen features
• Each cluster was then examined to see which object fit
the least well in its cluster and did not t any other cluster
BigML, Inc #DutchMLSchool
Human Expert
64
• Length/Width
• greater than 1 => “skinny”
• equal to 1 => “round”
• less than 1 => invert
• Number of Surfaces
• distinct surfaces require “edges” which have corners
• easier to count
• Smooth - true or false
Create features that capture these object differences
BigML, Inc #DutchMLSchool
Anomaly Features
65
Object Length / Width Num Surfaces Smooth
penny 1 3 TRUE
dime 1 3 TRUE
knob 1 4 TRUE
eraser 2,75 6 TRUE
box 1 6 TRUE
block 1,6 6 TRUE
screw 8 3 FALSE
battery 5 3 TRUE
key 4,25 3 FALSE
bead 1 2 TRUE
BigML, Inc #DutchMLSchool
length/width > 5
smooth?
box
blockeraser
knob
penny/dime
bead
key
battery
screw
num surfaces = 6
length/width =1
length/width < 2
Know that “splits” matter - don’t know the order
TrueFalse
TrueFalse TrueFalse
FalseTrue
TrueFalse
Random Splits
66
BigML, Inc #DutchMLSchool
Isolation Forest
67
Grow a random decision tree until
each instance from a sample is in
its own leaf
“easy” to isolate
“hard” to isolate
Depth
Now repeat the process several times and
use average Depth to compute anomaly
score: 0 (similar) -> 1 (dissimilar)
BigML, Inc #DutchMLSchool
Isolation Forest Scoring
68
D = 3
D = 6
D = 2
S=0.45
Map avg depth
to nal score
f1 f2 f3
i1 red cat ball
i2 red cat ball
i3 red cat box
i4 blue dog pen
For the instance, i2
Find the depth in each tree
BigML, Inc #DutchMLSchool
Model Competence
69
• A low anomaly score means the loan is similar to the
modeled loans.
• A high anomaly score means you should not trust the
model.
Prediction T T
Condence
0,86 0,84
Anomaly
Score
0,5367 0,7124
Competent? Y N
OPEN LOANS
PREDICTION
ANOMALY
SCORE
CLOSED LOAN
MODEL
CLOSED LOAN
ANOMALY DETECTOR
BigML, Inc #DutchMLSchool
Title
70
BigML, Inc #DutchMLSchool
1-Class Classier?
71
• You place an advertisement in a local newspaper
• You collect demographic information about all responders
• Now you want to market in a new locality with direct letters
• To optimize mailing costs, need to predict who will respond
• But, can not distinguish not interested from didn’t see the ad
• Train an anomaly detector on the 1-class data
• Pick the households with the lowest scores for mailing:
• If a household has a low anomaly score, then they are
“similar” to enough of your positive responders and
therefore may respond as well
• If an individual has a high anomaly score, then they are
dissimilar from all previous responders and therefore are
less likely to respond.
BigML, Inc #DutchMLSchool
Summary
72
• Anomaly detection is the process of finding unusual instances
• Some techniques and how they work:
• Univariate: standard deviation
• Benford’s law
• Isolation Forest
• Applications
• Filtering to improve models
• Finding mistakes, fraud, and intruders
• Knowing when to retrain a model (competence)
• 1-class classifiers
• In general… unsupervised learning techniques:
• Require more finesse and interpretation
• Are more commonly part of a multistep workflow
Co-organized by: Sponsor:
Business Partners:

Weitere ähnliche Inhalte

Was ist angesagt?

DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBigML, Inc
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBigML, Inc
 
MLSEV. Use Case: Online and Offline World in the Retail Sector
MLSEV. Use Case: Online and Offline World in the Retail SectorMLSEV. Use Case: Online and Offline World in the Retail Sector
MLSEV. Use Case: Online and Offline World in the Retail SectorBigML, Inc
 
BSSML17 - Association Discovery
BSSML17 - Association DiscoveryBSSML17 - Association Discovery
BSSML17 - Association DiscoveryBigML, Inc
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBigML, Inc
 
BSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic RegressionsBSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic RegressionsBigML, Inc
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - EnsemblesBigML, Inc
 
BSSML17 - Topic Models
BSSML17 - Topic ModelsBSSML17 - Topic Models
BSSML17 - Topic ModelsBigML, Inc
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringBigML, Inc
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsBigML, Inc
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Sri Ambati
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series BigML, Inc
 

Was ist angesagt? (15)

DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
 
MLSEV. Use Case: Online and Offline World in the Retail Sector
MLSEV. Use Case: Online and Offline World in the Retail SectorMLSEV. Use Case: Online and Offline World in the Retail Sector
MLSEV. Use Case: Online and Offline World in the Retail Sector
 
BSSML17 - Association Discovery
BSSML17 - Association DiscoveryBSSML17 - Association Discovery
BSSML17 - Association Discovery
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
BSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic RegressionsBSSML16 L2. Ensembles and Logistic Regressions
BSSML16 L2. Ensembles and Logistic Regressions
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
 
BSSML17 - Topic Models
BSSML17 - Topic ModelsBSSML17 - Topic Models
BSSML17 - Topic Models
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series
 

Ähnlich wie DutchMLSchool. Clusters and Anomalies

VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationBigML, Inc
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
BigML Education - Clusters
BigML Education - ClustersBigML Education - Clusters
BigML Education - ClustersBigML, Inc
 
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBigML, Inc
 
DutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningDutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningBigML, Inc
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveBigML, Inc
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxPrudhvirajEluri1
 
Data Science 101
Data Science 101Data Science 101
Data Science 101ideatoipo
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataDatameer
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
BigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs UnsupervisedBigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs UnsupervisedBigML, Inc
 
07 learning
07 learning07 learning
07 learningankit_ppt
 

Ähnlich wie DutchMLSchool. Clusters and Anomalies (20)

VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
BigML Education - Clusters
BigML Education - ClustersBigML Education - Clusters
BigML Education - Clusters
 
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
DutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised LearningDutchMLSchool. Supervised vs Unsupervised Learning
DutchMLSchool. Supervised vs Unsupervised Learning
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
 
DutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business PerspectiveDutchMLSchool. ML Business Perspective
DutchMLSchool. ML Business Perspective
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
L14. Anomaly Detection
L14. Anomaly DetectionL14. Anomaly Detection
L14. Anomaly Detection
 
How to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited DataHow to do Predictive Analytics with Limited Data
How to do Predictive Analytics with Limited Data
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
BigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs UnsupervisedBigML Education - Supervised vs Unsupervised
BigML Education - Supervised vs Unsupervised
 
07 learning
07 learning07 learning
07 learning
 

Mehr von BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryBigML, Inc
 

Mehr von BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility Industry
 

KĂźrzlich hochgeladen

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

KĂźrzlich hochgeladen (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

DutchMLSchool. Clusters and Anomalies

  • 1. 1st edition | July 8-11, 2019
  • 2. BigML, Inc #DutchMLSchool Clusters Finding Similarities Poul Petersen CIO, BigML, Inc 2
  • 3. BigML, Inc #DutchMLSchool What is Clustering? 3 • An unsupervised learning technique • No labels necessary • Useful for nding similar instances • Smart sampling/labelling • Finds “self-similar" groups of instances • Customer: groups with similar behavior • Medical: patients with similar diagnostic measurements • Denes each group by a “centroid” • Geometric center of the group • Represents the “average” member • Number of centroids (k) can be specied or determined
  • 4. BigML, Inc #DutchMLSchool Cluster Centroids 4 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51
  • 5. BigML, Inc #DutchMLSchool Cluster Centroids 5 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 auth = pin amount ~ $100 Same: date: Mon != Wed customer: Sally != Bob account: 6788 != 3421 class: clothes != gas zip: 26339 != 46140 Different: date = Wed (2 out of 3) customer = Bob account = 3421 auth = pin class = gas zip = 46140 amount = $104 Centroid: similar
  • 6. BigML, Inc #DutchMLSchool Use Cases 6 • Customer segmentation • Which customers are similar? • How many natural groups are there? • Item discovery • What other items are similar to this one? • Similarity • What other instances share a specic property? • Recommender (almost) • If you like this item, what other items might you like? • Active learning • Labelling unlabelled data efciently
  • 7. BigML, Inc #DutchMLSchool Customer Segmentation 7 GOAL: Cluster the users by usage statistics. Identify clusters with a higher percentage of high LTV users. Since they have similar usage patterns, the remaining users in these clusters may be good candidates for up-sell. • Dataset of mobile game users. • Data for each user consists of usage statistics and a LTV based on in- game purchases • Assumption: Usage correlates to LTV 0% 3% 1%
  • 8. BigML, Inc #DutchMLSchool Similarity 8 GOAL: Cluster the loans by application profile to rank loan quality by percentage of trouble loans in population • Dataset of Lending Club Loans • Mark any loan that is currently or has even been late as “trouble” 0% 3% 7% 1%
  • 9. BigML, Inc #DutchMLSchool Active Learning 9 GOAL: Rather than sample randomly, use clustering to group patients by similarity and then test a sample from each cluster to label the data. • Dataset of diagnostic measurements of 768 patients. • Want to test each patient for diabetes and label the dataset to build a model but the test is expensive*.
  • 10. BigML, Inc #DutchMLSchool Active Learning 10 *For a more realistic example of high cost, imagine a dataset with a billion transactions, each one needing to be labelled as fraud/not- fraud. Or a million images which need to be labeled as cat/not-cat. 2323
  • 11. BigML, Inc #DutchMLSchool Item Discovery 11 GOAL: Cluster the whiskies by flavor profile to discover whiskies that have similar taste. • Dataset of 86 whiskies • Each whiskey scored on a scale from 0 to 4 for each of 12 possible flavor characteristics. Smoky Fruity
  • 13. BigML, Inc #DutchMLSchool Human Expert 13 Cluster into 3 groups…
  • 15. BigML, Inc #DutchMLSchool Human Expert 15 • Jesa used prior knowledge to select possible features that separated the objects. • “round”, “skinny”, “edges”, “hard”, etc • Items were then clustered based on the chosen features • Separation quality was then tested to ensure: • met criteria of K=3 • groups were sufciently “distant” • no crossover
  • 16. BigML, Inc #DutchMLSchool Human Expert 16 • Length/Width • greater than 1 => “skinny” • equal to 1 => “round” • less than 1 => invert • Number of Surfaces • distinct surfaces require “edges” which have corners • easier to count Create features that capture these object differences
  • 17. BigML, Inc #DutchMLSchool Clustering Features 17 Object Length / Width Num Surfaces penny 1 3 dime 1 3 knob 1 4 eraser 2,75 6 box 1 6 block 1,6 6 screw 8 3 battery 5 3 key 4,25 3 bead 1 2
  • 18. BigML, Inc #DutchMLSchool Plot by Features 18 Num Surfaces Length / Width box block eraser knob penny dime bead key battery screw K-Means Key Insight: We can nd clusters using distances in n-dimensional feature space K=3
  • 19. BigML, Inc #DutchMLSchool Plot by Features 19 Num Surfaces Length / Width box block eraser knob penny dime bead key battery screw K-Means Find “best” (minimum distance) circles that include all points
  • 21. BigML, Inc #DutchMLSchool K-Means Algorithm 21 K=3 Repeat until centroids stop moving
  • 22. BigML, Inc #DutchMLSchool Features Matter 22 Metal Other Wood
  • 23. BigML, Inc #DutchMLSchool Convergence 23 Convergence guaranteed but not necessarily unique Starting points important (K++)
  • 24. BigML, Inc #DutchMLSchool Starting Points 24 • Random points or instances in n-dimensional space • Might start "too close" • Risk of sub-optimal convergence
  • 25. BigML, Inc #DutchMLSchool Sub-Optimal Converge 25 Arbitrarily Far Apart
 Sub-Optimal Arbitrarily Far Apart
 Optimal
  • 26. BigML, Inc #DutchMLSchool Starting Points 26 • Random points or instances in n-dimensional space • Might start "too close" • Risk of sub-optimal convergence • Chose points “farthest” away from each other • but this is sensitive to outliers • k++ • the rst point is chosen randomly from instances • each subsequent point is chosen from the remaining instances with a probability proportional to the squared distance from the point's closest existing cluster center
  • 27. BigML, Inc #DutchMLSchool K++ Initial Centers 27 Low
 Probability High
 ProbabilityHighest
 Probability K=3
  • 28. BigML, Inc #DutchMLSchool K++ Initial Centers 28 Low
 Probability Low
 Probability K=3
  • 29. BigML, Inc #DutchMLSchool K++ Initial Centers 29 K=3
  • 30. BigML, Inc #DutchMLSchool Scaling Matters 30 price number of bedrooms d = 160,000 d = 1
  • 31. BigML, Inc #DutchMLSchool Other Tricks 31 • What is the distance to a “missing value”? • What is the distance between categorical values? • How far is “red” from “green”? • What is the distance between text features? • Does it have to be Euclidean distance? • Unknown ideal number of clusters, “K”?
  • 32. BigML, Inc #DutchMLSchool Distance to Missing? 32 • Nonsense! Try replacing missing values with: • Maximum • Mean • Median • Minimum • Zero • Ignore instances with missing values
  • 33. BigML, Inc #DutchMLSchool Distance to Categorical? 33 • Dene special distance function: For two instances 𝑥 and 𝑦 and the categorical eld 𝑎: • if 𝑥 𝑎 = 𝑦 𝑎 then
 (𝑥,𝑦)distance=0 (or eld scaling value) 
 else 
 (𝑥,𝑦)distance=1 Approach: similar to “k-prototypes”
  • 34. BigML, Inc #DutchMLSchool Distance to Categorical? 34 animal favorite toy toy color cat ball red cat ball green d=0 d=0 d=1 cat laser red dog squeaky red d=1 d=1 d=0 D = 1 Then compute Euclidean distance between vectors D = √2 Note: the centroid is assigned the most common category of the member instances
  • 35. BigML, Inc #DutchMLSchool Text Vectors 35 1 Cosine Similarity 0 -1 "hippo" "safari" "zebra" …. 1 0 1 … 1 1 0 … 0 1 1 … Text Field #1 Text Field #2 Features(thousands) • Cosine Similarity • cos() between two vectors • 1 if collinear, 0 if orthogonal • only positive vectors: 0 ≤ CS ≤ 1 • Cosine Distance=1-Cosine Similarity • CD(TF1, TF2) = 0.5
  • 38. BigML, Inc #DutchMLSchool Finding K: G-Means 38 Let K=2 Keep 1, Split 1 New K=3
  • 39. BigML, Inc #DutchMLSchool Finding K: G-Means 39 Let K=3 Keep 1, Split 2 New K=5
  • 40. BigML, Inc #DutchMLSchool Finding K: G-Means 40 Let K=5 K=5
  • 42. BigML, Inc #DutchMLSchool Summary 42 • Cluster Purpose • Unsupervised technique for nding self-similar groups of instances • Number of centroids (k) can be inputed or computed • Outputs list of centroids • Conguration: • Algorithm: K-means / G-means • Cluster Parameter: k or critical value • Default missing / Summary elds / Scales / Weights • Model Clusters • Centroid / Batchcentroids
  • 43. BigML, Inc #DutchMLSchool Anomaly Detection Finding the Unusual Poul Petersen CIO, BigML, Inc 43
  • 44. BigML, Inc #DutchMLSchool What is Anomaly Detection? 44 • An unsupervised learning technique • No labels necessary • Useful for nding unusual instances • Filtering, nding mistakes, 1-class classiers • Finds instances that do not match • Customer: big or small spender for prole • Medical: healthy patient despite indicative diagnostics • Denes each unusual instance by an “anomaly score” • in BigML: 0=normal, 1=unusual, and 0.7 ≫ 0.6 0.5 • Standard deviation, distributions, etc
  • 45. BigML, Inc #DutchMLSchool Clusters 45 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51
  • 46. BigML, Inc #DutchMLSchool Clusters 46 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 similar
  • 47. BigML, Inc #DutchMLSchool Anomaly Detection 47 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51
  • 48. BigML, Inc #DutchMLSchool Anomaly Detection 48 date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 anomaly • Amount $2,459 is higher than all other transactions • It is the only transaction • In zip 21350 • for the purchase class "tech"
  • 49. BigML, Inc #DutchMLSchool Use Cases 49 • Unusual instance discovery - "exploration" • Intrusion Detection - "looking for unusual usage patterns" • Fraud - "looking for unusual behavior" • Identify Incorrect Data - "looking for mistakes" • Remove Outliers - "improve model quality" • Model Competence / Input Data Drift
  • 50. BigML, Inc #DutchMLSchool Removing Outliers 50 • Models need to generalize • Outliers negatively impact generalization GOAL: Use anomaly detector to identify most anomalous points and then remove them before modeling. DATASET FILTERED DATASET ANOMALY DETECTOR CLEAN MODEL
  • 51. BigML, Inc #DutchMLSchool Diabetes Anomalies 51 DIABETES SOURCE DIABETES DATASET TRAIN SET TEST SET ALL MODEL CLEAN DATASET FILTER ALL MODEL ALL EVALUATION CLEAN EVALUATION COMPARE EVALUATIONS ANAOMALY DETECTOR
  • 53. BigML, Inc #DutchMLSchool Intrusion Detection 53 GOAL: Identify unusual command line behavior per user and across all users that might indicate an intrusion. • Dataset of command line history for users • Data for each user consists of commands, flags, working directories, etc. • Assumption: Users typically issue the same flag patterns and work in certain directories Per User Per Dir All User All Dir
  • 54. BigML, Inc #DutchMLSchool Fraud 54 • Dataset of credit card transactions • Additional user prole information GOAL: Cluster users by profile and use multiple anomaly scores to detect transactions that are anomalous on multiple levels. Card Level User Level Similar User Level
  • 55. BigML, Inc #DutchMLSchool Model Competence 55 • After putting a model it into production, data that is being predicted can become statistically different than the training data. • Train an anomaly detector at the same time as the model. GOAL: For every prediction, compute an anomaly score. If the anomaly score is high, then the model may not be competent and should not be trusted. Prediction T T Condence 0,86 0,84 Anomaly Score 0,5367 0,7124 Competent? Y N At Prediction TimeAt Training Time DATASET MODEL ANOMALY DETECTOR
  • 56. BigML, Inc #DutchMLSchool Benford’s Law 56 • In real-life numeric sets the small digits occur disproportionately often as leading signicant digits. • Applications include: • accounting records • electricity bills • street addresses • stock prices • population numbers • death rates • lengths of rivers • Available in BigML API
  • 57. BigML, Inc #DutchMLSchool Univariate Approach 57 • Single variable: heights, test scores, etc • Assume the value is distributed “normally” • Compute standard deviation • a measure of how “spread out” the numbers are • the square root of the variance (The average of the squared differences from the Mean.) • Depending on the number of instances, choose a “multiple” of standard deviations to indicate an anomaly. A multiple of 3 for 1000 instances removes ~ 3 outliers.
  • 58. BigML, Inc #DutchMLSchool Univariate Approach 58 measurement frequency outliersoutliers • Available in BigML API
  • 61. BigML, Inc #DutchMLSchool Human Expert 61 Most Unusual?
  • 62. BigML, Inc #DutchMLSchool Human Expert 62 “Round”“Skinny” “Corners” “Skinny” but not “smooth” No “Corners” Not “Round” Key Insight The “most unusual” object is different in some way from every partition of the features. Most unusual
  • 63. BigML, Inc #DutchMLSchool Human Expert 63 • Human used prior knowledge to select possible features that separated the objects. • “round”, “skinny”, “smooth”, “corners” • Items were then separated based on the chosen features • Each cluster was then examined to see which object t the least well in its cluster and did not t any other cluster
  • 64. BigML, Inc #DutchMLSchool Human Expert 64 • Length/Width • greater than 1 => “skinny” • equal to 1 => “round” • less than 1 => invert • Number of Surfaces • distinct surfaces require “edges” which have corners • easier to count • Smooth - true or false Create features that capture these object differences
  • 65. BigML, Inc #DutchMLSchool Anomaly Features 65 Object Length / Width Num Surfaces Smooth penny 1 3 TRUE dime 1 3 TRUE knob 1 4 TRUE eraser 2,75 6 TRUE box 1 6 TRUE block 1,6 6 TRUE screw 8 3 FALSE battery 5 3 TRUE key 4,25 3 FALSE bead 1 2 TRUE
  • 66. BigML, Inc #DutchMLSchool length/width > 5 smooth? box blockeraser knob penny/dime bead key battery screw num surfaces = 6 length/width =1 length/width < 2 Know that “splits” matter - don’t know the order TrueFalse TrueFalse TrueFalse FalseTrue TrueFalse Random Splits 66
  • 67. BigML, Inc #DutchMLSchool Isolation Forest 67 Grow a random decision tree until each instance from a sample is in its own leaf “easy” to isolate “hard” to isolate Depth Now repeat the process several times and use average Depth to compute anomaly score: 0 (similar) -> 1 (dissimilar)
  • 68. BigML, Inc #DutchMLSchool Isolation Forest Scoring 68 D = 3 D = 6 D = 2 S=0.45 Map avg depth to nal score f1 f2 f3 i1 red cat ball i2 red cat ball i3 red cat box i4 blue dog pen For the instance, i2 Find the depth in each tree
  • 69. BigML, Inc #DutchMLSchool Model Competence 69 • A low anomaly score means the loan is similar to the modeled loans. • A high anomaly score means you should not trust the model. Prediction T T Condence 0,86 0,84 Anomaly Score 0,5367 0,7124 Competent? Y N OPEN LOANS PREDICTION ANOMALY SCORE CLOSED LOAN MODEL CLOSED LOAN ANOMALY DETECTOR
  • 71. BigML, Inc #DutchMLSchool 1-Class Classier? 71 • You place an advertisement in a local newspaper • You collect demographic information about all responders • Now you want to market in a new locality with direct letters • To optimize mailing costs, need to predict who will respond • But, can not distinguish not interested from didn’t see the ad • Train an anomaly detector on the 1-class data • Pick the households with the lowest scores for mailing: • If a household has a low anomaly score, then they are “similar” to enough of your positive responders and therefore may respond as well • If an individual has a high anomaly score, then they are dissimilar from all previous responders and therefore are less likely to respond.
  • 72. BigML, Inc #DutchMLSchool Summary 72 • Anomaly detection is the process of nding unusual instances • Some techniques and how they work: • Univariate: standard deviation • Benford’s law • Isolation Forest • Applications • Filtering to improve models • Finding mistakes, fraud, and intruders • Knowing when to retrain a model (competence) • 1-class classiers • In general… unsupervised learning techniques: • Require more nesse and interpretation • Are more commonly part of a multistep workflow