SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Kamal Gupta Roy
Kamal Gupta Roy
kNN Algorithm
Kamal Gupta Roy
What would be the color of new circles?
| 2
A
B
C
D
Circle Color
A Red
B Brown
C Blue
D ???
Kamal Gupta Roy
| 3
Kamal Gupta Roy
Kamal Gupta Roy
Its all about how
far and how
close you are
from others?
| 4
A
B
C
D
Finding Nearest
Neighbors, who
they are?
Kamal Gupta Roy
Kamal Gupta Roy
Different Names for the same algorithm
| 5
Memory
based
Reasoning
Example
based
Reasoning
Instance
based
Learning
Lazy Learning
K-nearest neighbor
(KNN)
Kamal Gupta Roy
Kamal Gupta Roy
What is k in kNN?
| 6
A
Dotted Circle Decision k
Purple Red 1
Green Red 3
Orange Blue 13
Kamal Gupta Roy
Kamal Gupta Roy
Choosing the value of k?
| 7
Neighborhood may
include points from
other classes
Sensitive to
noise points
k is
too
small
k is too
large
Kamal Gupta Roy
Kamal Gupta Roy
Optimal k
| 8
Kamal Gupta Roy
Distance
| 9
Kamal Gupta Roy
Kamal Gupta Roy
Yellow path
Red path
Blue path
Green path
| 10
A
B
Kamal Gupta Roy
Kamal Gupta Roy
Manhattan Distance
|
11
• The distance between two points
measured along axes at right angles.
• In a plane with p1 at (x1, y1) and p2 at (x2,
y2), it is |x1 - x2| + |y1 - y2|
Kamal Gupta Roy
Kamal Gupta Roy
Euclidean Distance
Kamal Gupta Roy
Kamal Gupta Roy
Manhattan vs Euclidien
| 13
Kamal Gupta Roy
Kamal Gupta Roy
New
Value =
46
| 14
Age Default distance square(distance) d
25 Y -21 441 21
35 Y -11 121 11
45 Y -1 1 1
20 Y -26 676 26
35 Y -11 121 11
52 Y 6 36 6
23 Y -23 529 23
40 N -6 36 6
60 N 14 196 14
48 N 2 4 2
33 N -13 169 13
27 N -19 361 19
37 N -9 81 9
Default =
Yes
Kamal Gupta Roy
Kamal Gupta Roy
Exercise
| 15
Age Loan Default
25 40,000 Y
35 60,000 Y
45 80,000 Y
20 20,000 Y
35 120,000 Y
52 38,000 Y
23 85,000 Y
40 62,000 N
60 98,000 N
48 100,000 N
33 110,000 N
27 130,000 N
37 90,000 N
Predict default for a customer
with age = 46 and applied loan for
128,000
Kamal Gupta Roy
• Age = 46
• loan=128,000
| 16
• Default = No
Age Loan Default age dist sq loan dist sq d
25 40,000 Y 441 7,744,000,000 88,000
35 60,000 Y 121 4,624,000,000 68,000
45 80,000 Y 1 2,304,000,000 48,000
20 20,000 Y 676 11,664,000,000 108,000
35 120,000 Y 121 64,000,000 8,000
52 38,000 Y 36 8,100,000,000 90,000
23 85,000 Y 529 1,849,000,000 43,000
40 62,000 N 36 4,356,000,000 66,000
60 98,000 N 196 900,000,000 30,000
48 100,000 N 4 784,000,000 28,000
33 110,000 N 169 324,000,000 18,000
27 130,000 N 361 4,000,000 2,000
37 90,000 N 81 1,444,000,000 38,000
Kamal Gupta Roy
• Age = 46
• loan=128 K
| 17
• Default = Yes
Age Loan Default age dist sq loan dist sq d
25 40 Y 441 7,744 90
35 60 Y 121 4,624 69
45 80 Y 1 2,304 48
20 20 Y 676 11,664 111
35 120 Y 121 64 14
52 38 Y 36 8,100 90
23 85 Y 529 1,849 49
40 62 N 36 4,356 66
60 98 N 196 900 33
48 100 N 4 784 28
33 110 N 169 324 22
27 130 N 361 4 19
37 90 N 81 1,444 39
Kamal Gupta Roy
Kamal Gupta Roy
Feature
Scaling
| 18
Kamal Gupta Roy
Why scaling?
Scaling issues – Attributes may have to be scaled to
prevent distance measures from being dominated by one
of the attributes
Example:
height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 45 KG to 100KG
income of a person may vary from Rs10K to Rs 5 lakh
| 19
Kamal Gupta Roy
Standardization
Also called as Z-score normalization
Mean is zero
Standard deviation 1
| 20
Kamal Gupta Roy
Max-Min
Normalization
Also called as Min-Max scaling
normalization
Minimum is zero
Maximum is 1
| 21
Kamal Gupta Roy
| 22
Raw Data
Z Normalized
Max-Min
Kamal Gupta Roy
• Age = 46; norm age: 0.65
• loan=128,000; norm age: 0.98
| 23
• Default = No
Age Loan Default norm age norm loan age dist sq loan dist sq d
25 40,000 Y 0.13 0.18 0.28 0.64 0.96
35 60,000 Y 0.38 0.36 0.08 0.38 0.68
45 80,000 Y 0.63 0.55 0.00 0.19 0.44
20 20,000 Y - - 0.42 0.96 1.18
35 120,000 Y 0.38 0.91 0.08 0.01 0.28
52 38,000 Y 0.80 0.16 0.02 0.67 0.83
23 85,000 Y 0.08 0.59 0.33 0.15 0.70
40 62,000 N 0.50 0.38 0.02 0.36 0.62
60 98,000 N 1.00 0.71 0.12 0.07 0.44
48 100,000 N 0.70 0.73 0.00 0.06 0.26
33 110,000 N 0.33 0.82 0.11 0.03 0.36
27 130,000 N 0.18 1.00 0.23 0.00 0.48
37 90,000 N 0.43 0.64 0.05 0.12 0.41
Kamal Gupta Roy
| 24
Age Loan Default age dist sq loan dist sq d
25 40,000 Y 441 7,744,000,000 88,000
35 60,000 Y 121 4,624,000,000 68,000
45 80,000 Y 1 2,304,000,000 48,000
20 20,000 Y 676 11,664,000,000 108,000
35 120,000 Y 121 64,000,000 8,000
52 38,000 Y 36 8,100,000,000 90,000
23 85,000 Y 529 1,849,000,000 43,000
40 62,000 N 36 4,356,000,000 66,000
60 98,000 N 196 900,000,000 30,000
48 100,000 N 4 784,000,000 28,000
33 110,000 N 169 324,000,000 18,000
27 130,000 N 361 4,000,000 2,000
37 90,000 N 81 1,444,000,000 38,000
Age Loan Default
age dist
sq loan dist sq d
25 40 Y 441 7,744 90
35 60 Y 121 4,624 69
45 80 Y 1 2,304 48
20 20 Y 676 11,664 111
35 120 Y 121 64 14
52 38 Y 36 8,100 90
23 85 Y 529 1,849 49
40 62 N 36 4,356 66
60 98 N 196 900 33
48 100 N 4 784 28
33 110 N 169 324 22
27 130 N 361 4 19
37 90 N 81 1,444 39
Age Loan Default norm age norm loan age dist sq loan dist sq d
25 40,000 Y 0.13 0.18 0.28 0.64 0.96
35 60,000 Y 0.38 0.36 0.08 0.38 0.68
45 80,000 Y 0.63 0.55 0.00 0.19 0.44
20 20,000 Y - - 0.42 0.96 1.18
35 120,000 Y 0.38 0.91 0.08 0.01 0.28
52 38,000 Y 0.80 0.16 0.02 0.67 0.83
23 85,000 Y 0.08 0.59 0.33 0.15 0.70
40 62,000 N 0.50 0.38 0.02 0.36 0.62
60 98,000 N 1.00 0.71 0.12 0.07 0.44
48 100,000 N 0.70 0.73 0.00 0.06 0.26
33 110,000 N 0.33 0.82 0.11 0.03 0.36
27 130,000 N 0.18 1.00 0.23 0.00 0.48
37 90,000 N 0.43 0.64 0.05 0.12 0.41
Kamal Gupta Roy
CONFUSION MATRIX
| 25
Kamal Gupta Roy
Kamal Gupta Roy
Hiring Process Example
| 26
Matrix
Predicted
Good
Predicted
Bad
Actual
Good
Hired Good
Candidate
Rejected
Good
Candidate
Actual
Bad
Hired Bad
Candidate
Rejected
Bad
Candidate
TP
TN
FN
FP
Confusion
Matrix
Kamal Gupta Roy
Confusion Matrix
Predicted
Yes
Predicted
No
Actual
Yes
TP FN
Actual
No
FP TN
| 27
Accuracy = (TP + TN)/ (TP + FN + FP + TN)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)
Type 1 Error
Type 2 Error
Kamal Gupta Roy
Kamal Gupta Roy
Precision vs
Recall
| 28
Kamal Gupta Roy
Kamal Gupta Roy
Pregnancy Test
| 29
Predicted
Pregnant
Predicted
Not Pregnant
Actual
Pregnant
TP FN
Actual
Not Pregnant
FP TN
TN
TP
FP
FN
Kamal Gupta Roy
Sensitivity & Specificity
Predicted
Yes
Predicted
No
Actual
Yes
TP FN
Actual
No
FP TN
| 30
True Negative Rate, Specificity = TN / (TN+FP)
False Positive Rate = FP / (TN+FP)
True Positive Rate, Sensitivity = TP / (TP + FN)
False Negative Rate = FN / (TN+FP)

Weitere ähnliche Inhalte

Was ist angesagt?

Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceKaushik Rajan
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionRavi Gupta
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card PredictionAlexandre Pinto
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingTony Nguyen
 
Data preprocessing in Data Mining
Data preprocessing  in Data MiningData preprocessing  in Data Mining
Data preprocessing in Data MiningSamad Baseer Khan
 

Was ist angesagt? (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduce
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Isolation Forest
Isolation ForestIsolation Forest
Isolation Forest
 
Kmeans
KmeansKmeans
Kmeans
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing in Data Mining
Data preprocessing  in Data MiningData preprocessing  in Data Mining
Data preprocessing in Data Mining
 

Ähnlich wie Knn Algorithm

Ähnlich wie Knn Algorithm (6)

Ot ppt
Ot pptOt ppt
Ot ppt
 
2 organizing and displaying data
2  organizing and displaying    data2  organizing and displaying    data
2 organizing and displaying data
 
Lta qrb501 wk6
Lta qrb501 wk6Lta qrb501 wk6
Lta qrb501 wk6
 
The Professor Proposes
The Professor ProposesThe Professor Proposes
The Professor Proposes
 
Auroras Lighting 10W led downlight test report
Auroras Lighting 10W led downlight test reportAuroras Lighting 10W led downlight test report
Auroras Lighting 10W led downlight test report
 
Oil andgas
Oil andgasOil andgas
Oil andgas
 

Kürzlich hochgeladen

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Kürzlich hochgeladen (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Knn Algorithm

  • 1. Kamal Gupta Roy Kamal Gupta Roy kNN Algorithm
  • 2. Kamal Gupta Roy What would be the color of new circles? | 2 A B C D Circle Color A Red B Brown C Blue D ???
  • 4. Kamal Gupta Roy Kamal Gupta Roy Its all about how far and how close you are from others? | 4 A B C D Finding Nearest Neighbors, who they are?
  • 5. Kamal Gupta Roy Kamal Gupta Roy Different Names for the same algorithm | 5 Memory based Reasoning Example based Reasoning Instance based Learning Lazy Learning K-nearest neighbor (KNN)
  • 6. Kamal Gupta Roy Kamal Gupta Roy What is k in kNN? | 6 A Dotted Circle Decision k Purple Red 1 Green Red 3 Orange Blue 13
  • 7. Kamal Gupta Roy Kamal Gupta Roy Choosing the value of k? | 7 Neighborhood may include points from other classes Sensitive to noise points k is too small k is too large
  • 8. Kamal Gupta Roy Kamal Gupta Roy Optimal k | 8
  • 10. Kamal Gupta Roy Kamal Gupta Roy Yellow path Red path Blue path Green path | 10 A B
  • 11. Kamal Gupta Roy Kamal Gupta Roy Manhattan Distance | 11 • The distance between two points measured along axes at right angles. • In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|
  • 12. Kamal Gupta Roy Kamal Gupta Roy Euclidean Distance
  • 13. Kamal Gupta Roy Kamal Gupta Roy Manhattan vs Euclidien | 13
  • 14. Kamal Gupta Roy Kamal Gupta Roy New Value = 46 | 14 Age Default distance square(distance) d 25 Y -21 441 21 35 Y -11 121 11 45 Y -1 1 1 20 Y -26 676 26 35 Y -11 121 11 52 Y 6 36 6 23 Y -23 529 23 40 N -6 36 6 60 N 14 196 14 48 N 2 4 2 33 N -13 169 13 27 N -19 361 19 37 N -9 81 9 Default = Yes
  • 15. Kamal Gupta Roy Kamal Gupta Roy Exercise | 15 Age Loan Default 25 40,000 Y 35 60,000 Y 45 80,000 Y 20 20,000 Y 35 120,000 Y 52 38,000 Y 23 85,000 Y 40 62,000 N 60 98,000 N 48 100,000 N 33 110,000 N 27 130,000 N 37 90,000 N Predict default for a customer with age = 46 and applied loan for 128,000
  • 16. Kamal Gupta Roy • Age = 46 • loan=128,000 | 16 • Default = No Age Loan Default age dist sq loan dist sq d 25 40,000 Y 441 7,744,000,000 88,000 35 60,000 Y 121 4,624,000,000 68,000 45 80,000 Y 1 2,304,000,000 48,000 20 20,000 Y 676 11,664,000,000 108,000 35 120,000 Y 121 64,000,000 8,000 52 38,000 Y 36 8,100,000,000 90,000 23 85,000 Y 529 1,849,000,000 43,000 40 62,000 N 36 4,356,000,000 66,000 60 98,000 N 196 900,000,000 30,000 48 100,000 N 4 784,000,000 28,000 33 110,000 N 169 324,000,000 18,000 27 130,000 N 361 4,000,000 2,000 37 90,000 N 81 1,444,000,000 38,000
  • 17. Kamal Gupta Roy • Age = 46 • loan=128 K | 17 • Default = Yes Age Loan Default age dist sq loan dist sq d 25 40 Y 441 7,744 90 35 60 Y 121 4,624 69 45 80 Y 1 2,304 48 20 20 Y 676 11,664 111 35 120 Y 121 64 14 52 38 Y 36 8,100 90 23 85 Y 529 1,849 49 40 62 N 36 4,356 66 60 98 N 196 900 33 48 100 N 4 784 28 33 110 N 169 324 22 27 130 N 361 4 19 37 90 N 81 1,444 39
  • 18. Kamal Gupta Roy Kamal Gupta Roy Feature Scaling | 18
  • 19. Kamal Gupta Roy Why scaling? Scaling issues – Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 45 KG to 100KG income of a person may vary from Rs10K to Rs 5 lakh | 19
  • 20. Kamal Gupta Roy Standardization Also called as Z-score normalization Mean is zero Standard deviation 1 | 20
  • 21. Kamal Gupta Roy Max-Min Normalization Also called as Min-Max scaling normalization Minimum is zero Maximum is 1 | 21
  • 22. Kamal Gupta Roy | 22 Raw Data Z Normalized Max-Min
  • 23. Kamal Gupta Roy • Age = 46; norm age: 0.65 • loan=128,000; norm age: 0.98 | 23 • Default = No Age Loan Default norm age norm loan age dist sq loan dist sq d 25 40,000 Y 0.13 0.18 0.28 0.64 0.96 35 60,000 Y 0.38 0.36 0.08 0.38 0.68 45 80,000 Y 0.63 0.55 0.00 0.19 0.44 20 20,000 Y - - 0.42 0.96 1.18 35 120,000 Y 0.38 0.91 0.08 0.01 0.28 52 38,000 Y 0.80 0.16 0.02 0.67 0.83 23 85,000 Y 0.08 0.59 0.33 0.15 0.70 40 62,000 N 0.50 0.38 0.02 0.36 0.62 60 98,000 N 1.00 0.71 0.12 0.07 0.44 48 100,000 N 0.70 0.73 0.00 0.06 0.26 33 110,000 N 0.33 0.82 0.11 0.03 0.36 27 130,000 N 0.18 1.00 0.23 0.00 0.48 37 90,000 N 0.43 0.64 0.05 0.12 0.41
  • 24. Kamal Gupta Roy | 24 Age Loan Default age dist sq loan dist sq d 25 40,000 Y 441 7,744,000,000 88,000 35 60,000 Y 121 4,624,000,000 68,000 45 80,000 Y 1 2,304,000,000 48,000 20 20,000 Y 676 11,664,000,000 108,000 35 120,000 Y 121 64,000,000 8,000 52 38,000 Y 36 8,100,000,000 90,000 23 85,000 Y 529 1,849,000,000 43,000 40 62,000 N 36 4,356,000,000 66,000 60 98,000 N 196 900,000,000 30,000 48 100,000 N 4 784,000,000 28,000 33 110,000 N 169 324,000,000 18,000 27 130,000 N 361 4,000,000 2,000 37 90,000 N 81 1,444,000,000 38,000 Age Loan Default age dist sq loan dist sq d 25 40 Y 441 7,744 90 35 60 Y 121 4,624 69 45 80 Y 1 2,304 48 20 20 Y 676 11,664 111 35 120 Y 121 64 14 52 38 Y 36 8,100 90 23 85 Y 529 1,849 49 40 62 N 36 4,356 66 60 98 N 196 900 33 48 100 N 4 784 28 33 110 N 169 324 22 27 130 N 361 4 19 37 90 N 81 1,444 39 Age Loan Default norm age norm loan age dist sq loan dist sq d 25 40,000 Y 0.13 0.18 0.28 0.64 0.96 35 60,000 Y 0.38 0.36 0.08 0.38 0.68 45 80,000 Y 0.63 0.55 0.00 0.19 0.44 20 20,000 Y - - 0.42 0.96 1.18 35 120,000 Y 0.38 0.91 0.08 0.01 0.28 52 38,000 Y 0.80 0.16 0.02 0.67 0.83 23 85,000 Y 0.08 0.59 0.33 0.15 0.70 40 62,000 N 0.50 0.38 0.02 0.36 0.62 60 98,000 N 1.00 0.71 0.12 0.07 0.44 48 100,000 N 0.70 0.73 0.00 0.06 0.26 33 110,000 N 0.33 0.82 0.11 0.03 0.36 27 130,000 N 0.18 1.00 0.23 0.00 0.48 37 90,000 N 0.43 0.64 0.05 0.12 0.41
  • 26. Kamal Gupta Roy Kamal Gupta Roy Hiring Process Example | 26 Matrix Predicted Good Predicted Bad Actual Good Hired Good Candidate Rejected Good Candidate Actual Bad Hired Bad Candidate Rejected Bad Candidate TP TN FN FP Confusion Matrix
  • 27. Kamal Gupta Roy Confusion Matrix Predicted Yes Predicted No Actual Yes TP FN Actual No FP TN | 27 Accuracy = (TP + TN)/ (TP + FN + FP + TN) Recall = TP / (TP + FN) Precision = TP / (TP + FP) Type 1 Error Type 2 Error
  • 28. Kamal Gupta Roy Kamal Gupta Roy Precision vs Recall | 28
  • 29. Kamal Gupta Roy Kamal Gupta Roy Pregnancy Test | 29 Predicted Pregnant Predicted Not Pregnant Actual Pregnant TP FN Actual Not Pregnant FP TN TN TP FP FN
  • 30. Kamal Gupta Roy Sensitivity & Specificity Predicted Yes Predicted No Actual Yes TP FN Actual No FP TN | 30 True Negative Rate, Specificity = TN / (TN+FP) False Positive Rate = FP / (TN+FP) True Positive Rate, Sensitivity = TP / (TP + FN) False Negative Rate = FN / (TN+FP)