SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
15/8/12 NCKU TienYang
kaggle-預測未來腳踏⾞車租借數量
supervised learning
unsupervised learning
supervised learning unsupervised learning
classification v.s clustering
• Data is not labeled
• Group pointed that are “close” to each other
• Unsupervised learning
• Labeled data points
• Want a “rule” that assigns labels to new points
• Supervised learning
supervised learning v.s unsupervised learning
classification v.s clustering
supervised learning v.s unsupervised learning
classification v.s clustering
此演算法在2007年IEEE統計排名前⼗十名資料採
礦演算法之⼀一,以⽬目前來說是廣泛使⽤用、︑、⾮非
常有效⽽而且是易於掌握的演算法。︒。
此演算法在2007年IEEE統計排名前⼗十名資料採
礦演算法之⼀一,以⽬目前來說是廣泛使⽤用、︑、⾮非
常有效⽽而且是易於掌握的演算法。︒。


⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
A
⾝身⾼高
體重
B
C
D
⼤大隻
⼩小⽀支
資料
現在加⼊入了⼀一個
⾝身⾼高190 體重100 他會
是哪⼀一類?
A
⾝身⾼高
體重
B
C
D
⼤大隻
⼩小⽀支
E
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
A
⾝身⾼高
體重
B
C
D
⼤大隻
⼩小⽀支
⽤用膝蓋想都知道 是屬於⼤大隻 !
E
E
1. 先找k個最近鄰居
2. 鄰居決定分類
Euclidean distance
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
E ⾝身⾼高190 體重100
AE 和 的距離
( 190 - 180 )2 + ( 100 -90 )2
200
Euclidean distance
誰最近呢?
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
E 和⼤大夥的距離
200
450
3925
3125
E A和 最近,所以A是⼤大隻
那E也是⼤大隻
Brute Force
N samples in D dimensions
N = 4 D = 2
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
找出所有距離 N*D
找出最近的點 N
時間複雜度
O [ DN2 ]
K-D Tree
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
165
180
175160
190
K-D Tree
⾝身⾼高 體重 label
A 180 90 ⼤大隻
B 175 85 ⼤大隻
C 160 45 ⼩小⽀支
D 165 50 ⼩小⽀支
165,50
180,90
175,85
160,45
190,100
⽐比⾝身⾼高
⽐比體重
時間複雜度
O [ DN logN ]
Where KD trees partition data along
Cartesian axes, ball trees partition data
in a series of nesting hyper-spheres.
A房⼦子:200坪,⽤用10年,10元的畫
B房⼦子:50坪,⽤用10年,10萬元的畫
C房⼦子:199坪,⽤用10年,9萬元的畫
A房⼦子:200坪,⽤用10年,10元的畫
B房⼦子:50坪,⽤用10年,100000元的畫
C房⼦子:199坪,⽤用10年,90000元的畫
C房⼦子離B房⼦子⽐比較近,所以B,C的房價⼀一樣
A房⼦子:200坪,⽤用10年,10元的畫
B房⼦子:50坪,⽤用10年,10萬元的畫
C房⼦子:199坪,⽤用10年,9萬元的畫
C房⼦子離B房⼦子⽐比較近,所以B,C的房價⼀一樣
聽你在放屁,A,C的房價要差不多才對,因為坪數差不多
Euclidean distance 的缺點,不同維度會等同看待
可以解決不同維度會等同看待的問題
Mahalanobis distance
旋轉座標軸
?
我覺得雖然?
有四票,但是他離藍⾊色真的太近了
這樣很不公平
weights:距離的反⽐比
?1
10
10
10
10
score: 1/1
score: (1/10)*4 =2/5
?
You must predict the total count of bikes
rented during each hour
datetime hourly date + timestamp
season
1 = spring, 2 = summer, 3 = fall, 4 =
winter
weather
1: Clear, Few clouds, Partly cloudy,
Partly cloudy
2: Mist + Cloudy, Mist + Broken
temp temperature in Celsius
humidity relative humidity
windspeed wind speed
holiday
whether the day is considered a
holiday
workingday
whether the day is neither a
weekend nor holiday
資料分析
資料探勘&機器學習地表最強
https://github.com/wy36101299/knn-Bike-Sharing-Demand
scikit-learn
http://scikit-learn.org/stable/modules/neighbors.html
[Machine Learning] kNN分類演算法« Big O(1)
http://enginebai.logdown.com/posts/241676/knn
pandas: powerful Python data analysis toolkit
http://pandas.pydata.org/pandas-docs/stable/

Weitere ähnliche Inhalte

Was ist angesagt?

Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Hideki Tsunashima
 
はじパタ2章
はじパタ2章はじパタ2章
はじパタ2章
tetsuro ito
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
Shuyo Nakatani
 
『データ解析におけるプライバシー保護』勉強会 #2
『データ解析におけるプライバシー保護』勉強会 #2『データ解析におけるプライバシー保護』勉強会 #2
『データ解析におけるプライバシー保護』勉強会 #2
MITSUNARI Shigeo
 

Was ist angesagt? (20)

ポーカーAIの最新動向 20171031
ポーカーAIの最新動向 20171031ポーカーAIの最新動向 20171031
ポーカーAIの最新動向 20171031
 
Generative Adversarial Networks (GAN) の学習方法進展・画像生成・教師なし画像変換
Generative Adversarial Networks (GAN) の学習方法進展・画像生成・教師なし画像変換Generative Adversarial Networks (GAN) の学習方法進展・画像生成・教師なし画像変換
Generative Adversarial Networks (GAN) の学習方法進展・画像生成・教師なし画像変換
 
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
Disentanglement Survey:Can You Explain How Much Are Generative models Disenta...
 
はじパタ2章
はじパタ2章はじパタ2章
はじパタ2章
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
Off policy learning
Off policy learningOff policy learning
Off policy learning
 
Group normalization
Group normalizationGroup normalization
Group normalization
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
에어비앤비 리뷰데이터 분석을 통한 지역별 호스트 전략 제언
에어비앤비 리뷰데이터 분석을 통한 지역별 호스트 전략 제언 에어비앤비 리뷰데이터 분석을 통한 지역별 호스트 전략 제언
에어비앤비 리뷰데이터 분석을 통한 지역별 호스트 전략 제언
 
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習
 
ResNetの仕組み
ResNetの仕組みResNetの仕組み
ResNetの仕組み
 
マハラノビス距離とユークリッド距離の違い
マハラノビス距離とユークリッド距離の違いマハラノビス距離とユークリッド距離の違い
マハラノビス距離とユークリッド距離の違い
 
『データ解析におけるプライバシー保護』勉強会 #2
『データ解析におけるプライバシー保護』勉強会 #2『データ解析におけるプライバシー保護』勉強会 #2
『データ解析におけるプライバシー保護』勉強会 #2
 
lsh
lshlsh
lsh
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
 

Mehr von Tien-Yang (Aiden) Wu

Mehr von Tien-Yang (Aiden) Wu (14)

Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
 
K means
K meansK means
K means
 
RDD
RDDRDD
RDD
 
Semantic ui教學
Semantic ui教學Semantic ui教學
Semantic ui教學
 
響應式網頁教學
響應式網頁教學響應式網頁教學
響應式網頁教學
 
NoSQL & JSON
NoSQL & JSONNoSQL & JSON
NoSQL & JSON
 
Weebly上手教學
Weebly上手教學Weebly上手教學
Weebly上手教學
 
簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler簡易爬蟲製作和Pttcrawler
簡易爬蟲製作和Pttcrawler
 
Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設Python簡介和多版本虛擬環境架設
Python簡介和多版本虛擬環境架設
 

沒有想像中簡單的簡單分類器 Knn