SlideShare ist ein Scribd-Unternehmen logo
1 von 19
`
Traffic Classification based on Machine Learning
using Flow-level Information
Jong Gun Lee (jglee@an.kaist.ac.kr)
Advanced Networking Lab.
`
Table of Contents
• Motivation of this work
• Background about machine learning
• Our approach using machine learning
• Experiment (dataset and result)
• Conclusion
`
Motivation
• We cannot effectively classify the traffic of some new
emergent applications,
– such as online games and streaming applications
– because there is no application information, such as port
number or a common byte sequence in payload
We propose a methodology to classify Internet traffic
with supervised and unsupervised learning
`
Basic Terminologies of Machine Learning
• Classifier
is mapping unlabeled instances into classes
• Instance
is a single object of the world
• Attribute
is a single object of the world
• Feature
is the specification of an attribute and its value
• Feature vector
is a list of features describing an instance
`
Unsupervised and Supervised Learning
• Supervised learning (with answer/teacher)
– With a training set, a classifier learns the characteristics of each
class. And when entering new instance, the classifier predicts
the class of the instance.
• Unsupervised learning (without answer/teacher)
– With only a set of data (feature vectors), a classifier make a set
of clusters.
`
K-Means
• One of the unsupervised learning methods
• K value is the number of clusters and this value is given as
the initial parameter
• Procedure
– First, the classifier randomly chooses K points as the centers of
K subspaces
– Second, it divides the overall vector space into K subspaces
according to the centers
– Third, it picks new K centers for each subspaces
– And then, it iterates 2nd
and 3rd
steps until all of the centers are
not changed or moved within the threshold value
`
Example of K-Means
• # of instance: 8, K=2
`
Overall Process of Our Method
Unsupervised
Learning
Feature
Extraction
Supervised
Learning
N packets N feature
vectors
Classifier
K Clusters
Classification
Method
`
Flow-level Feature Information
• Protocol number: 6(TCP) or 17(UDP)
• Duration: seconds
• Number of packets per second (PPS)
• Mean of size of all packets
• Mean of size of non-ACK packets
• Rate of ACK packets
• Interaction Information
`
Feature Extraction (Interaction Information)
• Interaction Information
– H: 2-dimensional histogram, 16x16
– p1, p2, p3, …, pn
• a sequence of packets size of a flow and its partner flow
according to timestamp
For i = 1 : n-1
H[pi/100][pi+1/100]++
A sequence of packets’ size: 40, 80, 1500, …, 40, 1500
Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500]
Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100]
[0, 0], [0, 15], …, [0, 15]
`
Guideline
Unsupervised
Learning
Supervised
Learning
Feature
Extraction
Packets N feature
vectors
K clusters
yes
no
Classifier
Rx and Tx
Rx only
Tx only
#bins, bin size
Dynamic/static
Initial ??
packets
Effetive K
estimation
Efficient
theshold
What kind of
learning methodFeature
extraction
Unknown
TRaffic
`
Dataset
• 6412 bittorrent.arff
• 4913 clubbox.arff
• 101355 edonkey.arff
• 21060 fileguri.arff
• 635 ftp.arff
• 200274 http.arff
• 3611 https.arff
• 22 melon.arff
• 4986 msnp.arff
• 1565 nateon.arff
• 169 nntp.arff
• 63 pop3.arff
• 224 sayclub.arff
• 40556 smtp.arff
• 67 ssh.arff
• 385912 total
• 1500 bittorrent.arff
• 1500 clubbox.arff
• 1500 edonkey.arff
• 1500 fileguri.arff
• 0 ftp.arff
• 1500 http.arff
• 1500 https.arff
• 0 melon.arff
• 1500 msnp.arff
• 1500 nateon.arff
• 0 nntp.arff
• 0 pop3.arff
• 0 sayclub.arff
• 1500 smtp.arff
• 0 ssh.arff
• 13500 total
`
`
`
Sum of Squared Error (SSE)
• How to get SSE
• #bins: 8*8
• #clusters: 1~20
`
Fitting of SSE
Y=1.446e004 * X^(-1.194) + 755.8
`
Estimation of SSE
`
Decrease Rate of SSE
0.1% decrease
`
To do list
• Direction
– Rx and Tx, Rx only, and Tx only
• Dynamic bin size
• Initial N packets or all the packets
• Different (un)supervised learning method
• Different feature extraction method

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
Optimization and particle swarm optimization (O & PSO)
Optimization and particle swarm optimization (O & PSO) Optimization and particle swarm optimization (O & PSO)
Optimization and particle swarm optimization (O & PSO)
 
Transport layer (computer networks)
Transport layer (computer networks)Transport layer (computer networks)
Transport layer (computer networks)
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Uncertainty
UncertaintyUncertainty
Uncertainty
 
WATER QUALITY PREDICTION
WATER QUALITY PREDICTIONWATER QUALITY PREDICTION
WATER QUALITY PREDICTION
 
Ai and machine learning help detect, predict and prevent fraud - IBM Watson ...
Ai and machine learning help detect, predict and prevent fraud -  IBM Watson ...Ai and machine learning help detect, predict and prevent fraud -  IBM Watson ...
Ai and machine learning help detect, predict and prevent fraud - IBM Watson ...
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
4 informed-search
4 informed-search4 informed-search
4 informed-search
 
Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Transport layer udp and tcp network
Transport layer udp and tcp networkTransport layer udp and tcp network
Transport layer udp and tcp network
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
Alanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud alqoufi inductive learning
Alanoud alqoufi inductive learning
 
x.509-Directory Authentication Service
x.509-Directory Authentication Servicex.509-Directory Authentication Service
x.509-Directory Authentication Service
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
Cnn
CnnCnn
Cnn
 
1.1 What are Agent and Environment.pptx
1.1 What are Agent and Environment.pptx1.1 What are Agent and Environment.pptx
1.1 What are Agent and Environment.pptx
 
Text summarization
Text summarization Text summarization
Text summarization
 
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
Attention-based Models (DLAI D8L 2017 UPC Deep Learning for Artificial Intell...
 

Ähnlich wie ` Traffic Classification based on Machine Learning

malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
NaveenAd4
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
MithunPChandra
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
Kumar
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
AntareepMajumder
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Kundjanasith Thonglek
 

Ähnlich wie ` Traffic Classification based on Machine Learning (20)

Iiwas19 yamazaki slide
Iiwas19 yamazaki slideIiwas19 yamazaki slide
Iiwas19 yamazaki slide
 
malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
 
Performance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlaysPerformance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlays
 
2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter
 
Unit i
Unit iUnit i
Unit i
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String Kernels
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
Learning with classification and clustering, neural networks
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networks
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

` Traffic Classification based on Machine Learning

  • 1. ` Traffic Classification based on Machine Learning using Flow-level Information Jong Gun Lee (jglee@an.kaist.ac.kr) Advanced Networking Lab.
  • 2. ` Table of Contents • Motivation of this work • Background about machine learning • Our approach using machine learning • Experiment (dataset and result) • Conclusion
  • 3. ` Motivation • We cannot effectively classify the traffic of some new emergent applications, – such as online games and streaming applications – because there is no application information, such as port number or a common byte sequence in payload We propose a methodology to classify Internet traffic with supervised and unsupervised learning
  • 4. ` Basic Terminologies of Machine Learning • Classifier is mapping unlabeled instances into classes • Instance is a single object of the world • Attribute is a single object of the world • Feature is the specification of an attribute and its value • Feature vector is a list of features describing an instance
  • 5. ` Unsupervised and Supervised Learning • Supervised learning (with answer/teacher) – With a training set, a classifier learns the characteristics of each class. And when entering new instance, the classifier predicts the class of the instance. • Unsupervised learning (without answer/teacher) – With only a set of data (feature vectors), a classifier make a set of clusters.
  • 6. ` K-Means • One of the unsupervised learning methods • K value is the number of clusters and this value is given as the initial parameter • Procedure – First, the classifier randomly chooses K points as the centers of K subspaces – Second, it divides the overall vector space into K subspaces according to the centers – Third, it picks new K centers for each subspaces – And then, it iterates 2nd and 3rd steps until all of the centers are not changed or moved within the threshold value
  • 7. ` Example of K-Means • # of instance: 8, K=2
  • 8. ` Overall Process of Our Method Unsupervised Learning Feature Extraction Supervised Learning N packets N feature vectors Classifier K Clusters Classification Method
  • 9. ` Flow-level Feature Information • Protocol number: 6(TCP) or 17(UDP) • Duration: seconds • Number of packets per second (PPS) • Mean of size of all packets • Mean of size of non-ACK packets • Rate of ACK packets • Interaction Information
  • 10. ` Feature Extraction (Interaction Information) • Interaction Information – H: 2-dimensional histogram, 16x16 – p1, p2, p3, …, pn • a sequence of packets size of a flow and its partner flow according to timestamp For i = 1 : n-1 H[pi/100][pi+1/100]++ A sequence of packets’ size: 40, 80, 1500, …, 40, 1500 Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500] Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100] [0, 0], [0, 15], …, [0, 15]
  • 11. ` Guideline Unsupervised Learning Supervised Learning Feature Extraction Packets N feature vectors K clusters yes no Classifier Rx and Tx Rx only Tx only #bins, bin size Dynamic/static Initial ?? packets Effetive K estimation Efficient theshold What kind of learning methodFeature extraction Unknown TRaffic
  • 12. ` Dataset • 6412 bittorrent.arff • 4913 clubbox.arff • 101355 edonkey.arff • 21060 fileguri.arff • 635 ftp.arff • 200274 http.arff • 3611 https.arff • 22 melon.arff • 4986 msnp.arff • 1565 nateon.arff • 169 nntp.arff • 63 pop3.arff • 224 sayclub.arff • 40556 smtp.arff • 67 ssh.arff • 385912 total • 1500 bittorrent.arff • 1500 clubbox.arff • 1500 edonkey.arff • 1500 fileguri.arff • 0 ftp.arff • 1500 http.arff • 1500 https.arff • 0 melon.arff • 1500 msnp.arff • 1500 nateon.arff • 0 nntp.arff • 0 pop3.arff • 0 sayclub.arff • 1500 smtp.arff • 0 ssh.arff • 13500 total
  • 13. `
  • 14. `
  • 15. ` Sum of Squared Error (SSE) • How to get SSE • #bins: 8*8 • #clusters: 1~20
  • 16. ` Fitting of SSE Y=1.446e004 * X^(-1.194) + 755.8
  • 18. ` Decrease Rate of SSE 0.1% decrease
  • 19. ` To do list • Direction – Rx and Tx, Rx only, and Tx only • Dynamic bin size • Initial N packets or all the packets • Different (un)supervised learning method • Different feature extraction method