Outlier detection handling

•Als PPTX, PDF herunterladen•

1 gefällt mir•809 views

zekeLabs Technologies

Technologie

Agenda
● Introduction
● Novelty detection
● Statistical Methods
● OneClassSVM
● Outlier Detection
● GMM
● Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● DBSCAN
● Handling Outlier Data

Introduction
● Many applications require being able to decide whether a new observation
belongs to the same distribution as existing observations (it is an inlier), or
should be considered as different (it is an outlier).
● Often, this ability is used to clean real data sets.
● Inliers are labeled 1, while outliers are labeled -1.

Novelty Detection
● Consider a data set of n observations from the same distribution
described by p features.
● Consider now that we add one more observation to that data set.
● Is the new observation so different from the others that we can doubt it is
regular?
● It is about to learn a rough, close frontier delimiting the contour of the
initial observations distribution, plotted in embedding p-dimensional
space.

Statistical Methods
● Z-score
● Plotting

Novelty Detection using OneClassSVM
● Training data is not polluted.
● One-class SVM is an unsupervised
algorithm that learns a decision
function for novelty detection:
classifying new data as similar or
different to the training set.

Outlier Detection
● Separate regular observation from the polluting ones.
● Three ways of doing outlier detection
Elliptic Envelope IsolationForest Local Outlier Factor

Elliptical Envelop
● One common way of performing outlier
detection is to assume that the regular
data come from a known distribution
(e.g. data are Gaussian distributed).
● It tries to define the “shape” of the data,
and can define outlying observations as
observations which stand far enough
from the fit shape.

Isolation Forest
● It’s an efficient way of performing
outlier detection in high-dimensional
datasets is to use random forests.
● Built on the basis of decision trees
● Outliers lie further away from regular
observation.
● Random partitioning produces
noticeably shorter paths for
anomalies.

Local Outlier Factor
● It measures the local density
deviation of a given data point with
respect to its neighbors.
● The idea is to detect the samples
that have a substantially lower
density than their neighbors.

Handling Outliers
● Manual Analysis
● Dropping them
● Generating alerts
● Creating new feature marking outliers

Clustering Method - DBSCAN
● A density based clustering method
● N is an outlier point that lies in no
cluster and it is not ‘density
reachable’ nor ‘density connected’
to any other point. Thus this point
will have “his own cluster”.

Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Weitere ähnliche Inhalte

Was ist angesagt?

K means Clustering AlgorithmKasun Ranga Wijeweera

Data discretizationHadi M.Abachi

Introduction to unsupervised learning: outlier detectionJoseph Itopa Abubakar

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony

Decision treeSoujanya V

Logistic regression in Machine LearningKuppusamy P

Machine learning clusteringCosmoAIMS Bassett

Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St

Data Mining: Outlier analysisDataminingTools Inc

Decision TreesStudent

Outlier DetectionDr. Abdul Ahad Abro

Data warehouse and olap technologyDataminingTools Inc

Understanding random forestsMarc Garcia

Supervised and unsupervised learningParas Kohli

Machine Learning ClusteringRupak Roy

05 Clustering in Data MiningValerii Klymchuk

Clustering in data Mining (Data Mining)Mustafa Sherazi

Module 4: Model Selection and EvaluationSara Hooker

Fuzzy Clustering(C-means, K-means)Fellowship at Vodafone FutureLab

Supervised and unsupervised learningAmAn Singh

Was ist angesagt? (20)

K means Clustering Algorithm

Data discretization

Introduction to unsupervised learning: outlier detection

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...

Decision tree

Logistic regression in Machine Learning

Machine learning clustering

Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8

Data Mining: Outlier analysis

Decision Trees

Outlier Detection

Data warehouse and olap technology

Understanding random forests

Supervised and unsupervised learning

Machine Learning Clustering

05 Clustering in Data Mining

Clustering in data Mining (Data Mining)

Module 4: Model Selection and Evaluation

Fuzzy Clustering(C-means, K-means)

Supervised and unsupervised learning

Ähnlich wie Outlier detection handling

Data cleaning-outlier-detectionChathurangi Shyalika

Nearest neighborszekeLabs Technologies

Smartphone Activity PredictionTriskelion_Kaggle

Deep Semi-Supervised Anomaly DetectionManmeet Singh

GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA

UNIT_V_Cluster Analysis.pptxsandeepsandy494692

DM_clustering.pptnandhini manoharan

DBSCAN : A Clustering AlgorithmPınar Yahşi

[ML]-Unsupervised-learning_Unit2.ppt.pdf4NM20IS025BHUSHANNAY

Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor

Deep Semi-Supervised Anomaly DetectionManmeet Singh

Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P

Paper reviewJunya Tanaka

Cluster Analysis.pptxRvishnupriya2

Data wrangling week 10Ferdin Joe John Joseph PhD

DS9 - Clustering.pptxJK970901

Declarative data analysisSouth West Data Meetup

Kdd08 abodKruthikka Palraj

angle based outlier deKruthikka Palraj

Multiple Linear Regression Models in Outlier Detection IJORCS

Ähnlich wie Outlier detection handling (20)

Data cleaning-outlier-detection

Nearest neighbors

Smartphone Activity Prediction

Deep Semi-Supervised Anomaly Detection

GDG Cloud Community Day 2022 - Managing data quality in Machine Learning

UNIT_V_Cluster Analysis.pptx

DM_clustering.ppt

DBSCAN : A Clustering Algorithm

[ML]-Unsupervised-learning_Unit2.ppt.pdf

Outlier Detection Using Unsupervised Learning on High Dimensional Data

Deep Semi-Supervised Anomaly Detection

Anomaly detection (Unsupervised Learning) in Machine Learning

Paper review

Cluster Analysis.pptx

Data wrangling week 10

DS9 - Clustering.pptx

Declarative data analysis

Kdd08 abod

angle based outlier de

Multiple Linear Regression Models in Outlier Detection

Mehr von zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...zekeLabs Technologies

Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabszekeLabs Technologies

[Webinar] Following the Agile Footprint - zekeLabszekeLabs Technologies

Machine learning at scale - Webinar By zekeLabszekeLabs Technologies

A curtain-raiser to the container world Docker & KuberneteszekeLabs Technologies

Docker - A curtain raiser to the Container worldzekeLabs Technologies

Serverless and cloud computingzekeLabs Technologies

SQLzekeLabs Technologies

02 terraform core conceptszekeLabs Technologies

08 Terraform: ProvisionerszekeLabs Technologies

Naive bayeszekeLabs Technologies

Master guide to become a data scientist zekeLabs Technologies

Linear regressionzekeLabs Technologies

Linear models of classificationzekeLabs Technologies

Grid search, pipeline, featureunionzekeLabs Technologies

Feature selectionzekeLabs Technologies

Essential NumPyzekeLabs Technologies

Ensemble methods zekeLabs Technologies

Dimentionality reductionzekeLabs Technologies

Data PreprocessingzekeLabs Technologies

Mehr von zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...

Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs

[Webinar] Following the Agile Footprint - zekeLabs

Machine learning at scale - Webinar By zekeLabs

A curtain-raiser to the container world Docker & Kubernetes

Docker - A curtain raiser to the Container world

Serverless and cloud computing

SQL

02 terraform core concepts

08 Terraform: Provisioners

Naive bayes

Master guide to become a data scientist

Linear regression

Linear models of classification

Grid search, pipeline, featureunion

Feature selection

Essential NumPy

Ensemble methods

Dimentionality reduction

Data Preprocessing

Kürzlich hochgeladen

Slack Application Development 101 Slidespraypatel2

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Developing An App To Navigate The Roads of BrazilV3cube

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Injustice - Developers Among Us (SciFiDevCon 2024)

08448380779 Call Girls In Friends Colony Women Seeking Men

Developing An App To Navigate The Roads of Brazil

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Boost PC performance: How more available memory can improve productivity

GenCyber Cyber Security Day Presentation

Finology Group – Insurtech Innovation Award 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Driving Behavioral Change for Information Management through Data-Driven Gree...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

How to Troubleshoot Apps for the Modern Connected Worker

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Unblocking The Main Thread Solving ANRs and Frozen Frames

Outlier detection handling

1. zekeLabs Outlier Detection & Handling Learning made Simpler ! www.zekeLabs.com

2. Agenda ● Introduction ● Novelty detection ● Statistical Methods ● OneClassSVM ● Outlier Detection ● GMM ● Elliptical Envelop ● Isolation Forest ● Local Outlier Factor ● DBSCAN ● Handling Outlier Data

3. Introduction ● Many applications require being able to decide whether a new observation belongs to the same distribution as existing observations (it is an inlier), or should be considered as different (it is an outlier). ● Often, this ability is used to clean real data sets. ● Inliers are labeled 1, while outliers are labeled -1.

4. Novelty Detection ● Consider a data set of n observations from the same distribution described by p features. ● Consider now that we add one more observation to that data set. ● Is the new observation so different from the others that we can doubt it is regular? ● It is about to learn a rough, close frontier delimiting the contour of the initial observations distribution, plotted in embedding p-dimensional space.

5. Statistical Methods ● Z-score ● Plotting

6. Novelty Detection using OneClassSVM ● Training data is not polluted. ● One-class SVM is an unsupervised algorithm that learns a decision function for novelty detection: classifying new data as similar or different to the training set.

7. Outlier Detection ● Separate regular observation from the polluting ones. ● Three ways of doing outlier detection Elliptic Envelope IsolationForest Local Outlier Factor

8. Elliptical Envelop ● One common way of performing outlier detection is to assume that the regular data come from a known distribution (e.g. data are Gaussian distributed). ● It tries to define the “shape” of the data, and can define outlying observations as observations which stand far enough from the fit shape.

9. Isolation Forest ● It’s an efficient way of performing outlier detection in high-dimensional datasets is to use random forests. ● Built on the basis of decision trees ● Outliers lie further away from regular observation. ● Random partitioning produces noticeably shorter paths for anomalies.

10. Local Outlier Factor ● It measures the local density deviation of a given data point with respect to its neighbors. ● The idea is to detect the samples that have a substantially lower density than their neighbors.

11. Handling Outliers ● Manual Analysis ● Dropping them ● Generating alerts ● Creating new feature marking outliers

12. Clustering Method - DBSCAN ● A density based clustering method ● N is an outlier point that lies in no cluster and it is not ‘density reachable’ nor ‘density connected’ to any other point. Thus this point will have “his own cluster”.

13. Thank You !!!

14. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Outlier detection handling

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Outlier detection handling

Ähnlich wie Outlier detection handling (20)

Mehr von zekeLabs Technologies

Mehr von zekeLabs Technologies (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Outlier detection handling