SlideShare a Scribd company logo
1 of 43
Why does Naïve Bayesian Classification work so well amidst known conditional dependencies in the data structure? ,[object Object],[object Object],[object Object]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Naïve Bayesian Classification A form of machine learning that avoids complicated  conditional dependency models, and the requirement to define  much of the conditional dependencies in your data. Why does it work so well amidst conditional dependency? Tim Hare
Naïve Bayes (naïvely, hence the name) assumes no conditional dependence, but this simplification comes at a potential cost of misclassification  ,[object Object]
NB performance is at odds with past theory :  evidence in the primary literature that Naïve Bayes works beyond what would be anticipated given known conditional dependence in the data ,[object Object],[object Object],[object Object]
Zhang 2004:  Factoring  a general form of Bayes into two parts:  [NB] * [“something else”] ,[object Object],[object Object],[object Object],Take home message: the factorization indicates that FB=NB under certain data structures, and not in others.
Full Bayes (FB) and Naïve Bayes (NB) classification carried out on synthetic data by hand on one data vector = <1,0> When  conditional dependence is of different types  (C1: if A then A, C2: if A then B) in the two classes (upper left data grid: you may recognize this as “XOR”) NB will fail to classify correctly (and the information is “lost” due to  “cancellation” by equal probabilities  taking part in each classification estimate).  If  the conditional dependence is of the same type  (C1=C2: If A then B) in both classes (lower left data grid) NB may still classify the data correctly.  FB always classifies correctly in BOTH instances.  Posterior probability may be biased, but in fact that nets out (though analysis too complex to present here) to correct classification as well for a variety of reasons, in many cases. Loss (ratio is just 1) but no Bias Bias but no Loss
Naïve Bayes in R on the synthetic conditionally dependent data we analyzed in EXCEL for vector <1,0>, results in the same misclassification for the MIXED conditional dependence, and correct Democratic classification in the case of “even” conditional dependence.
Real data:  House of Representatives 1984 voting record on 17 congressional bills (columns) ,[object Object],[object Object],[object Object]
Use “R” for NB classification on HV84 +/- augmentation with conditional dependence via synthetic data  ,[object Object],[object Object],[object Object],[object Object]
Control analysis for synthetic augmentation experiments #1 and #2 (to follow): NB analysis HV84 real data  unmodified  by synthetic data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Augmentation with synthetic data -- experiment 1:  NB analysis on HV84 augmented by the conditionally dependent synthetic data, with the conditional dependence of the  different types (“mixed”)  in the two classes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Augmentation with synthetic data -- experiment 2:  NB analysis on HV84 augmented by the conditionally dependent synthetic data, with the conditional dependence of the  same type (“even”)  in the two classes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Matrices of classification outcomes for control (top matrix), “mixed” (middle matrix) and “even” (bottom matrix):  no adverse impact on classification Same assignment made in each experiment indicating that augmentation of real data with two types of conditional dependence does not influence classification, at least with this HV84 data set
Raw probabilities, however, show that even though assignments to class didn’t change in CONTROL, EXPT#1, and EXPT#2, differences (in this case slight) are imparted to the probability estimates, as expected.  Important to note we only added 2 attributes (columns) to 17, so the percentage of “contamination” by synthetic data is small.  Additional exploration could be done with increasing percentages of conditional dependence added in to the original HV84 data set.
Knowledge check: FB or NB? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q & A

More Related Content

What's hot

Record matching over multiple query result - Document
Record matching over multiple query result - DocumentRecord matching over multiple query result - Document
Record matching over multiple query result - DocumentNishna Ma
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesDataminingTools Inc
 
Attendance register1 explanation of how it works
Attendance register1    explanation of how it worksAttendance register1    explanation of how it works
Attendance register1 explanation of how it worksMarvelMan2
 
A random decision tree frameworkfor privacy preserving data mining
A random decision tree frameworkfor privacy preserving data miningA random decision tree frameworkfor privacy preserving data mining
A random decision tree frameworkfor privacy preserving data miningVenkat Projects
 
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONAN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONecij
 
Cis 111 Education Organization / snaptutorial.com
Cis 111  Education Organization / snaptutorial.comCis 111  Education Organization / snaptutorial.com
Cis 111 Education Organization / snaptutorial.comBaileya82
 
Cis 111 Enhance teaching / snaptutorial.com
Cis 111 Enhance teaching / snaptutorial.comCis 111 Enhance teaching / snaptutorial.com
Cis 111 Enhance teaching / snaptutorial.comDavis103
 
CIS 111 Exceptional Education / snaptutorial.com
CIS 111 Exceptional Education / snaptutorial.comCIS 111 Exceptional Education / snaptutorial.com
CIS 111 Exceptional Education / snaptutorial.comdonaldzs95
 
Cis 111 Extraordinary Success/newtonhelp.com
Cis 111 Extraordinary Success/newtonhelp.com  Cis 111 Extraordinary Success/newtonhelp.com
Cis 111 Extraordinary Success/newtonhelp.com amaranthbeg143
 
Data cleaning and screening
Data cleaning and screeningData cleaning and screening
Data cleaning and screeningHassan Hussein
 
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERYA WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERYIJDKP
 

What's hot (17)

Record matching over multiple query result - Document
Record matching over multiple query result - DocumentRecord matching over multiple query result - Document
Record matching over multiple query result - Document
 
bayes_proj
bayes_projbayes_proj
bayes_proj
 
Database design
Database designDatabase design
Database design
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And Attributes
 
Attendance register1 explanation of how it works
Attendance register1    explanation of how it worksAttendance register1    explanation of how it works
Attendance register1 explanation of how it works
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
A random decision tree frameworkfor privacy preserving data mining
A random decision tree frameworkfor privacy preserving data miningA random decision tree frameworkfor privacy preserving data mining
A random decision tree frameworkfor privacy preserving data mining
 
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONAN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
 
Cis 111 Education Organization / snaptutorial.com
Cis 111  Education Organization / snaptutorial.comCis 111  Education Organization / snaptutorial.com
Cis 111 Education Organization / snaptutorial.com
 
Cis 111 Enhance teaching / snaptutorial.com
Cis 111 Enhance teaching / snaptutorial.comCis 111 Enhance teaching / snaptutorial.com
Cis 111 Enhance teaching / snaptutorial.com
 
CIS 111 Exceptional Education / snaptutorial.com
CIS 111 Exceptional Education / snaptutorial.comCIS 111 Exceptional Education / snaptutorial.com
CIS 111 Exceptional Education / snaptutorial.com
 
Cis 111 Extraordinary Success/newtonhelp.com
Cis 111 Extraordinary Success/newtonhelp.com  Cis 111 Extraordinary Success/newtonhelp.com
Cis 111 Extraordinary Success/newtonhelp.com
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
 
Data cleaning and screening
Data cleaning and screeningData cleaning and screening
Data cleaning and screening
 
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERYA WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
A WEB REPOSITORY SYSTEM FOR DATA MINING IN DRUG DISCOVERY
 

Viewers also liked

Time series analysis of collaborative activities-CRIWG2012
Time series analysis of collaborative activities-CRIWG2012Time series analysis of collaborative activities-CRIWG2012
Time series analysis of collaborative activities-CRIWG2012Irene-Angelica Chounta
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesNTNU
 
Design Pattern Explained CH1
Design Pattern Explained CH1Design Pattern Explained CH1
Design Pattern Explained CH1Jamie (Taka) Wang
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive BayesJosh Patterson
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computingRkrishna Mishra
 

Viewers also liked (12)

Tam June 2009
Tam June 2009Tam June 2009
Tam June 2009
 
Time series analysis of collaborative activities-CRIWG2012
Time series analysis of collaborative activities-CRIWG2012Time series analysis of collaborative activities-CRIWG2012
Time series analysis of collaborative activities-CRIWG2012
 
A Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of CasesA Semi-naive Bayes Classifier with Grouping of Cases
A Semi-naive Bayes Classifier with Grouping of Cases
 
Design Pattern Explained CH1
Design Pattern Explained CH1Design Pattern Explained CH1
Design Pattern Explained CH1
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 

Similar to Naive Bayes with Conditionally Dependent Data

Name IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docxName IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docxrosemarybdodson23141
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesAbdul Haseeb
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answerssheibansari
 
Improving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationImproving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationBeat Winehouse
 
Application for Logical Expression Processing
Application for Logical Expression Processing Application for Logical Expression Processing
Application for Logical Expression Processing csandit
 
Database schema architecture.ppt
Database schema architecture.pptDatabase schema architecture.ppt
Database schema architecture.pptImXaib
 
Use the SPSS software and the data set (2004)GSS.SAV (attached).docx
Use the SPSS software and the data set (2004)GSS.SAV (attached).docxUse the SPSS software and the data set (2004)GSS.SAV (attached).docx
Use the SPSS software and the data set (2004)GSS.SAV (attached).docxgidmanmary
 
WEEK 7 – HW 7 FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docx
WEEK 7 – HW 7   FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docxWEEK 7 – HW 7   FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docx
WEEK 7 – HW 7 FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docxcockekeshia
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clusteringlau
 
iStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docx
iStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docxiStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docx
iStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docxvrickens
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
The Fuzzy Logical Databases
The Fuzzy Logical DatabasesThe Fuzzy Logical Databases
The Fuzzy Logical DatabasesAlaaZ
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptSubrata Kumer Paul
 
12.Data processing and concepts.pdf
12.Data processing and concepts.pdf12.Data processing and concepts.pdf
12.Data processing and concepts.pdfAyele40
 

Similar to Naive Bayes with Conditionally Dependent Data (20)

Name IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docxName IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docx
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short Notes
 
Sql interview questions and answers
Sql interview questions and  answersSql interview questions and  answers
Sql interview questions and answers
 
Improving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimationImproving Tree augmented Naive Bayes for class probability estimation
Improving Tree augmented Naive Bayes for class probability estimation
 
Application for Logical Expression Processing
Application for Logical Expression Processing Application for Logical Expression Processing
Application for Logical Expression Processing
 
Database schema architecture.ppt
Database schema architecture.pptDatabase schema architecture.ppt
Database schema architecture.ppt
 
Use the SPSS software and the data set (2004)GSS.SAV (attached).docx
Use the SPSS software and the data set (2004)GSS.SAV (attached).docxUse the SPSS software and the data set (2004)GSS.SAV (attached).docx
Use the SPSS software and the data set (2004)GSS.SAV (attached).docx
 
WEEK 7 – HW 7 FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docx
WEEK 7 – HW 7   FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docxWEEK 7 – HW 7   FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docx
WEEK 7 – HW 7 FALL 2017CORRELATIONREGRESSION PROBLEMS BASED O.docx
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
Saif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submittedSaif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submitted
 
iStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docx
iStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docxiStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docx
iStockphotoThinkstockchapter 8Factorial and Mixed-Fac.docx
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Blast
BlastBlast
Blast
 
The Fuzzy Logical Databases
The Fuzzy Logical DatabasesThe Fuzzy Logical Databases
The Fuzzy Logical Databases
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.ppt
 
Final Project Statr 503
Final Project Statr 503Final Project Statr 503
Final Project Statr 503
 
Pivoting approach-eav-data-dinu-2006
Pivoting approach-eav-data-dinu-2006Pivoting approach-eav-data-dinu-2006
Pivoting approach-eav-data-dinu-2006
 
12.Data processing and concepts.pdf
12.Data processing and concepts.pdf12.Data processing and concepts.pdf
12.Data processing and concepts.pdf
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Naive Bayes with Conditionally Dependent Data

  • 1.
  • 2.  
  • 3.  
  • 4.  
  • 5.  
  • 6.  
  • 7.  
  • 8.  
  • 9.  
  • 10.  
  • 11.  
  • 12.  
  • 13.  
  • 14.  
  • 15.  
  • 16.  
  • 17.  
  • 18.  
  • 19.  
  • 20.  
  • 21.  
  • 22.  
  • 23.  
  • 24.  
  • 25.  
  • 26.  
  • 27.  
  • 28. Naïve Bayesian Classification A form of machine learning that avoids complicated conditional dependency models, and the requirement to define much of the conditional dependencies in your data. Why does it work so well amidst conditional dependency? Tim Hare
  • 29.
  • 30.
  • 31.
  • 32. Full Bayes (FB) and Naïve Bayes (NB) classification carried out on synthetic data by hand on one data vector = <1,0> When conditional dependence is of different types (C1: if A then A, C2: if A then B) in the two classes (upper left data grid: you may recognize this as “XOR”) NB will fail to classify correctly (and the information is “lost” due to “cancellation” by equal probabilities taking part in each classification estimate). If the conditional dependence is of the same type (C1=C2: If A then B) in both classes (lower left data grid) NB may still classify the data correctly. FB always classifies correctly in BOTH instances. Posterior probability may be biased, but in fact that nets out (though analysis too complex to present here) to correct classification as well for a variety of reasons, in many cases. Loss (ratio is just 1) but no Bias Bias but no Loss
  • 33. Naïve Bayes in R on the synthetic conditionally dependent data we analyzed in EXCEL for vector <1,0>, results in the same misclassification for the MIXED conditional dependence, and correct Democratic classification in the case of “even” conditional dependence.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Matrices of classification outcomes for control (top matrix), “mixed” (middle matrix) and “even” (bottom matrix): no adverse impact on classification Same assignment made in each experiment indicating that augmentation of real data with two types of conditional dependence does not influence classification, at least with this HV84 data set
  • 40. Raw probabilities, however, show that even though assignments to class didn’t change in CONTROL, EXPT#1, and EXPT#2, differences (in this case slight) are imparted to the probability estimates, as expected. Important to note we only added 2 attributes (columns) to 17, so the percentage of “contamination” by synthetic data is small. Additional exploration could be done with increasing percentages of conditional dependence added in to the original HV84 data set.
  • 41.
  • 42.
  • 43. Q & A