This document discusses and compares three common machine learning classification tools: Naive Bayesian, logistic regression, and artificial neural networks. It analyzes their performance on a diabetes dataset containing 768 instances with 8 attributes. The artificial neural network achieved the highest accuracy at 79.7%, followed by logistic regression at 78.3%, and then Naive Bayesian at 76.3%. In conclusion, for complex datasets with many variables, artificial neural networks may perform better than traditional models like Naive Bayesian and logistic regression due to their ability to learn complex patterns from large amounts of data.
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Classification ANN
1. Classification Tools &
Artificial Neural Network
Vinaytosh Mishra
B-Tech (ECE) ,IIT(BHU)
MBA,IMNU,Ahmedabad
PG Diploma in Statistics & Computing ,
Institute of Science ,BHU
Specialization in Digital Marketing ,
University of Illinois ,Urbana Champaign ,USA
2. Agenda
Introduction of Machine Learning
Type of Machine Learning
Type of Classification Tools
Naïve Bayesian
Logistic Regression
Artificial Neural Networks
Comparison of three networks
Results
Conclusions
3. A Few Quotes
“A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
“Machine learning is the next Internet”
(Tony Tether, Director, DARPA)
Machine learning is the hot new thing”
(John Hennessy, President, Stanford)
“Web rankings today are mostly a matter of
machine learning” (Prabhakar Raghavan, Dir.
Research, Yahoo)
“Machine learning is going to result in a real
revolution” (Greg Papadopoulos, CTO, Sun)
4. So What Is Machine Learning?
Automating automation
Getting computers to program
themselves
Writing software is the bottleneck
Let the data do the work instead!
6. Types of Learning
Supervised (inductive) learning
Training data includes desired outputs
Unsupervised learning
Training data does not include desired outputs
Semi-supervised learning
Training data includes a few desired outputs
Reinforcement learning
Rewards from sequence of actions
7. Inductive Learning
Given examples of a function (X, F(X))
Predict function F(X) for new examples X
Discrete F(X): Classification
Continuous F(X): Regression
F(X) = Probability(X): Probability estimation
9. Naïve Bayesian
The Naive Bayesian classifier is based on
Bayes theorem with independence
assumptions between predictors.
A Naive Bayesian model is easy to build,
with no complicated iterative parameter
estimation which makes it particularly
useful for very large datasets.
Despite its simplicity, the Naive Bayesian
classifier often does surprisingly well and is
widely used because it often outperforms
more sophisticated classification methods.
10. How it works?
If this value is greater than certain
probability value the combination will
be selected in that class.
11. Logistic Regression
In supervised learning logistic is a regression model
where the dependent variable (DV) is categorical.
Logistic regression is used widely in many fields,
including the medical and social sciences
Many risk prediction models based on Logistic
Regression, have been developed to predict whether a
patient has a given disease like diabetes, coronary
heart disease, based on observed characteristics of the
patient like age, sex, body mass index, results of
various blood tests and anthropometric tests.
17. Neural Networks: Training
Presenting the network with sample data and
modifying the weights to better approximate
the desired function.
Supervised Learning
Supply network with inputs and desired outputs
Initially, the weights are randomly set
Weights modified to reduce difference between
actual and desired outputs
Back propagation
19. Comparison of three methods..
Dataset Description No of Attributes No of Instances
Pima Indians Diabetes Database of National Institute
of Diabetes and Digestive and Kidney Diseases
8 768
S/
N
Attribute Description
1 Number of times pregnant NPG
2 Plasma glucose concentration PGL
3 Diastolic blood pressure (mm Hg) DIA
4 Triceps skin fold thickness (mm) TSF
5 2-Hour serum insulin INS
6 Body mass index (kg/m2) BMI
7 Diabetes pedigree function DPF
8 Age (years) AGE
9 Class CLASS
Name of Method Bayesian Naïve Logistic Regression ANN (8-6-2)
Accuracy of Prediction 76.3% 78.3% 79.7%
20. Result & Conclusion
As the results suggest the Artificial Neural Network based
prediction model are better than traditional models like
Bayesian Naïve and Logistic Regression. We were not able
to witness a great difference in reported accuracy, among
the models discussed. But with increasing number of
variables, we may observe ANN as distant winner
The advancement in data base management technologies
has enabled us to practice evidence based medicine. The
technologies like cloud computing and Hadoop has made it
easy to manage and share the data. The advance
classification tools are more accurate and can be applied
on larger database to classify the disease more
accurately.
21. References
Ramachandran, Socio-Economic Burden of Diabetes in India, JULY 2007 VOL. 55
Anil Bhansali, Cost of Diabetes Care : Prevent Diabetes or Face Catastrophe, JAPI • FEBRUARY 2013
,VOL. 61
International Diabetes Federation. IDF Diabetes Atlas, 5th edn. Brussels: International Diabetes
Federation, 2011.
Li G, Zhang P, Wang J, et al. The long-term effect of lifestyle interventions to prevent diabetes in the
China Da Qing Diabetes Prevention Study: a 20-year follow-up study. Lancet 2008; 371: 1783– 89.
Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2 diabetes:
systematic review. BMJ 2011; 343: d7163
Buijsse B, Simmons RK, Griffin SJ, Schulze MB. Risk assessment tools for identifying individuals at risk of
developing type 2 diabetes. Epidemiology Rev 2011; 33: 46–62.
Lindstrom J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk.
Diabetes Care 2003; 26: 725–31.
Collins G S, Mallett S, Omar O, Yu LM. Developing risk prediction models for type 2 diabetes: a
systematic review of methodology and reporting. BMC Med 2011; 9: 103.
Diabetes Care January 2002 vol. 25 no. suppl 1 s21-s24
David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press. p. 128.
Boyd, C. R.; Tolson, M. A.; Copes, W. S. (1987). "Evaluating trauma care: The TRISS method. Trauma
Score and the Injury Severity Score". The Journal of trauma 27 (4): 370–378
Truett, J; Cornfield, J; Kannel, W (1967). "A multivariate analysis of the risk of coronary heart disease in
Framingham". Journal of chronic diseases 20 (7): 511–24.
David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press. p. 128.
I. Rish, An empirical study of the naive Bayes classifier, http://www.cc.gatech.edu
Model Extremely Complex Functions, Neural Networks, Model Extremely Complex Functions, Neural
Networks (2015)
Patterson, D. W., Artificial Neural Networks: theory and Application, Prentice Hall, pp141-243, 1996