SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
Introduction to Machine Learning
April 30, 2018
Andres Mendez-Vazquez,
Email: amendez@gdl.cinvestav.mx
1 Introduction
The first tool to attack data pattern related problems is Machine Learning, a Computer Science discipline concerned
with the design of algorithms that allows computers to evolve behaviors based on empirical data. These algorithms
can be organized in the following hierarchy: Supervised Learning, Unsupervised Learning and Semi-supervised
Learning. Thus, Machine Learning is more than anything an interdisciplinary sub-field dealing with the discover
of patterns in large data sets involving methods from Artificial Intelligence, Machine Learning, Statistics, and
Database Systems. Nowadays, this intersection of subfields has flourished in what many people call Data Sciences,
making Machine Learning one of its corner stones. However, as a precautionary tale, a person who wants to be a
Data Scientists must be proficient in:
1. Analysis of Algorithms,
2. Data Structures,
3. Probability and Statistics,
4. Linear Algebra,
5. Linear and Non-Linear Optimization,
6. Software Engineering,
7. Machine Learning,
8. Data Mining,
9. Languages for prototyping, Python or R,
10. Parallel Programming.
Making the journey into Data Science an arduous one that cannot be taken lightly.
1
Syllabus Machine Learning for Data Mining
2 Course Objectives
This is a theoretical and practical 65 hours course that introduces the students to concepts of machine learning
for processing and analyzing data from different sources. The emphasis is on various Machine Learning problems,
solutions and their use for Data Analysis. Students will develop an understanding of the Machine Learning process
and issues, learn techniques for Machine Learning, and apply them in solving Machine Learning problems.
II Prerequisites
Linear Algebra, Probability, Convex Optimization, Artificial Intelligence and Analysis of Algorithms.
Note: A piece of advise, this is barely the beginning of the amount of math that you should be able to handle in
order to be successful in this area.
III Class Grading
The grades in the class will be graded in the following way
Task Percentage
Midterm I 10%
Midterm II 10%
Final 10%
8 Homework 40%
Project 30%
IV Class Structure
We have decided to build the following structure for the class:
• Theoretical class.
– Here, a discussion of the basic theory on the Machine Learning will be done. Thus, the student can
understand the deepness of the problems and possible solutions in Machine Learning.
• Lab
– Here, the student will use NumPy and SciPy to implement the basic algorithms. This will allow the
student to understand the critical junction of such algorithms.
V Projects
The project will be in an specific problem that each student wants to work on. Possible topic are:
• Oil exploration,
• Association Rule Systems,
Cinvestav GDL 2
Syllabus Machine Learning for Data Mining
• Social Network Analysis,
• Page Ranking,
• Web Word Relevance Measures,
• Recommendation Systems,
• Bio-Signal Processing,
• Large-Scale Object Recognition,
• Natural Language Processing,
• Something you are really interested on. Please come and talk to me.
There are more possibilities at:
• https://www.kaggle.com/competitions
• http://aws.amazon.com/datasets/
VI About the Reading Material
Because of the scope of machine learning, different subjects will be obtained from different text books and papers.
Therefore:
1. The recommended books are at the end in the bibliography.
2. In addition, several articles will be used as we progress through the class.
Cinvestav GDL 3
Syllabus Machine Learning for Data Mining
VII Course Topics
I.1 Introduction [3, 17, 9, 13]
1. What is a Classifier?
2. Two approaches to Prediction
3. A little bit of Statistical Decision Theory
4. Classes of Estimators
I.2 Supervised Learning
I.2.1 Linear Classifiers[3, 17, 9]
1. Introduction to Linear Regression
2. Mean-Square Error Linear Estimation
(a) Canonical form
(b) Gradient Descent
3. Regularization
4. Logistic Regression
5. Fischer Linear Discriminant
6. Subset Selection
7. Shrinkage Methods
(a) Ridge Regression
(b) The Lasso
(c) Subset Selection
I.2.2 Stochastic Gradient Descent [6, 17, 4]
1. Introduction
2. The Steepest Descent Method
3. Application to the Mean-Square Error Cost Function
4. Stochastic Approximation
(a) Iterative Method
5. The Least-Mean-Squares Adaptive Algorithm
6. The Momentum Method
(a) ADAM - Adaptive Moment Estimation
Cinvestav GDL 4
Syllabus Machine Learning for Data Mining
I.2.3 Probability Classifiers [3, 12, 2, 17, 8, 13]
1. Discriminant Functions
2. Naive Bayes
3. Maximum Likelihood
4. Expectation Maximization and Mixture of Gaussian’s
5. Maximum a Posterior Probability Estimation
6. Generative Models vs Discriminative Models
I.2.4 Kernel Based Classifiers [10, 17]
1. Introduction
2. Support Vector Machines
3. Learning in Reproducing Kernel Hilbert Spaces
I.2.5 Random Trees [17, 9]
1. Decision Trees
2. Random Forest
(a) Tree Bagging
3. Variable Importance
4. Variants
I.2.6 Hidden Markov Models [14]
1. Introduction
2. Ford-ward Algorithm
3. The Viterbi Algorithm
4. Baum-Welch Algorithm
I.2.7 Neural Networks [10]
1. Perceptron
2. Multilayer Perceptron
3. Universal Approximation Theorem
Cinvestav GDL 5
Syllabus Machine Learning for Data Mining
I.2.8 Important Issues [3, 9, 13, 17]
1. Bias-Variance Dilemma
2. The Confusion Matrix
3. K-Cross Validation
I.2.9 Data Preparation [3, 7, 17]
1. Feature selection
(a) Preprocessing
(b) Statistical Methods
(c) Class Separability
(d) Feature Subset Selection.
2. Feature Generation
(a) Introduction
(b) Fisher Linear Discriminant
(c) Dimensionality Reduction
i. Principal Component Analysis
ii. The Singular Value Decomposition
I.2.10 Combining Classifiers [3, 16, 18]
1. Average Rules
2. Majority Voting Rule
3. A Bayesian viewpoint
4. AdaBoosting
5. Multi-Class AdaBoost
I.3 Unsupervised Learning
I.3.1 Clustering, the Classic Path[3, 17, 9, 13]
1. Introduction
2. Proximity Measures.
3. K-Means
(a) Vector Quantization
4. K-Centers
Cinvestav GDL 6
Syllabus Machine Learning for Data Mining
5. Mixture of Gaussian’s
6. K-Meoids
7. Hierarchical Clustering
8. Principal Components, Curves and Surface
(a) Spectral Clustering
9. Self-Organization Maps
10. Independent Component Analysis
(a) Latent Variables and Factor Analysis
(b) Independent Component Analysis
I.3.2 Association rules [9, 15]
1. The Market Problem
2. The Apriori Algorithm
3. Unsupervised as Supervised Learning
4. Improvements
5. Near Neighbor Search in High Dimensional Data
(a) Locality Sensitive Hashing
(b) Near Neighbor Search
I.3.3 Structure of the Web [9, 15, 11]
1. Page Rank Algorithm
2. Hyper link-Induced Topic Search (Hubs and Authorities)
3. Search of Communities
I.4 Semi-Supervised Learning [19, 5, 1]
1. Introduction to Semi-Supervised Learning
2. Using Expectation Maximization
(a) Cluster then Label
3. Graph-Based Semi-Supervised Learning
4. Semi-supervised Support Vector Machines
Cinvestav GDL 7
Syllabus Machine Learning for Data Mining
References
[1] Amparo Albalate and Wolfgang Minker. Semi-Supervised and Unsupervised Machine Learning: Novel Strate-
gies. John Wiley & Sons, 2013.
[2] Alessio Benavoli, Giorgio Corani, Janez Demsar, and Marco Zaffalon. Time for a change: a tutorial for
comparing multiple classifiers through bayesian analysis. arXiv preprint arXiv:1606.04316, 2016.
[3] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics).
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[4] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMP-
STAT’2010, pages 177–186. Springer, 2010.
[5] Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. Semi-Supervised Learning. The MIT Press, 1st
edition, 2010.
[6] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic
optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.
[7] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2Nd Edition). Wiley-Interscience,
2000.
[8] Mário A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell.,
25(9):1150–1159, September 2003.
[9] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Second Edition. Springer Series in Statistics. Springer New York, 2009.
[10] Simon Haykin. Neural Networks and Learning Machines. Prentice-Hall, Inc., Upper Saddle River, NJ, USA,
2008.
[11] Amy N. Langville and Carl D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings.
Princeton University Press, Princeton, NJ, USA, 2006.
[12] Geoffrey McLachlan. Discriminant analysis and statistical pattern recognition, volume 544. John Wiley &
Sons, 2004.
[13] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
[14] Lawrence R. Rabiner. Readings in speech recognition. chapter A Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition, pages 267–296. Morgan Kaufmann Publishers Inc., San Francisco,
CA, USA, 1990.
[15] Anand Rajaraman and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, New
York, NY, USA, 2011.
[16] RaÃo
l Rojas. Adaboost and the super bowl of classifiers a tutorial introduction to adaptive boosting. 2009.
[17] Sergios Theodoridis. Machine Learning: A Bayesian and Optimization Perspective. Academic Press, 1st
edition, 2015.
Cinvestav GDL 8
Syllabus Machine Learning for Data Mining
[18] Ji Zhu, Saharon Rosset, Hui Zou, and Trevor Hastie. Multi-class adaboost. Ann Arbor, 1001(48109):1612,
2006.
[19] Xiaojin Zhu and Andrew B Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial
intelligence and machine learning, 3(1):1–130, 2009.
Cinvestav GDL 9

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 
D034017022
D034017022D034017022
D034017022
 
GRID SEARCHING Novel way of Searching 2D Array
GRID SEARCHING Novel way of Searching 2D ArrayGRID SEARCHING Novel way of Searching 2D Array
GRID SEARCHING Novel way of Searching 2D Array
 
Teaching Matrices within Statistics
Teaching Matrices within StatisticsTeaching Matrices within Statistics
Teaching Matrices within Statistics
 
Edge Detection Using Fuzzy Logic with Varied Inputs
Edge Detection Using Fuzzy Logic with Varied InputsEdge Detection Using Fuzzy Logic with Varied Inputs
Edge Detection Using Fuzzy Logic with Varied Inputs
 
IRJET-Handwritten Digit Classification using Machine Learning Models
IRJET-Handwritten Digit Classification using Machine Learning ModelsIRJET-Handwritten Digit Classification using Machine Learning Models
IRJET-Handwritten Digit Classification using Machine Learning Models
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognition
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms and
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
A Novel Bayes Factor for Inverse Model Selection Problem based on Inverse Ref...
A Novel Bayes Factor for Inverse Model Selection Problem based on Inverse Ref...A Novel Bayes Factor for Inverse Model Selection Problem based on Inverse Ref...
A Novel Bayes Factor for Inverse Model Selection Problem based on Inverse Ref...
 
ALTERNATIVE METHOD TO LINEAR CONGRUENCE
ALTERNATIVE METHOD TO LINEAR CONGRUENCEALTERNATIVE METHOD TO LINEAR CONGRUENCE
ALTERNATIVE METHOD TO LINEAR CONGRUENCE
 
Image Similarity Test Using Eigenface Calculation
Image Similarity Test Using Eigenface CalculationImage Similarity Test Using Eigenface Calculation
Image Similarity Test Using Eigenface Calculation
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
 
2.mathematics for machine learning
2.mathematics for machine learning2.mathematics for machine learning
2.mathematics for machine learning
 
Recommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learningRecommendation algorithm using reinforcement learning
Recommendation algorithm using reinforcement learning
 
Data structures
Data structuresData structures
Data structures
 
Strings in Python
Strings in PythonStrings in Python
Strings in Python
 
Topological Data Analysis and Persistent Homology
Topological Data Analysis and Persistent HomologyTopological Data Analysis and Persistent Homology
Topological Data Analysis and Persistent Homology
 

Ähnlich wie Introduction Machine Learning Syllabus

Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
Rui Pedro Paiva
 
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
Phuc Nguyen
 

Ähnlich wie Introduction Machine Learning Syllabus (20)

Week 2 lecture
Week 2 lectureWeek 2 lecture
Week 2 lecture
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
A Survey And Taxonomy Of Distributed Data Mining Research Studies A Systemat...
A Survey And Taxonomy Of Distributed Data Mining Research Studies  A Systemat...A Survey And Taxonomy Of Distributed Data Mining Research Studies  A Systemat...
A Survey And Taxonomy Of Distributed Data Mining Research Studies A Systemat...
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social media
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
Mastering in Data Science 3RITPL-1 (1).pdf
Mastering in Data Science 3RITPL-1 (1).pdfMastering in Data Science 3RITPL-1 (1).pdf
Mastering in Data Science 3RITPL-1 (1).pdf
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
313 IDS _Course_Introduction_PPT.pptx
313 IDS _Course_Introduction_PPT.pptx313 IDS _Course_Introduction_PPT.pptx
313 IDS _Course_Introduction_PPT.pptx
 
Applications of Pattern Recognition Algorithms in Agriculture: A Review
Applications of Pattern Recognition Algorithms in Agriculture: A ReviewApplications of Pattern Recognition Algorithms in Agriculture: A Review
Applications of Pattern Recognition Algorithms in Agriculture: A Review
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
EmbNum: Semantic Labeling for Numerical Values with Deep Metric Learning
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 

Mehr von Andres Mendez-Vazquez

Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
Andres Mendez-Vazquez
 

Mehr von Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Schedule for Data Lab Community Path in Machine Learning
Schedule for Data Lab Community Path in Machine LearningSchedule for Data Lab Community Path in Machine Learning
Schedule for Data Lab Community Path in Machine Learning
 

Kürzlich hochgeladen

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Kürzlich hochgeladen (20)

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Introduction Machine Learning Syllabus

  • 1. Introduction to Machine Learning April 30, 2018 Andres Mendez-Vazquez, Email: amendez@gdl.cinvestav.mx 1 Introduction The first tool to attack data pattern related problems is Machine Learning, a Computer Science discipline concerned with the design of algorithms that allows computers to evolve behaviors based on empirical data. These algorithms can be organized in the following hierarchy: Supervised Learning, Unsupervised Learning and Semi-supervised Learning. Thus, Machine Learning is more than anything an interdisciplinary sub-field dealing with the discover of patterns in large data sets involving methods from Artificial Intelligence, Machine Learning, Statistics, and Database Systems. Nowadays, this intersection of subfields has flourished in what many people call Data Sciences, making Machine Learning one of its corner stones. However, as a precautionary tale, a person who wants to be a Data Scientists must be proficient in: 1. Analysis of Algorithms, 2. Data Structures, 3. Probability and Statistics, 4. Linear Algebra, 5. Linear and Non-Linear Optimization, 6. Software Engineering, 7. Machine Learning, 8. Data Mining, 9. Languages for prototyping, Python or R, 10. Parallel Programming. Making the journey into Data Science an arduous one that cannot be taken lightly. 1
  • 2. Syllabus Machine Learning for Data Mining 2 Course Objectives This is a theoretical and practical 65 hours course that introduces the students to concepts of machine learning for processing and analyzing data from different sources. The emphasis is on various Machine Learning problems, solutions and their use for Data Analysis. Students will develop an understanding of the Machine Learning process and issues, learn techniques for Machine Learning, and apply them in solving Machine Learning problems. II Prerequisites Linear Algebra, Probability, Convex Optimization, Artificial Intelligence and Analysis of Algorithms. Note: A piece of advise, this is barely the beginning of the amount of math that you should be able to handle in order to be successful in this area. III Class Grading The grades in the class will be graded in the following way Task Percentage Midterm I 10% Midterm II 10% Final 10% 8 Homework 40% Project 30% IV Class Structure We have decided to build the following structure for the class: • Theoretical class. – Here, a discussion of the basic theory on the Machine Learning will be done. Thus, the student can understand the deepness of the problems and possible solutions in Machine Learning. • Lab – Here, the student will use NumPy and SciPy to implement the basic algorithms. This will allow the student to understand the critical junction of such algorithms. V Projects The project will be in an specific problem that each student wants to work on. Possible topic are: • Oil exploration, • Association Rule Systems, Cinvestav GDL 2
  • 3. Syllabus Machine Learning for Data Mining • Social Network Analysis, • Page Ranking, • Web Word Relevance Measures, • Recommendation Systems, • Bio-Signal Processing, • Large-Scale Object Recognition, • Natural Language Processing, • Something you are really interested on. Please come and talk to me. There are more possibilities at: • https://www.kaggle.com/competitions • http://aws.amazon.com/datasets/ VI About the Reading Material Because of the scope of machine learning, different subjects will be obtained from different text books and papers. Therefore: 1. The recommended books are at the end in the bibliography. 2. In addition, several articles will be used as we progress through the class. Cinvestav GDL 3
  • 4. Syllabus Machine Learning for Data Mining VII Course Topics I.1 Introduction [3, 17, 9, 13] 1. What is a Classifier? 2. Two approaches to Prediction 3. A little bit of Statistical Decision Theory 4. Classes of Estimators I.2 Supervised Learning I.2.1 Linear Classifiers[3, 17, 9] 1. Introduction to Linear Regression 2. Mean-Square Error Linear Estimation (a) Canonical form (b) Gradient Descent 3. Regularization 4. Logistic Regression 5. Fischer Linear Discriminant 6. Subset Selection 7. Shrinkage Methods (a) Ridge Regression (b) The Lasso (c) Subset Selection I.2.2 Stochastic Gradient Descent [6, 17, 4] 1. Introduction 2. The Steepest Descent Method 3. Application to the Mean-Square Error Cost Function 4. Stochastic Approximation (a) Iterative Method 5. The Least-Mean-Squares Adaptive Algorithm 6. The Momentum Method (a) ADAM - Adaptive Moment Estimation Cinvestav GDL 4
  • 5. Syllabus Machine Learning for Data Mining I.2.3 Probability Classifiers [3, 12, 2, 17, 8, 13] 1. Discriminant Functions 2. Naive Bayes 3. Maximum Likelihood 4. Expectation Maximization and Mixture of Gaussian’s 5. Maximum a Posterior Probability Estimation 6. Generative Models vs Discriminative Models I.2.4 Kernel Based Classifiers [10, 17] 1. Introduction 2. Support Vector Machines 3. Learning in Reproducing Kernel Hilbert Spaces I.2.5 Random Trees [17, 9] 1. Decision Trees 2. Random Forest (a) Tree Bagging 3. Variable Importance 4. Variants I.2.6 Hidden Markov Models [14] 1. Introduction 2. Ford-ward Algorithm 3. The Viterbi Algorithm 4. Baum-Welch Algorithm I.2.7 Neural Networks [10] 1. Perceptron 2. Multilayer Perceptron 3. Universal Approximation Theorem Cinvestav GDL 5
  • 6. Syllabus Machine Learning for Data Mining I.2.8 Important Issues [3, 9, 13, 17] 1. Bias-Variance Dilemma 2. The Confusion Matrix 3. K-Cross Validation I.2.9 Data Preparation [3, 7, 17] 1. Feature selection (a) Preprocessing (b) Statistical Methods (c) Class Separability (d) Feature Subset Selection. 2. Feature Generation (a) Introduction (b) Fisher Linear Discriminant (c) Dimensionality Reduction i. Principal Component Analysis ii. The Singular Value Decomposition I.2.10 Combining Classifiers [3, 16, 18] 1. Average Rules 2. Majority Voting Rule 3. A Bayesian viewpoint 4. AdaBoosting 5. Multi-Class AdaBoost I.3 Unsupervised Learning I.3.1 Clustering, the Classic Path[3, 17, 9, 13] 1. Introduction 2. Proximity Measures. 3. K-Means (a) Vector Quantization 4. K-Centers Cinvestav GDL 6
  • 7. Syllabus Machine Learning for Data Mining 5. Mixture of Gaussian’s 6. K-Meoids 7. Hierarchical Clustering 8. Principal Components, Curves and Surface (a) Spectral Clustering 9. Self-Organization Maps 10. Independent Component Analysis (a) Latent Variables and Factor Analysis (b) Independent Component Analysis I.3.2 Association rules [9, 15] 1. The Market Problem 2. The Apriori Algorithm 3. Unsupervised as Supervised Learning 4. Improvements 5. Near Neighbor Search in High Dimensional Data (a) Locality Sensitive Hashing (b) Near Neighbor Search I.3.3 Structure of the Web [9, 15, 11] 1. Page Rank Algorithm 2. Hyper link-Induced Topic Search (Hubs and Authorities) 3. Search of Communities I.4 Semi-Supervised Learning [19, 5, 1] 1. Introduction to Semi-Supervised Learning 2. Using Expectation Maximization (a) Cluster then Label 3. Graph-Based Semi-Supervised Learning 4. Semi-supervised Support Vector Machines Cinvestav GDL 7
  • 8. Syllabus Machine Learning for Data Mining References [1] Amparo Albalate and Wolfgang Minker. Semi-Supervised and Unsupervised Machine Learning: Novel Strate- gies. John Wiley & Sons, 2013. [2] Alessio Benavoli, Giorgio Corani, Janez Demsar, and Marco Zaffalon. Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. arXiv preprint arXiv:1606.04316, 2016. [3] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. [4] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMP- STAT’2010, pages 177–186. Springer, 2010. [5] Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. Semi-Supervised Learning. The MIT Press, 1st edition, 2010. [6] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011. [7] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2Nd Edition). Wiley-Interscience, 2000. [8] Mário A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell., 25(9):1150–1159, September 2003. [9] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer Series in Statistics. Springer New York, 2009. [10] Simon Haykin. Neural Networks and Learning Machines. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2008. [11] Amy N. Langville and Carl D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton, NJ, USA, 2006. [12] Geoffrey McLachlan. Discriminant analysis and statistical pattern recognition, volume 544. John Wiley & Sons, 2004. [13] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012. [14] Lawrence R. Rabiner. Readings in speech recognition. chapter A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, pages 267–296. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990. [15] Anand Rajaraman and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, New York, NY, USA, 2011. [16] RaÃo l Rojas. Adaboost and the super bowl of classifiers a tutorial introduction to adaptive boosting. 2009. [17] Sergios Theodoridis. Machine Learning: A Bayesian and Optimization Perspective. Academic Press, 1st edition, 2015. Cinvestav GDL 8
  • 9. Syllabus Machine Learning for Data Mining [18] Ji Zhu, Saharon Rosset, Hui Zou, and Trevor Hastie. Multi-class adaboost. Ann Arbor, 1001(48109):1612, 2006. [19] Xiaojin Zhu and Andrew B Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1):1–130, 2009. Cinvestav GDL 9