Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Introduction Machine Learning Syllabus
1. Introduction to Machine Learning
April 30, 2018
Andres Mendez-Vazquez,
Email: amendez@gdl.cinvestav.mx
1 Introduction
The first tool to attack data pattern related problems is Machine Learning, a Computer Science discipline concerned
with the design of algorithms that allows computers to evolve behaviors based on empirical data. These algorithms
can be organized in the following hierarchy: Supervised Learning, Unsupervised Learning and Semi-supervised
Learning. Thus, Machine Learning is more than anything an interdisciplinary sub-field dealing with the discover
of patterns in large data sets involving methods from Artificial Intelligence, Machine Learning, Statistics, and
Database Systems. Nowadays, this intersection of subfields has flourished in what many people call Data Sciences,
making Machine Learning one of its corner stones. However, as a precautionary tale, a person who wants to be a
Data Scientists must be proficient in:
1. Analysis of Algorithms,
2. Data Structures,
3. Probability and Statistics,
4. Linear Algebra,
5. Linear and Non-Linear Optimization,
6. Software Engineering,
7. Machine Learning,
8. Data Mining,
9. Languages for prototyping, Python or R,
10. Parallel Programming.
Making the journey into Data Science an arduous one that cannot be taken lightly.
1
2. Syllabus Machine Learning for Data Mining
2 Course Objectives
This is a theoretical and practical 65 hours course that introduces the students to concepts of machine learning
for processing and analyzing data from different sources. The emphasis is on various Machine Learning problems,
solutions and their use for Data Analysis. Students will develop an understanding of the Machine Learning process
and issues, learn techniques for Machine Learning, and apply them in solving Machine Learning problems.
II Prerequisites
Linear Algebra, Probability, Convex Optimization, Artificial Intelligence and Analysis of Algorithms.
Note: A piece of advise, this is barely the beginning of the amount of math that you should be able to handle in
order to be successful in this area.
III Class Grading
The grades in the class will be graded in the following way
Task Percentage
Midterm I 10%
Midterm II 10%
Final 10%
8 Homework 40%
Project 30%
IV Class Structure
We have decided to build the following structure for the class:
• Theoretical class.
– Here, a discussion of the basic theory on the Machine Learning will be done. Thus, the student can
understand the deepness of the problems and possible solutions in Machine Learning.
• Lab
– Here, the student will use NumPy and SciPy to implement the basic algorithms. This will allow the
student to understand the critical junction of such algorithms.
V Projects
The project will be in an specific problem that each student wants to work on. Possible topic are:
• Oil exploration,
• Association Rule Systems,
Cinvestav GDL 2
3. Syllabus Machine Learning for Data Mining
• Social Network Analysis,
• Page Ranking,
• Web Word Relevance Measures,
• Recommendation Systems,
• Bio-Signal Processing,
• Large-Scale Object Recognition,
• Natural Language Processing,
• Something you are really interested on. Please come and talk to me.
There are more possibilities at:
• https://www.kaggle.com/competitions
• http://aws.amazon.com/datasets/
VI About the Reading Material
Because of the scope of machine learning, different subjects will be obtained from different text books and papers.
Therefore:
1. The recommended books are at the end in the bibliography.
2. In addition, several articles will be used as we progress through the class.
Cinvestav GDL 3
4. Syllabus Machine Learning for Data Mining
VII Course Topics
I.1 Introduction [3, 17, 9, 13]
1. What is a Classifier?
2. Two approaches to Prediction
3. A little bit of Statistical Decision Theory
4. Classes of Estimators
I.2 Supervised Learning
I.2.1 Linear Classifiers[3, 17, 9]
1. Introduction to Linear Regression
2. Mean-Square Error Linear Estimation
(a) Canonical form
(b) Gradient Descent
3. Regularization
4. Logistic Regression
5. Fischer Linear Discriminant
6. Subset Selection
7. Shrinkage Methods
(a) Ridge Regression
(b) The Lasso
(c) Subset Selection
I.2.2 Stochastic Gradient Descent [6, 17, 4]
1. Introduction
2. The Steepest Descent Method
3. Application to the Mean-Square Error Cost Function
4. Stochastic Approximation
(a) Iterative Method
5. The Least-Mean-Squares Adaptive Algorithm
6. The Momentum Method
(a) ADAM - Adaptive Moment Estimation
Cinvestav GDL 4
5. Syllabus Machine Learning for Data Mining
I.2.3 Probability Classifiers [3, 12, 2, 17, 8, 13]
1. Discriminant Functions
2. Naive Bayes
3. Maximum Likelihood
4. Expectation Maximization and Mixture of Gaussian’s
5. Maximum a Posterior Probability Estimation
6. Generative Models vs Discriminative Models
I.2.4 Kernel Based Classifiers [10, 17]
1. Introduction
2. Support Vector Machines
3. Learning in Reproducing Kernel Hilbert Spaces
I.2.5 Random Trees [17, 9]
1. Decision Trees
2. Random Forest
(a) Tree Bagging
3. Variable Importance
4. Variants
I.2.6 Hidden Markov Models [14]
1. Introduction
2. Ford-ward Algorithm
3. The Viterbi Algorithm
4. Baum-Welch Algorithm
I.2.7 Neural Networks [10]
1. Perceptron
2. Multilayer Perceptron
3. Universal Approximation Theorem
Cinvestav GDL 5
6. Syllabus Machine Learning for Data Mining
I.2.8 Important Issues [3, 9, 13, 17]
1. Bias-Variance Dilemma
2. The Confusion Matrix
3. K-Cross Validation
I.2.9 Data Preparation [3, 7, 17]
1. Feature selection
(a) Preprocessing
(b) Statistical Methods
(c) Class Separability
(d) Feature Subset Selection.
2. Feature Generation
(a) Introduction
(b) Fisher Linear Discriminant
(c) Dimensionality Reduction
i. Principal Component Analysis
ii. The Singular Value Decomposition
I.2.10 Combining Classifiers [3, 16, 18]
1. Average Rules
2. Majority Voting Rule
3. A Bayesian viewpoint
4. AdaBoosting
5. Multi-Class AdaBoost
I.3 Unsupervised Learning
I.3.1 Clustering, the Classic Path[3, 17, 9, 13]
1. Introduction
2. Proximity Measures.
3. K-Means
(a) Vector Quantization
4. K-Centers
Cinvestav GDL 6
7. Syllabus Machine Learning for Data Mining
5. Mixture of Gaussian’s
6. K-Meoids
7. Hierarchical Clustering
8. Principal Components, Curves and Surface
(a) Spectral Clustering
9. Self-Organization Maps
10. Independent Component Analysis
(a) Latent Variables and Factor Analysis
(b) Independent Component Analysis
I.3.2 Association rules [9, 15]
1. The Market Problem
2. The Apriori Algorithm
3. Unsupervised as Supervised Learning
4. Improvements
5. Near Neighbor Search in High Dimensional Data
(a) Locality Sensitive Hashing
(b) Near Neighbor Search
I.3.3 Structure of the Web [9, 15, 11]
1. Page Rank Algorithm
2. Hyper link-Induced Topic Search (Hubs and Authorities)
3. Search of Communities
I.4 Semi-Supervised Learning [19, 5, 1]
1. Introduction to Semi-Supervised Learning
2. Using Expectation Maximization
(a) Cluster then Label
3. Graph-Based Semi-Supervised Learning
4. Semi-supervised Support Vector Machines
Cinvestav GDL 7
8. Syllabus Machine Learning for Data Mining
References
[1] Amparo Albalate and Wolfgang Minker. Semi-Supervised and Unsupervised Machine Learning: Novel Strate-
gies. John Wiley & Sons, 2013.
[2] Alessio Benavoli, Giorgio Corani, Janez Demsar, and Marco Zaffalon. Time for a change: a tutorial for
comparing multiple classifiers through bayesian analysis. arXiv preprint arXiv:1606.04316, 2016.
[3] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics).
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[4] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMP-
STAT’2010, pages 177–186. Springer, 2010.
[5] Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien. Semi-Supervised Learning. The MIT Press, 1st
edition, 2010.
[6] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic
optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.
[7] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2Nd Edition). Wiley-Interscience,
2000.
[8] Mário A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell.,
25(9):1150–1159, September 2003.
[9] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Second Edition. Springer Series in Statistics. Springer New York, 2009.
[10] Simon Haykin. Neural Networks and Learning Machines. Prentice-Hall, Inc., Upper Saddle River, NJ, USA,
2008.
[11] Amy N. Langville and Carl D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings.
Princeton University Press, Princeton, NJ, USA, 2006.
[12] Geoffrey McLachlan. Discriminant analysis and statistical pattern recognition, volume 544. John Wiley &
Sons, 2004.
[13] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
[14] Lawrence R. Rabiner. Readings in speech recognition. chapter A Tutorial on Hidden Markov Models and
Selected Applications in Speech Recognition, pages 267–296. Morgan Kaufmann Publishers Inc., San Francisco,
CA, USA, 1990.
[15] Anand Rajaraman and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, New
York, NY, USA, 2011.
[16] RaÃo
l Rojas. Adaboost and the super bowl of classifiers a tutorial introduction to adaptive boosting. 2009.
[17] Sergios Theodoridis. Machine Learning: A Bayesian and Optimization Perspective. Academic Press, 1st
edition, 2015.
Cinvestav GDL 8
9. Syllabus Machine Learning for Data Mining
[18] Ji Zhu, Saharon Rosset, Hui Zou, and Trevor Hastie. Multi-class adaboost. Ann Arbor, 1001(48109):1612,
2006.
[19] Xiaojin Zhu and Andrew B Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial
intelligence and machine learning, 3(1):1–130, 2009.
Cinvestav GDL 9