9. Why Probability & Statistics?
• To understand
whether data is meaningful,
including:
• optimization,
• inference,
• testing,
• and other methods.
• Therefore, we are able to
analyze patterns
in data and using them to
predict, understand, and
improve results.
Prob: to
predict
likelihood
of future
Stats: to
analyze
frequency
of past
events
22. Neural Network
• Purpose to find out the output of
complex input by modeling the
relationship
• To find patterns in data (pattern
recognition)
• To predict sampled functions given no form
of the functions (function estimation)
• Mathematically speaking
Let input and
weight, then activation,
a = =
𝑥1, 𝑥2, 𝑥3, … , 𝑥 𝑛 𝑤1, 𝑤2, 𝑤3, … , 𝑤 𝑛
𝑖=1
𝑛
𝑥𝑖 𝑤𝑖𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑥3 𝑤3 + ⋯ + 𝑤 𝑛
23. Classification
• Decision Tree
• Random Forest
• Naïve Bayes
• Support Vector
Machine
• k-Nearest Neighbor
• Neural Network
Supervised Learning
Task Given
Probs & Stats
Linear Algebra
Calculus
25. Decision Tree
• Purpose to model the relationships among the
features and the possible outcomes in tree structures
• Mathematically speaking
• Gini impurity
• Information gain
• Variance reduction
27. Random Forest
• Purpose to model the relationships among the
features and the possible outcomes in tree structures
(but bigger features than decision tree, weights are
also varied: BOOSTING)
• Mathematically speaking
• Predictions,
With , if 𝑥𝑖 is one of the 𝑘’ points in the
same leaf as 𝑥’ , and 0 otherwise.
29. Naïve Bayes
• Purpose to describe the probability of
events and how probabilities should be revised
in the light of additional information.
• Mathematically speaking
31. Support Vector Machine
• Purpose to create flat boundary called a
hyperplane, which divides the space to create
fairly homogeneous partition on either side.
• Mathematically Speaking
Kernel 𝐾 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖, 𝑥𝑗
2
33. k-Nearest Neighbor
• Purpose to classify one of data into a class based
on its similarity to its nearest neighbor
• Mathematically speaking
𝑦 = argmax
𝑦
𝑝 𝑦|𝑥, 𝐷
𝑦 = majority vote (predictor)
D = a set of points in the circle
𝑝 𝑦|𝑥, 𝐷 = portion of points in k-nearest points
We intend to find a class for
question-tagged object. HOW?
x
y
1. Make a circle to get k-nearest neighbors of object (it’s a majority vote of k-nearest points, refer to the figure).
2. Repeat the process
3. Calculate the distance between object and its nearest neighbors
4. Find the probability of object by portion of points in k-nearest points (refer to figure: portion of A and B in the
circles) , or we can write it mathematically as above.
36. k-Means Clustering
• Purpose to minimize the
differences within each cluster
and maximize the differences
between the clusters.
• Mathematically speaking
• Data assignment
𝑐𝑖 = collection of centroids
𝐶𝑖 = set of 𝑐𝑖
𝑥 = data point
• Centroid update step
𝑐𝑖 = collection of centroids
𝑆𝑖 = set of data point assignments for each 𝑖 𝑡ℎ
cluster centroid
𝑥𝑖 = data point
38. • Purpose to give suggestion
of possible outcome by
predicting historical input
(e.g. recommendation of a
product to buy)
• Mathematically speaking
Association Rules
41. It’s not about equation
BUT
Mathematical intuitions
• Choose the right algorithms for problem
• Make good choices on parameter settings, validation strategies
• Recognize underfitting or overfitting
• Troubleshoot poor or ambiguous results
• Put appropriate bounds of confidence or uncertainty on results
• Do a better job of coding algorithms or incorporating them into
more complex analysis pipelines
42. And the most important thing is…
Understand the concept
NOT
The packages
43. Want to deep dive in particula
algorithm?
Simply request! :D