9. Why Probability & Statistics?
ā¢ To understand
whether data is meaningful,
including:
ā¢ optimization,
ā¢ inference,
ā¢ testing,
ā¢ and other methods.
ā¢ Therefore, we are able to
analyze patterns
in data and using them to
predict, understand, and
improve results.
Prob: to
predict
likelihood
of future
Stats: to
analyze
frequency
of past
events
22. Neural Network
ā¢ Purpose ļ to find out the output of
complex input by modeling the
relationship
ā¢ To find patterns in data (pattern
recognition)
ā¢ To predict sampled functions given no form
of the functions (function estimation)
ā¢ Mathematically speaking
Let input and
weight, then activation,
a = =
š„1, š„2, š„3, ā¦ , š„ š š¤1, š¤2, š¤3, ā¦ , š¤ š
š=1
š
š„š š¤šš„1 š¤1 + š„2 š¤2 + š„3 š¤3 + āÆ + š¤ š
23. Classification
ā¢ Decision Tree
ā¢ Random Forest
ā¢ NaĆÆve Bayes
ā¢ Support Vector
Machine
ā¢ k-Nearest Neighbor
ā¢ Neural Network
Supervised Learning
Task Given
Probs & Stats
Linear Algebra
Calculus
25. Decision Tree
ā¢ Purpose ļ to model the relationships among the
features and the possible outcomes in tree structures
ā¢ Mathematically speaking
ā¢ Gini impurity
ā¢ Information gain
ā¢ Variance reduction
27. Random Forest
ā¢ Purpose ļ to model the relationships among the
features and the possible outcomes in tree structures
(but bigger features than decision tree, weights are
also varied: BOOSTING)
ā¢ Mathematically speaking
ā¢ Predictions,
With , if š„š is one of the šā points in the
same leaf as š„ā , and 0 otherwise.
29. NaĆÆve Bayes
ā¢ Purpose ļ to describe the probability of
events and how probabilities should be revised
in the light of additional information.
ā¢ Mathematically speaking
31. Support Vector Machine
ā¢ Purpose ļ to create flat boundary called a
hyperplane, which divides the space to create
fairly homogeneous partition on either side.
ā¢ Mathematically Speaking
Kernel š¾ š„š, š„š = š„š, š„š
2
33. k-Nearest Neighbor
ā¢ Purpose ļ to classify one of data into a class based
on its similarity to its nearest neighbor
ā¢ Mathematically speaking
š¦ = argmax
š¦
š š¦|š„, š·
š¦ = majority vote (predictor)
D = a set of points in the circle
š š¦|š„, š· = portion of points in k-nearest points
We intend to find a class for
question-tagged object. HOW?
x
y
1. Make a circle to get k-nearest neighbors of object (itās a majority vote of k-nearest points, refer to the figure).
2. Repeat the process
3. Calculate the distance between object and its nearest neighbors
4. Find the probability of object by portion of points in k-nearest points (refer to figure: portion of A and B in the
circles) , or we can write it mathematically as above.
36. k-Means Clustering
ā¢ Purpose ļ to minimize the
differences within each cluster
and maximize the differences
between the clusters.
ā¢ Mathematically speaking
ā¢ Data assignment
šš = collection of centroids
š¶š = set of šš
š„ = data point
ā¢ Centroid update step
šš = collection of centroids
šš = set of data point assignments for each š š”ā
cluster centroid
š„š = data point
38. ā¢ Purpose ļ to give suggestion
of possible outcome by
predicting historical input
(e.g. recommendation of a
product to buy)
ā¢ Mathematically speaking
Association Rules
41. Itās not about equation
BUT
Mathematical intuitions
ā¢ Choose the right algorithms for problem
ā¢ Make good choices on parameter settings, validation strategies
ā¢ Recognize underfitting or overfitting
ā¢ Troubleshoot poor or ambiguous results
ā¢ Put appropriate bounds of confidence or uncertainty on results
ā¢ Do a better job of coding algorithms or incorporating them into
more complex analysis pipelines
42. And the most important thing isā¦
Understand the concept
NOT
The packages
43. Want to deep dive in particula
algorithm?
Simply request! :D