SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Machine
Learning
101
Masa Bina Cinta
HME ITB 2021
● Data Scientist at Mekari (Jun 2021-Present)
● Data Science Intern at Mekari (Jan 2021-Apr 2021)
● Part-time AI Researcher at Bisa AI (Aug 2020-Nov
2020)
● AI Engineer Intern at Bisa AI (Apr 2020-Aug 2020)
Ammar Chalifah
Teknik Biomedis ITB 2017 @ammarchalifah
ammarchalifah.com
2
Data
and Value Extraction
from Data
1
3
Career?
Okay, let’s talk about this first.
4
1. Artificial Intelligence Specialist (74% annual growth)
2. Robotics Engineer
3. Data Scientist (37% annual growth)
4. Full Stack Engineer
5. Site Reliability Engineer
6. Customer Success Specialist
7. Sales Development Representative
8. Data Engineer (33% annual growth)
9. Behavioral Health Technician
10. Cybersecurity Specialist
11. Back End Developer
12. Chief Revenue Officer
13. Cloud Engineer
14. JavaScript Developer
15. Product Owner
Top 15 Emerging Jobs in the US
LinkedIn Emerging Jobs Report (2020)
Linkedin 2020 Emerging Jobs Report.
https://business.linkedin.com/content/dam/me/business/en-
us/talent-solutions/emerging-jobs-
report/Emerging_Jobs_Report_U.S._FINAL.pdf
5
Demand for Data Science Skills
Between 2013 and 2015, demand
for data-related skills increased by
59%, 50%, 69%, and 88% for the ICT,
Media and Entertainment,
Professional Services, and Financial
Services industries.
However, Asia Pacific’s proficiency is
lagging behind other regions in key
data science skills.
High demand, low supply.
Demand is growing quickly with
big opportunity in Asia Pacific
World Economic Forum. (2019). Data Science in the New Economy, Insight Report.
http://www3.weforum.org/docs/WEF_Data_Science_In_the_New_Economy.pdf
6
Competition.
Company that uses data to make
data-driven decisions will win and
steal the laggards’ market share.
Every industry shows growing
demand for data-related skills. The
demand is expected to keep on
growing in the next several years.
Demand for data-related skills
is growing because it can be
used to extract values from
data.
What Drives Demand in Data-related Jobs?
Almost every
industry shows
growth in demand
for data-related
skills (WEF Report)
7
“Wait, what is data?”
It is just a collection of meaningless, raw facts.
8
DIKW Pyramid
Data
Raw facts, unprocessed,
unorganized
Information
Organized, processed
data, meaningful
Knowledge
Contextual, mix of values
and experiences
Wisdom
Evaluated understanding,
integrated knowledge
Data is only valuable if we can extract
values from it, by processing it to
create information, knowledge, and
wisdom.
“Yeah, ok. But this concept is too
abstract. What is data? What values can
we get from exploiting it? How can we
extract values from it?”
9
Types of Data
UNSW Sydney. (2020). Types of data and scales of measurement. https://studyonline.unsw.edu.au/blog/types-of-
data
Allen, Richard. What are the types of big data? https://www.selecthub.com/big-data-analytics/types-of-big-data-
analytics/
Structured vs.
unstructured
Quantitative vs. qualitative
Discrete vs. continuous
Nominal vs. ordinal
Binary vs. multi-class
Data is just random facts. To get
value, data must be processed.
Wait, but how do we get the
data that we need?
10
Data Science Hierarchy of Needs
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007%5D
Top-of-the-pyramid
products (AI, Deep
Learning, A/B testing, etc)
can only be built on top of
a strong foundation.
11
Analytic Ascendancy Model
Data analytics have
different levels based
on difficulty and value:
descriptive, diagnostic,
predictive, and
prescriptive. These are
the values that we
want to get from data.
12
● Data-related skills is growing in demand. Supply is inadequate. Opportunities everywhere!
● Demand is growing because data can be extracted to get values, giving upperhand to those who believe in
data-driven decision making.
● Data is just a collection of meaningless, raw facts. Data need to be processed to get useful information.
● Data have different types, which require different approaches to process them.
● Data science have a hierarchy of needs. Strong foundations in the data environment is needed before value
can be extracted.
● Value from data have different levels based on impact and difficulty.
For the sake of efficiency, we will jump directly to predictive analysis. Suppose we have a collected dataset, so there
are three steps left before we can extract predictive value from our data: (1) exploratory data analysis; (2) feature
engineering; and (3) modelling.
Recap
13
14
Exploratory
Data Analysis
2
15
Goals of EDA
16
Look at data before making any assumptions.
Size, number of
columns, data
types
Understand
context
Look at data
distribution and
identify outliers
Have a
descriptive
understanding
(centrality,
variability)
Analyze
correlations
And many
more!
Heart Disease Data
17
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Heart Disease Classification
Download the dataset from Kaggle: https://www.kaggle.com/ronitf/heart-disease-uci?select=heart.csv
18
Unzip the downloaded data. Open Google Colaboratory, then upload the heart.csv file to session storage. Execute
the code snippet below to load the CSV file into a pandas DataFrame.
EDA tips number 1: Read relevant information from the data source (readme files, column descriptions)
and display your data. You can refer to UCI archive page to read the full documentation of the data
(https://archive.ics.uci.edu/ml/datasets/Heart+Disease ). The df.head() line was used to display the
first 5 rows of your data.
Load CSV to DataFrame
import pandas as pd
file_name = "heart.csv"
df = pd.read_csv(file_name)
df.head()
EDA 1
19
Observing Table and Reading Docs
Display the data. See the column names, see
the data types.
Browse the documentation. This heart
disease docs can be found on the
university’s archive page:
https://archive.ics.uci.edu/ml/dataset
s/Heart+Disease
After reading the docs and
seeing the table, you realized
that this dataset has 13
columns of features and 1
target. The objective of this
predictive analysis is to
predict the target value based
on features values.
20
Next, you want to know the size of your data, the exact data types of each column, existence of empty data points in
your dataset. Pandas provides you easy-to-use functions to do just that in few lines of codes. If you have null values,
you need an extra step to impute or manipulate them.
Data Shape, Types, and Non-null Count
EDA 2 Check data size, columns’
data types, and existence
of null values.
df.shape
df.info()
All your data are
numerical, with no null
values.
21
303 rows, 14 columns.
Next, descriptive statistics will help you understand the centrality and variability of each numerical
features.
Descriptive Statistics
EDA 3
df.describe()
22
Data Visualization
EDA 4
Freely explore the data. Use data visualization to help make
your exploration more intuitive.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 6))
colors = {1:'red', 0:'blue'}
grouped = df.groupby('target')
for key, group in grouped:
group.plot(ax=ax, kind='scatter', x='age',
y='thalach', label=key, color=colors[key])
plt.show()
23
Besides visualizing our data, we also need
to check the correlation between features
and targets. There is possibility that a
feature is heavily correlated with the
target, making an ML approach inefficient.
There is also a possibility that several
features are heavily correlated with each
other, making the use of those features
together unnecessary.
Correlation Analysis
EDA 5
ax, fig = plt.subplots(nrows=1, ncols=1, figsize =
(10,10))
sns.heatmap(df.corr(), annot = True, ax = fig)
24
Check outliers. Outliers may cause biased models.
Outlier Checking
EDA 6
fig, ax = plt.subplots(nrows=1, ncols=len(df.columns), figsize =
(20,5))
for i,c in enumerate(df.columns):
sns.boxplot(data=df, y=c, ax = ax[i])
fig.tight_layout()
Box and whisker plot
25
Linearity check give us better understanding of the distribution of data points in each feature and
the skewness of the features.
Linearity and Distribution Check
EDA 7
fig, ax = plt.subplots(nrows=1, ncols=len(df.columns), figsize =
(20,5))
for i,c in enumerate(df.columns):
sns.histplot(data=df, y=c, ax = ax[i])
fig.tight_layout()
26
Feature
Engineering
3
27
Goals of Feature Engineering
28
Clean and process the data to help analysis/modelling
Rescale
numeric value
Clean missing
values (by
dropping or
imputing data)
Combine
multiple
features
Decode data
(categorical to
numerical,
numerical to
ordinal, etc)
Handle outliers
And many
more!
The EDA we have done before only give descriptive of inferential statistics. To extract more values from data, the
higher level in analytics ascendancy model is predictive analysis. One popular way to do predictive analysis is by
using a machine learning approach i.e. letting the machine learn by providing inputs (features) and outputs, with the
goal of finding the underlying rules that transform inputs to outputs.
In our hands-on experience, we have 13 features (or inputs), where we want to know the output (whether the patient
has heart problem or not) based on those features. Most of the time, we need to process our input by using feature
engineering.
Predictive Machine Learning Model
ML model
Input Output
Input, or features Output, or labels
29
Why?
We are lucky to have a clean, non-null, and all numeric data. Sometimes, you will need to analyze data from not-so-
ideal datasets: which have null values, extreme outliers, or nominal data (e.g. string). We can’t directly pump the data
into our machine learning model, so feature engineering become an important part of data science process.
Besides missing values or nominal data, sometimes we also need to process our numerical data: standardize,
normalize, threshold, etc. Different machine learning models require different input characteristics.
Now, we will explore several important feature engineering techniques, and later on we will implement some of them
to our data.
30
FE 1
31
Drop
Numerical
Imputation
Categorical
Imputation
Drop rows or columns with
missing values. Easy to do, but
may cause significant data
loss.
Fill with another numerical
value, such as 0 or median
(depends on case)
Fill with another categorical
value, such as most frequent
value or new category (e.g.
’Others’)
Handling Missing Values
# Drop missing rows
df = df[df.isnull() == False]
# Drop missing columns
df = df[df.columns[df.isnull().mean() == 0]]
# Impute with 0
df = df.fillna(0)
# Impute with median
df = df.fillna(df.median())
# Impute with new categorical
df = df.fillna('Others')
# Impute with most frequent
df['column_name'].fillna(df['column_name'].value_counts().id
xmax(), inplace=True)
FE 2
32
Outlier
Detection
Standard deviation vs
percentile
Outliers can be handled by:
- Drop outliers
- Cap outliers
Handling Outliers
#Dropping the outlier rows with standard deviation
factor = 3
upper_lim = df['column'].mean () + df['column'].std () * factor
lower_lim = df['column'].mean () - df['column'].std () * factor
df = df[(df['column'] < upper_lim) & (df['column'] > lower_lim)]
#Dropping the outlier rows with Percentiles
upper_lim = df['column'].quantile(.95)
lower_lim = df['column'].quantile(.05)
df = df[(df['column'] < upper_lim) & (df['column'] > lower_lim)]
#Capping the outlier rows with Percentiles
upper_lim = df['column'].quantile(.95)
lower_lim = df['column'].quantile(.05)
df.loc[(df[column] > upper_lim),column] = upper_lim
df.loc[(df[column] < lower_lim),column] = lower_lim
FE 3
33
Binning make model
more robust by
sacrificing information to
create more general (or
regularized) categories. It
prevents overfitting, but
cost performance.
Binning
Rençberoğlu, Emre (2019). Fundamental Techniques of Feature Engineering for Machine Learning. https://towardsdatascience.com/feature-
engineering-for-machine-learning-3a5e293a5114
FE 4
34
One-hot encoding encodes categorical data into multi-columns
binary numerical data.
One-hot Encoding
User ID Major
1 Biomedical Engineering
2 Electrical Engineering
3 Electrical Engineering
User
ID
Biomedical
Engineering
Electrical
Engineering
1 1 0
2 0 1
3 0 1
FE 5
35
Rescales numerical data. Two most popular scaling methods are
min-max normalization and standardization. Min-max
normalization scales all values to a range between 0 and 1.
Standardization scales all values to a new distribution with 0
mean and 1 standard deviation.
Scaling
# Min-max normalization
df['normalized'] = (df['value'] -
df['value'].min()) / (df['value'].max() -
df['value'].min())
Min-max normalization
# Standardization
df['standardized'] = (df['value'] -
df['value'].mean()) / df['value'].std()
Standardization
36
EDA
Feature
Engineering
Machine
Learning Model
4
37
Regression
Predicted value is a continuous
numerical value.
Performance measured by error.
38
Predicted value is a categorical data.
Performance measured by accuracy.
Classification
Generally, there are two kinds of prediction
https://www.javatpoint.com/regression-vs-classification-in-machine-learning
Machine learning model development is an iterative process, with successive trial-and-error. We may end up need to
try a bunch of different feature engineering methods, but we can make an educated guess for our first trial.
● First, we don’t need to process binary numerical data.
● Second, we know there are no outliers based on the histogram in linearity and distribution check.
● Third, there are several numerical value that is not normalized nor standardized. We may need to rescale these
columns.
● Lastly, there are no missing values nor categorical values in the data.
Choice of feature engineering is heavily dependent on which machine learning algorithm we’ll use. So, let’s jump to
the last phase of this workshop: picking our machine learning model!
39
Which feature engineering methods suit our need?
What is the function on the graph above?
40
Trivia 101
Regression Prediction
41
Regression maps input to a continuous output variable.
Main Idea: Given the regression function is hθ(x) = θ1x + θ0 ,
choose θ0 and θ1 so that hθ(x) is close to y of our training examples (x,y)
Questions that can be answered by regression:
● How expensive is this house?
● How many tonnes of product will be delivered next month?
Example of machine learning regression algorithms:
● Linear regression
Interestingly, an ordinal classification problem can be framed as a regression problem (for example, 3 class with
ordered severity can be seen as a regression).
Src: Machine Learning Andrew Ng, Stanford Edu
Classification Prediction
42
Classification maps input variables to probability of output classes. Classification may be binary or multi-class.
Questions that can be answered by classification:
● What animal is this?
● What kind of disease is this?
Example of machine learning regression algorithms:
● Logistic regression
● Naive Bayesian classification
● k-Nearest Neighbours
● Decision Tree
● Random Forest
Interestingly, a classification algorithm can be used to solve regression problems by framing it as a multi-class
classification problem with many classes!
Src: Machine Learning Andrew Ng, Stanford Edu
43
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
logit = LogisticRegression(random_state = 17)
logit.fit(X_train, y_train)
print(accuracy_score(logit.predict(X_test), y_test))
importance = logit.coef_
# summarize feature importance
for x,v in zip(X_train.columns, importance[0]):
print('Feature: {}, Score: {:.5f}'.format(x,v))
# plot feature importance
plt.bar([x for x in range(len(importance[0]))],
importance[0])
plt.show()
Accuracy
Feature importance
So, how about our heart
disease data?
44
It’s up to you! Just do some
experiment to find the optimal
model. For now, let’s try to frame
it as a classification problem.
Hands on!
Open the Google Colaboratory
45
[1] Patil, Prasad (2018). What is Exploratory Data Analysis? https://towardsdatascience.com/exploratory-data-
analysis-8fc1cb20fd15
[2] Rençberoğlu, Emre (2019). Fundamental Techniques of Feature Engineering for Machine Learning.
https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114
References
46
● Data Analyst Intern at Moving Walls (Apr - Jul 2021)
● Researcher Intern at NCIRI (Jul - Sep 2020)
● Backend Developer Intern at Bangunindo (Dec 2019-Jan 2020)
Ramadhita Umitaibatin
Teknik Biomedis ITB 2017
@ramadhitau
Ramadhita Umitaibatin
(LinkedIn)
47
Contributors
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, and infographics
& images by Freepik.
Thank you~
For further inquiries, please don’t
hesitate to contact me :)
48

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
kevinlan
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
Rui Pedro Paiva
 

Was ist angesagt? (20)

Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
Data science syllabus
Data science syllabusData science syllabus
Data science syllabus
 
L11. The Future of Machine Learning
L11. The Future of Machine LearningL11. The Future of Machine Learning
L11. The Future of Machine Learning
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data analysis
Data analysisData analysis
Data analysis
 
Data analytics
Data analyticsData analytics
Data analytics
 
Image Analytics: Caption Generation/Image Descriptions
Image Analytics: Caption Generation/Image DescriptionsImage Analytics: Caption Generation/Image Descriptions
Image Analytics: Caption Generation/Image Descriptions
 
Data Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should knowData Science for Business Managers - The bare minimum a manager should know
Data Science for Business Managers - The bare minimum a manager should know
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 

Ähnlich wie Machine learning 101

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
MuhammadTahiriqbal13
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
ArmyTrilidiaDevegaSK
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 

Ähnlich wie Machine learning 101 (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data Analytics
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Data science
Data science Data science
Data science
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
 
Welcome to CS310!
Welcome to CS310!Welcome to CS310!
Welcome to CS310!
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
Data science and Machine learning Booklet
Data science and Machine learning BookletData science and Machine learning Booklet
Data science and Machine learning Booklet
 

Kürzlich hochgeladen

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Machine learning 101

  • 2. ● Data Scientist at Mekari (Jun 2021-Present) ● Data Science Intern at Mekari (Jan 2021-Apr 2021) ● Part-time AI Researcher at Bisa AI (Aug 2020-Nov 2020) ● AI Engineer Intern at Bisa AI (Apr 2020-Aug 2020) Ammar Chalifah Teknik Biomedis ITB 2017 @ammarchalifah ammarchalifah.com 2
  • 4. Career? Okay, let’s talk about this first. 4
  • 5. 1. Artificial Intelligence Specialist (74% annual growth) 2. Robotics Engineer 3. Data Scientist (37% annual growth) 4. Full Stack Engineer 5. Site Reliability Engineer 6. Customer Success Specialist 7. Sales Development Representative 8. Data Engineer (33% annual growth) 9. Behavioral Health Technician 10. Cybersecurity Specialist 11. Back End Developer 12. Chief Revenue Officer 13. Cloud Engineer 14. JavaScript Developer 15. Product Owner Top 15 Emerging Jobs in the US LinkedIn Emerging Jobs Report (2020) Linkedin 2020 Emerging Jobs Report. https://business.linkedin.com/content/dam/me/business/en- us/talent-solutions/emerging-jobs- report/Emerging_Jobs_Report_U.S._FINAL.pdf 5
  • 6. Demand for Data Science Skills Between 2013 and 2015, demand for data-related skills increased by 59%, 50%, 69%, and 88% for the ICT, Media and Entertainment, Professional Services, and Financial Services industries. However, Asia Pacific’s proficiency is lagging behind other regions in key data science skills. High demand, low supply. Demand is growing quickly with big opportunity in Asia Pacific World Economic Forum. (2019). Data Science in the New Economy, Insight Report. http://www3.weforum.org/docs/WEF_Data_Science_In_the_New_Economy.pdf 6
  • 7. Competition. Company that uses data to make data-driven decisions will win and steal the laggards’ market share. Every industry shows growing demand for data-related skills. The demand is expected to keep on growing in the next several years. Demand for data-related skills is growing because it can be used to extract values from data. What Drives Demand in Data-related Jobs? Almost every industry shows growth in demand for data-related skills (WEF Report) 7
  • 8. “Wait, what is data?” It is just a collection of meaningless, raw facts. 8
  • 9. DIKW Pyramid Data Raw facts, unprocessed, unorganized Information Organized, processed data, meaningful Knowledge Contextual, mix of values and experiences Wisdom Evaluated understanding, integrated knowledge Data is only valuable if we can extract values from it, by processing it to create information, knowledge, and wisdom. “Yeah, ok. But this concept is too abstract. What is data? What values can we get from exploiting it? How can we extract values from it?” 9
  • 10. Types of Data UNSW Sydney. (2020). Types of data and scales of measurement. https://studyonline.unsw.edu.au/blog/types-of- data Allen, Richard. What are the types of big data? https://www.selecthub.com/big-data-analytics/types-of-big-data- analytics/ Structured vs. unstructured Quantitative vs. qualitative Discrete vs. continuous Nominal vs. ordinal Binary vs. multi-class Data is just random facts. To get value, data must be processed. Wait, but how do we get the data that we need? 10
  • 11. Data Science Hierarchy of Needs https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007%5D Top-of-the-pyramid products (AI, Deep Learning, A/B testing, etc) can only be built on top of a strong foundation. 11
  • 12. Analytic Ascendancy Model Data analytics have different levels based on difficulty and value: descriptive, diagnostic, predictive, and prescriptive. These are the values that we want to get from data. 12
  • 13. ● Data-related skills is growing in demand. Supply is inadequate. Opportunities everywhere! ● Demand is growing because data can be extracted to get values, giving upperhand to those who believe in data-driven decision making. ● Data is just a collection of meaningless, raw facts. Data need to be processed to get useful information. ● Data have different types, which require different approaches to process them. ● Data science have a hierarchy of needs. Strong foundations in the data environment is needed before value can be extracted. ● Value from data have different levels based on impact and difficulty. For the sake of efficiency, we will jump directly to predictive analysis. Suppose we have a collected dataset, so there are three steps left before we can extract predictive value from our data: (1) exploratory data analysis; (2) feature engineering; and (3) modelling. Recap 13
  • 14. 14
  • 16. Goals of EDA 16 Look at data before making any assumptions. Size, number of columns, data types Understand context Look at data distribution and identify outliers Have a descriptive understanding (centrality, variability) Analyze correlations And many more!
  • 18. Heart Disease Classification Download the dataset from Kaggle: https://www.kaggle.com/ronitf/heart-disease-uci?select=heart.csv 18
  • 19. Unzip the downloaded data. Open Google Colaboratory, then upload the heart.csv file to session storage. Execute the code snippet below to load the CSV file into a pandas DataFrame. EDA tips number 1: Read relevant information from the data source (readme files, column descriptions) and display your data. You can refer to UCI archive page to read the full documentation of the data (https://archive.ics.uci.edu/ml/datasets/Heart+Disease ). The df.head() line was used to display the first 5 rows of your data. Load CSV to DataFrame import pandas as pd file_name = "heart.csv" df = pd.read_csv(file_name) df.head() EDA 1 19
  • 20. Observing Table and Reading Docs Display the data. See the column names, see the data types. Browse the documentation. This heart disease docs can be found on the university’s archive page: https://archive.ics.uci.edu/ml/dataset s/Heart+Disease After reading the docs and seeing the table, you realized that this dataset has 13 columns of features and 1 target. The objective of this predictive analysis is to predict the target value based on features values. 20
  • 21. Next, you want to know the size of your data, the exact data types of each column, existence of empty data points in your dataset. Pandas provides you easy-to-use functions to do just that in few lines of codes. If you have null values, you need an extra step to impute or manipulate them. Data Shape, Types, and Non-null Count EDA 2 Check data size, columns’ data types, and existence of null values. df.shape df.info() All your data are numerical, with no null values. 21 303 rows, 14 columns.
  • 22. Next, descriptive statistics will help you understand the centrality and variability of each numerical features. Descriptive Statistics EDA 3 df.describe() 22
  • 23. Data Visualization EDA 4 Freely explore the data. Use data visualization to help make your exploration more intuitive. import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(6, 6)) colors = {1:'red', 0:'blue'} grouped = df.groupby('target') for key, group in grouped: group.plot(ax=ax, kind='scatter', x='age', y='thalach', label=key, color=colors[key]) plt.show() 23
  • 24. Besides visualizing our data, we also need to check the correlation between features and targets. There is possibility that a feature is heavily correlated with the target, making an ML approach inefficient. There is also a possibility that several features are heavily correlated with each other, making the use of those features together unnecessary. Correlation Analysis EDA 5 ax, fig = plt.subplots(nrows=1, ncols=1, figsize = (10,10)) sns.heatmap(df.corr(), annot = True, ax = fig) 24
  • 25. Check outliers. Outliers may cause biased models. Outlier Checking EDA 6 fig, ax = plt.subplots(nrows=1, ncols=len(df.columns), figsize = (20,5)) for i,c in enumerate(df.columns): sns.boxplot(data=df, y=c, ax = ax[i]) fig.tight_layout() Box and whisker plot 25
  • 26. Linearity check give us better understanding of the distribution of data points in each feature and the skewness of the features. Linearity and Distribution Check EDA 7 fig, ax = plt.subplots(nrows=1, ncols=len(df.columns), figsize = (20,5)) for i,c in enumerate(df.columns): sns.histplot(data=df, y=c, ax = ax[i]) fig.tight_layout() 26
  • 28. Goals of Feature Engineering 28 Clean and process the data to help analysis/modelling Rescale numeric value Clean missing values (by dropping or imputing data) Combine multiple features Decode data (categorical to numerical, numerical to ordinal, etc) Handle outliers And many more!
  • 29. The EDA we have done before only give descriptive of inferential statistics. To extract more values from data, the higher level in analytics ascendancy model is predictive analysis. One popular way to do predictive analysis is by using a machine learning approach i.e. letting the machine learn by providing inputs (features) and outputs, with the goal of finding the underlying rules that transform inputs to outputs. In our hands-on experience, we have 13 features (or inputs), where we want to know the output (whether the patient has heart problem or not) based on those features. Most of the time, we need to process our input by using feature engineering. Predictive Machine Learning Model ML model Input Output Input, or features Output, or labels 29
  • 30. Why? We are lucky to have a clean, non-null, and all numeric data. Sometimes, you will need to analyze data from not-so- ideal datasets: which have null values, extreme outliers, or nominal data (e.g. string). We can’t directly pump the data into our machine learning model, so feature engineering become an important part of data science process. Besides missing values or nominal data, sometimes we also need to process our numerical data: standardize, normalize, threshold, etc. Different machine learning models require different input characteristics. Now, we will explore several important feature engineering techniques, and later on we will implement some of them to our data. 30
  • 31. FE 1 31 Drop Numerical Imputation Categorical Imputation Drop rows or columns with missing values. Easy to do, but may cause significant data loss. Fill with another numerical value, such as 0 or median (depends on case) Fill with another categorical value, such as most frequent value or new category (e.g. ’Others’) Handling Missing Values # Drop missing rows df = df[df.isnull() == False] # Drop missing columns df = df[df.columns[df.isnull().mean() == 0]] # Impute with 0 df = df.fillna(0) # Impute with median df = df.fillna(df.median()) # Impute with new categorical df = df.fillna('Others') # Impute with most frequent df['column_name'].fillna(df['column_name'].value_counts().id xmax(), inplace=True)
  • 32. FE 2 32 Outlier Detection Standard deviation vs percentile Outliers can be handled by: - Drop outliers - Cap outliers Handling Outliers #Dropping the outlier rows with standard deviation factor = 3 upper_lim = df['column'].mean () + df['column'].std () * factor lower_lim = df['column'].mean () - df['column'].std () * factor df = df[(df['column'] < upper_lim) & (df['column'] > lower_lim)] #Dropping the outlier rows with Percentiles upper_lim = df['column'].quantile(.95) lower_lim = df['column'].quantile(.05) df = df[(df['column'] < upper_lim) & (df['column'] > lower_lim)] #Capping the outlier rows with Percentiles upper_lim = df['column'].quantile(.95) lower_lim = df['column'].quantile(.05) df.loc[(df[column] > upper_lim),column] = upper_lim df.loc[(df[column] < lower_lim),column] = lower_lim
  • 33. FE 3 33 Binning make model more robust by sacrificing information to create more general (or regularized) categories. It prevents overfitting, but cost performance. Binning Rençberoğlu, Emre (2019). Fundamental Techniques of Feature Engineering for Machine Learning. https://towardsdatascience.com/feature- engineering-for-machine-learning-3a5e293a5114
  • 34. FE 4 34 One-hot encoding encodes categorical data into multi-columns binary numerical data. One-hot Encoding User ID Major 1 Biomedical Engineering 2 Electrical Engineering 3 Electrical Engineering User ID Biomedical Engineering Electrical Engineering 1 1 0 2 0 1 3 0 1
  • 35. FE 5 35 Rescales numerical data. Two most popular scaling methods are min-max normalization and standardization. Min-max normalization scales all values to a range between 0 and 1. Standardization scales all values to a new distribution with 0 mean and 1 standard deviation. Scaling # Min-max normalization df['normalized'] = (df['value'] - df['value'].min()) / (df['value'].max() - df['value'].min()) Min-max normalization # Standardization df['standardized'] = (df['value'] - df['value'].mean()) / df['value'].std() Standardization
  • 38. Regression Predicted value is a continuous numerical value. Performance measured by error. 38 Predicted value is a categorical data. Performance measured by accuracy. Classification Generally, there are two kinds of prediction https://www.javatpoint.com/regression-vs-classification-in-machine-learning
  • 39. Machine learning model development is an iterative process, with successive trial-and-error. We may end up need to try a bunch of different feature engineering methods, but we can make an educated guess for our first trial. ● First, we don’t need to process binary numerical data. ● Second, we know there are no outliers based on the histogram in linearity and distribution check. ● Third, there are several numerical value that is not normalized nor standardized. We may need to rescale these columns. ● Lastly, there are no missing values nor categorical values in the data. Choice of feature engineering is heavily dependent on which machine learning algorithm we’ll use. So, let’s jump to the last phase of this workshop: picking our machine learning model! 39 Which feature engineering methods suit our need?
  • 40. What is the function on the graph above? 40 Trivia 101
  • 41. Regression Prediction 41 Regression maps input to a continuous output variable. Main Idea: Given the regression function is hθ(x) = θ1x + θ0 , choose θ0 and θ1 so that hθ(x) is close to y of our training examples (x,y) Questions that can be answered by regression: ● How expensive is this house? ● How many tonnes of product will be delivered next month? Example of machine learning regression algorithms: ● Linear regression Interestingly, an ordinal classification problem can be framed as a regression problem (for example, 3 class with ordered severity can be seen as a regression). Src: Machine Learning Andrew Ng, Stanford Edu
  • 42. Classification Prediction 42 Classification maps input variables to probability of output classes. Classification may be binary or multi-class. Questions that can be answered by classification: ● What animal is this? ● What kind of disease is this? Example of machine learning regression algorithms: ● Logistic regression ● Naive Bayesian classification ● k-Nearest Neighbours ● Decision Tree ● Random Forest Interestingly, a classification algorithm can be used to solve regression problems by framing it as a multi-class classification problem with many classes! Src: Machine Learning Andrew Ng, Stanford Edu
  • 43. 43 from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score logit = LogisticRegression(random_state = 17) logit.fit(X_train, y_train) print(accuracy_score(logit.predict(X_test), y_test)) importance = logit.coef_ # summarize feature importance for x,v in zip(X_train.columns, importance[0]): print('Feature: {}, Score: {:.5f}'.format(x,v)) # plot feature importance plt.bar([x for x in range(len(importance[0]))], importance[0]) plt.show() Accuracy Feature importance
  • 44. So, how about our heart disease data? 44 It’s up to you! Just do some experiment to find the optimal model. For now, let’s try to frame it as a classification problem.
  • 45. Hands on! Open the Google Colaboratory 45
  • 46. [1] Patil, Prasad (2018). What is Exploratory Data Analysis? https://towardsdatascience.com/exploratory-data- analysis-8fc1cb20fd15 [2] Rençberoğlu, Emre (2019). Fundamental Techniques of Feature Engineering for Machine Learning. https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114 References 46
  • 47. ● Data Analyst Intern at Moving Walls (Apr - Jul 2021) ● Researcher Intern at NCIRI (Jul - Sep 2020) ● Backend Developer Intern at Bangunindo (Dec 2019-Jan 2020) Ramadhita Umitaibatin Teknik Biomedis ITB 2017 @ramadhitau Ramadhita Umitaibatin (LinkedIn) 47 Contributors
  • 48. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik. Thank you~ For further inquiries, please don’t hesitate to contact me :) 48