2. Wait, who am I?
Tak Kenal maka Tak Sayang,
Tak Sayang, maka Tak Tanya
Tak Tanya, maka Tak Tahu
-- Vincent Tatan --
3. Proprietary + Confidential
Meet Vincent
Safe Browsing Analyst (Machine Learning)
Google Trust & Safety
Medium : towardsdatascience.com/@vincentkernn
Linkedin : linkedin.com/in/vincenttatan/
Data Podcast: https://datacast.simplecast.com/
4. Proprietary + Confidential
Path to Google
Lazada Group
Data Scientist Intern
Dec 16 - Apr 17
B.Sc., Management Information
Systems and Services
Aug 13 - July 17
Visa
Data & Architecture
Engineer
Jun 17 - Aug 19
Google
Data Analyst, Machine Learning
Aug 19 - Present
7. Proprietary + Confidential
Trust and Safety
Protect more than 3 billion
devices worldwide
1. Google notifies your browsers
to prevent phishing and malware.
2. Using machine learning-based
detection, we contributed to
99.9% accuracy in spam
detection
3. So if you see this, beware!
13. Proprietary + Confidential
Unlabeled
Training Data
Labeled
Training Data
Unseen
Test Data
Unsupervised Learning : No labeled data. Finding patterns/insights
Supervised Learning: Most common learning scenarios
14. Proprietary + Confidential
Labeled
Training Data
Unlabeled
Training Data
Semi Supervised Learning : With labeled and unlabeled training data
Unseen Test
Data
}
Why? Training data might imply same distributions.
23. Proprietary + Confidential
Focus Work: The cycle of Data Project
● Generate Insights from Escalation
● Conduct EDA
● Create Prelim Un/Supervised Model
Policy Making
● Action in case of Phishing/SE Attacks
● Analyse Reports and Detect Causes
● Create Data Dashboard to understand impacts
Escalation
● Creating Deep Machine Learning Model
● Research and Analyse Effectiveness
● Deployment & Governance
Automation (ML, DNN)
26. Proprietary + Confidential
ML Pipeline
Data Collection +
Preprocessing
Model Training
and Evaluation
Machine Learning
Operations (MLOps)
27. Proprietary + Confidential
Data Collection
More data beats smarter
algorithms
1. But it is not practical
2. Data is expensive. Money and
time to collect labels
3. Big data might be overkill
28. Proprietary + Confidential
Model Training
Based on different use cases
1. Regression: n dim-Polynomial?
2. Classification: Decision tree,
logistic regression SVM
3. Each of the algorithm has multiple
characteristics:
a. Susceptible to outliers
b. Explainability
29. Proprietary + Confidential
Model Evaluation
Is it useful?
1. Regression: Root Mean Squared
Error (RMSE)
2. Classification: Confusion metrics,
AUC, Precision, Recall, F1
3. Complexity, explainability,
latency (time and space)
4. Eager/Lazy learners
30. Proprietary + Confidential
ML Ops
Operating real ML for real Use Case
1. Model Push
2. Model Validation
3. Monitoring/Anomaly Detection
33. Analytics / ML Trend
How Analytics enter/menyurupi our lives?
34. Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Sandiaga Uno and
Anies greatly
supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
35. Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Sandiaga Uno and
Anies greatly
supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
36. Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government
,Education and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was appointed
as the minister of
Education and Culture
Merdeka Hackathon and
Kampus Merdeka is held
Sandiaga Uno and Anies
greatly supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
37. Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Ahok
for Government and
Startup Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Sandiaga Uno and
Anies greatly
supports these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
38. Proprietary + Confidential
Analytics Development in Indonesia
4G Technology is
out in Indonesia
First time 4G is out in
Indonesia. High
smartphone adoption
with large digital
market.
2014
Large Growth
from now
“unicorn” Tech
Gojek Launches
Android and iOS apps
for 4 services:
transportation,
courier, and shopping
Traveloka becomes
the leading choice of
online flights and
hotel bookings.
2015
UI opens the first
big data curriculum
The first analytics
curriculum opened by
tertiary institutions
The expansion of
Jakarta Smart City
supported by Pak
Ahok for Government
and Startup
Collaboration
2016
Strong Political Support
of Indonesia Tech-
Education Movement
Nadiem was
appointed as the
minister of Education
Merdeka Hackathon
and Merdeka belajar
is held
Pak Sandiaga Uno
and Pak Anies greatly
support these
initatives.
2018-
2019
Corona Virus
Corona virus affects
livelihood while
promoting the
attractiveness of
data science / IT
sector
2020
(now)
Data is the new
Electricity
The unity of political
movements,
businesses, startups,
and capital labour to
boost data analytics
movement in
Indonesia
The coming of 5G in
Indonesia
2021/
next
Infrastructure &
Business Growth
Education &
Political Supports
Crisis & Innovation
Movement
39. How can you excel in ML?
I’m super excited! What’s next!
41. Proprietary + Confidential
Contribute more!
Data Science expands a lot, share your knowledge
1. Read more: Keep experimenting
with your learning styles
(kinesthetic, auditory, visual)
2. Write more: Write articles and share
them!
3. Speak more: Teach your fellow
peers or any conferences out there!
42. Proprietary + Confidential
Generalize and Specialize
1. Strength: Utilize your biggest
strengths
2. Communicate: Communicate your
strength and impacts more.
3. Learn in T: SQL & Python/R are the
breadth, then domain knowledge is
your depth
Read Deep Work
Follow it to the T
43. Proprietary + Confidential
Smile!
Data Science is Fun
1. Play: Tough, so have fun.
2. Hack: Use Saturdays to learn with
friends.
3. Celebrate impacts: Data science is
about building impacts. Start small
and celebrate!
Welcome the audience
Introduce yourself
Tell them broadly what you are going to talk about
Transition to video
5 real-world examples
4 Google products
Untuk materinya, tidak perlu terlalu dalam mas. Cukup overview saja. Karena ini intro to machine laerning dan pesertanya adalah pemula, jadi isi materi kurang lebih:
1. Apa itu machibe learning?
2. Kegunaanya
3. Jenis-jenis (supervised, unsupervised)
4. Algoritma dari supervised dan unsupervised
5. Contoh penerapanya dari setiap algoritma
6. Workflow / Alur pengerjaan project machine learning (contoh: data preprocessing, modelling, tunning, deployment, monitoring)
7. Library apa yang paling sering digunakan
8. Kemampuan dasar apa yang perlu dipersiapan
Untuk materinya, tidak perlu terlalu dalam mas. Cukup overview saja. Karena ini intro to machine laerning dan pesertanya adalah pemula, jadi isi materi kurang lebih:
1. Apa itu machibe learning?
2. Kegunaanya
3. Jenis-jenis (supervised, unsupervised)
4. Algoritma dari supervised dan unsupervised
5. Contoh penerapanya dari setiap algoritma
6. Workflow / Alur pengerjaan project machine learning (contoh: data preprocessing, modelling, tunning, deployment, monitoring)
7. Library apa yang paling sering digunakan
8. Kemampuan dasar apa yang perlu dipersiapan
ML has already made a huge impact in the world especially in the areas of science and health care. ML is impacting almost every industry from Manufacturing to sales and Marketing and from Agriculture to Astronomy.
For the simple basic codes that I am going to talk about is using this material from Google Colab
In case you don’t know what Google Colab is, it is an impressive tool where you can run your GPU for free using interactive notebooks environments.
So if you want to run your machine learnign model quickly using Tensorflow, Keras, and many more but you don’t want to invest a lot. Then you can come to this environment. It is easy.
If you are still unsure, then let me know. But for now, you can just know that we are using this training tutorial as our simple intro to CNN
In Agriculture: In dairy farming a cows health is vital to the survival business and Connecterra a company in the Netherlands wondered
if they can use Machine Learning to keep cows healthy by tracking behaviors and being able to provide insights to farmers and veterinarians on actions to be taken to ensure happy, healthy cows with higher yields.
So now, happy cows come not only from California but also from the Netherlands
Google Maps has created Street View-style visual guides for step-by-step directions overlaid onto the real world, as viewed through the smartphone camera. Further, Google plans to integrate its Assistant, equipped with the computer vision platform Google Lens, into Maps. That way, you’ll be able to pan over a city street and see pop-ups highlighting restaurants and other locations in real time.
Now you Google is offering offline downloads for its AI-powered translator. So if you don’t have unlimited data or you have a plan that doesn’t work internationally, you can now download neural machine translation from Google’s Android and iOS apps.
Google Translate’s offline AI translations will first be available in 59 languages, including English, Arabic, Chinese, German, and Hindi, to name a few. They’ll take about 35MB per language, so they won’t use up too much of your device’s storage. Lower-specced phones should also be able to support the new update, as Google says it wants users in all markets to have access to the feature.
5 real-world examples
4 Google products
5 real-world examples
4 Google products
P
P
5 real-world examples
4 Google products
5 real-world examples
4 Google products
P
P
P
P
P
Now that we are aware of all the resources let’s understand the framework for building ML models.
Now that we are aware of all the resources let’s understand the framework for building ML models.