SlideShare ist ein Scribd-Unternehmen logo
1 von 96
Downloaden Sie, um offline zu lesen
HOW TO BECOME A
“REAL” DATA SCIENTIST
(TH, 2017)
1+1=?
1+1=2
1+1=2
?
25÷5=?
25÷5=14
__
5/25
1_
5/25
1_
5/25
5
1_
5/25
5-
1_
5/25
5-
20
14
5/25
5-
20
14
5/25
5-
20
20
14
5/25
5-
20
20-
0
14
5×
14
5×
20
14
5×
20
5
14
5×
20
5+
25
BUMI ITU…
BUMI ITU…
BULAT VS DATAR
???
?
?
Why?
Why? Why?
Why? Why? Why?
SCIENTIST IS KAYPOH !!
Why?
Why? Why?
Why? Why? Why?
WIKIPEDIA:
A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.
WIKIPEDIA:
Data

A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.
WIKIPEDIA:
Data

A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
WIKIPEDIA:
Data

A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
Data: a set of values of qualitative or
quantitative variables.
WIKIPEDIA:
Data

A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
Data: a set of values of qualitative or
quantitative variables.
WIKIPEDIA:
Data

A scientist is a person engaging in a systematic
activity to acquire knowledge that describes and
predicts the natural world.→ Data
Data: a set of values of qualitative or
quantitative variables.
Formulation
MATHEMATICAL MODEL
(MODELER)
MATHEMATICAL RESULTS
(PROGRAMMER)
REAL WORLD (DATA ANALYST)
Interpretation
Mathematical
Analysis
DATA SCIENTIST
DATA SCIENTIST
Formulation
MATHEMATICAL MODEL
(MODELER)
MATHEMATICAL RESULTS
(PROGRAMMER)
REAL WORLD (DATA ANALYST)
Ex: Text Mining
Interpretation
Mathematical
Analysis
DATA SCIENTIST
Formulation
MATHEMATICAL MODEL
(MODELER)
Latent Dirichlet
Allocation
MATHEMATICAL RESULTS
(PROGRAMMER)
REAL WORLD (DATA ANALYST)
Ex: Text Mining
Interpretation
Mathematical
Analysis
DATA SCIENTIST
Formulation
MATHEMATICAL MODEL
(MODELER)
Latent Dirichlet
Allocation
MATHEMATICAL RESULTS
(PROGRAMMER)
Topic Model
REAL WORLD (DATA ANALYST)
Ex: Text Mining
Interpretation
Mathematical
Analysis
RESEARCH METHODOLOGY
PROBLEMS
RESEARCH METHODOLOGY
PROBLEMS
RESEARCH METHODOLOGY
PROBLEMS
TRIGGERS
RESEARCH METHODOLOGY
PROBLEMS
TRIGGERS
DATA
DATA SCIENTIST
RESEARCH METHODOLOGY
PROBLEMS
TRIGGERS
DATA
DATA SCIENTIST
RESEARCH METHODOLOGY
PROBLEMS
SOLVE
TRIGGERS
DATA
DATA SCIENTIST
RESEARCH METHODOLOGY
PROBLEMS
SOLVE
QUALITATIVE
METHOD
QUANTITATIVE
METHOD
TRIGGERS
DATA
DATA SCIENTIST
RESEARCH METHODOLOGY
PROBLEMS
SOLVE
QUALITATIVE
METHOD
QUANTITATIVE
METHOD
TRIGGERS
DATA
DATA SCIENTIST
√
×
APPROACHES
STATISTICS MACHINE LEARNING
APPROACHES
STATISTICS
-Population VS Sample
MACHINE LEARNING
-Training VS Testing
APPROACHES
STATISTICS
-Population VS Sample
-Confidence
MACHINE LEARNING
-Training VS Testing
-Accuracy
APPROACHES
STATISTICS
-Population VS Sample
-Confidence
MACHINE LEARNING
-Training VS Testing
-Accuracy
(SAMPLE)DATA≠(BIG)DATA
Variables
Variables
Measurable Latent
Variables
Measurable Latent
Categorical Numerical
Variables
Measurable Latent
Categorical Numerical
Likert
Thurstone
Semantic
Differential
Variables
Measurable Latent
Categorical Numerical
Nominal Ordinal Interval Ratio
Likert
Thurstone
Semantic
Differential
NOTES:
-Big data analytic needs CLEAR definition of
variables.
NOTES:
-Big data analytic needs CLEAR definition of
variables.
-Data cleansing is a MUST!!
NOTES:
-Big data analytic needs CLEAR definition of
variables.
-Data cleansing is a MUST!!
-Garbage in, Garbage out.
Now assume that you have a
cleansed big data set...
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
- Econometric
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
- Econometric
- AI
Now assume that you have a
cleansed big data set...
- Describe the data using visualization or other appropriate
measurements.
- Define the problem.
- Supervised VS Unsupervised
- Balanced VS Unbalanced
- Cross-section VS Time-Series VS Panel
- Prediction: Estimation VS Forecasting
- Improvement: Accuracy VS Insight
- Modeling.
- Expertise
- Econometric
- AI
- Hybrid
Data
Validation set
Training set
Test set
Train
classifier
Homogeneous
ensemble
algorithm
Individual
classification
algorithm
Apply
model
Classification
models
Apply
model
Test set
prediction
Train
classifier
Ensemble model
Validation set
predictions
Apply
model
Heterogeneous
ensemble
algorithm
Features
Selection
Clustering
Estimated
Value
STATISTICAL LEARNING FLOWCHART
PLIZ, OJO NGE-LIB!!
OPTIMAL INDIVIDUAL SALES
ALLOCATION & FORECASTING
ASTRA HONDA MOTOR
- METRA DIGITAL MEDIA -
Description Value
ROW_ID Row ID NUMERIC
MAIN_PARTNER Nomor referral ID dari Astra World (AWO) NUMERIC
FRAME_NO Nomor rangka motor yang dipunyai customer TEXT
CUST_ID Nomor ID customer yang didapat dari KTP/SIM TEXT
SALES_DATE Tangga sepeda motor honda dibeli DATE (YYYY-MM-DD HH:MM:SS)
KODE_MESIN Tiap tipe motor mempunyai kode mesin yang berbeda dengan tipe motor yang lain 75 NOMINAL {JF81E, ...}
SEQUENCE_MESIN Sequence dari kode mesin NUMERIC
VARIAN_MOTOR Varian motor yang dipunyai customer 76 NOMINAL {ALL NEW VARIO, …}
COLOR Warna motor yang dipunyai customer 73 NOMINAL {HITAM, …}
KODE_CUSTOMER Tipe customer {INDIVIDUAL, COLLECTIVE, GROUP, JOINT PROMO}
JENIS_KELAMIN Jenis kelamin customer {LAKI-LAKI, PEREMPUAN}
TANGGAL_LAHIR Bulan dan tahun lahir customer DATE (MM/YYYY)
KELURAHAN_SURAT Kelurahan surat menyurat customer 1251 NOMINAL {KETEWEL, …}
KECAMATAN_SURAT Kecamatan surat menyurat customer 120 NOMINAL {SUKAWATI, …}
KOTA_SURAT Kota surat menyurat customer 30 NOMINAL {KAB. GIANYAR, …}
KODE_POS Kode pos surat menyurat customer NUMERIC
PROPINSI Propinsi surat menyurat customer 8 NOMINAL {BALI, …}
STATUS_RUMAH Status rumah customer {RUMAH SENDIRI, RUMAH SEWA, RUMAH ORANG TUA/KELUARGA}
JENIS_PENJUALAN_STNK Jenis penjualan saat keluar faktur (bener-benar terjual) {CASH, CREDIT}
JENIS_PENJUALAN_SSU Jenis penjualan ini saat deal, bisa berubah saat transaksi {CASH, CREDIT}
NAMA_LEASING_COMPANY Nama leasing company yang menangani cicilan customer TEXT
BESAR_DP Besar DP yang diberikan customer TEXT
BESAR_CICILAN Besar cicilan per bulan NUMERIC
LAMA_CICILAN Lama cicilan sampai lunas (bulan) NUMERIC
AGAMA Agama customer {HINDU, KRISTEN, ISLAM, KATOLIK, LAIN-LAIN, BUDHA}
PEKERJAAN Pekerjaan customer 16 NOMINAL {PEGAWAI SWASTA, …}
PENGELUARAN Pengeluaran customer per bulan {1,2,3,4,5,6,7}
PENDIDIKAN Pendidikan terakhir customer {SLTA/SMU, AKADEMI/DIPLOMA, TIDAK TAMAT SD, SD, SLTP/SMP, SARJANA, PASCA SARJANA}
NO_HP Nomor handphone customer TEXT
STATUS_NOMOR_HP Tipe kartu handphone customer {PRABAYAR, PASCABAYAR}
NO_TLP Nomor telepon customer TEXT
KEBERSEDIAAN DIHUBUNGI Kebersediaan customer untuk dihubungi lagi di masa depan {YES, NO}
MERK_MOTOR_SBLMNYA Merk motor yang dipunyai customer sebelumnya {HONDA, YAMAHA, SUZUKI, BELUM PERNAH MEMILIKI, KAWASAKI, MOTOR LAIN}
TYPE_MOTOR_SBLMNYA Tipe motor yang dipunyai customer sebelumnya {AT AUTOMATIC, CUB BEBEK, SPORT, BELUM PERNAH MEMILIKI}
SMH_DIGUNAKAN_UNTUK Tujuan dibelinya sepeda motor {LAIN-LAIN, KEBUTUHAN KELUARGA, KE SEKOLAH/ KE KAMPUS, BERDAGANG, PEMAKAIAN JARAK DEKAT, REKREASI / OLAH RAGA, BEKERJA}
YG_MENGGUNAKAN_SMH Orang yang akan menggunakan sepeda motor yang dibeli {ANAK, LAIN-LAIN, PASANGAN SUAMI ATAU ISTRI, SAYA SENDIRI}
MD Kode Main Dealer yang membawahi dealer tempat customer membeli sepeda motor Honda {N01}
DEALER_CODE Kode dealer tempat customer membeli sepeda motor Honda 77 NOMINAL {06877, …}
KODE_SALES_PERSON Kode sales person yang menjual sepeda motor Honda ke customer 1718 NOMINAL {218595, …}
TGL_MASUK_DATA Tanggal masuk ke AHM dari MD DATE (YYYY-MM-DD HH:MM:SS)
STATUS_VALIDASI Validasi dari MD untuk menandakan apakah baris data CDB terkait sudah divalidasi kebenarannya atau belum {1,2}
UPLOADED_ON Tanggal masuk ke AWO dari AHM DATE (YYYY-MM-DD HH:MM:SS)
METHODOLOGY
AHM
METHODOLOGY
AHM
DEALER
DEALER
DEALER
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
FORECASTING
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
❷
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
❷
FORECASTING
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
METHODOLOGY
AHM
DEALER
DEALER
DEALER
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
METHODOLOGY
AHM
DEALER
DEALER
DEALER
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
METHODOLOGY
AHM
DEALER
DEALER
DEALER
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
❶
METHODOLOGY
AHM
DEALER
DEALER
DEALER
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
❶
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
❷
METHODOLOGY
AHM
DEALER
DEALER
DEALER
❶
ALLOCATION
FORECASTING
❷
FORECASTING
TOTAL
❷
DATA PREPARATION
NO VARIABEL NO VARIABEL
1ROW_ID 22BESAR_DP
2MAIN_PARTNER 23BESAR_CICILAN
3FRAME_NO 24LAMA_CICILAN
4CUST_ID 25AGAMA
5SALES_DATE 26PEKERJAAN
6KODE_MESIN 27PENGELUARAN
7SEQUENCE_MESIN 28PENDIDIKAN
8VARIAN_MOTOR 29NO_HP
9COLOR 30STATUS_NOMOR_HP
10KODE_CUSTOMER 31NO_TLP
11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI
12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA
13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA
14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK
15KOTA_SURAT 36YG_MENGGUNAKAN_SMH
16KODE_POS 37MD
17PROPINSI 38DEALER_CODE
18STATUS_RUMAH 39KODE_SALES_PERSON
19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA
20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI
21NAMA_LEASING_COMPANY 42UPLOADED_ON
DATA PREPARATION
NO VARIABEL NO VARIABEL
1ROW_ID 22BESAR_DP
2MAIN_PARTNER 23BESAR_CICILAN
3FRAME_NO 24LAMA_CICILAN
4CUST_ID 25AGAMA
5SALES_DATE 26PEKERJAAN
6KODE_MESIN 27PENGELUARAN
7SEQUENCE_MESIN 28PENDIDIKAN
8VARIAN_MOTOR 29NO_HP
9COLOR 30STATUS_NOMOR_HP
10KODE_CUSTOMER 31NO_TLP
11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI
12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA
13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA
14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK
15KOTA_SURAT 36YG_MENGGUNAKAN_SMH
16KODE_POS 37MD
17PROPINSI 38DEALER_CODE
18STATUS_RUMAH 39KODE_SALES_PERSON
19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA
20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI
21NAMA_LEASING_COMPANY 42UPLOADED_ON
NO VARIABEL
1SALES_DATE
2JENIS_PENJUALAN_STNK
3KODE_CUSTOMER
4BESAR_DP
5BESAR_CICILAN
6LAMA_CICILAN
7DEALER_CODE
DATA PREPARATION
NO VARIABEL NO VARIABEL
1ROW_ID 22BESAR_DP
2MAIN_PARTNER 23BESAR_CICILAN
3FRAME_NO 24LAMA_CICILAN
4CUST_ID 25AGAMA
5SALES_DATE 26PEKERJAAN
6KODE_MESIN 27PENGELUARAN
7SEQUENCE_MESIN 28PENDIDIKAN
8VARIAN_MOTOR 29NO_HP
9COLOR 30STATUS_NOMOR_HP
10KODE_CUSTOMER 31NO_TLP
11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI
12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA
13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA
14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK
15KOTA_SURAT 36YG_MENGGUNAKAN_SMH
16KODE_POS 37MD
17PROPINSI 38DEALER_CODE
18STATUS_RUMAH 39KODE_SALES_PERSON
19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA
20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI
21NAMA_LEASING_COMPANY 42UPLOADED_ON
NO VARIABEL
1SALES_DATE
2JENIS_PENJUALAN_STNK
3KODE_CUSTOMER
4BESAR_DP
5BESAR_CICILAN
6LAMA_CICILAN
7DEALER_CODE
DATA PREPARATION
NO VARIABEL NO VARIABEL
1ROW_ID 22BESAR_DP
2MAIN_PARTNER 23BESAR_CICILAN
3FRAME_NO 24LAMA_CICILAN
4CUST_ID 25AGAMA
5SALES_DATE 26PEKERJAAN
6KODE_MESIN 27PENGELUARAN
7SEQUENCE_MESIN 28PENDIDIKAN
8VARIAN_MOTOR 29NO_HP
9COLOR 30STATUS_NOMOR_HP
10KODE_CUSTOMER 31NO_TLP
11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI
12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA
13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA
14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK
15KOTA_SURAT 36YG_MENGGUNAKAN_SMH
16KODE_POS 37MD
17PROPINSI 38DEALER_CODE
18STATUS_RUMAH 39KODE_SALES_PERSON
19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA
20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI
21NAMA_LEASING_COMPANY 42UPLOADED_ON
NO VARIABEL
1SALES_DATE
2JENIS_PENJUALAN_STNK
3KODE_CUSTOMER
4HARGA_MOTOR
5DEALER_CODE
NO VARIABEL
1SALES_DATE
2JENIS_PENJUALAN_STNK
3KODE_CUSTOMER
4BESAR_DP
5BESAR_CICILAN
6LAMA_CICILAN
7DEALER_CODE
OPTIMAL INDIVIDUAL SALES FORECASTING
OPTIMAL INDIVIDUAL SALES ALLOCATION
Rp-
Rp200,000,000
Rp400,000,000
Rp600,000,000
Rp800,000,000
Rp1,000,000,000
Rp1,200,000,000
Rp1,400,000,000
Rp1,600,000,000
EXISTING VS PROPOSED ALLOCATION METHODOLOGY
Rp-
Rp500,000,000
Rp1,000,000,000
Rp1,500,000,000
Rp2,000,000,000
Rp2,500,000,000
Rp3,000,000,000
Rp3,500,000,000
3749
12642
9701
432
8692
12628
987
637
10244
11662
10090
9669
5563
7525
2564
8693
11840
793
4010
12993
9219
8122
5920
794
2521
9222
11646
5553
1847
5772
9221
11422
166
6855
7715
7803
6330
9220
10290
5780
2546
5267
11844
1646
13701
9223
13718
6877
2519
1905
10291
1904
12421
10098
12177
12646
986
3426
810
811
984
9930
15934
4
7440
812
Proposed Existing
Description Value
ROW_ID Row ID NUMERIC
MAIN_PARTNER Nomor referral ID dari Astra World (AWO) NUMERIC
FRAME_NO Nomor rangka motor yang dipunyai customer TEXT
CUST_ID Nomor ID customer yang didapat dari KTP/SIM TEXT
SALES_DATE Tangga sepeda motor honda dibeli DATE (YYYY-MM-DD HH:MM:SS)
KODE_MESIN Tiap tipe motor mempunyai kode mesin yang berbeda dengan tipe motor yang lain 75 NOMINAL {JF81E, ...}
SEQUENCE_MESIN Sequence dari kode mesin NUMERIC
VARIAN_MOTOR Varian motor yang dipunyai customer 76 NOMINAL {ALL NEW VARIO, …}
COLOR Warna motor yang dipunyai customer 73 NOMINAL {HITAM, …}
KODE_CUSTOMER Tipe customer {INDIVIDUAL, COLLECTIVE, GROUP, JOINT PROMO}
JENIS_KELAMIN Jenis kelamin customer {LAKI-LAKI, PEREMPUAN}
TANGGAL_LAHIR Bulan dan tahun lahir customer DATE (MM/YYYY)
KELURAHAN_SURAT Kelurahan surat menyurat customer 1251 NOMINAL {KETEWEL, …}
KECAMATAN_SURAT Kecamatan surat menyurat customer 120 NOMINAL {SUKAWATI, …}
KOTA_SURAT Kota surat menyurat customer 30 NOMINAL {KAB. GIANYAR, …}
KODE_POS Kode pos surat menyurat customer NUMERIC
PROPINSI Propinsi surat menyurat customer 8 NOMINAL {BALI, …}
STATUS_RUMAH Status rumah customer {RUMAH SENDIRI, RUMAH SEWA, RUMAH ORANG TUA/KELUARGA}
JENIS_PENJUALAN_STNK Jenis penjualan saat keluar faktur (bener-benar terjual) {CASH, CREDIT}
JENIS_PENJUALAN_SSU Jenis penjualan ini saat deal, bisa berubah saat transaksi {CASH, CREDIT}
NAMA_LEASING_COMPANY Nama leasing company yang menangani cicilan customer TEXT
BESAR_DP Besar DP yang diberikan customer TEXT
BESAR_CICILAN Besar cicilan per bulan NUMERIC
LAMA_CICILAN Lama cicilan sampai lunas (bulan) NUMERIC
AGAMA Agama customer {HINDU, KRISTEN, ISLAM, KATOLIK, LAIN-LAIN, BUDHA}
PEKERJAAN Pekerjaan customer 16 NOMINAL {PEGAWAI SWASTA, …}
PENGELUARAN Pengeluaran customer per bulan {1,2,3,4,5,6,7}
PENDIDIKAN Pendidikan terakhir customer {SLTA/SMU, AKADEMI/DIPLOMA, TIDAK TAMAT SD, SD, SLTP/SMP, SARJANA, PASCA SARJANA}
NO_HP Nomor handphone customer TEXT
STATUS_NOMOR_HP Tipe kartu handphone customer {PRABAYAR, PASCABAYAR}
NO_TLP Nomor telepon customer TEXT
KEBERSEDIAAN DIHUBUNGI Kebersediaan customer untuk dihubungi lagi di masa depan {YES, NO}
MERK_MOTOR_SBLMNYA Merk motor yang dipunyai customer sebelumnya {HONDA, YAMAHA, SUZUKI, BELUM PERNAH MEMILIKI, KAWASAKI, MOTOR LAIN}
TYPE_MOTOR_SBLMNYA Tipe motor yang dipunyai customer sebelumnya {AT AUTOMATIC, CUB BEBEK, SPORT, BELUM PERNAH MEMILIKI}
SMH_DIGUNAKAN_UNTUK Tujuan dibelinya sepeda motor {LAIN-LAIN, KEBUTUHAN KELUARGA, KE SEKOLAH/ KE KAMPUS, BERDAGANG, PEMAKAIAN JARAK DEKAT, REKREASI / OLAH RAGA, BEKERJA}
YG_MENGGUNAKAN_SMH Orang yang akan menggunakan sepeda motor yang dibeli {ANAK, LAIN-LAIN, PASANGAN SUAMI ATAU ISTRI, SAYA SENDIRI}
MD Kode Main Dealer yang membawahi dealer tempat customer membeli sepeda motor Honda {N01}
DEALER_CODE Kode dealer tempat customer membeli sepeda motor Honda 77 NOMINAL {06877, …}
KODE_SALES_PERSON Kode sales person yang menjual sepeda motor Honda ke customer 1718 NOMINAL {218595, …}
TGL_MASUK_DATA Tanggal masuk ke AHM dari MD DATE (YYYY-MM-DD HH:MM:SS)
STATUS_VALIDASI Validasi dari MD untuk menandakan apakah baris data CDB terkait sudah divalidasi kebenarannya atau belum {1,2}
UPLOADED_ON Tanggal masuk ke AWO dari AHM DATE (YYYY-MM-DD HH:MM:SS)

Weitere ähnliche Inhalte

Ähnlich wie DATA SCIENTIST 1.pdf

Data visualization tools & techniques - 1
Data visualization tools & techniques - 1Data visualization tools & techniques - 1
Data visualization tools & techniques - 1Korivi Sravan Kumar
 
Digging into the Dirichlet Distribution by Max Sklar
Digging into the Dirichlet Distribution by Max SklarDigging into the Dirichlet Distribution by Max Sklar
Digging into the Dirichlet Distribution by Max SklarHakka Labs
 
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...DataScienceConferenc1
 
Chart Essentials
Chart EssentialsChart Essentials
Chart EssentialsBrandwatch
 
L8 scientific visualization of data
L8 scientific visualization of dataL8 scientific visualization of data
L8 scientific visualization of dataSeppo Karrila
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?ProCogia
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...InfoTrust LLC
 
IBM Watson Analytics
IBM Watson AnalyticsIBM Watson Analytics
IBM Watson AnalyticsLuca Rago
 
Spocto a new paradigm
Spocto a new paradigmSpocto a new paradigm
Spocto a new paradigmspocto
 
Essential Data Science for Product Designers and Non-Scientists
Essential Data Science for Product Designers and Non-ScientistsEssential Data Science for Product Designers and Non-Scientists
Essential Data Science for Product Designers and Non-ScientistsJames Christopher
 
Financial Forecasting For WordPress Businesses
Financial Forecasting For WordPress BusinessesFinancial Forecasting For WordPress Businesses
Financial Forecasting For WordPress BusinessesCaldera Labs
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture Wake Tech BAS
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
 
Intuitions and Formulations for Data Science Problems
Intuitions and Formulations for Data Science ProblemsIntuitions and Formulations for Data Science Problems
Intuitions and Formulations for Data Science ProblemsMusfir Mohammed
 
Visualize data using the split-apply-combine approach
Visualize data using the split-apply-combine approachVisualize data using the split-apply-combine approach
Visualize data using the split-apply-combine approachLuca Candela
 
ASMD 2022 for class.pptx
ASMD 2022 for class.pptxASMD 2022 for class.pptx
ASMD 2022 for class.pptxMahekSinghania2
 
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docx
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docxBig Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docx
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docxtangyechloe
 

Ähnlich wie DATA SCIENTIST 1.pdf (20)

Data visualization tools & techniques - 1
Data visualization tools & techniques - 1Data visualization tools & techniques - 1
Data visualization tools & techniques - 1
 
Digging into the Dirichlet Distribution by Max Sklar
Digging into the Dirichlet Distribution by Max SklarDigging into the Dirichlet Distribution by Max Sklar
Digging into the Dirichlet Distribution by Max Sklar
 
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...
[DSC Europe 22] Why is a good data scientist a package of professions? - Neza...
 
Chart Essentials
Chart EssentialsChart Essentials
Chart Essentials
 
L8 scientific visualization of data
L8 scientific visualization of dataL8 scientific visualization of data
L8 scientific visualization of data
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
17 02-20 improving the counting method to fill the gender gap (bis). (copie)
17 02-20 improving the counting method to fill the gender gap (bis). (copie)17 02-20 improving the counting method to fill the gender gap (bis). (copie)
17 02-20 improving the counting method to fill the gender gap (bis). (copie)
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
 
Notes part iii
Notes   part iiiNotes   part iii
Notes part iii
 
IBM Watson Analytics
IBM Watson AnalyticsIBM Watson Analytics
IBM Watson Analytics
 
Spocto a new paradigm
Spocto a new paradigmSpocto a new paradigm
Spocto a new paradigm
 
Essential Data Science for Product Designers and Non-Scientists
Essential Data Science for Product Designers and Non-ScientistsEssential Data Science for Product Designers and Non-Scientists
Essential Data Science for Product Designers and Non-Scientists
 
Financial Forecasting For WordPress Businesses
Financial Forecasting For WordPress BusinessesFinancial Forecasting For WordPress Businesses
Financial Forecasting For WordPress Businesses
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
Intuitions and Formulations for Data Science Problems
Intuitions and Formulations for Data Science ProblemsIntuitions and Formulations for Data Science Problems
Intuitions and Formulations for Data Science Problems
 
Visualize data using the split-apply-combine approach
Visualize data using the split-apply-combine approachVisualize data using the split-apply-combine approach
Visualize data using the split-apply-combine approach
 
Lab4 2022
Lab4 2022Lab4 2022
Lab4 2022
 
ASMD 2022 for class.pptx
ASMD 2022 for class.pptxASMD 2022 for class.pptx
ASMD 2022 for class.pptx
 
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docx
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docxBig Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docx
Big Data Analytics Tools..DS_Store__MACOSXBig Data Analyti.docx
 

Kürzlich hochgeladen

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Kürzlich hochgeladen (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

DATA SCIENTIST 1.pdf

  • 1. HOW TO BECOME A “REAL” DATA SCIENTIST (TH, 2017)
  • 20. BUMI ITU… BULAT VS DATAR ???
  • 21.
  • 22.
  • 23. ?
  • 24. ?
  • 26. SCIENTIST IS KAYPOH !! Why? Why? Why? Why? Why? Why?
  • 27. WIKIPEDIA: A scientist is a person engaging in a systematic activity to acquire knowledge that describes and predicts the natural world.
  • 28. WIKIPEDIA: Data  A scientist is a person engaging in a systematic activity to acquire knowledge that describes and predicts the natural world.
  • 29. WIKIPEDIA: Data  A scientist is a person engaging in a systematic activity to acquire knowledge that describes and predicts the natural world.→ Data
  • 30. WIKIPEDIA: Data  A scientist is a person engaging in a systematic activity to acquire knowledge that describes and predicts the natural world.→ Data Data: a set of values of qualitative or quantitative variables.
  • 31. WIKIPEDIA: Data  A scientist is a person engaging in a systematic activity to acquire knowledge that describes and predicts the natural world.→ Data Data: a set of values of qualitative or quantitative variables.
  • 32. WIKIPEDIA: Data  A scientist is a person engaging in a systematic activity to acquire knowledge that describes and predicts the natural world.→ Data Data: a set of values of qualitative or quantitative variables.
  • 33. Formulation MATHEMATICAL MODEL (MODELER) MATHEMATICAL RESULTS (PROGRAMMER) REAL WORLD (DATA ANALYST) Interpretation Mathematical Analysis DATA SCIENTIST
  • 34. DATA SCIENTIST Formulation MATHEMATICAL MODEL (MODELER) MATHEMATICAL RESULTS (PROGRAMMER) REAL WORLD (DATA ANALYST) Ex: Text Mining Interpretation Mathematical Analysis
  • 35. DATA SCIENTIST Formulation MATHEMATICAL MODEL (MODELER) Latent Dirichlet Allocation MATHEMATICAL RESULTS (PROGRAMMER) REAL WORLD (DATA ANALYST) Ex: Text Mining Interpretation Mathematical Analysis
  • 36. DATA SCIENTIST Formulation MATHEMATICAL MODEL (MODELER) Latent Dirichlet Allocation MATHEMATICAL RESULTS (PROGRAMMER) Topic Model REAL WORLD (DATA ANALYST) Ex: Text Mining Interpretation Mathematical Analysis
  • 41.
  • 48. APPROACHES STATISTICS -Population VS Sample -Confidence MACHINE LEARNING -Training VS Testing -Accuracy
  • 49. APPROACHES STATISTICS -Population VS Sample -Confidence MACHINE LEARNING -Training VS Testing -Accuracy (SAMPLE)DATA≠(BIG)DATA
  • 54. Variables Measurable Latent Categorical Numerical Nominal Ordinal Interval Ratio Likert Thurstone Semantic Differential
  • 55. NOTES: -Big data analytic needs CLEAR definition of variables.
  • 56. NOTES: -Big data analytic needs CLEAR definition of variables. -Data cleansing is a MUST!!
  • 57. NOTES: -Big data analytic needs CLEAR definition of variables. -Data cleansing is a MUST!! -Garbage in, Garbage out.
  • 58. Now assume that you have a cleansed big data set...
  • 59. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements.
  • 60. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem.
  • 61. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised
  • 62. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced
  • 63. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel
  • 64. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting
  • 65. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting - Improvement: Accuracy VS Insight
  • 66. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting - Improvement: Accuracy VS Insight - Modeling.
  • 67. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting - Improvement: Accuracy VS Insight - Modeling. - Expertise
  • 68. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting - Improvement: Accuracy VS Insight - Modeling. - Expertise - Econometric
  • 69. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting - Improvement: Accuracy VS Insight - Modeling. - Expertise - Econometric - AI
  • 70. Now assume that you have a cleansed big data set... - Describe the data using visualization or other appropriate measurements. - Define the problem. - Supervised VS Unsupervised - Balanced VS Unbalanced - Cross-section VS Time-Series VS Panel - Prediction: Estimation VS Forecasting - Improvement: Accuracy VS Insight - Modeling. - Expertise - Econometric - AI - Hybrid
  • 71. Data Validation set Training set Test set Train classifier Homogeneous ensemble algorithm Individual classification algorithm Apply model Classification models Apply model Test set prediction Train classifier Ensemble model Validation set predictions Apply model Heterogeneous ensemble algorithm Features Selection Clustering Estimated Value STATISTICAL LEARNING FLOWCHART PLIZ, OJO NGE-LIB!!
  • 72. OPTIMAL INDIVIDUAL SALES ALLOCATION & FORECASTING ASTRA HONDA MOTOR - METRA DIGITAL MEDIA -
  • 73. Description Value ROW_ID Row ID NUMERIC MAIN_PARTNER Nomor referral ID dari Astra World (AWO) NUMERIC FRAME_NO Nomor rangka motor yang dipunyai customer TEXT CUST_ID Nomor ID customer yang didapat dari KTP/SIM TEXT SALES_DATE Tangga sepeda motor honda dibeli DATE (YYYY-MM-DD HH:MM:SS) KODE_MESIN Tiap tipe motor mempunyai kode mesin yang berbeda dengan tipe motor yang lain 75 NOMINAL {JF81E, ...} SEQUENCE_MESIN Sequence dari kode mesin NUMERIC VARIAN_MOTOR Varian motor yang dipunyai customer 76 NOMINAL {ALL NEW VARIO, …} COLOR Warna motor yang dipunyai customer 73 NOMINAL {HITAM, …} KODE_CUSTOMER Tipe customer {INDIVIDUAL, COLLECTIVE, GROUP, JOINT PROMO} JENIS_KELAMIN Jenis kelamin customer {LAKI-LAKI, PEREMPUAN} TANGGAL_LAHIR Bulan dan tahun lahir customer DATE (MM/YYYY) KELURAHAN_SURAT Kelurahan surat menyurat customer 1251 NOMINAL {KETEWEL, …} KECAMATAN_SURAT Kecamatan surat menyurat customer 120 NOMINAL {SUKAWATI, …} KOTA_SURAT Kota surat menyurat customer 30 NOMINAL {KAB. GIANYAR, …} KODE_POS Kode pos surat menyurat customer NUMERIC PROPINSI Propinsi surat menyurat customer 8 NOMINAL {BALI, …} STATUS_RUMAH Status rumah customer {RUMAH SENDIRI, RUMAH SEWA, RUMAH ORANG TUA/KELUARGA} JENIS_PENJUALAN_STNK Jenis penjualan saat keluar faktur (bener-benar terjual) {CASH, CREDIT} JENIS_PENJUALAN_SSU Jenis penjualan ini saat deal, bisa berubah saat transaksi {CASH, CREDIT} NAMA_LEASING_COMPANY Nama leasing company yang menangani cicilan customer TEXT BESAR_DP Besar DP yang diberikan customer TEXT BESAR_CICILAN Besar cicilan per bulan NUMERIC LAMA_CICILAN Lama cicilan sampai lunas (bulan) NUMERIC AGAMA Agama customer {HINDU, KRISTEN, ISLAM, KATOLIK, LAIN-LAIN, BUDHA} PEKERJAAN Pekerjaan customer 16 NOMINAL {PEGAWAI SWASTA, …} PENGELUARAN Pengeluaran customer per bulan {1,2,3,4,5,6,7} PENDIDIKAN Pendidikan terakhir customer {SLTA/SMU, AKADEMI/DIPLOMA, TIDAK TAMAT SD, SD, SLTP/SMP, SARJANA, PASCA SARJANA} NO_HP Nomor handphone customer TEXT STATUS_NOMOR_HP Tipe kartu handphone customer {PRABAYAR, PASCABAYAR} NO_TLP Nomor telepon customer TEXT KEBERSEDIAAN DIHUBUNGI Kebersediaan customer untuk dihubungi lagi di masa depan {YES, NO} MERK_MOTOR_SBLMNYA Merk motor yang dipunyai customer sebelumnya {HONDA, YAMAHA, SUZUKI, BELUM PERNAH MEMILIKI, KAWASAKI, MOTOR LAIN} TYPE_MOTOR_SBLMNYA Tipe motor yang dipunyai customer sebelumnya {AT AUTOMATIC, CUB BEBEK, SPORT, BELUM PERNAH MEMILIKI} SMH_DIGUNAKAN_UNTUK Tujuan dibelinya sepeda motor {LAIN-LAIN, KEBUTUHAN KELUARGA, KE SEKOLAH/ KE KAMPUS, BERDAGANG, PEMAKAIAN JARAK DEKAT, REKREASI / OLAH RAGA, BEKERJA} YG_MENGGUNAKAN_SMH Orang yang akan menggunakan sepeda motor yang dibeli {ANAK, LAIN-LAIN, PASANGAN SUAMI ATAU ISTRI, SAYA SENDIRI} MD Kode Main Dealer yang membawahi dealer tempat customer membeli sepeda motor Honda {N01} DEALER_CODE Kode dealer tempat customer membeli sepeda motor Honda 77 NOMINAL {06877, …} KODE_SALES_PERSON Kode sales person yang menjual sepeda motor Honda ke customer 1718 NOMINAL {218595, …} TGL_MASUK_DATA Tanggal masuk ke AHM dari MD DATE (YYYY-MM-DD HH:MM:SS) STATUS_VALIDASI Validasi dari MD untuk menandakan apakah baris data CDB terkait sudah divalidasi kebenarannya atau belum {1,2} UPLOADED_ON Tanggal masuk ke AWO dari AHM DATE (YYYY-MM-DD HH:MM:SS)
  • 89. DATA PREPARATION NO VARIABEL NO VARIABEL 1ROW_ID 22BESAR_DP 2MAIN_PARTNER 23BESAR_CICILAN 3FRAME_NO 24LAMA_CICILAN 4CUST_ID 25AGAMA 5SALES_DATE 26PEKERJAAN 6KODE_MESIN 27PENGELUARAN 7SEQUENCE_MESIN 28PENDIDIKAN 8VARIAN_MOTOR 29NO_HP 9COLOR 30STATUS_NOMOR_HP 10KODE_CUSTOMER 31NO_TLP 11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI 12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA 13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA 14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK 15KOTA_SURAT 36YG_MENGGUNAKAN_SMH 16KODE_POS 37MD 17PROPINSI 38DEALER_CODE 18STATUS_RUMAH 39KODE_SALES_PERSON 19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA 20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI 21NAMA_LEASING_COMPANY 42UPLOADED_ON
  • 90. DATA PREPARATION NO VARIABEL NO VARIABEL 1ROW_ID 22BESAR_DP 2MAIN_PARTNER 23BESAR_CICILAN 3FRAME_NO 24LAMA_CICILAN 4CUST_ID 25AGAMA 5SALES_DATE 26PEKERJAAN 6KODE_MESIN 27PENGELUARAN 7SEQUENCE_MESIN 28PENDIDIKAN 8VARIAN_MOTOR 29NO_HP 9COLOR 30STATUS_NOMOR_HP 10KODE_CUSTOMER 31NO_TLP 11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI 12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA 13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA 14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK 15KOTA_SURAT 36YG_MENGGUNAKAN_SMH 16KODE_POS 37MD 17PROPINSI 38DEALER_CODE 18STATUS_RUMAH 39KODE_SALES_PERSON 19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA 20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI 21NAMA_LEASING_COMPANY 42UPLOADED_ON NO VARIABEL 1SALES_DATE 2JENIS_PENJUALAN_STNK 3KODE_CUSTOMER 4BESAR_DP 5BESAR_CICILAN 6LAMA_CICILAN 7DEALER_CODE
  • 91. DATA PREPARATION NO VARIABEL NO VARIABEL 1ROW_ID 22BESAR_DP 2MAIN_PARTNER 23BESAR_CICILAN 3FRAME_NO 24LAMA_CICILAN 4CUST_ID 25AGAMA 5SALES_DATE 26PEKERJAAN 6KODE_MESIN 27PENGELUARAN 7SEQUENCE_MESIN 28PENDIDIKAN 8VARIAN_MOTOR 29NO_HP 9COLOR 30STATUS_NOMOR_HP 10KODE_CUSTOMER 31NO_TLP 11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI 12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA 13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA 14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK 15KOTA_SURAT 36YG_MENGGUNAKAN_SMH 16KODE_POS 37MD 17PROPINSI 38DEALER_CODE 18STATUS_RUMAH 39KODE_SALES_PERSON 19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA 20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI 21NAMA_LEASING_COMPANY 42UPLOADED_ON NO VARIABEL 1SALES_DATE 2JENIS_PENJUALAN_STNK 3KODE_CUSTOMER 4BESAR_DP 5BESAR_CICILAN 6LAMA_CICILAN 7DEALER_CODE
  • 92. DATA PREPARATION NO VARIABEL NO VARIABEL 1ROW_ID 22BESAR_DP 2MAIN_PARTNER 23BESAR_CICILAN 3FRAME_NO 24LAMA_CICILAN 4CUST_ID 25AGAMA 5SALES_DATE 26PEKERJAAN 6KODE_MESIN 27PENGELUARAN 7SEQUENCE_MESIN 28PENDIDIKAN 8VARIAN_MOTOR 29NO_HP 9COLOR 30STATUS_NOMOR_HP 10KODE_CUSTOMER 31NO_TLP 11JENIS_KELAMIN 32KEBERSEDIAAN DIHUBUNGI 12TANGGAL_LAHIR 33MERK_MOTOR_SBLMNYA 13KELURAHAN_SURAT 34TYPE_MOTOR_SBLMNYA 14KECAMATAN_SURAT 35SMH_DIGUNAKAN_UNTUK 15KOTA_SURAT 36YG_MENGGUNAKAN_SMH 16KODE_POS 37MD 17PROPINSI 38DEALER_CODE 18STATUS_RUMAH 39KODE_SALES_PERSON 19JENIS_PENJUALAN_STNK 40TGL_MASUK_DATA 20JENIS_PENJUALAN_SSU 41STATUS_VALIDASI 21NAMA_LEASING_COMPANY 42UPLOADED_ON NO VARIABEL 1SALES_DATE 2JENIS_PENJUALAN_STNK 3KODE_CUSTOMER 4HARGA_MOTOR 5DEALER_CODE NO VARIABEL 1SALES_DATE 2JENIS_PENJUALAN_STNK 3KODE_CUSTOMER 4BESAR_DP 5BESAR_CICILAN 6LAMA_CICILAN 7DEALER_CODE
  • 94. OPTIMAL INDIVIDUAL SALES ALLOCATION Rp- Rp200,000,000 Rp400,000,000 Rp600,000,000 Rp800,000,000 Rp1,000,000,000 Rp1,200,000,000 Rp1,400,000,000 Rp1,600,000,000
  • 95. EXISTING VS PROPOSED ALLOCATION METHODOLOGY Rp- Rp500,000,000 Rp1,000,000,000 Rp1,500,000,000 Rp2,000,000,000 Rp2,500,000,000 Rp3,000,000,000 Rp3,500,000,000 3749 12642 9701 432 8692 12628 987 637 10244 11662 10090 9669 5563 7525 2564 8693 11840 793 4010 12993 9219 8122 5920 794 2521 9222 11646 5553 1847 5772 9221 11422 166 6855 7715 7803 6330 9220 10290 5780 2546 5267 11844 1646 13701 9223 13718 6877 2519 1905 10291 1904 12421 10098 12177 12646 986 3426 810 811 984 9930 15934 4 7440 812 Proposed Existing
  • 96. Description Value ROW_ID Row ID NUMERIC MAIN_PARTNER Nomor referral ID dari Astra World (AWO) NUMERIC FRAME_NO Nomor rangka motor yang dipunyai customer TEXT CUST_ID Nomor ID customer yang didapat dari KTP/SIM TEXT SALES_DATE Tangga sepeda motor honda dibeli DATE (YYYY-MM-DD HH:MM:SS) KODE_MESIN Tiap tipe motor mempunyai kode mesin yang berbeda dengan tipe motor yang lain 75 NOMINAL {JF81E, ...} SEQUENCE_MESIN Sequence dari kode mesin NUMERIC VARIAN_MOTOR Varian motor yang dipunyai customer 76 NOMINAL {ALL NEW VARIO, …} COLOR Warna motor yang dipunyai customer 73 NOMINAL {HITAM, …} KODE_CUSTOMER Tipe customer {INDIVIDUAL, COLLECTIVE, GROUP, JOINT PROMO} JENIS_KELAMIN Jenis kelamin customer {LAKI-LAKI, PEREMPUAN} TANGGAL_LAHIR Bulan dan tahun lahir customer DATE (MM/YYYY) KELURAHAN_SURAT Kelurahan surat menyurat customer 1251 NOMINAL {KETEWEL, …} KECAMATAN_SURAT Kecamatan surat menyurat customer 120 NOMINAL {SUKAWATI, …} KOTA_SURAT Kota surat menyurat customer 30 NOMINAL {KAB. GIANYAR, …} KODE_POS Kode pos surat menyurat customer NUMERIC PROPINSI Propinsi surat menyurat customer 8 NOMINAL {BALI, …} STATUS_RUMAH Status rumah customer {RUMAH SENDIRI, RUMAH SEWA, RUMAH ORANG TUA/KELUARGA} JENIS_PENJUALAN_STNK Jenis penjualan saat keluar faktur (bener-benar terjual) {CASH, CREDIT} JENIS_PENJUALAN_SSU Jenis penjualan ini saat deal, bisa berubah saat transaksi {CASH, CREDIT} NAMA_LEASING_COMPANY Nama leasing company yang menangani cicilan customer TEXT BESAR_DP Besar DP yang diberikan customer TEXT BESAR_CICILAN Besar cicilan per bulan NUMERIC LAMA_CICILAN Lama cicilan sampai lunas (bulan) NUMERIC AGAMA Agama customer {HINDU, KRISTEN, ISLAM, KATOLIK, LAIN-LAIN, BUDHA} PEKERJAAN Pekerjaan customer 16 NOMINAL {PEGAWAI SWASTA, …} PENGELUARAN Pengeluaran customer per bulan {1,2,3,4,5,6,7} PENDIDIKAN Pendidikan terakhir customer {SLTA/SMU, AKADEMI/DIPLOMA, TIDAK TAMAT SD, SD, SLTP/SMP, SARJANA, PASCA SARJANA} NO_HP Nomor handphone customer TEXT STATUS_NOMOR_HP Tipe kartu handphone customer {PRABAYAR, PASCABAYAR} NO_TLP Nomor telepon customer TEXT KEBERSEDIAAN DIHUBUNGI Kebersediaan customer untuk dihubungi lagi di masa depan {YES, NO} MERK_MOTOR_SBLMNYA Merk motor yang dipunyai customer sebelumnya {HONDA, YAMAHA, SUZUKI, BELUM PERNAH MEMILIKI, KAWASAKI, MOTOR LAIN} TYPE_MOTOR_SBLMNYA Tipe motor yang dipunyai customer sebelumnya {AT AUTOMATIC, CUB BEBEK, SPORT, BELUM PERNAH MEMILIKI} SMH_DIGUNAKAN_UNTUK Tujuan dibelinya sepeda motor {LAIN-LAIN, KEBUTUHAN KELUARGA, KE SEKOLAH/ KE KAMPUS, BERDAGANG, PEMAKAIAN JARAK DEKAT, REKREASI / OLAH RAGA, BEKERJA} YG_MENGGUNAKAN_SMH Orang yang akan menggunakan sepeda motor yang dibeli {ANAK, LAIN-LAIN, PASANGAN SUAMI ATAU ISTRI, SAYA SENDIRI} MD Kode Main Dealer yang membawahi dealer tempat customer membeli sepeda motor Honda {N01} DEALER_CODE Kode dealer tempat customer membeli sepeda motor Honda 77 NOMINAL {06877, …} KODE_SALES_PERSON Kode sales person yang menjual sepeda motor Honda ke customer 1718 NOMINAL {218595, …} TGL_MASUK_DATA Tanggal masuk ke AHM dari MD DATE (YYYY-MM-DD HH:MM:SS) STATUS_VALIDASI Validasi dari MD untuk menandakan apakah baris data CDB terkait sudah divalidasi kebenarannya atau belum {1,2} UPLOADED_ON Tanggal masuk ke AWO dari AHM DATE (YYYY-MM-DD HH:MM:SS)