This slide is to share what I've learned from the kaggle competition. There 3 topics -1) Overview of the competition 2) Introduction to Decision Tree and 3) R package XGboost.
5. What is ROC?
• ROC : receiver operating characteristic
• The ROC curve was first developed by electrical engineers and radar
engineers during World War II for detecting enemy objects in battlefields.
• ROC curve is a graphical plot that illustrates the performance of a binary
classifier system as its discrimination threshold is varied.
• The curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings.
https://en.wikipedia.org/wiki/Receiver_operating_characteristic
6. Sensitivity and Specificity
https://www.youtube.com/watch?v=Z5TtopYX1Gc
• True Positive (tp) – Detection
• False Positive (fp) – False alarm
• True Negative (tn)
• False Negative (fn)
• Sensitivity = Probability of Detection
• Specificity = Probability of True Negative
• 1-Specificity = Probability of False alarm
8. receiver operating characteristic (ROC)
https://www.youtube.com/watch?v=gYIlKUP2hk0
the ROC curve can be generated by
plotting the cumulative distribution
function of the detection probability
in the y-axis versus the cumulative
distribution function of the false-
alarm probability in x-axis.
26. XGBoost: Extreme Gradient Boosting
• An optimized distributed gradient boosting library
• XGBoost only works with numeric vectors. you need to convert all
other forms of data into numeric vectors.
• XGBoost provides a convenient function to do cross (an important
method to measure the model’s prediction power).
• XGBoost can handle missing values in the data
27. XGBoost: Extreme Gradient Boosting
https://www.youtube.com/watch?v=ufHo8vbk6g4
http://blog.nycdatascience.com/faculty/kaggle-winning-solution-xgboost-algorithm-let-us-learn-from-its-author-3/
The minimum information we need to provide is
28. XGBoost: Extreme Gradient Boosting
• Step 1 Load all the libraries
• Step 2 Load the dataset
• Step 4 Tune and Run the model
• Step 3 Data Cleaning & Feature Engineering
• Step 5 Score the Test Population
https://www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/
Which customers have the most potential business value
Prediction model
Classification algorithm
Data:
Characteristics (People)
Activities (act_train, act_test)