Agile Development for Startup โดย ผศ.ดร.รัฐกร พูลทรัพย์
Statistics and big data for justice and fairness
1. Statistics for justice and fairness
ผศ.ดร.อานนท์ ศักดิ์วรวิชญ์
ผู้อานวยการหลักสูตร
Ph.D. and M.Sc. in Business Analytics and Data Science
อาจารย์ประจาสาขาวิชาวิทยาการประกันภัยและการบริหารความเสี่ยง
คณะสถิติประยุกต์ สถาบันบัณฑิตพัฒนบริหารศาสตร์
2.
3.
4.
5.
6. Roles of statistics in fairness and justice
• Facilitate fairness
• Detect anomaly and fraud
• Prevent crime and anomaly
• Regulatory Impact Assessment
15. There is no crime without any trace!
-Large deviation from normal or average man or cluster.
-Large deviation from past behavior.
-Inconsistency with themselves and surroundings.
-Repeated anomaly pattern.
-Caution on statistical detection of cheating and anomalous detection
Anomaly Detection
17. Large deviation from normal or average man or cluster.
v
58
Severity
Frequency58
18. Loss58 = f(Frequency57, Severity57, ICD-1057, ICD-957
,ICD-1058, ICD-958, age, gender)
Loss58
58
Predictors
Under Predict (Fraud or abuse)
v
vvvvv
vvvvv
vvvvv
vv
vv
v
v
vv
v
vv
Large deviation from past behavior.
19. Large deviation from past behavior.
TOEFL time 2
TOEFL time 1
Under Predict (Fraud or abuse)
v
v
vv
vv
vv
vvv
vv
vvv
vv
vv
v
v
vv
v
v
v
20. Inconsistency with themselves and surroundings.
-Low ability test taker can answer difficult item.
-K-index for copying! Eight dimensions
-Scoring test with contaminated response vector
-Influence function + Robust estimators
21.
22. -5 -4 -3 -2 -1 0 1 2 3
0
10
20
Pseudovalue Distribution for an Optima Examinee
Proficiency
Estimaate
Frequency
From Incorrect
Responses
From Correct
Responses
28. • Positive Predictive Value: PPV
Caution on statistical detection of cheating
64.76 % 99.30%
29. • Statistical evidence as a red flag or warning
• Physical evidence is always needed.
• Early detection, protection, and prevention.
• Bayesian flip is needed.
Caution on statistical detection of cheating
P(Cheating=Yes|Detection=Yes)
P(Detection=Yes|Cheating=Yes)
P(Cheating=No|Detection=No)
P(Detection=No|Cheating=No)
P(Cheating=Yes|Detection=Yes)=P(Detection=Yes|Cheating=Yes)*P(Cheating=Yes)
P(Detection=Yes)
35. • LOF = Local density of k neighbor/Local density of its own point
• The Higher LOF = the more extreme local outlier!!!!
• Determine sigma (radius / reachable distance around point) so
that we can count k neighbor.
• Local density for point = numbers of points within reachable
distance/sum of distance between points and all k neighbors
LOF