Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Theorems
Random sampling on DB records
Multiplicative Decompositions
of Stochastic Distributions
and Their Applications
To...
Theorems
Random sampling on DB records
Keywords
Bradley-Terry model — 1952
Snedecor-Fisher distributions — 1934, known as ...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2  0 :
Prob
...
Theorems
Random sampling on DB records
Logarithmic Variances
Propositions about U[0, 1]
Theorem
For any 0  δ  1 :
U[0, 1] ...
Theorems
Random sampling on DB records
Logarithmic Variances
Propositions about U[0, 1]
Theorem
For any 0  δ  1 :
U[0, 1] ...
Theorems
Random sampling on DB records
Logarithmic Variances
Propositions about U[0, 1]
Theorem
For any 0  δ  1 :
U[0, 1] ...
Theorems
Random sampling on DB records
Logarithmic Variances
Propositions about U[0, 1]
Theorem
For any 0  δ  1 :
U[0, 1] ...
Theorems
Random sampling on DB records
Logarithmic Variances
Propositions about U[0, 1]
Theorem
For any 0  δ  1 :
U[0, 1] ...
Theorems
Random sampling on DB records
Logarithmic Variances
Proof outline:
Relations such as Γ(2z) = 22z−1
π−1/2
Γ(z)Γ(z ...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of 1/(1 + abs(rt(n, 1)/rt(n, 1)))
1/(1 + abs(rt(n, ...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of 1/(1 + abs(rt(n, 2)/rt(n, 2)))
1/(1 + abs(rt(n, ...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of 1/(1 + abs(rt(n, 3)/rt(n, 3)))
1/(1 + abs(rt(n, ...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of 1/(1 + abs(rnorm(n)/rnorm(n)))
1/(1 + abs(rnorm(...
Theorems
Random sampling on DB records
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
Toshiyuki Sh...
Theorems
Random sampling on DB records
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) |...
Theorems
Random sampling on DB records
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) |...
Theorems
Random sampling on DB records
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) |...
Theorems
Random sampling on DB records
Logarithmic Variances
Logaritimic Variances
Theorem
Var(log F(2, 2)) = π2
/3
Var(lo...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of log(abs(rf(n, 2, 2)))
0.0
0.1
0.2
0.3
0.4
log
.0...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of log(abs(rt(n, 1)))
0.0
0.1
0.2
0.3
0.4
log
.001
...
Theorems
Random sampling on DB records
Logarithmic Variances
Histogram of log(abs(rt(n, 2)))
0.0
0.1
0.2
0.3
0.4
log
.001
...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
Tos...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Application to Statistical Disclosure Control
When analyzing data in a company:
1 T...
Theorems
Random sampling on DB records
Why Weighted Random Sampling on a Table?
Toshiyuki Shimono
Multiplicative Decomposi...
Theorems
Random sampling on DB records
Why Weighted Random Sampling on a Table?
1 Human eyes can only see sampled records ...
Theorems
Random sampling on DB records
Why Weighted Random Sampling on a Table?
1 Human eyes can only see sampled records ...
Theorems
Random sampling on DB records
Why Weighted Random Sampling on a Table?
1 Human eyes can only see sampled records ...
Theorems
Random sampling on DB records
Table: Word frequency table of ”Hamlet”. Simple rand. samp.
word count word count w...
Theorems
Random sampling on DB records
Table: Word frequency table of ”Hamlet”. Simple vs. Weighted.
word count word count...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
Toshiyuki Shimono
Multi...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sens...
Theorems
Random sampling on DB records
Why RWS of T by v-weight is possible?
If
x1
x2
∼ F(2, 2) and 0 ≤ v1 ≪ v2 :
Prob[
v1...
Theorems
Random sampling on DB records
Why RWS of T by v-weight is possible?
If
x1
x2
∼ F(2, 2) and 0 ≤ v1 ≪ v2 :
Prob[
v1...
Theorems
Random sampling on DB records
Why RWS of T by v-weight is possible?
If
x1
x2
∼ F(2, 2) and 0 ≤ v1 ≪ v2 :
Prob[
v1...
Theorems
Random sampling on DB records
Why RWS of T by v-weight is possible?
If
x1
x2
∼ F(2, 2) and 0 ≤ v1 ≪ v2 :
Prob[
v1...
Theorems
Random sampling on DB records
Is WRS of T by exactly v-weight is possible?
Fox fixed 0 ≤ v1≤v2,
if random variable...
Theorems
Random sampling on DB records
Combinations of the distributions
to enable WRS
and to enable Pr[v1x1  v2x2] : Pr[v...
Nächste SlideShare
Wird geladen in …5
×

Multiplicative Decompositions of Stochastic Distributions and Their Applications - 2021-01-22

2021年1月22日の金沢大学の科学研究費シンポジウム「統計科学の革新にむけて」のためのスライド資料
http://stat.w3.kanazawa-u.ac.jp/ksympo20.html

  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Multiplicative Decompositions of Stochastic Distributions and Their Applications - 2021-01-22

  1. 1. Theorems Random sampling on DB records Multiplicative Decompositions of Stochastic Distributions and Their Applications Toshiyuki Shimono The institute of Statistical Mathematics Kakenhi Sympo, Kanawaza University, 2021-01-22 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  2. 2. Theorems Random sampling on DB records Keywords Bradley-Terry model — 1952 Snedecor-Fisher distributions — 1934, known as F-dist. Student’s t-distribution — 1908, Gosset {Uniform, Geometric, Gamma} dist. ”Multiplicative” operations upon distributions New usage of symbols Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  3. 3. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob v1x1 v2x2 : Prob v1x1 v2x2 = v1 : v2 where x1, x2 iid ∼ |T(2) |. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  4. 4. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1, x2 iid ∼ |T(2) |. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  5. 5. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1, x2 iid ∼ |T(2) |. — cf. Bradley-Terry model (1952): Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  6. 6. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1, x2 iid ∼ |T(2) |. — cf. Bradley-Terry model (1952): Prob [ ”player i ” stronger than ”player j ” ] = vi vi + vj Applied to food preferences, sports team strengths. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  7. 7. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | × ⊥ means the independent variates multiplication. 1/□ means the reciprocal number of the variate. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  8. 8. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | ≡ |T(1) | × ⊥ F(2, 2)1/2 The superscription means exponent : 1/2 means taking the square root of the variate. 1/4 , 1/8 will appear as 4th root, 8th root. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  9. 9. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | ≡ |T(1) | × ⊥ F(2, 2)1/2 Note: T(1) ≡ 1/T(1), F(2, 2) ≡ 1/F(2, 2), T(2) ̸≡ 1/T(2). Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  10. 10. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | ≡ |T(1) | × ⊥ F(2, 2)1/2 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ F(2, 2)1/4 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  11. 11. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | ≡ |T(1) | × ⊥ F(2, 2)1/2 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ F(2, 2)1/4 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ |T(1) |1/4 × ⊥ F(2, 2)1/8 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  12. 12. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | ≡ |T(1) | × ⊥ F(2, 2)1/2 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ F(2, 2)1/4 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ |T(1) |1/4 × ⊥ F(2, 2)1/8 ≡ · · · Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  13. 13. Theorems Random sampling on DB records Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 0 : Prob v1x1 v2x2 | {z } = v1/(v1+v2) : Prob v1x1 v2x2 | {z } = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | × ⊥ 1/|T(2) | ≡ |T(1) | × ⊥ F(2, 2)1/2 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ F(2, 2)1/4 ≡ |T(1) | × ⊥ |T(1) |1/2 × ⊥ |T(1) |1/4 × ⊥ F(2, 2)1/8 ≡ · · · ≡ F(2, 2). Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  14. 14. Theorems Random sampling on DB records Logarithmic Variances Propositions about U[0, 1] Theorem For any 0 δ 1 : U[0, 1] ≡ U[δ, 1] × ⊥ δ⌊ log δ U[0,1] ⌋ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  15. 15. Theorems Random sampling on DB records Logarithmic Variances Propositions about U[0, 1] Theorem For any 0 δ 1 : U[0, 1] ≡ U[δ, 1] × ⊥ δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ 1/U[δ, 1] × ⊥ 1/δ⌊ log δ U[0,1] ⌋ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  16. 16. Theorems Random sampling on DB records Logarithmic Variances Propositions about U[0, 1] Theorem For any 0 δ 1 : U[0, 1] ≡ U[δ, 1] × ⊥ δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ 1/U[δ, 1] × ⊥ 1/δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ exp Γδ × ⊥ exp Γ1−δ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  17. 17. Theorems Random sampling on DB records Logarithmic Variances Propositions about U[0, 1] Theorem For any 0 δ 1 : U[0, 1] ≡ U[δ, 1] × ⊥ δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ 1/U[δ, 1] × ⊥ 1/δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ exp Γδ × ⊥ exp Γ1−δ ; Γ1 2 ≡ N(0, 1 2 )2 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  18. 18. Theorems Random sampling on DB records Logarithmic Variances Propositions about U[0, 1] Theorem For any 0 δ 1 : U[0, 1] ≡ U[δ, 1] × ⊥ δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ 1/U[δ, 1] × ⊥ 1/δ⌊ log δ U[0,1] ⌋ 1/U[0, 1] ≡ exp Γδ × ⊥ exp Γ1−δ ; Γ1 2 ≡ N(0, 1 2 )2 Theorem 1/U[0, 1] ≡ F(2, 2) + 1 A good approximation relation. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  19. 19. Theorems Random sampling on DB records Logarithmic Variances Proof outline: Relations such as Γ(2z) = 22z−1 π−1/2 Γ(z)Γ(z + 1/2) and B(x, y) = Γ(x)Γ(y) Γ(x+y) = 2 R π/2 0 sin2x−1 t cos2y−1 t dt are used. E[Xm ] for m ∈ R is calculated for each distribution X such as F(2, 2), |T(1) |, |T(2) |, which are Γ(1 + m)Γ(1 − m), Γ(1+m 2 )Γ(1−m 2 )/π, √ 2 m √ π Γ(1+m 2 )Γ(2−m 2 ), respectively. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  20. 20. Theorems Random sampling on DB records Logarithmic Variances Histogram of 1/(1 + abs(rt(n, 1)/rt(n, 1))) 1/(1 + abs(rt(n, 1)/rt(n, 1))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 10000 20000 30000 40000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  21. 21. Theorems Random sampling on DB records Logarithmic Variances Histogram of 1/(1 + abs(rt(n, 2)/rt(n, 2))) 1/(1 + abs(rt(n, 2)/rt(n, 2))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 10000 20000 30000 40000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  22. 22. Theorems Random sampling on DB records Logarithmic Variances Histogram of 1/(1 + abs(rt(n, 3)/rt(n, 3))) 1/(1 + abs(rt(n, 3)/rt(n, 3))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 10000 20000 30000 40000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  23. 23. Theorems Random sampling on DB records Logarithmic Variances Histogram of 1/(1 + abs(rnorm(n)/rnorm(n))) 1/(1 + abs(rnorm(n)/rnorm(n))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 10000 20000 30000 40000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  24. 24. Theorems Random sampling on DB records Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  25. 25. Theorems Random sampling on DB records Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ Var( |T(1) | ) = ∞ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  26. 26. Theorems Random sampling on DB records Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ Var( |T(1) | ) = ∞ Var( |T(2) | ) = ∞ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  27. 27. Theorems Random sampling on DB records Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ Var( |T(1) | ) = ∞ Var( |T(2) | ) = ∞ Var( 1/U[0, 1] ) = ∞ Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  28. 28. Theorems Random sampling on DB records Logarithmic Variances Logaritimic Variances Theorem Var(log F(2, 2)) = π2 /3 Var(log |T(1) |) = π2 /4 Var(log |T(2) |) = π2 /6 Var(log 1/U[0, 1]) = 1 log above : taking log of the variate and then forming a new dist. Note: consistent with the previous theorem. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  29. 29. Theorems Random sampling on DB records Logarithmic Variances Histogram of log(abs(rf(n, 2, 2))) 0.0 0.1 0.2 0.3 0.4 log .001 log .01 log .1 log 1 log 10 log 100 log 1000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  30. 30. Theorems Random sampling on DB records Logarithmic Variances Histogram of log(abs(rt(n, 1))) 0.0 0.1 0.2 0.3 0.4 log .001 log .01 log .1 log 1 log 10 log 100 log 1000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  31. 31. Theorems Random sampling on DB records Logarithmic Variances Histogram of log(abs(rt(n, 2))) 0.0 0.1 0.2 0.3 0.4 log .001 log .01 log .1 log 1 log 10 log 100 log 1000 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  32. 32. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  33. 33. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  34. 34. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. 2 External experts often handle the data. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  35. 35. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. 2 External experts often handle the data. 3 How do they keep the data confident enough? Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  36. 36. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. 2 External experts often handle the data. 3 How do they keep the data confident enough? Statistical Disclosure Control is necessary. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  37. 37. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. 2 External experts often handle the data. 3 How do they keep the data confident enough? Statistical Disclosure Control is necessary. Multiplicative noise is often useful on numerical data. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  38. 38. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. 2 External experts often handle the data. 3 How do they keep the data confident enough? Statistical Disclosure Control is necessary. Multiplicative noise is often useful on numerical data. N(µ, σ2) or U[a, b] has only been used, possibly. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  39. 39. Theorems Random sampling on DB records Application to Statistical Disclosure Control When analyzing data in a company: 1 Transaction data are accumulated. 2 External experts often handle the data. 3 How do they keep the data confident enough? Statistical Disclosure Control is necessary. Multiplicative noise is often useful on numerical data. N(µ, σ2) or U[a, b] has only been used, possibly. Useful distributions for weighted random sampling: 1 U[δ, 1], δ⌊ log δ U[0,1] ⌋ and exp Γδ. ( δ ∈ [0, 1] ) 2 |T(2) |, |T(1) | and F(2, 2). Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  40. 40. Theorems Random sampling on DB records Why Weighted Random Sampling on a Table? Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  41. 41. Theorems Random sampling on DB records Why Weighted Random Sampling on a Table? 1 Human eyes can only see sampled records of a big table. ▶ A table may contains thousands, millions, billions of record. Too huge for human eyes. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  42. 42. Theorems Random sampling on DB records Why Weighted Random Sampling on a Table? 1 Human eyes can only see sampled records of a big table. 2 Without randomness they only leads to biased view. ▶ Without randomness one can catch only beginning or eye-catching records. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  43. 43. Theorems Random sampling on DB records Why Weighted Random Sampling on a Table? 1 Human eyes can only see sampled records of a big table. 2 Without randomness they only leads to biased view. 3 Weighting w.r.t. prices or other numeric values helps the comprehension of a table, greatly. ▶ Simple random sampling still has defects: Excessively retrieves the records with the low importance. The importance indicators of the each records, therefore, should be utilized in the sampling. ▶ Weighted random sampling retrieves records according to: the probability proportional to an auxiliary variable such as price. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  44. 44. Theorems Random sampling on DB records Table: Word frequency table of ”Hamlet”. Simple rand. samp. word count word count word count word count word count OPHELIA 67 stuff 3 lament 2 Looking 1 Mourners 1 doth 23 chief 3 translate 2 ’take 1 strokes 1 use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1 devil 9 puff’d 2 revolution 1 east 1 scent 1 home 6 plague 2 Pinch 1 profanely 1 warning 1 touch 6 venom 2 access 1 struggling 1 betimes 1 season 5 spokes 2 bravery 1 nerve 1 hent 1 get 5 lunacy 2 quietly 1 amities 1 assure 1 ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1 neck 3 Drown’d 2 consider’d 1 toys 1 moods 1 50(= 10 × 5) words are randomly chosen from different 5,318 words. The total of the counts is 32,446. i.e. Hamlet written by Shakespeare contains such 32,446 number words. Those with count≥2, 3, 4,..,10,.. only occupies about 1 2 , 1 3 , 1 4 , .., 1 10 , .. of this entire (50 rows) table, respectively. This ”count≥ ∀ c” rows occupy 1 c of an entire frequency table often appears quite often, emprically! Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  45. 45. Theorems Random sampling on DB records Table: Word frequency table of ”Hamlet”. Simple vs. Weighted. word count word count word count word count word count OPHELIA 67 stuff 3 lament 2 Looking 1 Mourners 1 doth 23 chief 3 translate 2 ’take 1 strokes 1 use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1 devil 9 puff’d 2 revolution 1 east 1 scent 1 home 6 plague 2 Pinch 1 profanely 1 warning 1 touch 6 venom 2 access 1 struggling 1 betimes 1 season 5 spokes 2 bravery 1 nerve 1 hent 1 get 5 lunacy 2 quietly 1 amities 1 assure 1 ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1 neck 3 Drown’d 2 consider’d 1 toys 1 moods 1 word count word count word count word count word count the 995 And 263 more 90 many 18 parts 3 and 706 this 248 at 75 command 10 ways 3 to 635 me 234 well 65 hell 10 antique 2 of 630 him 197 let 60 honour 10 yesternight 1 I 546 he 178 speak 55 Reads 5 constantly 1 my 441 HORATIO 128 go 52 Follow 5 emulate 1 HAMLET 407 do 127 night 47 stir 5 honour’s 1 it 361 what 116 into 27 knew 5 really 1 not 299 all 108 Good 25 ourself 3 revolution 1 that 266 our 107 Ghost 25 white 3 riotous 1 Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  46. 46. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  47. 47. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  48. 48. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′ . Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  49. 49. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′ . 4 Apply various analysis on T′ . Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  50. 50. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′ . 4 Apply various analysis on T′ . 1 Performs several analysis on T′ as usual. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  51. 51. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′ . 4 Apply various analysis on T′ . 1 Performs several analysis on T′ as usual. 2 Numerical sum of v′ may well reflects the sum of v. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  52. 52. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′ . 4 Apply various analysis on T′ . 1 Performs several analysis on T′ as usual. 2 Numerical sum of v′ may well reflects the sum of v. 3 Random sampling of T′ by the weight v is also possible! Note: v is hidden. Only v′ can be seen by the expert. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  53. 53. Theorems Random sampling on DB records The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′ . 4 Apply various analysis on T′ . 1 Performs several analysis on T′ as usual. 2 Numerical sum of v′ may well reflects the sum of v. 3 Random sampling of T′ by the weight v is also possible! 5 The data provider can judge the ability of the expert without showing the precise numerical values of v. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  54. 54. Theorems Random sampling on DB records Why RWS of T by v-weight is possible? If x1 x2 ∼ F(2, 2) and 0 ≤ v1 ≪ v2 : Prob[ v1x1 x2 v2] = v1 v1 + v2 ≈ v1 v2 . Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  55. 55. Theorems Random sampling on DB records Why RWS of T by v-weight is possible? If x1 x2 ∼ F(2, 2) and 0 ≤ v1 ≪ v2 : Prob[ v1x1 x2 v2] = v1 v1 + v2 ≈ v1 v2 . Thus, under the condition X × ⊥ X′ = F(2, 2), along with x(i) iid ∼ X and x′ (i) iid ∼ X′ , calculate v′′ (i) := v′ (i)/x′ (i) where v′ (i) := v(i) × x(i). Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  56. 56. Theorems Random sampling on DB records Why RWS of T by v-weight is possible? If x1 x2 ∼ F(2, 2) and 0 ≤ v1 ≪ v2 : Prob[ v1x1 x2 v2] = v1 v1 + v2 ≈ v1 v2 . Thus, under the condition X × ⊥ X′ = F(2, 2), along with x(i) iid ∼ X and x′ (i) iid ∼ X′ , calculate v′′ (i) := v′ (i)/x′ (i) where v′ (i) := v(i) × x(i). Define Sn as #Sn = n and v′′ (∀ i ∈ Sn) ≥ v′′ (∀ j / ∈ Sn). Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  57. 57. Theorems Random sampling on DB records Why RWS of T by v-weight is possible? If x1 x2 ∼ F(2, 2) and 0 ≤ v1 ≪ v2 : Prob[ v1x1 x2 v2] = v1 v1 + v2 ≈ v1 v2 . Thus, under the condition X × ⊥ X′ = F(2, 2), along with x(i) iid ∼ X and x′ (i) iid ∼ X′ , calculate v′′ (i) := v′ (i)/x′ (i) where v′ (i) := v(i) × x(i). Define Sn as #Sn = n and v′′ (∀ i ∈ Sn) ≥ v′′ (∀ j / ∈ Sn). Then {T(i)|i ∈ Sn} is a sample from T approximately by v-weight. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  58. 58. Theorems Random sampling on DB records Is WRS of T by exactly v-weight is possible? Fox fixed 0 ≤ v1≤v2, if random variable x1, x2 ≥ 0 satisfies x1 x2 ∼ U[0, 1]−1 : Prob[ v1x1 x2 v2] = v1 v2 . Thus, under the condition X × ⊥ X′ = U[0, 1]−1 , along with x(i) iid ∼ X and x′ (i) iid ∼ X′ , calculate v′′ (i) := v′ (i)/x′ (i) where v′ (i) := v(i) × x(i). Define Sn as #Sn = n and v′′ (∀ i ∈ Sn) ≥ v′′ (∀ j / ∈ Sn). Then {T(i)|i ∈ Sn} is a sample from T exactly by v-weight. Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei
  59. 59. Theorems Random sampling on DB records Combinations of the distributions to enable WRS and to enable Pr[v1x1 v2x2] : Pr[v1x1 v2x2] = v1 : v2 Dist. of x1 Dist. of x2 var[ log x1 ] |T(2) | |T(2) | π2 /6 F(2, 2)1/2 |T(1) | π2 /3 F(2, 2)1/4 |T(1) | × ⊥ |T(1) |1/2 π2 /12 F(2, 2)1/8 |T(1) | × ⊥ |T(1) |1/2 × ⊥ |T(1) |1/4 π2 /48 · · · · · · |T(1) | F(2, 2)1/2 π2 /4 |T(1) |1/2 |T(1) | × ⊥ F(2, 2)1/4 π2 /16 |T(1) |1/4 |T(1) |1/2 × ⊥ |T(1) | × ⊥ F(2, 2)1/8 π2 /64 · · · · · · Toshiyuki Shimono Multiplicative Decompositionsof Stochastic Distributionsand Thei

×