Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
of
Johannes Schildgen
2015-10-16
schildgen@cs.uni-kl.de
Visualization
NoSQL Transformations
using
Sampling
of
Johannes Schildgen
2015-10-16
schildgen@cs.uni-kl.de
Visualization
NoSQL Transformations
using
Sampling
of
Johannes Schildgen
2015-10-16
schildgen@cs.uni-kl.de
Visualization
NoSQL Transformations
using
Sampling
of
Johannes Schildgen
2015-10-16
schildgen@cs.uni-kl.de
Visualization
NoSQL Transformations
using
Sampling
5
NoSQL Transformations
6
NoSQL Transformations
7
NoSQL Transformations
8
Column Families
RowId info prices
9
RowId info prices
6511/2014-11-07 12:32:00
6218/2014-11-07 12:32:00
chain street bonus
FillItUp XStreet Paybag
chain str...
10
RowId info prices
6511/2014-11-07 12:32:00
6218/2014-11-07 12:32:00
chain street bonus
FillItUp XStreet Paybag
chain st...
11
12
Filter
13
Group & Aggregate
14
Data ↔ Metadata
15
Supports Flexible Schema
16
RowId info prices
6511/2014-11-07 12:32:00
6218/2014-11-07 12:32:00
Average price
for each type
of fuel for
FillItUp st...
17
18
Sampling
Sampling Factor: 0.2 (=1/5)
19
Sampling
Sampling Factor: 0.2 (=1/5)
20
Sampling: Accuracy vs. Time
21
Sampling: Sampling Factor vs. Time
22
Sampling
Sampling Factor: 0.2 (=1/5)
How accurate
is this?
23
Sampling
Sampling Factor: 0.2 (=1/5)
Which factor for
x% accuracy?
24
How accurate
is this?
Confidence Intervals
e.g., 95%-CI:
„95% of all CIs contain
the true value”
[ 𝑥 −
1
𝛼
⋅
𝜎
𝑛
, 𝑥+
1...
25
[ 𝑥 −
1
𝛼
⋅
𝜎
𝑛
, 𝑥+
1
𝛼
⋅
𝜎
𝑛
]
1-ci (e.g. 0.05)
standard deviation
Chebyshev
Formula:
sampling size
average value
26
[ 𝑥 −
1
0.05
⋅
𝜎
𝑛
, 𝑥+
1
0.05
⋅
𝜎
𝑛
]
1-ci (e.g. 0.05)
standard deviation
Chebyshev
Formula:
sampling size
average val...
27
[ 𝑥 −
1
0.05
⋅
𝜎
3
, 𝑥+
1
0.05
⋅
𝜎
3
]
1-ci (e.g. 0.05)
standard deviation
Chebyshev
Formula:
sampling size
average val...
28
[25 −
1
0.05
⋅
𝜎
3
, 25+
1
0.05
⋅
𝜎
3
]
1-ci (e.g. 0.05)
standard deviation
Chebyshev
Formula:
sampling size
average va...
29
[25 −
1
0.05
⋅
𝜎
3
, 25+
1
0.05
⋅
𝜎
3
]
1-ci (e.g. 0.05)
standard deviation
Chebyshev
Formula:
sampling size
average va...
30
[25 −
1
0.05
⋅
31
3
, 25+
1
0.05
⋅
31
3
]
1-ci (e.g. 0.05)
standard deviation
Chebyshev
Formula:
sampling size
average ...
31
[25 − 8, 25+8]
32
[17, 33]
(relative whisker height: 32%)
33
𝑖 𝑖=75
𝑖 𝑖²=2525
(32% „error“)
( )
34
𝑖 𝑖=75
𝑖 𝑖²=2525
(32% „error“)
( )
35
𝑖 𝑖=139
𝑖 𝑖²=5128
(32% „error“)
( )
36
𝑖 𝑖=139
𝑖 𝑖²=5128
(10% „error“)
( )
37
38
39
40
41
42
43
44
Evaluation 1% Sampling needs
10% the time of a full
computation
sampling size
time
45
Evaluation: Iterative Sampling
sampling size
time[s]
46
47
Thank you for your attention!
Nächste SlideShare
Wird geladen in …5
×

Visualization of NotaQL Transformations using Sampling

312 Aufrufe

Veröffentlicht am

BRAUN, Stefan; SCHILDGEN, Johannes; DEßLOCH, Stefan. Visualisierung von NoSQL-Transformationen unter der Verwendung von Sampling-Techniken. 2015. LWA 2015: 427-438

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Visualization of NotaQL Transformations using Sampling

  1. 1. of Johannes Schildgen 2015-10-16 schildgen@cs.uni-kl.de Visualization NoSQL Transformations using Sampling
  2. 2. of Johannes Schildgen 2015-10-16 schildgen@cs.uni-kl.de Visualization NoSQL Transformations using Sampling
  3. 3. of Johannes Schildgen 2015-10-16 schildgen@cs.uni-kl.de Visualization NoSQL Transformations using Sampling
  4. 4. of Johannes Schildgen 2015-10-16 schildgen@cs.uni-kl.de Visualization NoSQL Transformations using Sampling
  5. 5. 5 NoSQL Transformations
  6. 6. 6 NoSQL Transformations
  7. 7. 7 NoSQL Transformations
  8. 8. 8 Column Families RowId info prices
  9. 9. 9 RowId info prices 6511/2014-11-07 12:32:00 6218/2014-11-07 12:32:00 chain street bonus FillItUp XStreet Paybag chain street FillItUp Rotweg Column Families Diesel SuperE10 1.279 1.459 LPG 0.589 Average price for each type of fuel for FillItUp stations List of streets and the lowest price for each type of fuel. For each log record: Average price over all types of fuel. RowId avg Diesel 1.2589552 RowId XStreet YStreet Diesel 1.279 1.289 RowId avg 6511/2014-… 1.376
  10. 10. 10 RowId info prices 6511/2014-11-07 12:32:00 6218/2014-11-07 12:32:00 chain street bonus FillItUp XStreet Paybag chain street FillItUp Rotweg Column Families Diesel SuperE10 1.279 1.459 LPG 0.589 Average price for each type of fuel for FillItUp stations List of streets and the lowest price for each type of fuel. For each log record: Average price over all types of fuel. RowId avg Diesel 1.2589552 RowId XStreet YStreet Diesel 1.279 1.289 RowId avg 6511/2014-… 1.376
  11. 11. 11
  12. 12. 12 Filter
  13. 13. 13 Group & Aggregate
  14. 14. 14 Data ↔ Metadata
  15. 15. 15 Supports Flexible Schema
  16. 16. 16 RowId info prices 6511/2014-11-07 12:32:00 6218/2014-11-07 12:32:00 Average price for each type of fuel for FillItUp stations RowId avg Diesel 1.2589552 SuperE10 1.4922919 LPG 0.5890000 SuperE5 1.5192811 chain street bonus FillItUp XStreet Paybag chain street FillItUp Rotweg Diesel SuperE10 1.279 1.459 LPG 0.589 IN-FILTER: chain=‘FillItUp‘, OUT._r <- IN.prices:_c, OUT.avg <- AVG(IN._v) Visualizations
  17. 17. 17
  18. 18. 18 Sampling Sampling Factor: 0.2 (=1/5)
  19. 19. 19 Sampling Sampling Factor: 0.2 (=1/5)
  20. 20. 20 Sampling: Accuracy vs. Time
  21. 21. 21 Sampling: Sampling Factor vs. Time
  22. 22. 22 Sampling Sampling Factor: 0.2 (=1/5) How accurate is this?
  23. 23. 23 Sampling Sampling Factor: 0.2 (=1/5) Which factor for x% accuracy?
  24. 24. 24 How accurate is this? Confidence Intervals e.g., 95%-CI: „95% of all CIs contain the true value” [ 𝑥 − 1 𝛼 ⋅ 𝜎 𝑛 , 𝑥+ 1 𝛼 ⋅ 𝜎 𝑛 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value
  25. 25. 25 [ 𝑥 − 1 𝛼 ⋅ 𝜎 𝑛 , 𝑥+ 1 𝛼 ⋅ 𝜎 𝑛 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value
  26. 26. 26 [ 𝑥 − 1 0.05 ⋅ 𝜎 𝑛 , 𝑥+ 1 0.05 ⋅ 𝜎 𝑛 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value
  27. 27. 27 [ 𝑥 − 1 0.05 ⋅ 𝜎 3 , 𝑥+ 1 0.05 ⋅ 𝜎 3 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value 10 20 45 (10+20+45)/3 = 25
  28. 28. 28 [25 − 1 0.05 ⋅ 𝜎 3 , 25+ 1 0.05 ⋅ 𝜎 3 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value 10 20 45 (10+20+45)/3 = 25
  29. 29. 29 [25 − 1 0.05 ⋅ 𝜎 3 , 25+ 1 0.05 ⋅ 𝜎 3 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value 10 20 45 (10+20+45)/3 = 25 𝑖 𝑖=75 𝑖 𝑖²=2525 𝜎 = 3 3 − 1 ⋅ 1 3 ⋅ (2525 − 252)
  30. 30. 30 [25 − 1 0.05 ⋅ 31 3 , 25+ 1 0.05 ⋅ 31 3 ] 1-ci (e.g. 0.05) standard deviation Chebyshev Formula: sampling size average value 10 20 45 (10+20+45)/3 = 25 𝑖 𝑖=75 𝑖 𝑖²=2525 𝜎 = 3 3 − 1 ⋅ 1 3 ⋅ (2525 − 252)
  31. 31. 31 [25 − 8, 25+8]
  32. 32. 32 [17, 33] (relative whisker height: 32%)
  33. 33. 33 𝑖 𝑖=75 𝑖 𝑖²=2525 (32% „error“) ( )
  34. 34. 34 𝑖 𝑖=75 𝑖 𝑖²=2525 (32% „error“) ( )
  35. 35. 35 𝑖 𝑖=139 𝑖 𝑖²=5128 (32% „error“) ( )
  36. 36. 36 𝑖 𝑖=139 𝑖 𝑖²=5128 (10% „error“) ( )
  37. 37. 37
  38. 38. 38
  39. 39. 39
  40. 40. 40
  41. 41. 41
  42. 42. 42
  43. 43. 43
  44. 44. 44 Evaluation 1% Sampling needs 10% the time of a full computation sampling size time
  45. 45. 45 Evaluation: Iterative Sampling sampling size time[s]
  46. 46. 46
  47. 47. 47 Thank you for your attention!

×