2. problem definition
- have to simulate fake logs based on log count per hour
- data to fit: thumbor.buzzni.com
3.
4. log modeling
- count of logs per hour
== frequency of logs appearing in a fixed interval of time
5. log modeling: poisson distribution
- wikipedia: poisson distribution expresses the probability of a given number
of events occurring in a fixed interval of time or space if these events
occur with a known constant rate λ and independently of the time since the
last event.
6. log modeling: poisson process
- wikipedia: poisson point process is a type of random mathematical object
that consists of points randomly located on a mathematical space.
7. implementation: homogeneous case
def get_points_homogeneous(min_t, max_t, occurrence):
points = []
for _ in range(occurrence):
points.append(random.randint(min_t, max_t))
points.sort()
for point in points:
yield point
10. log modeling: inhomogeneous poisson process
λmax
keep
discard
maximum integer bound
λ(t)
t
65%
discard
probability
7%
discard
probability
11. implementation: nonhomogeneous case
def get_points_nonhomogeneous(min_t, max_t, occurrence):
points = []
max_bound = occurrence.get_max_bound(min_t, max_t)
for _ in range(max_bound):
points.append(random.randint(min_t, max_t))
points.sort()
for point in points:
keep_probability = occurrence.get(point) / max_bound
if keep_probability > random.random():
yield point
13. reference
- Chiu, S. N., Stoyan, D., Kendall, W. S., & Mecke, J. (2013). Stochastic
geometry and its applications (3rd ed.). The Atrium, Southern Gate,
Chichester, West Sussex, United Kingdom: John Wiley & Sons.
- Poisson distribution. (2019, February 16). Retrieved February 25, 2019,
from https://en.wikipedia.org/wiki/Poisson_distribution
- Poisson point process. (2019, February 20). Retrieved February 25, 2019,
from https://en.wikipedia.org/wiki/Poisson_point_process