Time Series Meetup: Virtual Edition | July 2020

July 21, 2020
Welcome To
Time Series Virtual Meetup

2
Agenda
●Introductions
●Our talk today
●Q&A
●Open Jobs
●Be a speaker

November 10 - 11, 2020
North America Virtual Experience
www.inﬂuxdays.com/virtual-experience-2020/
Call for Papers is now open!
We’re looking for great speakers – submit
your speaker application today.

Anomaly Detection with
Median Absolute Deviation
Anais Dotis-Georgiou
Developer Advocate | InﬂuxData

Median Absolute Deviation with Flux for Anomaly Detection
& Contributing Custom Flux Packages
Getting MAD

© 2020 InﬂuxData. All rights reserved.6
Hello!
• Developer Advocate
• Anais Jackie Dotis on LinkedIn
• @AnaisDotis
• http://community.inﬂuxdata.com/

What is Median Absolute Deviation?
• a “deviation from the pack” algorithms
• spot containers, virtual machines (VMs), servers, or sensors
that are behaving differently from others, you can use the
Median Absolute Deviation
• reduce incident times and MTTR to uphold SLAs

How Does MAD work?

Numerical Example of MAD

Step One

Step Two

Step Three

Step Four

Step Five

Flux Functions Used
• group()
• drop()
• median()
• map()
• join()

group()
The group() function groups records based on their values for
speciﬁc columns. It produces tables with new group keys based
on provided properties. Specify an empty array of columns to
ungroup data or merge all input tables into a single output table.
group(columns: ["host", "_measurement"], mode:"by")

drop()
The drop() function removes specified columns from a table.
Columns are specified either through a list or a predicate
function. When a dropped column is part of the group key, it will
be removed from the key. If a specified column is not present in a
table, it will return an error.
drop(columns: ["col1", "col2"])

median()
The median() function is a special application of the
quantile() function that returns the median _value of an
input table or all non-null records in the input table with values
that fall within the 0.5 quantile (50th percentile) depending on
the method used.
median(
column: "_value",
method: "estimate_tdigest",
compression: 0.0
)

map()
The map() function applies a function to each record in the input tables. The modiﬁed
records are assigned to new tables based on the group key of the input table. The
output tables are the result of applying the map function to each record of the input
tables.
When the output record contains a different value for the group key, the record is
regrouped into the appropriate table. When the output record drops a column that
was part of the group key, that column is removed from the group key.
map(fn: (r) => ({ _value: r._value * r._value }))

join()
The join() function merges two or more input streams whose
values are equal on a set of common columns into a single
output stream. Null values are not considered equal when
comparing column values. The resulting schema is the union of
the input schemas. The resulting group key is the union of the
input group keys.
join(tables: {key1: table1, key2: table2}, on: ["_time", "_field"], method: "inner")

Custom Flux Function: Basic Syntax
// Basic function definition structure
functionName = (functionParameters) => functionOperations
// Function definition
square = (n) => n * n
// Function usage
> square(n:3)
9

Custom Flux Function: pipe-forward data
In the example below, the tables parameter is assigned to the <- expression,
which represents all data piped-forward into the function. tables is then
piped-forward into other operations in the function deﬁnition.
// Function usage
from(bucket: "example-bucket")
|> range(start: -1m)
|> filter(fn: (r) =>
r._measurement == "mem" and
r._field == "used_percent"
)
|> multByX(x:2.0)
functionName = (tables=<-) => tables |>
functionOperations
// Function definition
multByX = (tables=<-, x) =>
tables
|> map(fn: (r) => ({ r with _value:
r._value * x}))

Contributing a User Deﬁned Flux Package
1. Write your function
2. Write a test
3. Compile
4. Submit a PR

mad.ﬂux
package anomalydetection
import "math"
import "experimental"
mad = (table=<-, threshold=3.0) => {
// MEDiXi = med(x)
data = table |> group(columns: ["_time"], mode:"by")
med = data |> median(column: "_value")
// diff = |Xi - MEDiXi| = math.abs(xi-med(xi))
diff = join(tables: {data: data, med: med}, on: ["_time"],
method: "inner")
|> map(fn: (r) => ({ r with _value: math.abs(x: r._value_data
- r._value_med) }))
|> drop(columns: ["_start", "_stop", "_value_med",
"_value_data"])
// The constant k is needed to make the estimator consistent
for the parameter of interest.
k = 1.4826
// MAD = k * MEDi * |Xi - MEDiXi|
diff_med =
diff
|> median(column: "_value")
|> map(fn: (r) => ({ r with MAD: k * r._value}))
|> filter(fn: (r) => r.MAD > 0.0)
output = join(tables: {diff: diff, diff_med: diff_med},
on: ["_time"], method: "inner")
|> map(fn: (r) => ({ r with _value:
r._value_diff/r._value_diff_med}))
|> map(fn: (r) => ({ r with
level:
if r._value >= threshold then "anomaly"
else "normal"
}))
return output
}

mad_test.ﬂux
t_mad = (table=<-) =>
table
|> range(start:
2020-04-27T00:00:00Z, stop:
2020-05-01T00:00:00Z)
|> anomalydetection.mad(threshold:
3.0)
test t_mad = () =>
({input: testing.loadStorage(csv: inData),
want: testing.loadMem(csv: outData), fn:
t_mad})
package anomalydetection_test
import "testing"
import
"contrib/anaisdg/anomalydetection
"
inData= "<your annotated csv>"
outData="<your annotated csv>"

Compile Flux
1. Install the pkg-config utility:
brew install pkg-config
2. Install the pkg-conﬁg wrapper utility:
go get github.com/influxdata/pkg-config
3. Ensure the GOBIN directory is on your PATH:
export PATH=${GOPATH}/bin:${PATH}
4. Navigate to the Flux repository and run the following commands to build Flux:
go generate ./libflux/go/libflux
go generate ./stdlib
go build ./cmd/flux

28
Open Jobs?
InﬂuxData: https://www.inﬂuxdata.com/careers/

29
NEXT MEETUP - August 12, 2020:
Obtaining the Perfect Smoke By Monitoring Your BBQ
with InﬂuxDB and Telegraf
Thanks for coming!

Time Series Meetup: Virtual Edition | July 2020

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Time Series Meetup: Virtual Edition | July 2020

Ähnlich wie Time Series Meetup: Virtual Edition | July 2020 (20)

Mehr von InfluxData

Mehr von InfluxData (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Time Series Meetup: Virtual Edition | July 2020