Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux Functions | InfluxDays Virtual Experience London 2020

Scott Anderson
Technical Writer @ InfluxData
Map & Reduce
The Powerhouses of
Custom Flux Functions

© 2020 InfluxData. All rights reserved. 2
Disclaimer
This presentation is not about Hadoop MapReduce-like
functionality in Flux. This is about building custom
functions with map() and reduce().

Introduction to Flux
www.influxdata.com/resources/introduction-to-
flux-and-functional-data-scripting/

“
Flux is a functional data scripting language
that lets you query, process, write, and
alert on data in a single syntax.
”

Design Goals
● Turing-complete
● Usable
● Readable
● Flexible
● Extensible
● Testable
● Contributable
● Shareable

Extensibility
● InfluxQL was and is hard to extend.
● TICKscript was and is easier to extend, but depends on
external languages for true customization.

Extensibility
● Early strategy for Flux has been to create “primitive” functions
that can be used to create other functions.
● Flux recently introduced architecture and guidelines for
community contributions.

If it doesn’t exist, create it
Custom Functions

Custom Function Definition
funcName = (parameters) => funcOperations
Function
square = (x) => x * x
Transformation
somethingCool = (tables=<-) => tables |> ...

Functions & Transformations
Function
bool(v: 1)
// Returns true
Transformation
toBool = (tables=<-) =>
tables
|> map(fn: (r) => ({
r with _value:
bool(v: r._value)
}))

Functions & Transformations
from(bucket: "myData")
|> range(start: -1h)
|> filter(fn: (r) => r._field == "enabled")
|> toBool()

map() & reduce()

_time sensorID _field _value
2020-06-01T00:12:00Z TM02001 tempC 21.29
2020-06-01T00:12:01Z TM02001 tempC 21.36
2020-06-01T00:12:02Z TM02001 tempC 21.34

2020-06-01T00:12:00Z TM02001 tempC 21.29
{
_time: 2020-06-01T00:12:00Z,
sensorID: "TM02001",
_field: "tempC",
_value: 21.29
}

map(
fn:
)
(r) => ({ ... })
)

(r) => ({ ... })
r = {
_time: 2020-06-01T00:12:00Z,
_field: "tempC",
_value: 21.29
}

r = {
_time: 2020-06-01T00:12:00Z,
_field: "tempC",
_value: 21.29
}
(r) => ({ _value: r._value })

r = {
_time: 2020-06-01T00:12:00Z,
_field: "tempC",
_value: 21.29
}
map(fn: (r) => ({
_field: "tempF", _value: (r._value * 9.0 / 5.0) + 32.0
}))

{
_field: "tempF",
_value: 70.322
}
map(fn: (r) => ({
}))
_field _value
tempF 70.322

{
_time: 2020-06-01T00:12:00Z,
_field: "tempF",
_value: 70.322
}
map(fn: (r) => ({ r with
}))

map(fn: (r) => ({ r with
}))
2020-06-01T00:12:00Z TM02001 tempF 70.322

|> map(fn: (r) => ({ r with _field: "tempF", _value: (r._value * 9.0 / 5.0) + 32.0 }))
2020-06-01T00:12:00Z TM02001 tempC 21.29
2020-06-01T00:12:01Z TM02001 tempC 21.36
2020-06-01T00:12:02Z TM02001 tempC 21.34
2020-06-01T00:12:00Z TM02001 tempF 70.322
2020-06-01T00:12:01Z TM02001 tempF 70.448
2020-06-01T00:12:02Z TM02001 tempF 70.412

toF
Create a custom transformation
toF = (tables=<-)toF = (tables=<-) =>
tables
toF = (tables=<-) =>
tables
|> map(
fn: (r) => ({ r with
_field: "tempF",
_value: (r._value * 9.0 / 5.0) + 32.0
})
)

|> map(fn: (r) => ({ r with _field: "tempF", _value: (r._value * 9.0 / 5.0) + 32.0 }))
2020-06-01T00:12:00Z TM02001 tempC 21.29
2020-06-01T00:12:01Z TM02001 tempC 21.36
2020-06-01T00:12:02Z TM02001 tempC 21.34
2020-06-01T00:12:00Z TM02001 tempF 70.322
2020-06-01T00:12:01Z TM02001 tempF 70.448
2020-06-01T00:12:02Z TM02001 tempF 70.412
|> toF()

Temperature Range Temperature Text
below 0°C freezing
0°C - 15.5°C cold
15.5°C - 26.5°C mild
26.5°C - 37.5°C hot
37.5°C or higher very hot
Add text based on temperature range?

(r) => ({
column: if r._value == "foo" then "bar" else "baz"
})
Use conditional logic!

tempToText = (tables=<-) =>
tables
|> map(fn: (r) => ({ r with
})
)
tables
temp_text:
})
)
tables
temp_text:
if r._value <= 0.0 then "freezing"
})
)
tables
temp_text:
else if r._value > 0.0 and r._value <= 15.5 then "cold"
})
)
tables
temp_text:
else if r._value > 15.5 and r._value <= 26.5 then "mild"
else if r._value > 26.5 and r._value <= 37.5 then "hot"
else if r._value > 37.5 then "very hot"
})
)
tables
temp_text:
else if r._value > 15.5 and r._value <= 26.5 then "mild"
else if r._value > 26.5 and r._value <= 37.5 then "hot"
else if r._value > 37.5 then "very hot"
else "unknown"
})
)

|> tempToText()
2020-06-01T00:12:00Z TM02001 tempC -1.2
2020-06-01T00:12:01Z TM02001 tempC 22.1
2020-06-01T00:12:02Z TM02001 tempC 31.8
_time sensorID _field _value temp_text
2020-06-01T00:12:00Z TM02001 tempC -1.2 freezing
2020-06-01T00:12:01Z TM02001 tempC 22.1 mild
2020-06-01T00:12:02Z TM02001 tempC 31.8 hot

Example custom functions
using map()

Basic mathematical operations
multiplyByX = (tables=<-, x) =>
tables
|> map(fn: (r) => ({ r with _value: r._value * x }))
square = (tables=<-) =>
tables
|> map(fn: (r) => ({ r with _value: r._value * r._value }))
quarter = (tables=<-) =>
tables
|> map(fn: (r) => ({ r with _value: r._value / 4.0 }))

Other mathematical operations
import "math"
sin = (tables=<-) =>
tables
|> map(fn: (r) => ({ r with _value: math.sin(x: r._value) }))
circumference = (tables=<-) =>
tables
|> map(fn: (r) => ({ r with circumference: 2.0 * math.pi * r._value }))
circularArea = (tables=<-) =>
tables
|> map(fn: (r) => ({ r with area: math.pi * r._value ^ 2.0 }))

String manipulationimport "strings"
rmHostPrefix = (tables=<-) =>
tables
host: strings.trimPrefix(v: r.host, prefix: "host_") }))
replaceSpaces = (tables=<-, char="_") =>
tables
_value: strings.replaceAll(v: r._value, t: " ", u: char) }))
humanReadableMessage = (tables=<-) =>
tables
humanReadable: "The current value is ${string(v: r._value)}." }))

Common gotchas
● The with operator is your friend.
● Mathematical operands must be of the same data type.
● map() does not add new columns to group keys.
● map() only operates on a single row at a time.
○ Can’t use values from previous or subsequent rows.
○ If your operation requires values from separate fields, use pivot()
or join() to align field values in rows.

Reduce all the things!!
reduce()

|> reduce(...)

2020-06-01T00:12:00Z TM02001 tempC 21.29
2020-06-01T00:12:01Z TM02001 tempC 21.36
2020-06-01T00:12:02Z TM02001 tempC 21.34
sensorID _field _value
TM02001 tempC 21.33

reduce(
fn:
identity:
)

reduce(
fn: (r, accumulator) => ({ ... }),
identity:
)

reduce(
fn: (r, accumulator) => ({ ... }),
identity: { ... }
)

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
r = {_time: 0001, _field: "example", _value: 1.3}
group_key = [ _field ]
fn: (r, accumulator) => ({ sum: r._value + accumulator.sum })
accumulator = {sum: 0.0}
identity: {sum: 0.0}

{ sum: r._value + accumulator.sum }
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
{ sum: 1.3 + 0.0 }{ sum: 1.3 }
Returns:

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
{ sum: 12.2 + 1.3 }{ sum: 13.5 }
Returns:

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
{ sum: 18.4 }
Returns:

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
{ sum: 21.6 }
Returns:

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
{ sum: 37.4 }
Returns:

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Final accumulator:
{ sum: 37.4 }

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
● Final accumulator
● Columns in the group key
● Drops all other columns
Output table

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
_field sum
example 37.4

_start _stop _time _measurement tag _field _value
2020-06-01T00:00:00Z 2020-06-05T23:59:59Z 2020-06-01T00:00:00Z foo bar example 1.3
group_key = [ _start,_stop,_measurement,tag,_field ]

_start _stop _measurement tag _field sum
2020-06-01T00:00:00Z 2020-06-05T23:59:59Z foo bar example 37.4
identity: {_value: 0.0}

2020-06-01T00:00:00Z 2020-06-05T23:59:59Z 2020-06-01T00:00:00Z foo baz example 8.1

2020-06-01T00:00:00Z 2020-06-05T23:59:59Z foo bar example 37.4
2020-06-01T00:00:00Z 2020-06-05T23:59:59Z foo baz example 45.5

average

average = (tables=<-) =>
tables
|> reduce(
)

tables
|> reduce(
fn: (r, accumulator) => ({
}),
)

tables
|> reduce(
sum: r._value + accumulator.sum,
}),
)

tables
|> reduce(
count: accumulator.count + 1.0,
}),
)

tables
|> reduce(
avg: (r._value + accumulator.sum) / accumulator.count
}),
)

tables
|> reduce(
}),
identity: { sum: 0.0, count: 1.0, avg: 0.0 }
)

{
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
accumulator = {sum: 0.0, count: 1.0, avg: 0.0}
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 1.3 + 0.0,
count: 1.0 + 1.0,
avg: (1.3 + 0.0) / 1.0
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 1.3,
count: 2.0,
avg: 1.3
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 12.2 + 1.3,
count: 2.0 + 1.0,
avg: (12.2 + 1.3) / 2.0
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 14.5,
count: 3.0,
avg: 6.75
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 18.4,
count: 4.0,
avg: 6.13
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 21.6,
count: 5.0,
avg: 5.4
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

{
sum: 37.4,
count: 6.0,
avg: 7.48
}
_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
Returns:
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}
Final accumulator:
{ sum: 37.4, count: 6.0, avg: 7.48 }

_time _field _value
0001 example 1.3
0002 example 12.2
0003 example 4.9
0004 example 3.2
0005 example 15.8
})
identity: {
sum: 0.0,
count: 1.0,
avg: 0.0
}
_field sum count avg
example 37.4 6.0 7.48

tables
|> reduce(
}),
identity: { sum: 0.0, count: 1.0, avg: 0.0 }
)
|> drop(columns: ["sum","count"])
|> rename(columns: {avg: "_value"})

_time tag _field _value
0001 foo example 98.3
_time tag _field _value
0001 bar example 56.1
tag _field _value
foo example 114.26
tag _field _value
bar example 74.8
|> average()

Example custom functions
using reduce()

minMaxMean = (tables=<-) =>
tables
|> reduce(
identity: {count: 0, sum: 0.0, min: 0.0, max: 0.0, mean:0.0},
count: accumulator.count + 1,
min: if accumulator.count == 0 then r._value
else if r._value < accumulator.min then r._value
else accumulator.min,
max: if accumulator.count == 0 then r._value
else if r._value > accumulator.max then r._value
else accumulator.max,
mean: if accumulator.count == 0 then r._value
else (r._value + accumulator.sum) / float(v: accumulator.count + 1)
})
)
|> drop(columns: ["count", "sum"])

candlestick = (tables=<-) =>
tables
|> reduce(
identity: { index: 0, first: 0.0, last: 0.0, max: 0.0, min: 0.0 },
index: accumulator.index + 1,
first: if accumulator.index == 0 then r._value
else accumulator.first,
last: r._value,
max: if accumulator.index == 0 then r._value
else if r._value > accumulator.max then r._value
else accumulator.max,
min: if accumulator.index == 0 then r._value
else if r._value < accumulator.min then r._value
else accumulator.min
})
)
|> drop(columns: ["index"])

import "strings"
commaSeparatedList = (tables=<-) =>
tables
|> reduce(
identity: {values: ""},
values: accumulator.values + "${string(v: r._value)},"
})
)
values: strings.trimRight(v: r.values, cutset: ",")
}))

Common gotchas
● reduce() does not support the with operator.
● Columns not in the group key or explicitly mapped in the
reduce operation are dropped.
● Mathematical operands must be of the same data type.
● reduce() is “destructive” and will only output a single row
per table.

Contribute custom functions to Flux
Contribute & Share

Contribute to Flux
1. Fork and clone the Flux repo – github.com/influxdata/flux
2. Navigate to flux/stdlib/contrib and create a new directory with
your GitHub username.
flux/stdlib/contrib/sanderson
1. Create directories for each sub-package.
flux/stdlib/contrib/sanderson/finance
4. Add a package definition file.
flux/stdlib/contrib/sanderson/finance/finance.flux

package finance

package finance
import "math"
import "regexp"

package finance
import "math"
import "regexp"
option currency = "$"

package finance
import "math"
import "regexp"
nyse = ["A","AA","AACG","AAL","AAMC","AAME" ... ]

package finance
import "math"
import "regexp"
nyse = ["A","AA","AACG","AAL","AAMC","AAME" ... ]
// Comments that explain what functions do and how they work
candlestick = (tables=<-) => tables |> ...
// Calculate profit and loss
profitAndLoss = (tables=<-) => tables |> ...
// Calculate the days until retirement
retireIn = (tables=<-, goal) => tables |> ...

5. Add tests for each function.
flux/stdlib/contrib/sanderson/finance/retireIn_test.f
lux
Contribute to Flux

package retireIn_test
import "testing"
import "contrib/sanderson/finance"
inData= "...some annotated CSV..."
outData = "...some more annotated CSV..."
t_retireIn = (table=<-) =>
table
|> range(start: 2020-06-01T00:00:00Z, stop: 2020-06-05T00:00:00Z)
|> finance.retireIn(goal: 5000000)
test _retireIn = () => ({
input: testing.loadStorage(csv: inData),
want: testing.loadMem(csv: outData),
fn: t_retireIn
})

6. Generate the standard library.
make or go generate ./stdlib
6. Push your changes to GitHub.
7. Submit a pull request on the Flux repo.
8. Use your contributed package.
Contribute to Flux
lux

6. Generate the standard library.
make or go generate ./stdlib
6. Push your changes to GitHub.
7. Submit a pull request on the Flux repo.
8. Use your contributed package.
9. Share your contribution.
Contribute to Flux
lux

Thank you!
Email: scott@influxdata.com
Github: @sanderson
InfluxData Community: @scott
InfluxDB Community Slack: @Scott Anderson

Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux Functions | InfluxDays Virtual Experience London 2020

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux Functions | InfluxDays Virtual Experience London 2020

Ähnlich wie Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux Functions | InfluxDays Virtual Experience London 2020 (20)

Mehr von InfluxData

Mehr von InfluxData (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux Functions | InfluxDays Virtual Experience London 2020