In this presentation for Gophercon India 2016, we discuss about a generalized idea to easily compose any stream of data by implementing only a one method interface for data collections.
2. 2
Streams are the Norm
●
Need for Business
Analytics
generates endless
streams of data
●
Horizontal
Scaling adds to
the number of
streams
●
Stream variety is
on the rise
●
Streams need to
be composed and
co-processed
4. 4
Stream Elements
No Generics In Go, so stream elements are boxed
objects:
interface{}
●
There is no type-safety for generic stream
processing.
●
Not a big deal really, Schemaless datasources
return interfaces anyway.
●
It can be easily managed by runtime type-
checking in the first step of the pipeline.
8. 8
Problem
●
Don’t want to code1 unless
absolutely necessary
●
Don’t want to repeat ourselves
●
More code leads to more maintenance
and testing
1
not on company hours at least! YMMV.
9. 9
Abstraction Goals
●
Data processing should be decoupled
from data structures.
●
Compositions should happen on data, not data
structures.
Note: <T> denotes type. This is not valid Go
code.
Note: f and m are functions, e.g:
f(value interface{}) bool
m(value interface{}) interface{}
15. 15
Transduction
●
Flush is used when some function in the
chain would like to eject the operation.
●
When all the data in the stream has been
processed or a flush has been requested,
method Complete() is called to capture
the states in the stateful reducers.
Chain of functions call each
other:
f, m => m(f(val))
17. 17
Observations
●
Cons
– No compile-time type safety
– Tricky to parallelize
●
Pros
– Fewer Go-routines for long pipelines
– Fewer synchronizations For channels
– Potentially uses less memory
– Decoupled processing logic from data structures
– Better compositions
– More readable