Streams

Streams of social
consciousness
Real-time data transformation

Who am I?
Psycholinguist
Research/Data
analysis
Flex Programmer
OO, Enterprise
Interactive
Developer
Browser + Server
2000 2008 2013
Marielle Lange @widged

Stream expertise
Fairly recent and rather limited
๏Gulp -> custom modules written by adapting other
modules.
๏Data analysis -> Using streams to process large size data
sets.
➡ I will Attempt to provide the minimal orientation to get
started. Staying clear of complex topics like back-
pressure handling.

Streams for data analysis
Garden Data. Aggregating data scrapped from a large number of websites.
Parsing them. Normalizing them (Farenheit vs Celsius, March in NH or SH). Reducing
them (converting [55-65] to 55 #1, 60 #1, 65 #1). Rendering them (average vs
visualisation).
➡

Streams manage a data flow.
Sources. Where data pour
from.
Sinks. Where results pour to.
Throughs. Where data gets
manipulated and
transformed.
ReadStream.
WriteStream.

What are they good for?
๏ Gulp - writing your own modules.
๏ Real-time data obtained from remote servers that would
be too impractical to buffer in a device with limited
memory.
๏ Map-reduce types of computations - a programming
model for processing and generating large data sets. A
map function generates a set of intermediate key/value
pairs ({word: ‘hello’, length: 5}) and a reduce function
merges all intermediate values associated with the same
intermediate key ([‘agile’ , ‘greet’ ,‘hello’] - list of words of
length 5). Great if you want to run computations on
distributed systems.

Readable Streams
Abstraction for a source of data that you are reading
data from.
‣ http responses, on the client
‣ http requests, on the server
‣ fs read streams
‣ zlib streams
‣ crypto streams
‣ tcp sockets
‣ child process stdout and stderr
‣ process.stdin
Notes
๏A readable stream will not start emitting data until you indicate that you are ready
to receive it.
๏Readable streams have two “modes”: a flowing mode and a non-flowing mode.
var flappyStream =
readable.read();

Writable Streams
Abstraction for a destination that you are writing data to.‣ http responses, on the client
‣ http requests, on the server
‣ fs write streams
‣ zlib streams
‣ crypto streams
‣ tcp sockets
‣ child process stdin
‣ process.stdout, process.stderr
writeable
.write(flappyBird);

Transforms
Compressing a file using gzip
var fs     = require(“fs”),
zlib   = require(“fs”);
var readable = fs.createReadStream("foo.txt"),
writable = fs.createWriteStream("foo.txt.gz");
readable
   .pipe(gzip)
   .pipe(writable);
var evilStream =
transform.output
.read();
transform.input
.write(flappyBird);
Abstraction for a stream that is both readable and writable,
where the input is related to the output (map or filter step).
Dominic Tarr’s `through`
module provides a similar
functionality

Basic API
Readable stream
var fs = require('fs');
var readable = fs.createReadStream('foo.txt');
// this is the classic api
readable
.on('data', function (data) { console.log('Data!', data); })
.on('error', function (err) { console.error('Error', err); })
.on('end', function () { console.log('All done!'); });
var fs = require('fs');
var readable = fs.createReadStream('foo.txt')
, writable = fs.createWriteStream('copy.txt');
readable.pipe(writable)
.on('finish', function () { writable.write('an extra line'); });
Writable stream

event-stream (D. Tarr)
JSONStream = require('JSONStream'),
map = require('map-stream');
var input = fs.createReadStream("twitter-feed.json"),
output = fs.createWriteStream("twitter-sentiments.json");
input
.pipe(JSONStream.parse("*"))
.pipe(map(computeSentiments))
.pipe(output);

Rapidly define a list of files to read from with glob strings
Vinyl
var fs = require('fs'), vinyl = require('vinyl-fs')
vinyl.src('./data/*/quad/*.comp.json', { buffer: false }).pipe(map(mapSource));
function mapSource(file, asyncReturn) { var srcStream = file.contents;
srcStream .pipe(JSONStream.parse("*")) .pipe(SomeAnalysis)
.pipe(vinyl.dest("./out"))
};

Twitter Sentiments
Register an application with the Twitter API –
https://dev.twitter.com/
Create an access token.
In your projects, add a file “secret_keys.js” with:
Takes advantage of the sentiment module:
https://github.com/thisandagain/sentiment
consumer_secret: "YOUR_CONSUMER_SECRET", access_token_key: "USER_ACCESS_TOKEN", access_token_secret: "USE

Separation of concerns
The #1 reason to use streams for me is that the
piping structure encourages the writing of programs
as bite-size modules that are highly interchangeable.
In the early stages of writing the example program, I had:
tweets
.pipe(map(englishOnly))
.pipe(map(addSentiment))
Then I found out that the API gave you the option to specify a language filter.All I had to do was drop one line of code.

Functional Programming
A more functional style of programming encourages the
avoidance of side effects or state mutation.
map   = require(“map-stream”);
var readable = fs.createReadStream("foo.txt"),
readable
   .pipe(map(filterEnglish))
function filterEnglish(data, asyncReturn) {
   if(data.language === “en”) {
// write these data to the output stream
      asyncReturn(null, data);
   } else {
// but don’t write these.
      asyncReturn();
   }
}
๏ Single Responsibility Principle: "A
function should do one thing, and do it
well."
๏ Pure functions. No knowledge of the
external world whatsoever. Every bit
of information required for the running
of the function is explicitly passed as
paramter.
๏ Immutable data. A function returns a
new data that captures the
transformation rather than a reference
to the old data.
๏ Higher Order Functions. Functions
that return functions (partials,
currying). A way to capture local

Streams

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Streams

Ähnlich wie Streams (20)

Streams