Message:Passing - lpw 2012

Messaging,
interoperability and log
aggregation - a new
framework

Tomas Doran (t0m) <bobtﬁsh@bobtﬁsh.net>

Sponsored by
• state51
• Pb of mogilefs, 100+ boxes.
• > 4 million tracks on-demand via API
• > 400 reqs/s per server, >1Gb peak from backhaul
• Suretec VOIP Systems
• UK voice over IP provider
• Extensive API, including WebHooks for notiﬁcations
• TIM Group
• “Alpha capture” applications
• Java / Scala / Clojure / ruby / puppet / python / perl

What?
• This talk is about my new perl library:
Message::Passing

Why?
• I’d better stop, and explain a speciﬁc
problem.

Why?
problem.
• The solution that grew out of this is more
generic.

Why?
problem.
generic.
• But it illustrates my concerns and design
choices well.

Why?
problem.
generic.
• But it illustrates my concerns and design
choices well.
• And everyone likes a story, right?

Once upon a time...

• I was bored of tailing log ﬁles across dozens
of servers

Once upon a time...

• I was bored of tailing log ﬁles across dozens
of servers
• splunk was amazing, but unaffordable

Centralised logging
• Syslog isn’t good enough

Centralised logging
• UDP is lossy, TCP not much better

Centralised logging
• Limited ﬁelds

Centralised logging
• Limited ﬁelds
• No structure to actual message

Centralised logging
• Limited ﬁelds
• No structure to actual message
• RFC3164 - “This document describes the
observed behaviour of the syslog protocol”

Centralised logging
• Structured app logging

Centralised logging
• We want to log data, rather than text
from our application

Centralised logging
• We want to log data, rather than text
from our application
• E.g. HTTP request - vhost, path, time to
generate, N db queries etc..

Centralised logging
• Post-process log ﬁles to re-structure

Centralised logging
• Cases we do not control (e.g. apache)

Centralised logging
• Cases we do not control (e.g. apache)
• SO MANY DATE FORMATS. ARGHH!!

Apache

[27/Jun/2012:23:57:03
+0000]

ElasticSearch

[2012-06-26
02:08:26,879]

RabbitMQ

26-Jun-2012::16:18:30

.Net ‘tick’
634763158360000000

100 ns from 1st Jan 1AD
(Except those that are from 3rd Jan)

Aaaaaaannnnyyyway...

• Please use ISO8601
• or epochseconds
• or epochmicroseconds
• In UTC!

Centralised logging
• Publish logs as JSON to a message queue

Centralised logging
• JSON is fast, and widely supported

Centralised logging
• JSON is fast, and widely supported
• Great for arbitrary structured data!

Message queue
• Flattens load spikes!

Message queue
• Only have to keep up with average message
volume, not peak volume.

Message queue
• Logs are bursty! (Peak rate 1000x average.)

Message queue
• Easy to scale - just add more consumers

Message queue
• Allows smart routing

Message queue
• Allows smart routing
• Great as a common integration point.

elasticsearch
• Just tip JSON documents into it

elasticsearch
• Figures out type for each ﬁeld, indexes
appropriately.

elasticsearch
appropriately.
• Free sharding and replication

elasticsearch
appropriately.
• Free sharding and replication
• Histograms!

Logstash
In JRuby, by Jordan Sissel

Input
Simple: Filter
Output

Flexible
Extensible
Plays well with others
Nice web interface

Logstash on each host
is totally out...

is totally out...
• Running it on elasticsearch servers which
are already dedicated to this is ﬁne..

is totally out...
• I’d still like to reuse all of it’s parsing

is totally out...
• How about I just log to AMQP from my
app?

is totally out...
• How about I just log to AMQP from my
app?
• Doooom!

ZeroMQ has the
correct semantics

ZeroMQ has the
correct semantics
• Pub/Sub sockets

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Never, ever blocking

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Lossy! (If needed)

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Buffer sizes / locations conﬁgureable

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Arbitrary message size

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Arbitrary message size
• IO done in a background thread

On host log collector
• ZeroMQ SUB socket
• App logs - pre structured
• Syslog listener
• Forward rsyslogd
• Log ﬁle tailer
• Ship to AMQP

This talk
• Is about my new library: Message::Passing
• The clue is in the name...

This talk
• Hopefully really simple

This talk
• Maybe even useful!

This talk
• Maybe even useful!
• Deﬁnitely small - you can replace / rewrite
it easily.

Lets make it generic!
• So, I wanted a log shipper

• I ended up with a framework for messaging
interoperability

interoperability
• Whoops!

interoperability
• Whoops!
• Got sick of writing scripts..

Does this actually
work?
• YES - In production at four sites for me.

Does this actually
work?
• Some of the adaptors are partially
complete

Does this actually
work?
complete
• Dumber than logstash - no multiple
threads/cores

Does this actually
work?
complete
• Dumber than logstash - no multiple
threads/cores
• ZeroMQ is insanely fast

Other people are using
it in production!

Two people I know of already writing have
already written adaptors!

Events - my model for
message passing

message passing
• a hash {}

message passing
• a hash {}
• Output consumes events:
• method consume ($event) { ...

message passing
• a hash {}
• Input produces events:
• has output_to => (..

message passing
• a hash {}
• Input produces events:
• has output_to => (..
• Filter does both

Simplifying assumption

$self->output_to->consume($message)

That’s it.
• No, really - that’s all the complexity you
have to care about!

That’s it.
have to care about!
• Except for the complexity introduced by
the inputs and outputs you use.

That’s it.
have to care about!
• Except for the complexity introduced by
the inputs and outputs you use.
• Uniﬁed attribute names / reconnection
model, etc.. This helps, somewhat..

Inputs and outputs
• ZeroMQ In / Out
• AMQP (RabbitMQ) In / Out
• STOMP (ActiveMQ) In / Out
• elasticsearch Out
• Redis PubSub In/Out
• Syslog In
• MongoDB Out
• Collectd In/Out
• HTTP POST (“WebHooks”) Out
• UDP packets In/Out (e.g. statsd)

DSL
• Building more complex chains
easy!
• Multiple inputs
• Multiple outputs
• Multiple independent chains

CLI

• 1 Input
• 1 Output
• 1 Filter (default Null)

• For simple use, or testing.

CLI

• Encode / Decode step is just a Filter
• JSON by default
• Supply command line, or conﬁg ﬁle
• Daemon features

The dist:
Message::Passing
• Core dist supplies CLI, DSL, roles for reuse.

The dist:
Message::Passing
• Adaptors for most protocols in other
modules.

The dist:
Message::Passing
modules.
• Moo based - small footprint, can be
fatpacked (no XS dependencies).

The dist:
Message::Passing
modules.
• Moo based - small footprint, can be
fatpacked (no XS dependencies).
• Moose compatible.

Example?

message-pass --input STDIN --output STDOUT
{}
{}

Less trivial example
message-pass --input ZeroMQ --input_options
‘{“socket_bind”:”tcp://*:5222”}’
--output STDOUT

message-pass --output ZeroMQ --output_options
‘{“connect”:”tcp://127.0.0.1:5222”}’
--input STDIN

Jenga:
message-pass --input STDIN --output STOMP --output_options
'{"destination":"/queue/foo","hostname":"localhost", "port":"6163", "username":"guest",
"password":"guest"}'

message-pass --input STOMP --output Redis --input_options '{"destination":"/queue/
foo", "hostname":"localhost","port":"6163","username":"guest","password":"guest"}'
--output_options '{"topic":"foo","hostname":"127.0.0.1","port":"6379"}'

message-pass --input Redis --output AMQP --input_options '{"topics":
["foo"],"hostname":"127.0.0.1","port":"6379"}' --output_options
'{"hostname":"127.0.0.1","username":"guest","password":"guest",
"exchange_name":"foo"}'

message-pass --input AMQP --output STDOUT --input_options
'{"hostname":"127.0.0.1", "username":"guest", "password":"guest",
"exchange_name":"foo","queue_name":"foo"}'

Example 4?
• The last example wasn’t silly enough!

Example 4?
• How could I top that?

Example 4?
• Plan - Re-invent mongrel2

Example 4?
• Plan - Re-invent mongrel2
• Badly

PSGI
• PSGI $env is basically just a hash.

PSGI
• (With a little ﬁddling), you can serialize it as
JSON

PSGI
JSON
• PSGI response is just an array.

PSGI
JSON
• PSGI response is just an array.
• Ignore streaming responses!

Demo?
plackup -E production -s Twiggy -MPlack::App::Message::Passing
-e'Plack::App::Message::Passing->new(return_address =>
"tcp://127.0.0.1:5555", send_address =>
"tcp://127.0.0.1:5556")->to_app'

plackup -E production -s Message::Passing testapp.psgi --host
127.0.0.1 --port 5556

PUSH socket does fan
out between multiple
handlers.

Reply to address
embedded in request

Run multiple ‘handler’
processes. Hot
restarts, hot add /
remove workers

Other applications

• Anywhere an asynchronous event stream is
useful!
• Monitoring
• Metrics transport
• Queued jobs - worker pool

Other applications
(Web stuff)

• User activity (ajax ‘what are your users
doing’)
• WebSockets / MXHR
• HTTP Push notiﬁcations - “WebHooks”

WebHooks

• HTTP PUSH notiﬁcation
• E.g. Paypal IPN
• Shopify API

What about logstash?

• Use my lightweight code on end nodes.
• Use logstash for parsing/ﬁltering on the
dedicated hardware (elasticsearch boxes)
• Filter to change my hashes to logstash
compatible hashes
• For use with MooseX::Storage and/or
Log::Message::Structured

Interoperating - a real
example

example
• Log JSON events out of apps (in multiple
languages) to ZMQ

example
languages) to ZMQ
• Collect and munge with Message::Passing
script ‘logcollector’

example
languages) to ZMQ
• Send to central logstash

example
languages) to ZMQ
• Send onto statsd to aggregate

example
languages) to ZMQ
• Send onto statsd to aggregate
• Graphs in graphite

TimedWebRequest
• A standard event
• Page generation time, URI, HTTP status

statsd
• Rolls up counters and timers into metrics

statsd
• One bucket per stat, emits values every 10
seconds

statsd
seconds
• Counters: Request rate, HTTP status rate

statsd
seconds
• Counters: Request rate, HTTP status rate
• Timers: Total page time, mean page time,
min/max page times

Code

• https://metacpan.org/module/
Message::Passing
• https://github.com/suretec/Message-Passing
• #message-passing on irc.perl.org
• Examples: git://github.com/2941747.git

Message:Passing - lpw 2012

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Message:Passing - lpw 2012

Ähnlich wie Message:Passing - lpw 2012 (20)

Mehr von Tomas Doran

Mehr von Tomas Doran (20)

Message:Passing - lpw 2012

Hinweis der Redaktion