2. 22
Who we are.
Log everything
Mike Lohmann
Architektur
Author (PHPMagazin, IX, heise.de)
Dr. Stefan Schadwinkel
Analytics
Author (heise.de, Cereb.Cortex, EJN, J.Neurophysiol.)
3. 33
Agenda.
Log everything
What we did. What we do.
Log everything! - Our way from Requirement to Solution
Infrastructure and technologies: Simple, Scalable, Open Source
Happy business users.
4. 44
What we did.
Log everything
Creating & operating education communities
Webapplications
Multi-language
Different market rules in different countries
Consolidating the technological basis for multiple (new) products
5. 55
DECK36 GmbH & Co. KG
Log everything
DECK36 is a young spin-off from ICANS
7 core engineers with longstanding expertise
(operate, scale, automate, analyze)
Consulting and engineering services for the
etruvian group and external customers
6. 66
Numberfacts of PokerStrategy.com
Log everything
6.000.000
Registered Users
PokerStrategy.com
Education since 2005
19 Languages
2.800.000
PI/Day
700.000
Posts/Day
7. 77
Moving on…
Log everything
Build more Education communities like PokerStrategy…
Assume PokerStrategy KPIs(?)
Other Business models
Add mobile and the social web…
Our requirement: Log everything!
8. 88
Logging Tools / Technologies
Producer
Web/Mobile Apps
JS Frontend
Servers
Databases
9/22/2013
Transport
Now:
RabbitMQ +
Erlang Consumer
OR
Kafka +
Any other Consumer
Was:
Flume
Storage
Now:
S3 Storage +
Hadoop with EMR
OR
Any other storage
Was:
Virtualized Inhouse
Hadoop
Analytics
MapReduce with
Hive/Pig
Results in any format
Excel, QlikView,
RDMS, ...
Realtime Datastream Analytics
Storm / Trident
11. 1111
Producer JS (in progress)
9/22/2013
JS Client
DataCollector
(NodeJS)
Shovel
Local
RabbitMQ
Local
Storage
Validator
Tracks Event
/Home
Trigger
WebSocket
12. 1212
Producer
9/22/2013
LoggingComponent: Provides interfaces, filters and handlers
LoggingBundle: Glues all together with Symfony2
Drupal Logging Module: Using the LoggingComponent
JS Frontend Client: LogClient for Browsers (in progress)
https://github.com/ICANS/IcansLoggingComponent
https://github.com/ICANS/IcansLoggingBundle
https://github.com/ICANS/drupal-logging-module
https://github.com/DECK36/starlog-js-frontend-client
13. 1313
Transport
9/22/2013
1st Solution: Flume
+ Part of the Hadoop Ecosystem
+ Flexible Central config, Extensible via Plugins
- Not mature software (flume, flume-ng, plugin interfaces, ..)
- Central config has problems with puppet
2nd Solution: RabbitMQ
+ Local RabbitMQ Cluster
+ Decentralized config (producers & consumers simply connect)
- HDFS Sink not pre-packaged
14. 1414
Storage
9/22/2013
1st Solution: Self-hosted Hadoop
- Virtualized Infrastructure makes HDFS redundant
- High costs (cluster always running, admin work)
2nd Solution: Cloud Storage
+ Amazon S3
+ Elastic MapReduce: Hadoop on demand
+ cost effective (only pay, what you use)
15. 1515
Compaction
9/22/2013
RabbitMQ consumer (Erlang) stores data to cloud
Yet: we have a mixed message stream, but want:
s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo
MapReduce:
Streaming (stdin/stdout to any tool)
Computation (Hive, Pig, Cascalog, etc.)
Amazon Redshift
PostgreSQL-compatible Data Warehouse
Hive Partitioning!
16. 1616
Analytics
9/22/2013
Cascalog is Clojure, Clojure is Lisp
(?<- (stdout) [?person] (age ?person ?age) … (< ?age 30))
Query
Operator
Cascading
Output Tap
Columns of
the dataset
generated
by the query
„Generator“ „Predicate“
as many as you want
both can be any clojure function
clojure can call anything that is
available within a JVM
20. 2020
Analytics
9/22/2013
Now, get the stats by executing a query:
We can now simply copy the data from S3 and import in any local analytical tool
Excel, Redshift, QlikView, R, etc.
21. 2121
Realtime Datastream Analytics
9/22/2013
• Storm: Hadoop for realtime analytics
• Rock solid HA concept
• Highly scalable
• Can:
Processing Streams (and trigger events)
Provide a DRPC functionality
Work on enormous data load
• Fancy names for modules
(spouts/bolts/tuple/topology)
• Easy to use
Small and easy to understand API
DevMode
• Add new topologies at run time
23. 2323
Happy business users!
9/22/2013
Questions they have often can be automated (ETL, Reports)
New questions can be explored (Ad-hoc, Search)
Insights can be used as feedback into the system (Decisions, Websockets)
Data-driven applications can be created that can be used by multiple websites or
they can be taylored to individual needs.