SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 1
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Real-time analytics with
Storm, NoSQL, and Hadoop
Brian Bulkowski
Founder and CTO
Aerospike
What is Storm?
➤  Storm is an Apache open-source project
(Hadoop of real time)
§  Created by Twitter
§  No “Cloudera”-like vendor
§  Return of “Message Oriented Middleware”
➤  Overview
§  “Spouts”
connect with data sources
(example: web log tail ingest)
§  “Bolts”
analysis and data manipulation of any sort
§  “Nimbus”
control entity re-balances and re-configures
without downtime, add servers seamlessly
§  “Topology”
Ordering of bolts and spouts
§  “Tuple”
A Storm message
§  “Trident”
Higher level abstraction supporting real-time
joins, reliability
© 2014 Aerospike. All rights reserved. Confidential | Pg. 2
Why Storm?
➤  Same management framework as
Hadoop
➤  Many Spouts and Bolts available
https://github.com/nathanmarz/storm-
contrib
➤  Simple database integration
➤  High performance & reliability
(1MM/S/S ++)
➤  Great documentation
➤  Multiple languages
➤  Excellent framework for event
oriented implementation
© 2014 Aerospike. All rights reserved. Confidential | Pg. 3
But ---
➤  Storm has problems
➤  0mq -> Netty TCP transition
§  So much easier!
➤  Trident not good enough
§  But closer than most things
➤  Auto-rebalance framework
➤  Good news:
Twitter doubling down
© 2014 Aerospike. All rights reserved. Confidential | Pg. 4
Streaming architecture
© 2014 Aerospike. All rights reserved. | Pg. 5
Data
Warehouse,
Hadoop
Cluster
Real-time
Interactions
Server
Batch Analytics
•  User segmentation
•  Location patterns
•  Similar audience
Real-time Interactions
•  Frequency caps
•  Recent ads served
•  Recent search terms
User
Data
Streaming
(Storm)
Hadoop
Integration with Hadoop
➤  Example Bolt connectors for
§  Hdfs
§  Hadoop
➤  Example pattern
§  Bolt writes to HDFS
§  Bolt reads from HDFS
➤  Problem:
§  Storm is AT LEAST ONCE
➤  Solution:
§  write from your app to Hadoop
§  Emit from Hadoop to an edge database
§  Or read from Hdfs directly
© 2014 Aerospike. All rights reserved. Confidential | Pg. 6
Examples
➤  TrendingWords
§  Slice time into “slots” (1 minute / 5 minute)
for each word
§  On seeing a word, increment bucket
© 2014 Aerospike. All rights reserved. Confidential | Pg. 7
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
§  Use Storm Ticks (0.8) for emitting
top events
§  Determine most popular words in
a category
§  Clean up old data on Ticks or on
reads
Medium easy !
( Google
‘trending topics storm’ )
Examples
➤  TrendingWords – Database analysis – single words
§  Twitter at … 5k tps?
§  20 indexable words per tweet
§  10k writes per second
§  50,000 words in English
§  A reasonable RAM problem
Requires queries & kvs
➤  Tuples
§  Perhaps 30 tuples per tweet
§  600K tps writes
§  10M tuples ? Flash might be better
Problem is much harder now!
© 2014 Aerospike. All rights reserved. Confidential | Pg. 8
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
Examples
➤  Recommendations
§  Multiple recommendation systems
§  Multi-arm bandit
§  https://github.com/tdunning/
storm-counts/wiki/Bayesian-Bandit
➤  Simple fraud counts
§  Store recent requests for payment
§  Store recent users
§  Calculate fraud scores, drop events
if past threshold
© 2014 Aerospike. All rights reserved. Confidential | Pg. 9
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 10
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Storm overview
Development installs (easy)
➤  Everything on one server
( no ZeroMQ )
➤  Zookeeper, Nimbus, Supervisors (slaves)
all on one machine
➤  Write your Spouts and Bolts,
but what next?
➤  Resources:
➤  https://github.com/nathanmarz/storm/wiki/
Setting-up-development-environment
© 2014 Aerospike. All rights reserved. Confidential | Pg. 11
Real-world install
( distributed or go home )
➤  Zookeeper
§  A small (1 to 5) set of nodes to hold configuration. It’s like DNS. Hopefully
you have this running for Hadoop.
➤  Nimbus
§  Single server where “Topologies” are managed
§  Communicates via Zookeeper, “storm” namespace
➤  Supervisor ( slave )
§  Needs the addresses of zookeeper nodes
§  Needs the Nimbus address
§  Then simply waits for work
➤  Best guide – this is really good !
http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/
© 2012 Aerospike. All rights reserved. Confidential | Pg. 12
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 13
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Let’s Begin
Real-world install
➤  Ubuntu 12.04 LTS
➤  In Amazon, there’s a variety of scripts and
helpers
§  ( now you can build your own )
§  You can scale back to 1 machine (zoo & nimbus)
➤  In a laptop VM environment, use NAT
§  Share your /etc/hosts file
© 2014 Aerospike. All rights reserved. Confidential | Pg. 14
Prerequisites
➤  Packages
§  sudo apt-get install git-core libtool
autoconf g++ build-essential uuid-dev
maven
➤  Java
§  OpenJDK 6 “is unfortunate”, OracleJDK 6 ok
§  7 can be OpenJDK or Oracle
§  Can’t mix and match 5, 6, 7
§  Will be building & submitting “fat jars”
§  JAVA_HOME everywhere
© 2014 Aerospike. All rights reserved. Confidential | Pg. 15
Prerequisites
➤  0mq – only required in 0.8
§  Native, high performance messaging
§  Storm 0.9.0 avoids 0mq, uses HTTP (!)
§  2.1.x (2.1.6 -> 2.1.11) ONLY
§  sudo apt-get install libzmq-dev
➤  Storm’s Java 0mq client (ugly
http://stackoverflow.com/questions/12115160/compiling-jzmq-on-ubuntu
git clone https://github.com/nathanmarz/jzmq.git
cd jzmq
./autogen.sh
./configure
# overcoming flaw in marz’s repo
cd src
touch classdist_noinst.stamp
CLASSPATH=.:./.:$CLASSPATH javac -d . org/zeromq/ZMQ.java org/
zeromq/ZMQException.java org/zeromq/ZMQQueue.java org/zeromq/
ZMQForwarder.java org/zeromq/ZMQStreamer.java
cd ..
make
sudo make install
© 2014 Aerospike. All rights reserved. Confidential | Pg. 16
Zookeeper server
➤  Packages
§  sudo apt-get install zookeeperd
‘d’ is for servers, without is for clients
§  export PATH=$PATH:/user/share/zookeeper/bin
§  sudo update-rc.d zookeeper defaults
➤  Config
§  All zookeepers know about all other zookeepers
§  /etc/zookeeper/conf/myid
-> 1 through 255
§  /etc/zookeeper/conf/zoo.cfg
list IP or DNS for all other zookeepers
§  Service is up only if a majority is up
© 2014 Aerospike. All rights reserved. Confidential | Pg. 17
Zookeeper server
➤  Validate
§  zkCli.sh -à ( whines a bit, ends up [CONNECTED] )
§  ls /
create /foo mydata
get /foo
set /foo otherdata
get /foo
➤  Notes
Data directory is /var/lib/zookeeper
Config file is /etc/zookeeper/conf/zoo.cfg
“client port” is 2181
© 2014 Aerospike. All rights reserved. Confidential | Pg. 18
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 19
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Let’s Install Storm
Storm
➤  No packages in Ubuntu
§  Download http://storm-project.net/downloads
§  Into /opt/storm, unzip, add to path
§  export PATH=$PATH:/opt/storm/storm-0.8.2/bin
➤  Create user “storm” ; remember Path + Java
sudo chown –R storm:storm /opt/storm
sudo chown –R storm:storm /var/data/storm
➤  Update config file
# update the storm configuration file - create a data directory, point storm
at it
sudo mkdir -p /var/data/storm
sudo chmod a+w /var/data/storm
sudo vi /opt/storm/storm-0.8.2/conf/storm.yaml
storm.local.dir: "/var/data/storm"
:
storm.zookeeper.servers:
- “192.168.62.149”
➤  YAML is touchy. Make sure to have a space after the ‘:’
© 2014 Aerospike. All rights reserved. Confidential | Pg. 20
Storm
➤  Try running components stand alone
sudo su - storm
cd /opt/storm/storm-0.8.2/
bin/storm nimbus
bin/storm supervisor
bin/storm ui
➤  Reliability through low state, quick restart – supervisor !
sudo apt-get install supervisord
cp storm.conf /etc/supervisor/conf.d
➤  Extra hint
sudo unlink /var/run/supervisor.sock
➤  Storm.conf
[program:storm-supervisor]
command=/opt/storm/storm-0.8.2/bin/storm supervisor
user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/supervisor.out
logfile_maxbytes=20MB
logfile_backups=10
© 2014 Aerospike. All rights reserved. Confidential | Pg. 21
Storm
➤  storm-master.conf
[program:storm-nimbus]
command=/opt/storm/storm-0.8.2/bin/storm nimbus
user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/nimbus.out
logfile_maxbytes=20MB
logfile_backups=10
[program:storm-ui]
command=/opt/storm/storm-0.8.2/bin/storm ui
user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/ui.out
logfile_maxbytes=20MB
logfile_backups=10
© 2014 Aerospike. All rights reserved. Confidential | Pg. 22
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 23
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Let’s Run Storm
Run some storm !
➤  Copy storm.yaml to a local user’s directory
mkdir ~/.storm
sudo cp /opt/storm/storm-0.8.2/conf/storm.yaml ~/.storm/
storm.yaml
sudo chown brian:brian ~/.storm/storm.yaml
➤  Install ‘leiningen’ (clojure tools)
Package version too old
http://leiningen.org/ download ‘lein’
into /usr/local/bin – chmod +x /usr/local/bin/lein
➤  storm-starter
git clone http://github.com/nathanmarz/storm-starter.git
cd storm-starter
lein deps ; lein compile ; lein uberjar
© 2014 Aerospike. All rights reserved. Confidential | Pg. 24
Run some storm !
➤  Submit the topology
storm jar target/storm-starter-0.0.1-SNAPSHOT-standalone.jar
storm.starter.ExclamationTopology exclamation-topology
➤  Watch the output
tail –f /opt/storm/storm-0.8.2/logs/worker-xxxx.log
➤  View the job
$ storm list
Topology_name Status Num_tasks Num_workers Uptime_secs
------------------------------------------------------------
exclamation-topology ACTIVE 16 3 798
© 2014 Aerospike. All rights reserved. Confidential | Pg. 25
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 26
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Coding to Storm
Creating topologies
➤  Say it with code
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word", new TestWordSpout(), 10);
builder.setBolt("exclaim1", new ExclamationBolt(), 3).shuffleGrouping("word");
builder.setBolt("exclaim2", new ExclamationBolt(), 2).shuffleGrouping("exclaim1");
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("test");
cluster.shutdown();
}
}
From the ExclaimationTopology
© 2014 Aerospike. All rights reserved. Confidential | Pg. 27
Spouts
➤  RandomSentenceSpout
  public void nextTuple() {
Utils.sleep(100);
String[] sentences = new String[]{
"the cow jumped over the moon",
"an apple a day keeps the doctor away",
"four score and seven years ago",
"snow white and the seven dwarfs",
"i am at two with nature" };
String sentence = sentences[_rand.nextInt(sentences.length)];
_collector.emit(new Values(sentence));
}
© 2014 Aerospike. All rights reserved. Confidential | Pg. 28
Spouts
➤  Easy interaction with queues
§  https://github.com/nathanmarz/storm-contrib
➤  storm-kafka
§  Popular persistent, topic-based queue
➤  storm-jms
§  Java message service
➤  storm-sqs
§  EC2’s queue service
➤  more!
© 2014 Aerospike. All rights reserved. Confidential | Pg. 29
Bolts
➤  Simple exclamation bolt
public static class ExclamationBolt extends BaseRichBolt {
OutputCollector _collector;
@Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
@Override
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
© 2014 Aerospike. All rights reserved. Confidential | Pg. 30
Persisting Bolts
➤  Persistence is good ! Scale the data layer
§  Shared required for counters like TrendingWords
§  Stateless Bolts are better
➤  Easy persistence through examples
§  https://github.com/nathanmarz/storm-contrib
➤  storm-rdbms
§  Stores to a database table
➤  storm-cassandra
§  Stores to a cassandra cluster
➤  storm-redis
➤  storm-memcache
© 2014 Aerospike. All rights reserved. Confidential | Pg. 31
Aerospike Bolts
➤  Bolts available on github
§  https://github.com/aerospike/storm-aerospike
➤  EnrichBolt
§  Add fields from column after looking up a key
➤  PersistBolt
§  Store fields based on a key
© 2014 Aerospike. All rights reserved. Confidential | Pg. 32
Consider Aerospike
➤  If you need the speed of Storm, you need the speed
of Aerospike
§  That’s why we talk about Storm!
➤  Free version at http://aerospike.com/
➤  Internap – free high performance SSD servers for
trial
➤  Benefits
§  In memory with FLASH
§  Clustered for high performance
§  HA state matches Storm’s stateless model
© 2014 Aerospike. All rights reserved. Confidential | Pg. 33
Interesting Bolts
➤  MovingAverageWithSpikeDetection
§  https://github.com/stormprocessor/storm-examples
➤  TrendingTopics
§  http://www.michael-noll.com/blog/2013/01/18/
implementing-real-time-trending-topics-in-storm/
➤  Baysian Bandit
§  https://github.com/tdunning/storm-counts
© 2014 Aerospike. All rights reserved. Confidential | Pg. 34
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 35
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Why Aerospike ?
Why not other databases?
➤  Low latency - database requests in bolts
➤  Proven in production with large-scale deployments
➤  Flash optimized
§  Do you need more than 30G ?
➤  Read / write optimized
➤  Faster & more reliable than
than Kafka (Cassandra based)
➤  Faster than Mongo
➤  More scale than Redis
© 2012 Aerospike. All rights reserved. Confidential | Pg. 36
Aerospike: the gold standard for high throughput,
low latency, high reliability transactions
Performance
• Over ten trillion transactions per
month
• 99% of transactions faster than 2
ms
• 150K TPS per server
Scalability
• Billions of Internet users
• Clustered Software
• Automatic Data Rebalancing
Reliability
• 50 customers; zero service down-
time
• Immediate Consistency
• Rapid Failover; Data Center
Replication
Price/Performance
• Makes impossible projects
affordable
• Flash-optimized
• 1/10 the servers required
2 Trillion Transactions per month
➤  100,000 attributes each for 160 Million users
§  Data used by recommendation engines, e-commerce,
RTB, mobile, video and display ad platforms
➤  16 TB of data on 400 Million users
§  60 Billion Transactions per month
§  4 data centers for geographic proximity
& high availability
§  “Scale.
Real-time performance.
Real-time replication at 4 datacenters.
Aerospike delivered.”
 - Elad Efraim, CTO
© 2013 Aerospike. All rights reserved. Proprietary Pg. 39
➤ Hard to
Maintain
➤ Performance
Better than the Competition
➤ Latency
➤ Number of
Servers
➤ Stability
➤ Cost of
RAM
➤ Cost of
RAM
➤ Scalability
OHIO
1)  No Hotspots
– DHT with RIPEMD160
simplifies data
partitioning
2)  Smart Client – 1 hop to
data, no load balancers
3)  Shared Nothing
Architecture,
every node identical
7) XDR – asynch replication
across data centers ensures
Zero Downtime
4)  Single row ACID
– synch replication in cluster
5)  Smart Cluster, Zero Touch
– auto-failover, rebalancing,
rolling upgrades..
6)  Transactions and long running
tasks prioritized real-time
Simpler Scaling: Fewer Servers, ACID, Zero Touch
DISTRIBUTED QUERIES
1.  “Scatter” requests to all nodes
2.  Indexes in DRAM for fast map of secondary à primary keys
3.  Indexes co-located with data to guarantee ACID,
manage migrations
4.  Records read in parallel from all SSDs
using lock free concurrency control
5.  Aggregate results on each node
6.  “Gather” results from all nodes on client
STREAM AGGREGATIONS
1.  Push Code/ Security Policies/ Rules to Data with UDFs
2.  Pipe Query results through UDFs to
Filter, Transform, Aggregate.. Map, Reduce
REAL-TIME ANALYTICS on OPERATIONAL DATA (No ETL)
➤  In Database, within the same Cluster
➤  On the same Data, on XDR Replicated Clusters
Real-time Analytics on Operational Data
Follow
Join Us!
< meta description=“Aerospike is a
Rule CaptainWord {
strings:
$header = {D0 CF 11 E0 A1 B1 1A E1}
$author = {00 00 00 63 61 70 74 61 69 6E 00}
condition:
$header at 0 and $author
Content=“malware,Exec Code, Overflow, ExecCode Bypass”
Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027”
Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili”
#HashTags
tag
<head>
<meta name=“”>
<meta desc=“”>
</head>
© 2014 Aerospike. All rights reserved. Pg. 43
“keywords”
<TITLE>
✓ I’M ATTENDING
@Aerospikedb
Questions ?

Weitere ähnliche Inhalte

Was ist angesagt?

Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Community
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016StackIQ
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep DiveAmazon Web Services
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica setSudheer Kondla
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016StackIQ
 
Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Naseem Khoodoruth
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph Ceph Community
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensionsChiou-Nan Chen
 
Intro ProxySQL
Intro ProxySQLIntro ProxySQL
Intro ProxySQLI Goo Lee
 
Globus toolkit4installationguide
Globus toolkit4installationguideGlobus toolkit4installationguide
Globus toolkit4installationguideAdarsh Patil
 
Load Balancing with Nginx
Load Balancing with NginxLoad Balancing with Nginx
Load Balancing with NginxMarian Marinov
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Serversupertom
 
Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeMichael May
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Denish Patel
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...ronwarshawsky
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Community
 

Was ist angesagt? (19)

Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Unity Makes Strength
Unity Makes StrengthUnity Makes Strength
Unity Makes Strength
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive(CMP402) Amazon EC2 Instances Deep Dive
(CMP402) Amazon EC2 Instances Deep Dive
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica set
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
 
Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)Azure File Share and File Sync guide (Beginners Edition)
Azure File Share and File Sync guide (Beginners Edition)
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
Intro ProxySQL
Intro ProxySQLIntro ProxySQL
Intro ProxySQL
 
Globus toolkit4installationguide
Globus toolkit4installationguideGlobus toolkit4installationguide
Globus toolkit4installationguide
 
Load Balancing with Nginx
Load Balancing with NginxLoad Balancing with Nginx
Load Balancing with Nginx
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the Edge
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
MongoDB performance tuning and load testing, NOSQL Now! 2013 Conference prese...
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 

Ähnlich wie Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL Matters, Cologne, 2014

Storm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsStorm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsAerospike, Inc.
 
Play framework productivity formula
Play framework   productivity formula Play framework   productivity formula
Play framework productivity formula Sorin Chiprian
 
Site Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariSite Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariJoseph Scott
 
Apache and PHP: Why httpd.conf is your new BFF!
Apache and PHP: Why httpd.conf is your new BFF!Apache and PHP: Why httpd.conf is your new BFF!
Apache and PHP: Why httpd.conf is your new BFF!Jeff Jones
 
Browser Serving Your Web Application Security - NorthEast PHP 2017
Browser Serving Your Web Application Security - NorthEast PHP 2017Browser Serving Your Web Application Security - NorthEast PHP 2017
Browser Serving Your Web Application Security - NorthEast PHP 2017Philippe Gamache
 
Browser Serving Your Web Application Security - Madison PHP 2017
Browser Serving Your Web Application Security - Madison PHP 2017Browser Serving Your Web Application Security - Madison PHP 2017
Browser Serving Your Web Application Security - Madison PHP 2017Philippe Gamache
 
Browser Serving Your We Application Security - ZendCon 2017
Browser Serving Your We Application Security - ZendCon 2017Browser Serving Your We Application Security - ZendCon 2017
Browser Serving Your We Application Security - ZendCon 2017Philippe Gamache
 
Real Time Big Data (w/ NoSQL)
Real Time Big Data (w/ NoSQL)Real Time Big Data (w/ NoSQL)
Real Time Big Data (w/ NoSQL)Stein Writes Inc.
 
ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4Jim Jagielski
 
OSGi Enterprise R6 specs are out! - David Bosschaert & Carsten Ziegeler
OSGi Enterprise R6 specs are out! - David Bosschaert & Carsten ZiegelerOSGi Enterprise R6 specs are out! - David Bosschaert & Carsten Ziegeler
OSGi Enterprise R6 specs are out! - David Bosschaert & Carsten Ziegelermfrancis
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerOrtus Solutions, Corp
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Ortus Solutions, Corp
 
Nodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredevNodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredevFelix Geisendörfer
 
HTML5 vs Silverlight
HTML5 vs SilverlightHTML5 vs Silverlight
HTML5 vs SilverlightMatt Casto
 
PHP from soup to nuts Course Deck
PHP from soup to nuts Course DeckPHP from soup to nuts Course Deck
PHP from soup to nuts Course DeckrICh morrow
 

Ähnlich wie Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL Matters, Cologne, 2014 (20)

Storm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsStorm Persistence and Real-Time Analytics
Storm Persistence and Real-Time Analytics
 
Play framework productivity formula
Play framework   productivity formula Play framework   productivity formula
Play framework productivity formula
 
Site Performance - From Pinto to Ferrari
Site Performance - From Pinto to FerrariSite Performance - From Pinto to Ferrari
Site Performance - From Pinto to Ferrari
 
Apache and PHP: Why httpd.conf is your new BFF!
Apache and PHP: Why httpd.conf is your new BFF!Apache and PHP: Why httpd.conf is your new BFF!
Apache and PHP: Why httpd.conf is your new BFF!
 
Api Design
Api DesignApi Design
Api Design
 
Node.js 1, 2, 3
Node.js 1, 2, 3Node.js 1, 2, 3
Node.js 1, 2, 3
 
Browser Serving Your Web Application Security - NorthEast PHP 2017
Browser Serving Your Web Application Security - NorthEast PHP 2017Browser Serving Your Web Application Security - NorthEast PHP 2017
Browser Serving Your Web Application Security - NorthEast PHP 2017
 
Browser Serving Your Web Application Security - Madison PHP 2017
Browser Serving Your Web Application Security - Madison PHP 2017Browser Serving Your Web Application Security - Madison PHP 2017
Browser Serving Your Web Application Security - Madison PHP 2017
 
Browser Serving Your We Application Security - ZendCon 2017
Browser Serving Your We Application Security - ZendCon 2017Browser Serving Your We Application Security - ZendCon 2017
Browser Serving Your We Application Security - ZendCon 2017
 
Attacking HTML5
Attacking HTML5Attacking HTML5
Attacking HTML5
 
Real Time Big Data (w/ NoSQL)
Real Time Big Data (w/ NoSQL)Real Time Big Data (w/ NoSQL)
Real Time Big Data (w/ NoSQL)
 
ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4
 
OSGi Enterprise R6 specs are out! - David Bosschaert & Carsten Ziegeler
OSGi Enterprise R6 specs are out! - David Bosschaert & Carsten ZiegelerOSGi Enterprise R6 specs are out! - David Bosschaert & Carsten Ziegeler
OSGi Enterprise R6 specs are out! - David Bosschaert & Carsten Ziegeler
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and docker
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018
 
Sanjeev ghai 12
Sanjeev ghai 12Sanjeev ghai 12
Sanjeev ghai 12
 
Nodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredevNodejs a-practical-introduction-oredev
Nodejs a-practical-introduction-oredev
 
Pecl Picks
Pecl PicksPecl Picks
Pecl Picks
 
HTML5 vs Silverlight
HTML5 vs SilverlightHTML5 vs Silverlight
HTML5 vs Silverlight
 
PHP from soup to nuts Course Deck
PHP from soup to nuts Course DeckPHP from soup to nuts Course Deck
PHP from soup to nuts Course Deck
 

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Combining Real-time and Batch Analytics with NoSQL, Storm and Hadoop - NoSQL Matters, Cologne, 2014

  • 1. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 1 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Real-time analytics with Storm, NoSQL, and Hadoop Brian Bulkowski Founder and CTO Aerospike
  • 2. What is Storm? ➤  Storm is an Apache open-source project (Hadoop of real time) §  Created by Twitter §  No “Cloudera”-like vendor §  Return of “Message Oriented Middleware” ➤  Overview §  “Spouts” connect with data sources (example: web log tail ingest) §  “Bolts” analysis and data manipulation of any sort §  “Nimbus” control entity re-balances and re-configures without downtime, add servers seamlessly §  “Topology” Ordering of bolts and spouts §  “Tuple” A Storm message §  “Trident” Higher level abstraction supporting real-time joins, reliability © 2014 Aerospike. All rights reserved. Confidential | Pg. 2
  • 3. Why Storm? ➤  Same management framework as Hadoop ➤  Many Spouts and Bolts available https://github.com/nathanmarz/storm- contrib ➤  Simple database integration ➤  High performance & reliability (1MM/S/S ++) ➤  Great documentation ➤  Multiple languages ➤  Excellent framework for event oriented implementation © 2014 Aerospike. All rights reserved. Confidential | Pg. 3
  • 4. But --- ➤  Storm has problems ➤  0mq -> Netty TCP transition §  So much easier! ➤  Trident not good enough §  But closer than most things ➤  Auto-rebalance framework ➤  Good news: Twitter doubling down © 2014 Aerospike. All rights reserved. Confidential | Pg. 4
  • 5. Streaming architecture © 2014 Aerospike. All rights reserved. | Pg. 5 Data Warehouse, Hadoop Cluster Real-time Interactions Server Batch Analytics •  User segmentation •  Location patterns •  Similar audience Real-time Interactions •  Frequency caps •  Recent ads served •  Recent search terms User Data Streaming (Storm) Hadoop
  • 6. Integration with Hadoop ➤  Example Bolt connectors for §  Hdfs §  Hadoop ➤  Example pattern §  Bolt writes to HDFS §  Bolt reads from HDFS ➤  Problem: §  Storm is AT LEAST ONCE ➤  Solution: §  write from your app to Hadoop §  Emit from Hadoop to an edge database §  Or read from Hdfs directly © 2014 Aerospike. All rights reserved. Confidential | Pg. 6
  • 7. Examples ➤  TrendingWords §  Slice time into “slots” (1 minute / 5 minute) for each word §  On seeing a word, increment bucket © 2014 Aerospike. All rights reserved. Confidential | Pg. 7 http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/ §  Use Storm Ticks (0.8) for emitting top events §  Determine most popular words in a category §  Clean up old data on Ticks or on reads Medium easy ! ( Google ‘trending topics storm’ )
  • 8. Examples ➤  TrendingWords – Database analysis – single words §  Twitter at … 5k tps? §  20 indexable words per tweet §  10k writes per second §  50,000 words in English §  A reasonable RAM problem Requires queries & kvs ➤  Tuples §  Perhaps 30 tuples per tweet §  600K tps writes §  10M tuples ? Flash might be better Problem is much harder now! © 2014 Aerospike. All rights reserved. Confidential | Pg. 8 http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
  • 9. Examples ➤  Recommendations §  Multiple recommendation systems §  Multi-arm bandit §  https://github.com/tdunning/ storm-counts/wiki/Bayesian-Bandit ➤  Simple fraud counts §  Store recent requests for payment §  Store recent users §  Calculate fraud scores, drop events if past threshold © 2014 Aerospike. All rights reserved. Confidential | Pg. 9
  • 10. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 10 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Storm overview
  • 11. Development installs (easy) ➤  Everything on one server ( no ZeroMQ ) ➤  Zookeeper, Nimbus, Supervisors (slaves) all on one machine ➤  Write your Spouts and Bolts, but what next? ➤  Resources: ➤  https://github.com/nathanmarz/storm/wiki/ Setting-up-development-environment © 2014 Aerospike. All rights reserved. Confidential | Pg. 11
  • 12. Real-world install ( distributed or go home ) ➤  Zookeeper §  A small (1 to 5) set of nodes to hold configuration. It’s like DNS. Hopefully you have this running for Hadoop. ➤  Nimbus §  Single server where “Topologies” are managed §  Communicates via Zookeeper, “storm” namespace ➤  Supervisor ( slave ) §  Needs the addresses of zookeeper nodes §  Needs the Nimbus address §  Then simply waits for work ➤  Best guide – this is really good ! http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/ © 2012 Aerospike. All rights reserved. Confidential | Pg. 12
  • 13. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 13 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Let’s Begin
  • 14. Real-world install ➤  Ubuntu 12.04 LTS ➤  In Amazon, there’s a variety of scripts and helpers §  ( now you can build your own ) §  You can scale back to 1 machine (zoo & nimbus) ➤  In a laptop VM environment, use NAT §  Share your /etc/hosts file © 2014 Aerospike. All rights reserved. Confidential | Pg. 14
  • 15. Prerequisites ➤  Packages §  sudo apt-get install git-core libtool autoconf g++ build-essential uuid-dev maven ➤  Java §  OpenJDK 6 “is unfortunate”, OracleJDK 6 ok §  7 can be OpenJDK or Oracle §  Can’t mix and match 5, 6, 7 §  Will be building & submitting “fat jars” §  JAVA_HOME everywhere © 2014 Aerospike. All rights reserved. Confidential | Pg. 15
  • 16. Prerequisites ➤  0mq – only required in 0.8 §  Native, high performance messaging §  Storm 0.9.0 avoids 0mq, uses HTTP (!) §  2.1.x (2.1.6 -> 2.1.11) ONLY §  sudo apt-get install libzmq-dev ➤  Storm’s Java 0mq client (ugly http://stackoverflow.com/questions/12115160/compiling-jzmq-on-ubuntu git clone https://github.com/nathanmarz/jzmq.git cd jzmq ./autogen.sh ./configure # overcoming flaw in marz’s repo cd src touch classdist_noinst.stamp CLASSPATH=.:./.:$CLASSPATH javac -d . org/zeromq/ZMQ.java org/ zeromq/ZMQException.java org/zeromq/ZMQQueue.java org/zeromq/ ZMQForwarder.java org/zeromq/ZMQStreamer.java cd .. make sudo make install © 2014 Aerospike. All rights reserved. Confidential | Pg. 16
  • 17. Zookeeper server ➤  Packages §  sudo apt-get install zookeeperd ‘d’ is for servers, without is for clients §  export PATH=$PATH:/user/share/zookeeper/bin §  sudo update-rc.d zookeeper defaults ➤  Config §  All zookeepers know about all other zookeepers §  /etc/zookeeper/conf/myid -> 1 through 255 §  /etc/zookeeper/conf/zoo.cfg list IP or DNS for all other zookeepers §  Service is up only if a majority is up © 2014 Aerospike. All rights reserved. Confidential | Pg. 17
  • 18. Zookeeper server ➤  Validate §  zkCli.sh -à ( whines a bit, ends up [CONNECTED] ) §  ls / create /foo mydata get /foo set /foo otherdata get /foo ➤  Notes Data directory is /var/lib/zookeeper Config file is /etc/zookeeper/conf/zoo.cfg “client port” is 2181 © 2014 Aerospike. All rights reserved. Confidential | Pg. 18
  • 19. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 19 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Let’s Install Storm
  • 20. Storm ➤  No packages in Ubuntu §  Download http://storm-project.net/downloads §  Into /opt/storm, unzip, add to path §  export PATH=$PATH:/opt/storm/storm-0.8.2/bin ➤  Create user “storm” ; remember Path + Java sudo chown –R storm:storm /opt/storm sudo chown –R storm:storm /var/data/storm ➤  Update config file # update the storm configuration file - create a data directory, point storm at it sudo mkdir -p /var/data/storm sudo chmod a+w /var/data/storm sudo vi /opt/storm/storm-0.8.2/conf/storm.yaml storm.local.dir: "/var/data/storm" : storm.zookeeper.servers: - “192.168.62.149” ➤  YAML is touchy. Make sure to have a space after the ‘:’ © 2014 Aerospike. All rights reserved. Confidential | Pg. 20
  • 21. Storm ➤  Try running components stand alone sudo su - storm cd /opt/storm/storm-0.8.2/ bin/storm nimbus bin/storm supervisor bin/storm ui ➤  Reliability through low state, quick restart – supervisor ! sudo apt-get install supervisord cp storm.conf /etc/supervisor/conf.d ➤  Extra hint sudo unlink /var/run/supervisor.sock ➤  Storm.conf [program:storm-supervisor] command=/opt/storm/storm-0.8.2/bin/storm supervisor user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/supervisor.out logfile_maxbytes=20MB logfile_backups=10 © 2014 Aerospike. All rights reserved. Confidential | Pg. 21
  • 22. Storm ➤  storm-master.conf [program:storm-nimbus] command=/opt/storm/storm-0.8.2/bin/storm nimbus user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/nimbus.out logfile_maxbytes=20MB logfile_backups=10 [program:storm-ui] command=/opt/storm/storm-0.8.2/bin/storm ui user=storm autostart=true autorestart=true startsecs=10 startretries=999 log_stdout=true log_stderr=true logfile=/var/log/storm/ui.out logfile_maxbytes=20MB logfile_backups=10 © 2014 Aerospike. All rights reserved. Confidential | Pg. 22
  • 23. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 23 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Let’s Run Storm
  • 24. Run some storm ! ➤  Copy storm.yaml to a local user’s directory mkdir ~/.storm sudo cp /opt/storm/storm-0.8.2/conf/storm.yaml ~/.storm/ storm.yaml sudo chown brian:brian ~/.storm/storm.yaml ➤  Install ‘leiningen’ (clojure tools) Package version too old http://leiningen.org/ download ‘lein’ into /usr/local/bin – chmod +x /usr/local/bin/lein ➤  storm-starter git clone http://github.com/nathanmarz/storm-starter.git cd storm-starter lein deps ; lein compile ; lein uberjar © 2014 Aerospike. All rights reserved. Confidential | Pg. 24
  • 25. Run some storm ! ➤  Submit the topology storm jar target/storm-starter-0.0.1-SNAPSHOT-standalone.jar storm.starter.ExclamationTopology exclamation-topology ➤  Watch the output tail –f /opt/storm/storm-0.8.2/logs/worker-xxxx.log ➤  View the job $ storm list Topology_name Status Num_tasks Num_workers Uptime_secs ------------------------------------------------------------ exclamation-topology ACTIVE 16 3 798 © 2014 Aerospike. All rights reserved. Confidential | Pg. 25
  • 26. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 26 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Coding to Storm
  • 27. Creating topologies ➤  Say it with code public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("word", new TestWordSpout(), 10); builder.setBolt("exclaim1", new ExclamationBolt(), 3).shuffleGrouping("word"); builder.setBolt("exclaim2", new ExclamationBolt(), 2).shuffleGrouping("exclaim1"); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { LocalCluster cluster = new LocalCluster(); cluster.submitTopology("test", conf, builder.createTopology()); Utils.sleep(10000); cluster.killTopology("test"); cluster.shutdown(); } } From the ExclaimationTopology © 2014 Aerospike. All rights reserved. Confidential | Pg. 27
  • 28. Spouts ➤  RandomSentenceSpout   public void nextTuple() { Utils.sleep(100); String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" }; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } © 2014 Aerospike. All rights reserved. Confidential | Pg. 28
  • 29. Spouts ➤  Easy interaction with queues §  https://github.com/nathanmarz/storm-contrib ➤  storm-kafka §  Popular persistent, topic-based queue ➤  storm-jms §  Java message service ➤  storm-sqs §  EC2’s queue service ➤  more! © 2014 Aerospike. All rights reserved. Confidential | Pg. 29
  • 30. Bolts ➤  Simple exclamation bolt public static class ExclamationBolt extends BaseRichBolt { OutputCollector _collector; @Override public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } @Override public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } © 2014 Aerospike. All rights reserved. Confidential | Pg. 30
  • 31. Persisting Bolts ➤  Persistence is good ! Scale the data layer §  Shared required for counters like TrendingWords §  Stateless Bolts are better ➤  Easy persistence through examples §  https://github.com/nathanmarz/storm-contrib ➤  storm-rdbms §  Stores to a database table ➤  storm-cassandra §  Stores to a cassandra cluster ➤  storm-redis ➤  storm-memcache © 2014 Aerospike. All rights reserved. Confidential | Pg. 31
  • 32. Aerospike Bolts ➤  Bolts available on github §  https://github.com/aerospike/storm-aerospike ➤  EnrichBolt §  Add fields from column after looking up a key ➤  PersistBolt §  Store fields based on a key © 2014 Aerospike. All rights reserved. Confidential | Pg. 32
  • 33. Consider Aerospike ➤  If you need the speed of Storm, you need the speed of Aerospike §  That’s why we talk about Storm! ➤  Free version at http://aerospike.com/ ➤  Internap – free high performance SSD servers for trial ➤  Benefits §  In memory with FLASH §  Clustered for high performance §  HA state matches Storm’s stateless model © 2014 Aerospike. All rights reserved. Confidential | Pg. 33
  • 34. Interesting Bolts ➤  MovingAverageWithSpikeDetection §  https://github.com/stormprocessor/storm-examples ➤  TrendingTopics §  http://www.michael-noll.com/blog/2013/01/18/ implementing-real-time-trending-topics-in-storm/ ➤  Baysian Bandit §  https://github.com/tdunning/storm-counts © 2014 Aerospike. All rights reserved. Confidential | Pg. 34
  • 35. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 35 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Why Aerospike ?
  • 36. Why not other databases? ➤  Low latency - database requests in bolts ➤  Proven in production with large-scale deployments ➤  Flash optimized §  Do you need more than 30G ? ➤  Read / write optimized ➤  Faster & more reliable than than Kafka (Cassandra based) ➤  Faster than Mongo ➤  More scale than Redis © 2012 Aerospike. All rights reserved. Confidential | Pg. 36
  • 37. Aerospike: the gold standard for high throughput, low latency, high reliability transactions Performance • Over ten trillion transactions per month • 99% of transactions faster than 2 ms • 150K TPS per server Scalability • Billions of Internet users • Clustered Software • Automatic Data Rebalancing Reliability • 50 customers; zero service down- time • Immediate Consistency • Rapid Failover; Data Center Replication Price/Performance • Makes impossible projects affordable • Flash-optimized • 1/10 the servers required
  • 38.
  • 39. 2 Trillion Transactions per month ➤  100,000 attributes each for 160 Million users §  Data used by recommendation engines, e-commerce, RTB, mobile, video and display ad platforms ➤  16 TB of data on 400 Million users §  60 Billion Transactions per month §  4 data centers for geographic proximity & high availability §  “Scale. Real-time performance. Real-time replication at 4 datacenters. Aerospike delivered.”  - Elad Efraim, CTO © 2013 Aerospike. All rights reserved. Proprietary Pg. 39
  • 40. ➤ Hard to Maintain ➤ Performance Better than the Competition ➤ Latency ➤ Number of Servers ➤ Stability ➤ Cost of RAM ➤ Cost of RAM ➤ Scalability
  • 41. OHIO 1)  No Hotspots – DHT with RIPEMD160 simplifies data partitioning 2)  Smart Client – 1 hop to data, no load balancers 3)  Shared Nothing Architecture, every node identical 7) XDR – asynch replication across data centers ensures Zero Downtime 4)  Single row ACID – synch replication in cluster 5)  Smart Cluster, Zero Touch – auto-failover, rebalancing, rolling upgrades.. 6)  Transactions and long running tasks prioritized real-time Simpler Scaling: Fewer Servers, ACID, Zero Touch
  • 42. DISTRIBUTED QUERIES 1.  “Scatter” requests to all nodes 2.  Indexes in DRAM for fast map of secondary à primary keys 3.  Indexes co-located with data to guarantee ACID, manage migrations 4.  Records read in parallel from all SSDs using lock free concurrency control 5.  Aggregate results on each node 6.  “Gather” results from all nodes on client STREAM AGGREGATIONS 1.  Push Code/ Security Policies/ Rules to Data with UDFs 2.  Pipe Query results through UDFs to Filter, Transform, Aggregate.. Map, Reduce REAL-TIME ANALYTICS on OPERATIONAL DATA (No ETL) ➤  In Database, within the same Cluster ➤  On the same Data, on XDR Replicated Clusters Real-time Analytics on Operational Data
  • 43. Follow Join Us! < meta description=“Aerospike is a Rule CaptainWord { strings: $header = {D0 CF 11 E0 A1 B1 1A E1} $author = {00 00 00 63 61 70 74 61 69 6E 00} condition: $header at 0 and $author Content=“malware,Exec Code, Overflow, ExecCode Bypass” Content’=“ Java 0day. Gh0st RAT, Mac Control RAT, MS09-027” Content=“Hosh Hewer, Jenwediki yighingha iltimas qilish Jediwili” #HashTags tag <head> <meta name=“”> <meta desc=“”> </head> © 2014 Aerospike. All rights reserved. Pg. 43 “keywords” <TITLE> ✓ I’M ATTENDING @Aerospikedb Questions ?