SlideShare ist ein Scribd-Unternehmen logo
1 von 95
Downloaden Sie, um offline zu lesen
Dr. James Stanier | Head of Analytics | Brandwatch.com | jamess@brandwatch.com
Coming Up/ 
• Me, Brandwatch and new problems 
• Apache Kafka 
• Processing data in Java 
• Distributing work with Zookeeper 
• Managing state 
© 2014 Brandwatch | www.brandwatch.com 2
Who? 
© 2014 Brandwatch | www.brandwatch.com 3
Dr. James Stanier 
Head of Analytics, Brandwatch 
@jstanier | jamess@brandwatch.com 
© 2014 Brandwatch | www.brandwatch.com 4
Brandwatch 
© 2014 Brandwatch | www.brandwatch.com 5
Where we are/ 
• Brighton 
• New York 
• San Francisco 
• Berlin 
• Stuttgart 
© 2013 Brandwatch | www.brandwatch.com 6
© 2014 Brandwatch | www.brandwatch.com 7
What Brandwatch does/ 
Crawl Store and 
Index 
Analyse 
3 
Present 
4 
• Crawl 70M+ 
sites including 
key social 
networks 
• 27 languages 
• Powerful search 
operators 
• 20Bn + indexed 
URLs 
• Years of historical 
data 
• Automated topic & 
sentiment analysis 
in all 27 
languages 
• Automate 
common tasks 
including alerts 
• Advanced 
analytics modules 
• Automatic 
categorisation 
with rules 
• Custom 
dashboards 
• Reporting & 
alerts 
© 2014 Brandwatch | www.brandwatch.com 8
Brandwatch 
Analytics 
© 2014 Brandwatch | www.brandwatch.com 9
10 
Data/ Presentation
Data/ Aggregation 
© 2014 Brandwatch | www.brandwatch.com 11
Data/ Classification 
© 2014 Brandwatch | www.brandwatch.com 12
Data/ Not just top level metrics 
© 2014 Brandwatch | www.brandwatch.com 13
Development/ What do we use? 
© 2014 Brandwatch | www.brandwatch.com 14
Data/ The numbers 
• 50+ Java Web Crawlers 
• 10+ Historical crawlers for new queries 
• Twitter via GNIP (now Twitter) 
• 70M+ query matches per day 
© 2014 Brandwatch | www.brandwatch.com 15
The speed of 
social 
© 2014 Brandwatch | www.brandwatch.com 16
© 2014 Brandwatch | www.brandwatch.com 17
© 2014 Brandwatch | www.brandwatch.com 18
© 2014 Brandwatch | www.brandwatch.com 19
A new challenge 
© 2014 Brandwatch | www.brandwatch.com 20
21 
The challenge/ The signal from the noise
The challenge/ at scale 
• 100K+ user queries 
• 70M+ mentions per day 
• Polling the database for mentions would take 
8hrs for one pass 
© 2014 Brandwatch | www.brandwatch.com 22
The Problem/ How we handled it… 
Crawler 1 
Crawler 2 
Crawler 
n-1 
Crawler 
N 
Kafka 
Cluster 
Signals 
Signals 
Signals 
Processing 
cluster 
Signals 
handler 
JVM 
DB 
Mentions 
Mentions 
© 2014 Brandwatch | www.brandwatch.com 23
Kafka 
© 2014 Brandwatch | www.brandwatch.com 24
Step 1/ Kafka 
Crawler 1 
Crawler 2 
Crawler 
n-1 
Crawler 
N 
Kafka 
Cluster Mentions 
© 2014 Brandwatch | www.brandwatch.com 25
Kafka/ What is it? 
• Apache Kafka is a publish-subscribe messaging 
system rethought as a distributed commit log 
• Apache top level project November 2013 
• Started at LinkedIn 
© 2014 Brandwatch | www.brandwatch.com 26
Kafka/ is… 
• Fast: hundreds of MBs read/write per second 
from thousands of clients 
• Scalable: clustered, partitioned over many 
machines, expanded without downtime 
• Durable: messages persisted to disk and 
replicated in cluster 
© 2014 Brandwatch | www.brandwatch.com 27
Kafka/ Terminology 
• Kafka maintains feeds of messages 
called topics 
• Programs that publish messages are 
called producers 
• Programs that subscribe to messages 
are called consumers 
• Kafka is a cluster of servers called brokers 
© 2014 Brandwatch | www.brandwatch.com 28
Kafka/ How it’s used... 
Producer Producer Producer 
Kafka 
Cluster 
Consumer Consumer Consumer 
© 2014 Brandwatch | www.brandwatch.com 29
Kafka/ Written to disk? 
http://q.acm.org/detail.cfm?id=1563874 
© 2014 Brandwatch | www.brandwatch.com 30
Kafka/ Bending, not breaking 
http://engineering.gnip.com/tag/kafka/ 
© 2014 Brandwatch | www.brandwatch.com 31
Kafka/ The anatomy of a topic 
0 1 2 3 4 5 6 
0 1 2 3 4 
0 1 2 3 4 5 
Partition 0 
Partition 1 
Partition 2 
Old New 
Writes 
© 2014 Brandwatch | www.brandwatch.com 32
Kafka/ Warning: ordering 
Kafka guarantees a total ordering per partition, 
not per whole topic 
© 2014 Brandwatch | www.brandwatch.com 33
Kafka/ Try it out! 
> tar -xzf kafka_2.9.2-0.8.1.tgz! 
> cd kafka_2.9.2-0.8.1! 
> bin/zookeeper-server-start.sh config/ 
zookeeper.properties! 
> bin/kafka-server-start.sh config/ 
server.properties! 
! 
© 2014 Brandwatch | www.brandwatch.com 34
Kafka/ Try it out! 
> bin/kafka-topics.sh --create --zookeeper 
localhost:2181 --replication-factor 1 --partitions 
1 --topic test! 
> bin/kafka-console-producer.sh --broker-list 
localhost:9092 --topic test! 
Hello JAX!! 
> bin/kafka-console-consumer.sh --zookeeper 
localhost:2181 --topic test --from-beginning! 
Hello JAX!! 
© 2014 Brandwatch | www.brandwatch.com 35
Kafka/ With Java 
<dependency>! 
!<groupId>org.apache.kafka</groupId>! 
!<artifactId>kafka_2.10</artifactId>! 
!<version>0.8.1</version>! 
</dependency>! 
© 2014 Brandwatch | www.brandwatch.com 36
Kafka/ With Java 
Properties props = new Properties();! 
props.put("metadata.broker.list", 
"broker1:9092,broker2:9092");! 
props.put("serializer.class", 
"kafka.serializer.StringEncoder");! 
props.put("partitioner.class", 
"example.producer.SimplePartitioner");! 
props.put("request.required.acks", "1");! 
ProducerConfig config = new ProducerConfig(props);! 
Producer<String, String> producer = new 
Producer<String, String>(config);! 
© 2014 Brandwatch | www.brandwatch.com 37
Kafka/ Partitioning 
public class SimplePartitioner implements Partitioner<String> {! 
public SimplePartitioner (VerifiableProperties props) {! 
! 
public int partition(String key, int numberOfPartitions) {! 
return md5hash(key) % numberOfPartitions;! 
}! 
}! 
© 2014 Brandwatch | www.brandwatch.com 38
Kafka/ Sending from the crawlers 
String message = toJson(...);! 
KeyedMessage<String, String> message = new! 
!KeyedMessage<String, String>("query.mentions", 
!queryId, message);! 
producer.send(message);! 
© 2014 Brandwatch | www.brandwatch.com 39
Step 1/ Done 
Crawler 1 
Crawler 2 
Crawler 
n-1 
Crawler 
N 
Kafka 
Cluster Mentions 
© 2014 Brandwatch | www.brandwatch.com 40
Processing 
© 2014 Brandwatch | www.brandwatch.com 41
42 
Processing/ What’s happening now?
Step 2.1/ One processing JVM 
Crawler 1 
Crawler 2 
Crawler 
n-1 
Crawler 
N 
Kafka 
Cluster Mentions 
Signals 
processor 
© 2014 Brandwatch | www.brandwatch.com 43
Processing/ A wild tweet appears! 
Mention 
date: 14/10/2014 6:10PM 
pageType: twitter 
author: @javadude 
hashtags: [#jaxlondon, #amazingtalk, #greatshoes] 
mentionedTweeters: [@jstanier] 
text: “@jstanier is at #jaxlondon tonight 
#amazingtalk #greatshoes” 
© 2014 Brandwatch | www.brandwatch.com 44
Processing/ Storing hashtags 
Map<Date, Multiset<String>>! 
! 
Initialise with the last 24 hours 
© 2014 Brandwatch | www.brandwatch.com 45
Processing/ Storing hashtags 
Map<Date, Multiset<String>>! 
! 
Mention 
date: 14/10/2014 6:10PM 
hashtags: [#jaxlondon, #amazingtalk, #greatshoes] 
© 2014 Brandwatch | www.brandwatch.com 46
Processing/ Storing hashtags 
Map<Date, Multiset<String>>! 
! 
Mention 
date: 14/10/2014 6:10PM 
hashtags: [#jaxlondon, #amazingtalk, #greatshoes] 
add(“#jaxlondon”)! 
add(“#amazingtalk”)! 
add(“#greatshoes”)! 
© 2014 Brandwatch | www.brandwatch.com 47
Processing/ Cycling the buckets 
@Scheduled(cron = "0 0 * * * *")! 
public void cycleBuckets() {! 
Date oldest = buckets.lastKey();! 
removeBucket(oldest);! 
DateTime newest = new! 
! ! !DateTime(buckets.firstKey());! 
addBucket(newest.plusHours(1).toDate());! 
}! 
© 2014 Brandwatch | www.brandwatch.com 48
Processing/ Detecting spikes 
• At regular intervals 
• For each hashtag 
• Convert to a timeseries [5, …. 1002, 5499] 
• Use our super secret detection algorithm 
• Give a score to it 
• If score > threshold, it’s interesting 
• Send it on a new Kafka topic 
© 2014 Brandwatch | www.brandwatch.com 49
Processing/ What we just did 
#hashtag 
data 
model 
© 2014 Brandwatch | www.brandwatch.com 50
Processing/ But we also track… 
author 
data 
model 
sentiment 
data 
model 
page type 
data 
model 
link share 
data 
model 
#hashtag 
data 
model 
country 
data 
model 
volume 
data 
model 
© 2014 Brandwatch | www.brandwatch.com 51
Processing/ …for one query 
“JAX London” query 
author 
data 
model 
sentiment 
data 
model 
page type 
data 
model 
link share 
data 
model 
#hashtag 
data 
model 
country 
data 
model 
volume 
data 
model 
© 2014 Brandwatch | www.brandwatch.com 52
Processing/ 100K+ queries and rising 
© 2014 Brandwatch | www.brandwatch.com 53
Processing/ We need more JVMs 
But how do we share the workload? 
© 2014 Brandwatch | www.brandwatch.com 54
Distribution 
of work 
© 2014 Brandwatch | www.brandwatch.com 55
Step 2.2/ A cluster of processing JVMs 
Crawler 1 
Crawler 2 
Crawler 
n-1 
Crawler 
N 
Mentions Signals 
© 2014 Brandwatch | www.brandwatch.com 
Kafka 
Cluster 
Signals 
Processing 
cluster 
Mentions
Distribution/ An atomic unit of work 
© 2014 Brandwatch | www.brandwatch.com 
Signals 
Processing 
cluster 
?
Distribution/ Leader election 
A way of deciding who is the leader for a task in 
a group of distributed nodes 
© 2014 Brandwatch | www.brandwatch.com 58
Distribution/ Zookeeper 
A way of coordinating and managing distributed 
applications 
© 2014 Brandwatch | www.brandwatch.com 59
Zookeeper/ It’s like a file system 
/brandwatch 
/feature_1 /feature_2 
© 2014 Brandwatch | www.brandwatch.com 60
Zookeeper/ At the command line 
© 2014 Brandwatch | www.brandwatch.com 61
Distribution/ In Java 
http://curator.apache.org/curator-framework 
© 2014 Brandwatch | www.brandwatch.com 62
Distribution/ Recipes 
© 2014 Brandwatch | www.brandwatch.com 63
Distribution/ Instantiating Curator 
! 
CuratorFrameworkFactory! 
.builder()! 
.connectString(zkQuorum)! 
.namespace(namespace)! 
.build()! 
! ! ! !.start();! 
© 2014 Brandwatch | www.brandwatch.com 64
Distribution/ Offering jobs 
/brandwatch 
/signals 
/queries 
/15846 /1268589 
Manager 
JVM 
DB 
© 2014 Brandwatch | www.brandwatch.com 65
Distribution/ Adding nodes 
! 
public void createZNode(String queryId) {! 
try {! 
client.create().forPath(ZK_NODE_PREFIX + queryId);! 
} catch (NodeExistsException e) {! 
log.debug("Node {} was already created.”, queryId);! 
}! 
}! 
© 2014 Brandwatch | www.brandwatch.com 66
Distribution/ Deleting nodes 
! 
public void removeZNode(String queryId) {! 
try {! 
client.delete().forPath(ZK_NODE_PREFIX + queryId);! 
} catch (NoNodeException e) {! 
log.debug("Node {} was already deleted.”, queryId);! 
}! 
}! 
© 2014 Brandwatch | www.brandwatch.com 67
Distribution/ Leader election 101 
/brandwatch 
/signals 
/queries 
/15846 /1268589 
Processing 
JVM 1 
Processing 
JVM 2 
Processing 
JVM 3 
© 2014 Brandwatch | www.brandwatch.com 68
Distribution/ Leader election 101 
/brandwatch 
/signals 
/queries 
/15846 /1268589 
1 2 3 
Processing 
JVM 1 
Processing 
JVM 2 
Processing 
JVM 3 
© 2014 Brandwatch | www.brandwatch.com 69
Distribution/ The leader dies 
/brandwatch 
/signals 
/queries 
/15846 /1268589 
2 3 
Processing 
JVM 
Processing 
JVM 
© 2014 Brandwatch | www.brandwatch.com 70
Distribution/ The dead rises again 
/brandwatch 
/signals 
/queries 
/15846 /1268589 
2 3 4 
Processing 
JVM 
Processing 
JVM 
Processing 
JVM 
© 2014 Brandwatch | www.brandwatch.com 71
Distribution/ Curator: LeaderLatch 
! 
public LeaderLatch(CuratorFramework client, String latchPath)! 
Parameters:! 
! 
client - the client! 
latchPath - the path for this leadership group! 
© 2014 Brandwatch | www.brandwatch.com 72
Distribution/ LeaderLatch recipe 
public class WorkerManager implements PathChildrenCacheListener {! 
! 
private Map<Integer, LeaderLatch> leaderLatches = newHashMap();! 
© 2014 Brandwatch | www.brandwatch.com 73 
! 
@Override! 
public void childEvent(CuratorFramework client, ! 
PathChildrenCacheEvent event) {! 
// Handle adds and removes here!! 
}! 
}!
Distribution/ Curator: Starting up 
@PostConstruct! 
public void initialise() throws Exception {! 
List<ChildData> currentData = newArrayList(initialisePathChildrenCache());! 
log.info("Pre creating workers for {} existing queries", currentData.size());! 
for (ChildData childData : currentData) {! 
int queryId = parseQueryIdFromPath(childData.getPath());! 
startLeaderElection(queryId);! 
}! 
}! 
© 2014 Brandwatch | www.brandwatch.com 74
Distribution/ Curator: PathChildrenCache 
private List<ChildData> initialisePathChildrenCache() throws Exception {! 
pathChildrenCache.start(StartMode.BUILD_INITIAL_CACHE);! 
pathChildrenCache.getListenable().addListener(this);! 
List<ChildData> currentData = pathChildrenCache.getCurrentData();! 
return currentData;! 
}! 
© 2014 Brandwatch | www.brandwatch.com 75
Distribution/ Curator: Adding a node 
@Override! 
public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) { ! 
ChildData childData = event.getData();! 
switch (event.getType()) {! 
case CHILD_ADDED:! 
queryId = parseQueryIdFromPath(childData.getPath());! 
if (!haveLeaderLatchForQuery(queryId)) {! 
startLeaderElection(queryId);! 
}! 
break;! 
© 2014 Brandwatch | www.brandwatch.com 76
Distribution/ Curator: Deleting a node 
// Continued...! 
case CHILD_REMOVED:! 
queryId = parseQueryIdFromPath(childData.getPath());! 
removeLeaderLatchForQuery(queryId);! 
break;! 
default:! 
break;! 
}! 
}! 
© 2014 Brandwatch | www.brandwatch.com 77
Distribution/ Almost there? 
We are processing long running jobs 
What about workers getting overloaded? 
© 2014 Brandwatch | www.brandwatch.com 78
Distribution/ After leader election 
1. Take leadership 
2. Hit max queries? 
a. No – go to 3 
b. Yes – give up leadership, try again 
3. Start working 
© 2014 Brandwatch | www.brandwatch.com 79
Distribution/ Now we’re almost there? 
Actually, no… 
© 2014 Brandwatch | www.brandwatch.com 80
Distribution/ Infinite election 
At capacity! 
Processing JVM 
Elected 
for 1328 
Elected 
for 1328 
At capacity! At capacity! 
Elected 
for 1328 
Processing JVM Processing JVM 
© 2014 Brandwatch | www.brandwatch.com 81
Distribution/ Solution 
Processing JVM 
Elected 
for 1328 
At capacity! 
Elected 
for 1328 
Processing JVM Processing JVM 
Refused 1328 
© 2014 Brandwatch | www.brandwatch.com 82
Distribution/ Solution 
At capacity! 
Processing JVM 
Elected 
for 1328 
Elected 
for 1328 
Refused 1328 
At capacity! At capacity! 
Elected 
for 1328 
Processing JVM Processing JVM 
Refused 1328 
Refused 1328 
© 2014 Brandwatch | www.brandwatch.com 83
State 
© 2014 Brandwatch | www.brandwatch.com 84
State/ CAP theorem 
Availability 
CA AP 
CP 
© 2014 Brandwatch | www.brandwatch.com 85
State/ Snapshotting of worker data 
If one worker dies, we want the other to pick up 
where it left off 
Regular snapshotting to HBase 
© 2014 Brandwatch | www.brandwatch.com 86
State/ Serialisation and compression 
Serialise and compress using Kryo 
~ 0.5MB per query, but a lot are very small 
© 2014 Brandwatch | www.brandwatch.com 87
Step 2.2/ Done! 
Crawler 1 
Crawler 2 
Crawler 
n-1 
Crawler 
N 
Kafka 
Cluster 
Signals 
Signals 
Signals 
Processing 
cluster 
Signals 
handler 
JVM 
DB 
Mentions 
Mentions 
© 2014 Brandwatch | www.brandwatch.com 88
Monitoring 
© 2014 Brandwatch | www.brandwatch.com 89
Monitoring/ Statsd and Graphite 
© 2014 Brandwatch | www.brandwatch.com 90
Closing remarks 
© 2014 Brandwatch | www.brandwatch.com 91
© 2014 Brandwatch | www.brandwatch.com 92
Summary/ Using this architecture 
Now 
• Smarter alerts (email, push, in-browser) 
• Monitoring crises/events as they happen 
Underway 
• Automatic clustering of spikes into events 
• Historical analysis of trends 
© 2014 Brandwatch | www.brandwatch.com 93
Say hello/ 
jamess@brandwatch.com 
UK: +44 (0)1273 358 635 
@brandwatch | @jstanier 
www.brandwatch.com 
© 2014 Brandwatch | www.brandwatch.com 94
Q&A 
© 2014 Brandwatch | www.brandwatch.com 95

Weitere ähnliche Inhalte

Was ist angesagt?

Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)W2O Group
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wilddatamantra
 

Was ist angesagt? (20)

Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
spark-kafka_mod
spark-kafka_modspark-kafka_mod
spark-kafka_mod
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Zoo keeper in the wild
Zoo keeper in the wildZoo keeper in the wild
Zoo keeper in the wild
 

Andere mochten auch

Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Weaveworks
 
Deep-dive into Microservice Outer Architecture
Deep-dive into Microservice Outer ArchitectureDeep-dive into Microservice Outer Architecture
Deep-dive into Microservice Outer ArchitectureWSO2
 
Business use of Social Media and Impact on Enterprise Architecture
Business use of Social Media and Impact on Enterprise ArchitectureBusiness use of Social Media and Impact on Enterprise Architecture
Business use of Social Media and Impact on Enterprise ArchitectureNUS-ISS
 
StormCrawler in the wild
StormCrawler in the wildStormCrawler in the wild
StormCrawler in the wildJulien Nioche
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...Brian Grant
 
Kubernetes and bluemix
Kubernetes  and  bluemixKubernetes  and  bluemix
Kubernetes and bluemixDuckDuckGo
 
Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...
Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...
Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...Ambassador Labs
 
A brief study on Kubernetes and its components
A brief study on Kubernetes and its componentsA brief study on Kubernetes and its components
A brief study on Kubernetes and its componentsRamit Surana
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkScrapinghub
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Bob Cotton
 

Andere mochten auch (10)

Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes Orchestrating Microservices with Kubernetes
Orchestrating Microservices with Kubernetes
 
Deep-dive into Microservice Outer Architecture
Deep-dive into Microservice Outer ArchitectureDeep-dive into Microservice Outer Architecture
Deep-dive into Microservice Outer Architecture
 
Business use of Social Media and Impact on Enterprise Architecture
Business use of Social Media and Impact on Enterprise ArchitectureBusiness use of Social Media and Impact on Enterprise Architecture
Business use of Social Media and Impact on Enterprise Architecture
 
StormCrawler in the wild
StormCrawler in the wildStormCrawler in the wild
StormCrawler in the wild
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
Kubernetes and bluemix
Kubernetes  and  bluemixKubernetes  and  bluemix
Kubernetes and bluemix
 
Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...
Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...
Velocity NYC 2017: Building Resilient Microservices with Kubernetes, Docker, ...
 
A brief study on Kubernetes and its components
A brief study on Kubernetes and its componentsA brief study on Kubernetes and its components
A brief study on Kubernetes and its components
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 

Ähnlich wie Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

A proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversionA proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversionCollabNet
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsAntonio Carpentieri
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipVMware Tanzu
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipMatt Stine
 
Docker Containers for Continuous Delivery
Docker Containers for Continuous DeliveryDocker Containers for Continuous Delivery
Docker Containers for Continuous DeliverySynerzip
 
Using the SDACK Architecture on Security Event Inspection
Using the SDACK Architecture on Security Event InspectionUsing the SDACK Architecture on Security Event Inspection
Using the SDACK Architecture on Security Event InspectionYu-Lun Chen
 
ThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsBrad Williams
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...Docker, Inc.
 
Delivering Applications Continuously to Cloud
Delivering Applications Continuously to CloudDelivering Applications Continuously to Cloud
Delivering Applications Continuously to CloudIBM UrbanCode Products
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayBuilding A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayVMware Tanzu
 
Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...
Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...
Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...DataStax Academy
 
Improving Your Company’s Health with Middleware Takeout
Improving Your Company’s Health with Middleware TakeoutImproving Your Company’s Health with Middleware Takeout
Improving Your Company’s Health with Middleware TakeoutVMware Tanzu
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for RealistsOracle Developers
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realistsKarthik Gaekwad
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopDataWorks Summit
 
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next LevelKubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next LevelTim Pouyer
 
DevOps Unleashed: Strategies that Speed Deployments
DevOps Unleashed: Strategies that Speed DeploymentsDevOps Unleashed: Strategies that Speed Deployments
DevOps Unleashed: Strategies that Speed DeploymentsForgeRock
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and BeyondSimon Elisha
 

Ähnlich wie Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier (20)

A proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversionA proven path for migrating from clearcase to git and or subversion
A proven path for migrating from clearcase to git and or subversion
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni Rails
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
 
Docker Containers for Continuous Delivery
Docker Containers for Continuous DeliveryDocker Containers for Continuous Delivery
Docker Containers for Continuous Delivery
 
Using the SDACK Architecture on Security Event Inspection
Using the SDACK Architecture on Security Event InspectionUsing the SDACK Architecture on Security Event Inspection
Using the SDACK Architecture on Security Event Inspection
 
ThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.jsThatConference 2016 - Highly Available Node.js
ThatConference 2016 - Highly Available Node.js
 
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
Using the SDACK Architecture on Security Event Inspection by Yu-Lun Chen and ...
 
YARN
YARNYARN
YARN
 
Delivering Applications Continuously to Cloud
Delivering Applications Continuously to CloudDelivering Applications Continuously to Cloud
Delivering Applications Continuously to Cloud
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One DayBuilding A Diverse Geo-Architecture For Cloud Native Applications In One Day
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
 
Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...
Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...
Pivotal Cloud Foundry: Building a diverse geo-architecture for Cloud Native A...
 
Improving Your Company’s Health with Middleware Takeout
Improving Your Company’s Health with Middleware TakeoutImproving Your Company’s Health with Middleware Takeout
Improving Your Company’s Health with Middleware Takeout
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for Realists
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realists
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next LevelKubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
Kubecon 2019 - Promoting Kubernetes CI/CD to the Next Level
 
DevOps Unleashed: Strategies that Speed Deployments
DevOps Unleashed: Strategies that Speed DeploymentsDevOps Unleashed: Strategies that Speed Deployments
DevOps Unleashed: Strategies that Speed Deployments
 
To Microservices and Beyond
To Microservices and BeyondTo Microservices and Beyond
To Microservices and Beyond
 

Mehr von JAXLondon2014

GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovJAXLondon2014
 
Performance Metrics for your Delivery Pipeline - Wolfgang Gottesheim
Performance Metrics for your Delivery Pipeline - Wolfgang GottesheimPerformance Metrics for your Delivery Pipeline - Wolfgang Gottesheim
Performance Metrics for your Delivery Pipeline - Wolfgang GottesheimJAXLondon2014
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...JAXLondon2014
 
Conditional Logging Considered Harmful - Sean Reilly
Conditional Logging Considered Harmful - Sean ReillyConditional Logging Considered Harmful - Sean Reilly
Conditional Logging Considered Harmful - Sean ReillyJAXLondon2014
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
API Management - a hands on workshop - Paul Fremantle
API Management - a hands on workshop - Paul FremantleAPI Management - a hands on workshop - Paul Fremantle
API Management - a hands on workshop - Paul FremantleJAXLondon2014
 
'Bootiful' Code with Spring Boot - Josh Long
'Bootiful' Code with Spring Boot - Josh Long'Bootiful' Code with Spring Boot - Josh Long
'Bootiful' Code with Spring Boot - Josh LongJAXLondon2014
 
The Full Stack Java Developer - Josh Long
The Full Stack Java Developer - Josh LongThe Full Stack Java Developer - Josh Long
The Full Stack Java Developer - Josh LongJAXLondon2014
 
The Economies of Scaling Software - Josh Long and Abdelmonaim Remani
The Economies of Scaling Software - Josh Long and Abdelmonaim RemaniThe Economies of Scaling Software - Josh Long and Abdelmonaim Remani
The Economies of Scaling Software - Josh Long and Abdelmonaim RemaniJAXLondon2014
 
Dataflow, the Forgotten Way - Russel Winder
Dataflow, the Forgotten Way - Russel WinderDataflow, the Forgotten Way - Russel Winder
Dataflow, the Forgotten Way - Russel WinderJAXLondon2014
 
Habits of Highly Effective Technical Teams - Martijn Verburg
Habits of Highly Effective Technical Teams - Martijn VerburgHabits of Highly Effective Technical Teams - Martijn Verburg
Habits of Highly Effective Technical Teams - Martijn VerburgJAXLondon2014
 
The Lazy Developer's Guide to Cloud Foundry - Holly Cummins
The Lazy Developer's Guide to Cloud Foundry - Holly CumminsThe Lazy Developer's Guide to Cloud Foundry - Holly Cummins
The Lazy Developer's Guide to Cloud Foundry - Holly CumminsJAXLondon2014
 
Testing within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopTesting within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopJAXLondon2014
 
Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...
Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...
Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...JAXLondon2014
 
Squeezing Performance of out of In-Memory Data Grids - Fuad Malikov
Squeezing Performance of out of In-Memory Data Grids - Fuad MalikovSqueezing Performance of out of In-Memory Data Grids - Fuad Malikov
Squeezing Performance of out of In-Memory Data Grids - Fuad MalikovJAXLondon2014
 
Spocktacular Testing - Russel Winder
Spocktacular Testing - Russel WinderSpocktacular Testing - Russel Winder
Spocktacular Testing - Russel WinderJAXLondon2014
 
Server Side JavaScript on the Java Platform - David Delabassee
Server Side JavaScript on the Java Platform - David DelabasseeServer Side JavaScript on the Java Platform - David Delabassee
Server Side JavaScript on the Java Platform - David DelabasseeJAXLondon2014
 
Reflection Madness - Dr. Heinz Kabutz
Reflection Madness - Dr. Heinz KabutzReflection Madness - Dr. Heinz Kabutz
Reflection Madness - Dr. Heinz KabutzJAXLondon2014
 
Rapid Web Application Development with MongoDB and the JVM - Trisha Gee
Rapid Web Application Development with MongoDB and the JVM - Trisha GeeRapid Web Application Development with MongoDB and the JVM - Trisha Gee
Rapid Web Application Development with MongoDB and the JVM - Trisha GeeJAXLondon2014
 
Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...
Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...
Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...JAXLondon2014
 

Mehr von JAXLondon2014 (20)

GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
 
Performance Metrics for your Delivery Pipeline - Wolfgang Gottesheim
Performance Metrics for your Delivery Pipeline - Wolfgang GottesheimPerformance Metrics for your Delivery Pipeline - Wolfgang Gottesheim
Performance Metrics for your Delivery Pipeline - Wolfgang Gottesheim
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
 
Conditional Logging Considered Harmful - Sean Reilly
Conditional Logging Considered Harmful - Sean ReillyConditional Logging Considered Harmful - Sean Reilly
Conditional Logging Considered Harmful - Sean Reilly
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
API Management - a hands on workshop - Paul Fremantle
API Management - a hands on workshop - Paul FremantleAPI Management - a hands on workshop - Paul Fremantle
API Management - a hands on workshop - Paul Fremantle
 
'Bootiful' Code with Spring Boot - Josh Long
'Bootiful' Code with Spring Boot - Josh Long'Bootiful' Code with Spring Boot - Josh Long
'Bootiful' Code with Spring Boot - Josh Long
 
The Full Stack Java Developer - Josh Long
The Full Stack Java Developer - Josh LongThe Full Stack Java Developer - Josh Long
The Full Stack Java Developer - Josh Long
 
The Economies of Scaling Software - Josh Long and Abdelmonaim Remani
The Economies of Scaling Software - Josh Long and Abdelmonaim RemaniThe Economies of Scaling Software - Josh Long and Abdelmonaim Remani
The Economies of Scaling Software - Josh Long and Abdelmonaim Remani
 
Dataflow, the Forgotten Way - Russel Winder
Dataflow, the Forgotten Way - Russel WinderDataflow, the Forgotten Way - Russel Winder
Dataflow, the Forgotten Way - Russel Winder
 
Habits of Highly Effective Technical Teams - Martijn Verburg
Habits of Highly Effective Technical Teams - Martijn VerburgHabits of Highly Effective Technical Teams - Martijn Verburg
Habits of Highly Effective Technical Teams - Martijn Verburg
 
The Lazy Developer's Guide to Cloud Foundry - Holly Cummins
The Lazy Developer's Guide to Cloud Foundry - Holly CumminsThe Lazy Developer's Guide to Cloud Foundry - Holly Cummins
The Lazy Developer's Guide to Cloud Foundry - Holly Cummins
 
Testing within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopTesting within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris Gollop
 
Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...
Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...
Testing the Enterprise Layers - the A, B, C's of Integration Testing - Aslak ...
 
Squeezing Performance of out of In-Memory Data Grids - Fuad Malikov
Squeezing Performance of out of In-Memory Data Grids - Fuad MalikovSqueezing Performance of out of In-Memory Data Grids - Fuad Malikov
Squeezing Performance of out of In-Memory Data Grids - Fuad Malikov
 
Spocktacular Testing - Russel Winder
Spocktacular Testing - Russel WinderSpocktacular Testing - Russel Winder
Spocktacular Testing - Russel Winder
 
Server Side JavaScript on the Java Platform - David Delabassee
Server Side JavaScript on the Java Platform - David DelabasseeServer Side JavaScript on the Java Platform - David Delabassee
Server Side JavaScript on the Java Platform - David Delabassee
 
Reflection Madness - Dr. Heinz Kabutz
Reflection Madness - Dr. Heinz KabutzReflection Madness - Dr. Heinz Kabutz
Reflection Madness - Dr. Heinz Kabutz
 
Rapid Web Application Development with MongoDB and the JVM - Trisha Gee
Rapid Web Application Development with MongoDB and the JVM - Trisha GeeRapid Web Application Development with MongoDB and the JVM - Trisha Gee
Rapid Web Application Development with MongoDB and the JVM - Trisha Gee
 
Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...
Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...
Pushing Java EE outside of the Enterprise: Home Automation and IoT - David De...
 

Kürzlich hochgeladen

Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 

Kürzlich hochgeladen (20)

Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 

Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

  • 1. Dr. James Stanier | Head of Analytics | Brandwatch.com | jamess@brandwatch.com
  • 2. Coming Up/ • Me, Brandwatch and new problems • Apache Kafka • Processing data in Java • Distributing work with Zookeeper • Managing state © 2014 Brandwatch | www.brandwatch.com 2
  • 3. Who? © 2014 Brandwatch | www.brandwatch.com 3
  • 4. Dr. James Stanier Head of Analytics, Brandwatch @jstanier | jamess@brandwatch.com © 2014 Brandwatch | www.brandwatch.com 4
  • 5. Brandwatch © 2014 Brandwatch | www.brandwatch.com 5
  • 6. Where we are/ • Brighton • New York • San Francisco • Berlin • Stuttgart © 2013 Brandwatch | www.brandwatch.com 6
  • 7. © 2014 Brandwatch | www.brandwatch.com 7
  • 8. What Brandwatch does/ Crawl Store and Index Analyse 3 Present 4 • Crawl 70M+ sites including key social networks • 27 languages • Powerful search operators • 20Bn + indexed URLs • Years of historical data • Automated topic & sentiment analysis in all 27 languages • Automate common tasks including alerts • Advanced analytics modules • Automatic categorisation with rules • Custom dashboards • Reporting & alerts © 2014 Brandwatch | www.brandwatch.com 8
  • 9. Brandwatch Analytics © 2014 Brandwatch | www.brandwatch.com 9
  • 11. Data/ Aggregation © 2014 Brandwatch | www.brandwatch.com 11
  • 12. Data/ Classification © 2014 Brandwatch | www.brandwatch.com 12
  • 13. Data/ Not just top level metrics © 2014 Brandwatch | www.brandwatch.com 13
  • 14. Development/ What do we use? © 2014 Brandwatch | www.brandwatch.com 14
  • 15. Data/ The numbers • 50+ Java Web Crawlers • 10+ Historical crawlers for new queries • Twitter via GNIP (now Twitter) • 70M+ query matches per day © 2014 Brandwatch | www.brandwatch.com 15
  • 16. The speed of social © 2014 Brandwatch | www.brandwatch.com 16
  • 17. © 2014 Brandwatch | www.brandwatch.com 17
  • 18. © 2014 Brandwatch | www.brandwatch.com 18
  • 19. © 2014 Brandwatch | www.brandwatch.com 19
  • 20. A new challenge © 2014 Brandwatch | www.brandwatch.com 20
  • 21. 21 The challenge/ The signal from the noise
  • 22. The challenge/ at scale • 100K+ user queries • 70M+ mentions per day • Polling the database for mentions would take 8hrs for one pass © 2014 Brandwatch | www.brandwatch.com 22
  • 23. The Problem/ How we handled it… Crawler 1 Crawler 2 Crawler n-1 Crawler N Kafka Cluster Signals Signals Signals Processing cluster Signals handler JVM DB Mentions Mentions © 2014 Brandwatch | www.brandwatch.com 23
  • 24. Kafka © 2014 Brandwatch | www.brandwatch.com 24
  • 25. Step 1/ Kafka Crawler 1 Crawler 2 Crawler n-1 Crawler N Kafka Cluster Mentions © 2014 Brandwatch | www.brandwatch.com 25
  • 26. Kafka/ What is it? • Apache Kafka is a publish-subscribe messaging system rethought as a distributed commit log • Apache top level project November 2013 • Started at LinkedIn © 2014 Brandwatch | www.brandwatch.com 26
  • 27. Kafka/ is… • Fast: hundreds of MBs read/write per second from thousands of clients • Scalable: clustered, partitioned over many machines, expanded without downtime • Durable: messages persisted to disk and replicated in cluster © 2014 Brandwatch | www.brandwatch.com 27
  • 28. Kafka/ Terminology • Kafka maintains feeds of messages called topics • Programs that publish messages are called producers • Programs that subscribe to messages are called consumers • Kafka is a cluster of servers called brokers © 2014 Brandwatch | www.brandwatch.com 28
  • 29. Kafka/ How it’s used... Producer Producer Producer Kafka Cluster Consumer Consumer Consumer © 2014 Brandwatch | www.brandwatch.com 29
  • 30. Kafka/ Written to disk? http://q.acm.org/detail.cfm?id=1563874 © 2014 Brandwatch | www.brandwatch.com 30
  • 31. Kafka/ Bending, not breaking http://engineering.gnip.com/tag/kafka/ © 2014 Brandwatch | www.brandwatch.com 31
  • 32. Kafka/ The anatomy of a topic 0 1 2 3 4 5 6 0 1 2 3 4 0 1 2 3 4 5 Partition 0 Partition 1 Partition 2 Old New Writes © 2014 Brandwatch | www.brandwatch.com 32
  • 33. Kafka/ Warning: ordering Kafka guarantees a total ordering per partition, not per whole topic © 2014 Brandwatch | www.brandwatch.com 33
  • 34. Kafka/ Try it out! > tar -xzf kafka_2.9.2-0.8.1.tgz! > cd kafka_2.9.2-0.8.1! > bin/zookeeper-server-start.sh config/ zookeeper.properties! > bin/kafka-server-start.sh config/ server.properties! ! © 2014 Brandwatch | www.brandwatch.com 34
  • 35. Kafka/ Try it out! > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test! > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test! Hello JAX!! > bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning! Hello JAX!! © 2014 Brandwatch | www.brandwatch.com 35
  • 36. Kafka/ With Java <dependency>! !<groupId>org.apache.kafka</groupId>! !<artifactId>kafka_2.10</artifactId>! !<version>0.8.1</version>! </dependency>! © 2014 Brandwatch | www.brandwatch.com 36
  • 37. Kafka/ With Java Properties props = new Properties();! props.put("metadata.broker.list", "broker1:9092,broker2:9092");! props.put("serializer.class", "kafka.serializer.StringEncoder");! props.put("partitioner.class", "example.producer.SimplePartitioner");! props.put("request.required.acks", "1");! ProducerConfig config = new ProducerConfig(props);! Producer<String, String> producer = new Producer<String, String>(config);! © 2014 Brandwatch | www.brandwatch.com 37
  • 38. Kafka/ Partitioning public class SimplePartitioner implements Partitioner<String> {! public SimplePartitioner (VerifiableProperties props) {! ! public int partition(String key, int numberOfPartitions) {! return md5hash(key) % numberOfPartitions;! }! }! © 2014 Brandwatch | www.brandwatch.com 38
  • 39. Kafka/ Sending from the crawlers String message = toJson(...);! KeyedMessage<String, String> message = new! !KeyedMessage<String, String>("query.mentions", !queryId, message);! producer.send(message);! © 2014 Brandwatch | www.brandwatch.com 39
  • 40. Step 1/ Done Crawler 1 Crawler 2 Crawler n-1 Crawler N Kafka Cluster Mentions © 2014 Brandwatch | www.brandwatch.com 40
  • 41. Processing © 2014 Brandwatch | www.brandwatch.com 41
  • 42. 42 Processing/ What’s happening now?
  • 43. Step 2.1/ One processing JVM Crawler 1 Crawler 2 Crawler n-1 Crawler N Kafka Cluster Mentions Signals processor © 2014 Brandwatch | www.brandwatch.com 43
  • 44. Processing/ A wild tweet appears! Mention date: 14/10/2014 6:10PM pageType: twitter author: @javadude hashtags: [#jaxlondon, #amazingtalk, #greatshoes] mentionedTweeters: [@jstanier] text: “@jstanier is at #jaxlondon tonight #amazingtalk #greatshoes” © 2014 Brandwatch | www.brandwatch.com 44
  • 45. Processing/ Storing hashtags Map<Date, Multiset<String>>! ! Initialise with the last 24 hours © 2014 Brandwatch | www.brandwatch.com 45
  • 46. Processing/ Storing hashtags Map<Date, Multiset<String>>! ! Mention date: 14/10/2014 6:10PM hashtags: [#jaxlondon, #amazingtalk, #greatshoes] © 2014 Brandwatch | www.brandwatch.com 46
  • 47. Processing/ Storing hashtags Map<Date, Multiset<String>>! ! Mention date: 14/10/2014 6:10PM hashtags: [#jaxlondon, #amazingtalk, #greatshoes] add(“#jaxlondon”)! add(“#amazingtalk”)! add(“#greatshoes”)! © 2014 Brandwatch | www.brandwatch.com 47
  • 48. Processing/ Cycling the buckets @Scheduled(cron = "0 0 * * * *")! public void cycleBuckets() {! Date oldest = buckets.lastKey();! removeBucket(oldest);! DateTime newest = new! ! ! !DateTime(buckets.firstKey());! addBucket(newest.plusHours(1).toDate());! }! © 2014 Brandwatch | www.brandwatch.com 48
  • 49. Processing/ Detecting spikes • At regular intervals • For each hashtag • Convert to a timeseries [5, …. 1002, 5499] • Use our super secret detection algorithm • Give a score to it • If score > threshold, it’s interesting • Send it on a new Kafka topic © 2014 Brandwatch | www.brandwatch.com 49
  • 50. Processing/ What we just did #hashtag data model © 2014 Brandwatch | www.brandwatch.com 50
  • 51. Processing/ But we also track… author data model sentiment data model page type data model link share data model #hashtag data model country data model volume data model © 2014 Brandwatch | www.brandwatch.com 51
  • 52. Processing/ …for one query “JAX London” query author data model sentiment data model page type data model link share data model #hashtag data model country data model volume data model © 2014 Brandwatch | www.brandwatch.com 52
  • 53. Processing/ 100K+ queries and rising © 2014 Brandwatch | www.brandwatch.com 53
  • 54. Processing/ We need more JVMs But how do we share the workload? © 2014 Brandwatch | www.brandwatch.com 54
  • 55. Distribution of work © 2014 Brandwatch | www.brandwatch.com 55
  • 56. Step 2.2/ A cluster of processing JVMs Crawler 1 Crawler 2 Crawler n-1 Crawler N Mentions Signals © 2014 Brandwatch | www.brandwatch.com Kafka Cluster Signals Processing cluster Mentions
  • 57. Distribution/ An atomic unit of work © 2014 Brandwatch | www.brandwatch.com Signals Processing cluster ?
  • 58. Distribution/ Leader election A way of deciding who is the leader for a task in a group of distributed nodes © 2014 Brandwatch | www.brandwatch.com 58
  • 59. Distribution/ Zookeeper A way of coordinating and managing distributed applications © 2014 Brandwatch | www.brandwatch.com 59
  • 60. Zookeeper/ It’s like a file system /brandwatch /feature_1 /feature_2 © 2014 Brandwatch | www.brandwatch.com 60
  • 61. Zookeeper/ At the command line © 2014 Brandwatch | www.brandwatch.com 61
  • 62. Distribution/ In Java http://curator.apache.org/curator-framework © 2014 Brandwatch | www.brandwatch.com 62
  • 63. Distribution/ Recipes © 2014 Brandwatch | www.brandwatch.com 63
  • 64. Distribution/ Instantiating Curator ! CuratorFrameworkFactory! .builder()! .connectString(zkQuorum)! .namespace(namespace)! .build()! ! ! ! !.start();! © 2014 Brandwatch | www.brandwatch.com 64
  • 65. Distribution/ Offering jobs /brandwatch /signals /queries /15846 /1268589 Manager JVM DB © 2014 Brandwatch | www.brandwatch.com 65
  • 66. Distribution/ Adding nodes ! public void createZNode(String queryId) {! try {! client.create().forPath(ZK_NODE_PREFIX + queryId);! } catch (NodeExistsException e) {! log.debug("Node {} was already created.”, queryId);! }! }! © 2014 Brandwatch | www.brandwatch.com 66
  • 67. Distribution/ Deleting nodes ! public void removeZNode(String queryId) {! try {! client.delete().forPath(ZK_NODE_PREFIX + queryId);! } catch (NoNodeException e) {! log.debug("Node {} was already deleted.”, queryId);! }! }! © 2014 Brandwatch | www.brandwatch.com 67
  • 68. Distribution/ Leader election 101 /brandwatch /signals /queries /15846 /1268589 Processing JVM 1 Processing JVM 2 Processing JVM 3 © 2014 Brandwatch | www.brandwatch.com 68
  • 69. Distribution/ Leader election 101 /brandwatch /signals /queries /15846 /1268589 1 2 3 Processing JVM 1 Processing JVM 2 Processing JVM 3 © 2014 Brandwatch | www.brandwatch.com 69
  • 70. Distribution/ The leader dies /brandwatch /signals /queries /15846 /1268589 2 3 Processing JVM Processing JVM © 2014 Brandwatch | www.brandwatch.com 70
  • 71. Distribution/ The dead rises again /brandwatch /signals /queries /15846 /1268589 2 3 4 Processing JVM Processing JVM Processing JVM © 2014 Brandwatch | www.brandwatch.com 71
  • 72. Distribution/ Curator: LeaderLatch ! public LeaderLatch(CuratorFramework client, String latchPath)! Parameters:! ! client - the client! latchPath - the path for this leadership group! © 2014 Brandwatch | www.brandwatch.com 72
  • 73. Distribution/ LeaderLatch recipe public class WorkerManager implements PathChildrenCacheListener {! ! private Map<Integer, LeaderLatch> leaderLatches = newHashMap();! © 2014 Brandwatch | www.brandwatch.com 73 ! @Override! public void childEvent(CuratorFramework client, ! PathChildrenCacheEvent event) {! // Handle adds and removes here!! }! }!
  • 74. Distribution/ Curator: Starting up @PostConstruct! public void initialise() throws Exception {! List<ChildData> currentData = newArrayList(initialisePathChildrenCache());! log.info("Pre creating workers for {} existing queries", currentData.size());! for (ChildData childData : currentData) {! int queryId = parseQueryIdFromPath(childData.getPath());! startLeaderElection(queryId);! }! }! © 2014 Brandwatch | www.brandwatch.com 74
  • 75. Distribution/ Curator: PathChildrenCache private List<ChildData> initialisePathChildrenCache() throws Exception {! pathChildrenCache.start(StartMode.BUILD_INITIAL_CACHE);! pathChildrenCache.getListenable().addListener(this);! List<ChildData> currentData = pathChildrenCache.getCurrentData();! return currentData;! }! © 2014 Brandwatch | www.brandwatch.com 75
  • 76. Distribution/ Curator: Adding a node @Override! public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) { ! ChildData childData = event.getData();! switch (event.getType()) {! case CHILD_ADDED:! queryId = parseQueryIdFromPath(childData.getPath());! if (!haveLeaderLatchForQuery(queryId)) {! startLeaderElection(queryId);! }! break;! © 2014 Brandwatch | www.brandwatch.com 76
  • 77. Distribution/ Curator: Deleting a node // Continued...! case CHILD_REMOVED:! queryId = parseQueryIdFromPath(childData.getPath());! removeLeaderLatchForQuery(queryId);! break;! default:! break;! }! }! © 2014 Brandwatch | www.brandwatch.com 77
  • 78. Distribution/ Almost there? We are processing long running jobs What about workers getting overloaded? © 2014 Brandwatch | www.brandwatch.com 78
  • 79. Distribution/ After leader election 1. Take leadership 2. Hit max queries? a. No – go to 3 b. Yes – give up leadership, try again 3. Start working © 2014 Brandwatch | www.brandwatch.com 79
  • 80. Distribution/ Now we’re almost there? Actually, no… © 2014 Brandwatch | www.brandwatch.com 80
  • 81. Distribution/ Infinite election At capacity! Processing JVM Elected for 1328 Elected for 1328 At capacity! At capacity! Elected for 1328 Processing JVM Processing JVM © 2014 Brandwatch | www.brandwatch.com 81
  • 82. Distribution/ Solution Processing JVM Elected for 1328 At capacity! Elected for 1328 Processing JVM Processing JVM Refused 1328 © 2014 Brandwatch | www.brandwatch.com 82
  • 83. Distribution/ Solution At capacity! Processing JVM Elected for 1328 Elected for 1328 Refused 1328 At capacity! At capacity! Elected for 1328 Processing JVM Processing JVM Refused 1328 Refused 1328 © 2014 Brandwatch | www.brandwatch.com 83
  • 84. State © 2014 Brandwatch | www.brandwatch.com 84
  • 85. State/ CAP theorem Availability CA AP CP © 2014 Brandwatch | www.brandwatch.com 85
  • 86. State/ Snapshotting of worker data If one worker dies, we want the other to pick up where it left off Regular snapshotting to HBase © 2014 Brandwatch | www.brandwatch.com 86
  • 87. State/ Serialisation and compression Serialise and compress using Kryo ~ 0.5MB per query, but a lot are very small © 2014 Brandwatch | www.brandwatch.com 87
  • 88. Step 2.2/ Done! Crawler 1 Crawler 2 Crawler n-1 Crawler N Kafka Cluster Signals Signals Signals Processing cluster Signals handler JVM DB Mentions Mentions © 2014 Brandwatch | www.brandwatch.com 88
  • 89. Monitoring © 2014 Brandwatch | www.brandwatch.com 89
  • 90. Monitoring/ Statsd and Graphite © 2014 Brandwatch | www.brandwatch.com 90
  • 91. Closing remarks © 2014 Brandwatch | www.brandwatch.com 91
  • 92. © 2014 Brandwatch | www.brandwatch.com 92
  • 93. Summary/ Using this architecture Now • Smarter alerts (email, push, in-browser) • Monitoring crises/events as they happen Underway • Automatic clustering of spikes into events • Historical analysis of trends © 2014 Brandwatch | www.brandwatch.com 93
  • 94. Say hello/ jamess@brandwatch.com UK: +44 (0)1273 358 635 @brandwatch | @jstanier www.brandwatch.com © 2014 Brandwatch | www.brandwatch.com 94
  • 95. Q&A © 2014 Brandwatch | www.brandwatch.com 95