SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
INTEGRATE SOLR WITH REAL-TIME STREAM
PROCESSING APPLICATIONS

Timothy Potter
@thelabdude
linkedin.com/thelabdude
whoami
independent consultant search / big data projects
soon to be joining engineering team @LucidWorks
co-author Solr In Action
previously big data architect Dachis Group
my storm story
re-designed a complex batch-oriented indexing
pipeline based on Hadoop (Oozie, Pig, Hive, Sqoop)
to real-time storm topology
agenda
walk through how to develop a storm topology
common integration points with Solr
(near real-time indexing, percolator, real-time get)
example
listen to click events from 1.usa.gov URL shortener
(bit.ly) to determine trending US government sites
stream of click events:
http://developer.usa.gov/1usagov
http://www.smartgrid.gov -> http://1.usa.gov/ayu0Ru
beyond word count
tackle real challenges you’ll encounter when
developing a storm topology
and what about ... unit testing, dependency injection,
measure runtime behavior of your components, separation
of concerns, reducing boilerplate, hiding complexity ...
storm
open source distributed computation system
scalability, fault-tolerance, guaranteed message
processing (optional)
storm primitives
• 
• 
• 
• 
• 

tuple: ordered list of values
stream: unbounded sequence of tuples
spout: emit a stream of tuples (source)
bolt: performs some operation on each tuple
topology: dag of spouts and tuples
solution requirements
• 
• 
• 
• 
• 

receive click events from 1.usa.gov stream
count frequency of pages in a time window
rank top N sites per time window
extract title, body text, image for each link
persist rankings and metadata for visualization
trending snapshot (sept 12, 2013)
embed.ly
API

bolt
spout

field
grouping
bit.ly hash

global
grouping

EnrichLink
Bolt

1.usa.gov
Spout

field
grouping
bit.ly hash

provided by in the
storm-starter project

Solr
Indexing
Bolt

field
grouping
obj

Rolling
Count
Bolt

Intermediate
Rankings
Bolt

Total
Rankings
Bolt

stream
data store

Solr

grouping

global
grouping

Persist
Rankings
Bolt

Metrics
DB
stream grouping
• 
• 
• 
• 

shuffle: random distribution of tuples to all instances of a
bolt
field(s): group tuples by one or more fields in common
global: reduce down to one
all: replicate stream to all instances of a bolt
source: https://github.com/nathanmarz/storm/wiki/Concepts
useful storm concepts
bolts can receive input from many spouts
tuples in a stream can be grouped together
streams can be split and joined
bolts can inject new tuples into the stream
components can be distributed across a cluster at a
configurable parallelism level
•  optionally, storm keeps track of each tuple emitted by a
spout (ack or fail)
• 
• 
• 
• 
• 
tools
• 
• 
• 
• 
• 

Spring framework – dependency injection, configuration,
unit testing, mature, etc.
Groovy – keeps your code tidy and elegant
Mockito – ignore stuff your test doesn’t care about
Netty – fast & powerful NIO networking library
Coda Hale metrics – get visibility into how your bolts and
spouts are performing (at a very low-level)
spout
easy! just produce a stream of tuples ...
and ... avoid blocking when waiting for more data, ease off throttle if
topology is not processing fast enough, deal with failed tuples, choose if it
should use message Ids for each tuple emitted, data model / schema, etc ...
Hide complexity
of implementing
Storm contract

SpringSpout

Streaming
DataAction
(POJO)

Streaming
DataProvider
(POJO)

Spring
Dependency
Injection

SpringBolt

developer
focuses on
business
logic

Spring container (1 per topology per JVM)

JDBC

WebService
streaming data provider
class	
  OneUsaGovStreamingDataProvider	
  implements	
  StreamingDataProvider,	
  MessageHandler	
  {	
  
	
  
Spring Dependency Injection
	
  	
  	
  	
  MessageStream	
  messageStream	
  
	
  
	
  	
  	
  	
  ...	
  
	
  
	
  	
  	
  	
  void	
  open(Map	
  stormConf)	
  {	
  messageStream.receive(this)	
  }	
  
	
  
non-blocking call to get the
	
  	
  	
  	
  boolean	
  next(NamedValues	
  nv)	
  {	
  
next message from 1.usa.gov
	
  	
  	
  	
  	
  	
  	
  	
  String	
  msg	
  =	
  queue.poll()	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  (msg)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OneUsaGovRequest	
  req	
  =	
  objectMapper.readValue(msg,	
  OneUsaGovRequest)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  if	
  (req	
  !=	
  null	
  &&	
  req.globalBitlyHash	
  !=	
  null)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  nv.set(OneUsaGovTopology.GLOBAL_BITLY_HASH,	
  req.globalBitlyHash)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  nv.set(OneUsaGovTopology.JSON_PAYLOAD,	
  req)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  return	
  true	
  
use Jackson JSON parser
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  }	
  
to create an object from the
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  
raw incoming data
	
  	
  	
  	
  	
  	
  	
  	
  return	
  false	
  
	
  	
  	
  	
  }	
  
	
  
	
  	
  	
  	
  void	
  handleMessage(String	
  msg)	
  {	
  queue.offer(msg)	
  }	
  
jackson json to java
@JsonIgnoreProperties(ignoreUnknown	
  =	
  true)	
  
class	
  OneUsaGovRequest	
  implements	
  Serializable	
  {	
  
	
  
	
  	
  	
  	
  @JsonProperty("a")	
  
	
  	
  	
  	
  String	
  userAgent;	
  
	
  
	
  	
  	
  	
  @JsonProperty("c")	
  
Spring converts json to java object for you:
	
  	
  	
  	
  String	
  countryCode;	
  
	
  
	
  <bean	
  id="restTemplate"	
  	
  
	
  	
  	
  	
  @JsonProperty("nk")	
  
	
  	
  	
  	
  class="org.springframework.web.client.RestTemplate">	
  
	
  	
  	
  	
  int	
  knownUser;	
  
	
  	
  	
  <property	
  name="messageConverters">	
  
	
  
	
  	
  	
  	
  	
  <list>	
  
	
  	
  	
  	
  @JsonProperty("g")	
  
	
  	
  	
  	
  	
  	
  	
  <bean	
  id="messageConverter”	
  
	
  	
  	
  	
  String	
  globalBitlyHash;	
  
	
  	
  	
  	
  class="...json.MappingJackson2HttpMessageConverter">	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  </bean>	
  
	
  	
  	
  	
  @JsonProperty("h")	
  
	
  	
  	
  	
  	
  	
  	
  	
  </list>	
  
	
  	
  	
  	
  String	
  encodingUserBitlyHash;	
  
	
  	
  	
  	
  	
  	
  </property>	
  
	
  
	
  	
  	
  	
  </bean>	
  
	
  	
  	
  	
  @JsonProperty("l")	
  
	
  	
  	
  	
  String	
  encodingUserLogin;	
  
	
  
	
  	
  	
  	
  ...	
  
}	
  
spout data provider spring-managed bean
<bean	
  id="oneUsaGovStreamingDataProvider"	
  	
  
	
  	
  	
  	
  	
  	
  class="com.bigdatajumpstart.storm.OneUsaGovStreamingDataProvider">	
  
	
  	
  	
  	
  <property	
  name="messageStream">	
  
	
  	
  	
  	
  	
  	
  	
  	
  <bean	
  class="com.bigdatajumpstart.netty.HttpClient">	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <constructor-­‐arg	
  index="0"	
  value="${streamUrl}"/>	
  
	
  	
  	
  	
  	
  	
  	
  	
  </bean>	
  
	
  	
  	
  	
  </property>	
  
</bean>	
  

Note: when building the StormTopology to submit to Storm, you do:
	
  builder.setSpout("1.usa.gov-­‐spout",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  new	
  SpringSpout("oneUsaGovStreamingDataProvider",	
  spoutFields),	
  1)	
  
spout data provider unit test
class	
  OneUsaGovStreamingDataProviderTest	
  extends	
  StreamingDataProviderTestBase	
  {	
  
	
  
	
  	
  	
  	
  @Test	
  
	
  	
  	
  	
  void	
  testDataProvider()	
  {	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  String	
  jsonStr	
  =	
  '''{	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "a":	
  "user-­‐agent",	
  "c":	
  "US",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "nk":	
  0,	
  "tz":	
  "America/Los_Angeles",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "gr":	
  "OR",	
  "g":	
  "2BktiW",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "h":	
  "12Me4B2",	
  "l":	
  "usairforce",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "al":	
  "en-­‐us",	
  "hh":	
  "1.usa.gov",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "r":	
  "http://example.com/foo",	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ...	
  
	
  	
  	
  	
  	
  	
  	
  	
  }'''	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  OneUsaGovStreamingDataProvider	
  dataProvider	
  =	
  new	
  OneUsaGovStreamingDataProvider()	
  
	
  	
  	
  	
  	
  	
  	
  	
  dataProvider.setMessageStream(mock(MessageStream))	
  
	
  	
  	
  	
  	
  	
  	
  	
  dataProvider.open(stormConf)	
  //	
  Config	
  setup	
  in	
  base	
  class	
  
	
  	
  	
  	
  	
  	
  	
  	
  dataProvider.handleMessage(jsonStr)	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  NamedValues	
  record	
  =	
  new	
  NamedValues(OneUsaGovTopology.spoutFields)	
  
	
  	
  	
  	
  	
  	
  	
  	
  assertTrue	
  dataProvider.next(record)	
  
	
  	
  	
  	
  	
  	
  	
  	
  ...	
  
	
  	
  	
  	
  }	
  
}	
  

mock json to simulate
data from 1.usa.gov feed
use Mockito to satisfy
dependencies not needed
for this test

asserts to verify
data provider
works correctly
rolling count bolt
• 
• 
• 
• 

counts frequency of links in a sliding time window
emits topN in current window every M seconds
uses tick tuple trick provided by Storm to emit
every M seconds (configurable)
provided with the storm-starter project

http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
enrich link metadata bolt
• 
• 
• 
• 
• 

calls out to embed.ly API
caches results locally in the bolt instance
relies on field grouping (incoming tuples)
outputs data to be indexed in Solr
benefits from parallelism to enrich more
links concurrently (watch those rate limits)
embed.ly service

class	
  EmbedlyService	
  {	
  
	
  
	
  	
  	
  	
  @Autowired	
  
	
  	
  	
  	
  RestTemplate	
  restTemplate	
  
integrate coda hale metrics
	
  
	
  	
  	
  	
  String	
  apiKey	
  
	
  
	
  	
  	
  	
  private	
  Timer	
  apiTimer	
  =	
  MetricsSupport.timer(EmbedlyService,	
  "apiCall")	
  
	
  
	
  	
  	
  	
  Embedly	
  getLinkMetadata(String	
  link)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  String	
  urlEncoded	
  =	
  URLEncoder.encode(link,"UTF-­‐8")	
  
	
  	
  	
  	
  	
  	
  	
  	
  URI	
  uri	
  =	
  new	
  URI("https://api.embed.ly/1/oembed?key=${apiKey}&url=${urlEncoded}")	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  Embedly	
  embedly	
  =	
  null	
  
	
  	
  	
  	
  	
  	
  	
  	
  MetricsSupport.withTimer(apiTimer,	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  embedly	
  =	
  restTemplate.getForObject(uri,	
  Embedly)	
  
	
  	
  	
  	
  	
  	
  	
  	
  })	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  embedly	
  
simple closure to time our
	
  	
  	
  	
  }	
  
requests to the Web service
metrics
• 
• 
• 
• 

capture runtime behavior of the components in
your topology
Coda Hale metrics - http://metrics.codahale.com/
output metrics every N minutes
report metrics to JMX, ganglia, graphite, etc
-­‐-­‐	
  Meters	
  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
EnrichLinkBoltLogic.solrQueries	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  count	
  =	
  97	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  mean	
  rate	
  =	
  0.81	
  events/second	
  
	
  	
  	
  	
  	
  1-­‐minute	
  rate	
  =	
  0.89	
  events/second	
  
	
  	
  	
  	
  	
  5-­‐minute	
  rate	
  =	
  1.62	
  events/second	
  
	
  	
  	
  	
  15-­‐minute	
  rate	
  =	
  1.86	
  events/second	
  
	
  
SolrBoltLogic.linksIndexed	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  count	
  =	
  60	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  mean	
  rate	
  =	
  0.50	
  events/second	
  
	
  	
  	
  	
  	
  1-­‐minute	
  rate	
  =	
  0.41	
  events/second	
  
	
  	
  	
  	
  	
  5-­‐minute	
  rate	
  =	
  0.16	
  events/second	
  
	
  	
  	
  	
  15-­‐minute	
  rate	
  =	
  0.06	
  events/second	
  
	
  
-­‐-­‐	
  Timers	
  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
EmbedlyService.apiCall	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  count	
  =	
  60	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  mean	
  rate	
  =	
  0.50	
  calls/second	
  
	
  	
  	
  	
  	
  1-­‐minute	
  rate	
  =	
  0.40	
  calls/second	
  
	
  	
  	
  	
  	
  5-­‐minute	
  rate	
  =	
  0.16	
  calls/second	
  
	
  	
  	
  	
  15-­‐minute	
  rate	
  =	
  0.06	
  calls/second	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  min	
  =	
  138.70	
  milliseconds	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  max	
  =	
  7642.92	
  milliseconds	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  mean	
  =	
  1148.29	
  milliseconds	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  stddev	
  =	
  1281.40	
  milliseconds	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  median	
  =	
  652.83	
  milliseconds	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  75%	
  <=	
  1620.96	
  milliseconds	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ...	
  
storm cluster concepts
• 
• 
• 
• 
• 
• 

nimbus: master node (~job tracker in Hadoop)
zookeeper: cluster management / coordination
supervisor: one per node in the cluster to manage worker
processes
worker: one or more per supervisor (JVM process)
executor: thread in worker
task: work performed by a spout or bolt
Topology	
  
JAR	
  

Nimbus	
  

Node	
  1	
  

Zookeeper	
  

Supervisor	
  (1	
  per	
  node)	
  

Each component (spout or bolt)
is distributed across a cluster of
workers based on a configurable
parallelism

Worker	
  1	
  (port	
  6701)	
  
executor	
  
(thread)	
  

JVM	
  process	
  

...	
  N	
  workers	
  
...	
  M	
  nodes	
  
 @Override	
  
	
  StormTopology	
  build(StreamingApp	
  app)	
  throws	
  Exception	
  {	
  
parallelism hint to
	
  	
  	
  	
  	
  	
  	
  	
  	
  
the framework
	
  	
  	
  	
  	
  ...	
  	
  
(can be rebalanced)
	
  	
  	
  	
  	
  TopologyBuilder	
  builder	
  =	
  new	
  TopologyBuilder()	
  
	
  
	
  	
  	
  	
  	
  builder.setSpout("1.usa.gov-­‐spout",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  new	
  SpringSpout("oneUsaGovStreamingDataProvider",	
  spoutFields),	
  1)	
  
	
  
	
  	
  	
  	
  	
  builder.setBolt("enrich-­‐link-­‐bolt",	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  new	
  SpringBolt("enrichLinkAction",	
  enrichedLinkFields),	
  3)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .fieldsGrouping("1.usa.gov-­‐spout",	
  globalBitlyHashGrouping)	
  
	
  
	
  	
  	
  	
  	
  ...	
  
solr integration points
• 
• 
• 

real-time get
near real-time indexing (NRT)
percolate (match incoming docs to pre-existing
queries)
real-time get
use Solr for fast lookups by document ID
class	
  SolrClient	
  {	
  
	
  
	
  	
  	
  	
  @Autowired	
  
	
  	
  	
  	
  SolrServer	
  solrServer	
  
	
  
	
  	
  	
  	
  SolrDocument	
  get(String	
  docId,	
  String...	
  fields)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  SolrQuery	
  q	
  =	
  new	
  SolrQuery()	
  
	
  	
  	
  	
  	
  	
  	
  	
  q.setRequestHandler("/get")	
  
	
  	
  	
  	
  	
  	
  	
  	
  q.set("id",	
  docId)	
  
	
  	
  	
  	
  	
  	
  	
  	
  q.setFields(fields)	
  
	
  	
  	
  	
  	
  	
  	
  	
  QueryRequest	
  req	
  =	
  new	
  QueryRequest(q)	
  
	
  	
  	
  	
  	
  	
  	
  	
  req.setResponseParser(new	
  BinaryResponseParser())	
  
	
  	
  	
  	
  	
  	
  	
  	
  QueryResponse	
  rsp	
  =	
  req.process(solrServer)	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  (SolrDocument)rsp.getResponse().get("doc")	
  
	
  	
  	
  	
  }	
  
}	
  

send the request to the
“get” request handler
near real-time indexing
•  If possible, use CloudSolrServer to route documents
directly to the correct shard leaders (SOLR-4816)
•  Use <openSearcher>false</openSearcher> for auto “hard”
commits
•  Use auto soft commits as needed
•  Use parallelism of Storm bolt to distribute indexing work to
N nodes
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
percolate
•  match incoming documents to pre-configured
queries (inverted search)
–  example: Is this tweet related to campaign Y for brand X?

•  use storm’s distributed computation support to
evaluate M pre-configured queries per doc
two possible approaches
•  Lucene-only solution using MemoryIndex
–  See presentation by Charlie Hull and Alan Woodward

•  EmbeddedSolrServer
–  Full solrconfig.xml / schema.xml
–  RAMDirectory
–  Relies on Storm to scale up documents / second
–  Easy solution for up to a few thousand queries
PercolatorBolt	
  1	
  
Embedded	
  
SolrServer	
  
Twi"er	
  
Spout	
  
random	
  
stream	
  
grouping	
  

...	
  

Pre-­‐configured	
  
queries	
  stored	
  in	
  	
  
a	
  database	
  
Could be 100’s of these

PercolatorBolt	
  N	
  
Embedded	
  
SolrServer	
  

ZeroMQ	
  
pub/sub	
  to	
  push	
  
query	
  changes	
  
to	
  percolator	
  
tick tuples
•  send a special kind of tuple to a bolt every N
seconds
	
  if	
  (TupleHelpers.isTickTuple(input))	
  {	
  
	
  	
  	
  	
  	
  //	
  do	
  special	
  work	
  
	
  }	
  

used in percolator to delete accumulated documents every minute or so ...
references
• 

Storm Wiki

•  https://github.com/nathanmarz/storm/wiki/Documentation

• 

Overview: Krishna Gade
•  http://www.slideshare.net/KrishnaGade2/storm-at-twitter

• 

Trending Topics: Michael Knoll

•  http://www.michael-noll.com/blog/2013/01/18/implementing-real-timetrending-topics-in-storm/

• 

Understanding Parallelism: Michael Knoll

•  http://www.michael-noll.com/blog/2012/10/16/understanding-theparallelism-of-a-storm-topology/
Q&A
get the code: https://github.com/thelabdude/lsrdublin

Manning	
  coupon	
  codes	
  for	
  conference	
  related	
  books:	
  
	
  
h"p://deals.manningpublica8ons.com/Revolu8onsEU2013.html	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and OptimizationMongoDB
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubySATOSHI TAGOMORI
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼NAVER D2
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...thelabdude
 
Python in the database
Python in the databasePython in the database
Python in the databasepybcn
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsFlink Forward
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
Angular JS deep dive
Angular JS deep diveAngular JS deep dive
Angular JS deep diveAxilis
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesronwarshawsky
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Lucidworks
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Holden Karau
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormAndrea Iacono
 

Was ist angesagt? (20)

Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Python in the database
Python in the databasePython in the database
Python in the database
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Angular JS deep dive
Angular JS deep diveAngular JS deep dive
Angular JS deep dive
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Mongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategiesMongo db pefrormance optimization strategies
Mongo db pefrormance optimization strategies
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 

Andere mochten auch

Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Lucidworks
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...Lucidworks
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Hortonworks
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...Lucidworks
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...confluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks
 

Andere mochten auch (20)

Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 

Ähnlich wie INTEGRATE SOLR WITH REAL-TIME STREAM PROCESSING

Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
 
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamImre Nagi
 
Comet with node.js and V8
Comet with node.js and V8Comet with node.js and V8
Comet with node.js and V8amix3k
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyVarun Vijayaraghavan
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Djangokenluck2001
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWAREFIWARE
 
Building Scalable Stateless Applications with RxJava
Building Scalable Stateless Applications with RxJavaBuilding Scalable Stateless Applications with RxJava
Building Scalable Stateless Applications with RxJavaRick Warren
 
Mobile Software Engineering Crash Course - C06 WindowsPhone
Mobile Software Engineering Crash Course - C06 WindowsPhoneMobile Software Engineering Crash Course - C06 WindowsPhone
Mobile Software Engineering Crash Course - C06 WindowsPhoneMohammad Shaker
 
比XML更好用的Java Annotation
比XML更好用的Java Annotation比XML更好用的Java Annotation
比XML更好用的Java Annotationjavatwo2011
 
3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - Monospace3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - MonospaceFrank Krueger
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsJeff Durta
 
Modern Android app library stack
Modern Android app library stackModern Android app library stack
Modern Android app library stackTomáš Kypta
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSuzquiano
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every dayVadym Khondar
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Luca Lusso
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBMongoDB
 
Event-driven IO server-side JavaScript environment based on V8 Engine
Event-driven IO server-side JavaScript environment based on V8 EngineEvent-driven IO server-side JavaScript environment based on V8 Engine
Event-driven IO server-side JavaScript environment based on V8 EngineRicardo Silva
 
What’s New in Spring Data MongoDB
What’s New in Spring Data MongoDBWhat’s New in Spring Data MongoDB
What’s New in Spring Data MongoDBVMware Tanzu
 

Ähnlich wie INTEGRATE SOLR WITH REAL-TIME STREAM PROCESSING (20)

Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
 
Comet with node.js and V8
Comet with node.js and V8Comet with node.js and V8
Comet with node.js and V8
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 
huhu
huhuhuhu
huhu
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
Building Scalable Stateless Applications with RxJava
Building Scalable Stateless Applications with RxJavaBuilding Scalable Stateless Applications with RxJava
Building Scalable Stateless Applications with RxJava
 
Mobile Software Engineering Crash Course - C06 WindowsPhone
Mobile Software Engineering Crash Course - C06 WindowsPhoneMobile Software Engineering Crash Course - C06 WindowsPhone
Mobile Software Engineering Crash Course - C06 WindowsPhone
 
比XML更好用的Java Annotation
比XML更好用的Java Annotation比XML更好用的Java Annotation
比XML更好用的Java Annotation
 
3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - Monospace3 Mobile App Dev Problems - Monospace
3 Mobile App Dev Problems - Monospace
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applications
 
Modern Android app library stack
Modern Android app library stackModern Android app library stack
Modern Android app library stack
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMS
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDB
 
Event-driven IO server-side JavaScript environment based on V8 Engine
Event-driven IO server-side JavaScript environment based on V8 EngineEvent-driven IO server-side JavaScript environment based on V8 Engine
Event-driven IO server-side JavaScript environment based on V8 Engine
 
What’s New in Spring Data MongoDB
What’s New in Spring Data MongoDBWhat’s New in Spring Data MongoDB
What’s New in Spring Data MongoDB
 

Mehr von lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platformlucenerevolution
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucenelucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

INTEGRATE SOLR WITH REAL-TIME STREAM PROCESSING

  • 1.
  • 2. INTEGRATE SOLR WITH REAL-TIME STREAM PROCESSING APPLICATIONS Timothy Potter @thelabdude linkedin.com/thelabdude
  • 3. whoami independent consultant search / big data projects soon to be joining engineering team @LucidWorks co-author Solr In Action previously big data architect Dachis Group
  • 4. my storm story re-designed a complex batch-oriented indexing pipeline based on Hadoop (Oozie, Pig, Hive, Sqoop) to real-time storm topology
  • 5. agenda walk through how to develop a storm topology common integration points with Solr (near real-time indexing, percolator, real-time get)
  • 6. example listen to click events from 1.usa.gov URL shortener (bit.ly) to determine trending US government sites stream of click events: http://developer.usa.gov/1usagov http://www.smartgrid.gov -> http://1.usa.gov/ayu0Ru
  • 7. beyond word count tackle real challenges you’ll encounter when developing a storm topology and what about ... unit testing, dependency injection, measure runtime behavior of your components, separation of concerns, reducing boilerplate, hiding complexity ...
  • 8. storm open source distributed computation system scalability, fault-tolerance, guaranteed message processing (optional)
  • 9. storm primitives •  •  •  •  •  tuple: ordered list of values stream: unbounded sequence of tuples spout: emit a stream of tuples (source) bolt: performs some operation on each tuple topology: dag of spouts and tuples
  • 10. solution requirements •  •  •  •  •  receive click events from 1.usa.gov stream count frequency of pages in a time window rank top N sites per time window extract title, body text, image for each link persist rankings and metadata for visualization
  • 12.
  • 13. embed.ly API bolt spout field grouping bit.ly hash global grouping EnrichLink Bolt 1.usa.gov Spout field grouping bit.ly hash provided by in the storm-starter project Solr Indexing Bolt field grouping obj Rolling Count Bolt Intermediate Rankings Bolt Total Rankings Bolt stream data store Solr grouping global grouping Persist Rankings Bolt Metrics DB
  • 14. stream grouping •  •  •  •  shuffle: random distribution of tuples to all instances of a bolt field(s): group tuples by one or more fields in common global: reduce down to one all: replicate stream to all instances of a bolt source: https://github.com/nathanmarz/storm/wiki/Concepts
  • 15. useful storm concepts bolts can receive input from many spouts tuples in a stream can be grouped together streams can be split and joined bolts can inject new tuples into the stream components can be distributed across a cluster at a configurable parallelism level •  optionally, storm keeps track of each tuple emitted by a spout (ack or fail) •  •  •  •  • 
  • 16. tools •  •  •  •  •  Spring framework – dependency injection, configuration, unit testing, mature, etc. Groovy – keeps your code tidy and elegant Mockito – ignore stuff your test doesn’t care about Netty – fast & powerful NIO networking library Coda Hale metrics – get visibility into how your bolts and spouts are performing (at a very low-level)
  • 17. spout easy! just produce a stream of tuples ... and ... avoid blocking when waiting for more data, ease off throttle if topology is not processing fast enough, deal with failed tuples, choose if it should use message Ids for each tuple emitted, data model / schema, etc ...
  • 18. Hide complexity of implementing Storm contract SpringSpout Streaming DataAction (POJO) Streaming DataProvider (POJO) Spring Dependency Injection SpringBolt developer focuses on business logic Spring container (1 per topology per JVM) JDBC WebService
  • 19. streaming data provider class  OneUsaGovStreamingDataProvider  implements  StreamingDataProvider,  MessageHandler  {     Spring Dependency Injection        MessageStream  messageStream            ...            void  open(Map  stormConf)  {  messageStream.receive(this)  }     non-blocking call to get the        boolean  next(NamedValues  nv)  {   next message from 1.usa.gov                String  msg  =  queue.poll()                  if  (msg)  {                          OneUsaGovRequest  req  =  objectMapper.readValue(msg,  OneUsaGovRequest)                          if  (req  !=  null  &&  req.globalBitlyHash  !=  null)  {                                  nv.set(OneUsaGovTopology.GLOBAL_BITLY_HASH,  req.globalBitlyHash)                                  nv.set(OneUsaGovTopology.JSON_PAYLOAD,  req)                                  return  true   use Jackson JSON parser                        }   to create an object from the                }     raw incoming data                return  false          }            void  handleMessage(String  msg)  {  queue.offer(msg)  }  
  • 20. jackson json to java @JsonIgnoreProperties(ignoreUnknown  =  true)   class  OneUsaGovRequest  implements  Serializable  {            @JsonProperty("a")          String  userAgent;            @JsonProperty("c")   Spring converts json to java object for you:        String  countryCode;      <bean  id="restTemplate"            @JsonProperty("nk")          class="org.springframework.web.client.RestTemplate">          int  knownUser;        <property  name="messageConverters">              <list>          @JsonProperty("g")                <bean  id="messageConverter”          String  globalBitlyHash;          class="...json.MappingJackson2HttpMessageConverter">                        </bean>          @JsonProperty("h")                  </list>          String  encodingUserBitlyHash;              </property>            </bean>          @JsonProperty("l")          String  encodingUserLogin;            ...   }  
  • 21. spout data provider spring-managed bean <bean  id="oneUsaGovStreamingDataProvider"                class="com.bigdatajumpstart.storm.OneUsaGovStreamingDataProvider">          <property  name="messageStream">                  <bean  class="com.bigdatajumpstart.netty.HttpClient">                          <constructor-­‐arg  index="0"  value="${streamUrl}"/>                  </bean>          </property>   </bean>   Note: when building the StormTopology to submit to Storm, you do:  builder.setSpout("1.usa.gov-­‐spout",                                          new  SpringSpout("oneUsaGovStreamingDataProvider",  spoutFields),  1)  
  • 22. spout data provider unit test class  OneUsaGovStreamingDataProviderTest  extends  StreamingDataProviderTestBase  {            @Test          void  testDataProvider()  {                    String  jsonStr  =  '''{                          "a":  "user-­‐agent",  "c":  "US",                          "nk":  0,  "tz":  "America/Los_Angeles",                          "gr":  "OR",  "g":  "2BktiW",                          "h":  "12Me4B2",  "l":  "usairforce",                          "al":  "en-­‐us",  "hh":  "1.usa.gov",                          "r":  "http://example.com/foo",                          ...                  }'''                    OneUsaGovStreamingDataProvider  dataProvider  =  new  OneUsaGovStreamingDataProvider()                  dataProvider.setMessageStream(mock(MessageStream))                  dataProvider.open(stormConf)  //  Config  setup  in  base  class                  dataProvider.handleMessage(jsonStr)                    NamedValues  record  =  new  NamedValues(OneUsaGovTopology.spoutFields)                  assertTrue  dataProvider.next(record)                  ...          }   }   mock json to simulate data from 1.usa.gov feed use Mockito to satisfy dependencies not needed for this test asserts to verify data provider works correctly
  • 23. rolling count bolt •  •  •  •  counts frequency of links in a sliding time window emits topN in current window every M seconds uses tick tuple trick provided by Storm to emit every M seconds (configurable) provided with the storm-starter project http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
  • 24. enrich link metadata bolt •  •  •  •  •  calls out to embed.ly API caches results locally in the bolt instance relies on field grouping (incoming tuples) outputs data to be indexed in Solr benefits from parallelism to enrich more links concurrently (watch those rate limits)
  • 25. embed.ly service class  EmbedlyService  {            @Autowired          RestTemplate  restTemplate   integrate coda hale metrics          String  apiKey            private  Timer  apiTimer  =  MetricsSupport.timer(EmbedlyService,  "apiCall")            Embedly  getLinkMetadata(String  link)  {                  String  urlEncoded  =  URLEncoder.encode(link,"UTF-­‐8")                  URI  uri  =  new  URI("https://api.embed.ly/1/oembed?key=${apiKey}&url=${urlEncoded}")                    Embedly  embedly  =  null                  MetricsSupport.withTimer(apiTimer,  {                          embedly  =  restTemplate.getForObject(uri,  Embedly)                  })                  return  embedly   simple closure to time our        }   requests to the Web service
  • 26. metrics •  •  •  •  capture runtime behavior of the components in your topology Coda Hale metrics - http://metrics.codahale.com/ output metrics every N minutes report metrics to JMX, ganglia, graphite, etc
  • 27. -­‐-­‐  Meters  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   EnrichLinkBoltLogic.solrQueries                            count  =  97                    mean  rate  =  0.81  events/second            1-­‐minute  rate  =  0.89  events/second            5-­‐minute  rate  =  1.62  events/second          15-­‐minute  rate  =  1.86  events/second     SolrBoltLogic.linksIndexed                            count  =  60                    mean  rate  =  0.50  events/second            1-­‐minute  rate  =  0.41  events/second            5-­‐minute  rate  =  0.16  events/second          15-­‐minute  rate  =  0.06  events/second     -­‐-­‐  Timers  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   EmbedlyService.apiCall                            count  =  60                    mean  rate  =  0.50  calls/second            1-­‐minute  rate  =  0.40  calls/second            5-­‐minute  rate  =  0.16  calls/second          15-­‐minute  rate  =  0.06  calls/second                                min  =  138.70  milliseconds                                max  =  7642.92  milliseconds                              mean  =  1148.29  milliseconds                          stddev  =  1281.40  milliseconds                          median  =  652.83  milliseconds                              75%  <=  1620.96  milliseconds                              ...  
  • 28. storm cluster concepts •  •  •  •  •  •  nimbus: master node (~job tracker in Hadoop) zookeeper: cluster management / coordination supervisor: one per node in the cluster to manage worker processes worker: one or more per supervisor (JVM process) executor: thread in worker task: work performed by a spout or bolt
  • 29. Topology   JAR   Nimbus   Node  1   Zookeeper   Supervisor  (1  per  node)   Each component (spout or bolt) is distributed across a cluster of workers based on a configurable parallelism Worker  1  (port  6701)   executor   (thread)   JVM  process   ...  N  workers   ...  M  nodes  
  • 30.  @Override    StormTopology  build(StreamingApp  app)  throws  Exception  {   parallelism hint to                   the framework          ...     (can be rebalanced)          TopologyBuilder  builder  =  new  TopologyBuilder()              builder.setSpout("1.usa.gov-­‐spout",                      new  SpringSpout("oneUsaGovStreamingDataProvider",  spoutFields),  1)              builder.setBolt("enrich-­‐link-­‐bolt",                      new  SpringBolt("enrichLinkAction",  enrichedLinkFields),  3)                                .fieldsGrouping("1.usa.gov-­‐spout",  globalBitlyHashGrouping)              ...  
  • 31. solr integration points •  •  •  real-time get near real-time indexing (NRT) percolate (match incoming docs to pre-existing queries)
  • 32. real-time get use Solr for fast lookups by document ID class  SolrClient  {            @Autowired          SolrServer  solrServer            SolrDocument  get(String  docId,  String...  fields)  {                  SolrQuery  q  =  new  SolrQuery()                  q.setRequestHandler("/get")                  q.set("id",  docId)                  q.setFields(fields)                  QueryRequest  req  =  new  QueryRequest(q)                  req.setResponseParser(new  BinaryResponseParser())                  QueryResponse  rsp  =  req.process(solrServer)                  return  (SolrDocument)rsp.getResponse().get("doc")          }   }   send the request to the “get” request handler
  • 33. near real-time indexing •  If possible, use CloudSolrServer to route documents directly to the correct shard leaders (SOLR-4816) •  Use <openSearcher>false</openSearcher> for auto “hard” commits •  Use auto soft commits as needed •  Use parallelism of Storm bolt to distribute indexing work to N nodes http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
  • 34. percolate •  match incoming documents to pre-configured queries (inverted search) –  example: Is this tweet related to campaign Y for brand X? •  use storm’s distributed computation support to evaluate M pre-configured queries per doc
  • 35. two possible approaches •  Lucene-only solution using MemoryIndex –  See presentation by Charlie Hull and Alan Woodward •  EmbeddedSolrServer –  Full solrconfig.xml / schema.xml –  RAMDirectory –  Relies on Storm to scale up documents / second –  Easy solution for up to a few thousand queries
  • 36. PercolatorBolt  1   Embedded   SolrServer   Twi"er   Spout   random   stream   grouping   ...   Pre-­‐configured   queries  stored  in     a  database   Could be 100’s of these PercolatorBolt  N   Embedded   SolrServer   ZeroMQ   pub/sub  to  push   query  changes   to  percolator  
  • 37. tick tuples •  send a special kind of tuple to a bolt every N seconds  if  (TupleHelpers.isTickTuple(input))  {            //  do  special  work    }   used in percolator to delete accumulated documents every minute or so ...
  • 38. references •  Storm Wiki •  https://github.com/nathanmarz/storm/wiki/Documentation •  Overview: Krishna Gade •  http://www.slideshare.net/KrishnaGade2/storm-at-twitter •  Trending Topics: Michael Knoll •  http://www.michael-noll.com/blog/2013/01/18/implementing-real-timetrending-topics-in-storm/ •  Understanding Parallelism: Michael Knoll •  http://www.michael-noll.com/blog/2012/10/16/understanding-theparallelism-of-a-storm-topology/
  • 39. Q&A get the code: https://github.com/thelabdude/lsrdublin Manning  coupon  codes  for  conference  related  books:     h"p://deals.manningpublica8ons.com/Revolu8onsEU2013.html