SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Real	
  Time	
  Big	
  Data	
  With	
  Storm,	
  
Cassandra,	
  and	
  In-­‐Memory	
  Compu=ng	
  
DeWayne	
  Filppi	
  
@dfilppi	
  
 Big	
  Data	
  Predic=ons	
  	
  
“Over	
  the	
  next	
  few	
  years	
  we'll	
  see	
  the	
  adop=on	
  of	
  scalable	
  
frameworks	
  and	
  pla1orms	
  for	
  handling	
  
streaming,	
  or	
  near	
  real-­‐=me,	
  analysis	
  and	
  processing.	
  In	
  the	
  
same	
  way	
  that	
  Hadoop	
  has	
  been	
  borne	
  out	
  of	
  large-­‐scale	
  web	
  
applica=ons,	
  these	
  plaMorms	
  will	
  be	
  driven	
  by	
  the	
  needs	
  of	
  large-­‐
scale	
  loca=on-­‐aware	
  mobile,	
  social	
  and	
  sensor	
  use.”	
  
Edd	
  Dumbill,	
  O’REILLY	
  
2
® Copyright 2013 Gigaspaces Ltd. All Rights Reserved
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  3	
  
The	
  Two	
  Vs	
  of	
  Big	
  Data	
  	
  
Velocity	
   Volume	
  
We’re	
  Living	
  in	
  a	
  Real	
  Time	
  World…	
  
Homeland Security
Real Time Search
Social	
  
eCommerce
User	
  Tracking	
  &	
  
Engagement	
  
Financial Services
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  4	
  
The	
  Flavors	
  of	
  Big	
  Data	
  Analy=cs	
  	
  
Coun:ng	
   Correla:ng	
   Research	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  5	
  
Analy=cs	
  @	
  Twi`er	
  –	
  Coun=ng	
  	
  
§  How	
  many	
  signups,	
  
tweets,	
  retweets	
  for	
  a	
  
topic?	
  
§  What’s	
  the	
  average	
  
latency?	
  
§  Demographics	
  
§  Countries	
  and	
  ci=es	
  
§  Gender	
  	
  
§  Age	
  groups	
  	
  
§  Device	
  types	
  	
  
§  …	
  	
  	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  6	
  
Analy=cs	
  @	
  Twi`er	
  –	
  Correla=ng	
  	
  
§  What	
  devices	
  fail	
  at	
  the	
  
same	
  =me?	
  
§  What	
  features	
  get	
  user	
  
hooked?	
  
§  What	
  places	
  on	
  the	
  
globe	
  are	
  “happening”?	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  7	
  
Analy=cs	
  @	
  Twi`er	
  –	
  Research	
  	
  
§  Sen=ment	
  analysis	
  
§  “Obama	
  is	
  popular”	
  
§  Trends	
  
§  “People	
  like	
  to	
  tweet	
  
aeer	
  watching	
  
American	
  Idol”	
  
§  Spam	
  pa`erns	
  	
  
§  How	
  can	
  you	
  tell	
  when	
  
a	
  user	
  spams?	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  8	
  
It’s	
  All	
  about	
  Timing	
  	
  
“Real	
  :me”	
  	
  
(<	
  few	
  Seconds)	
  	
  
Reasonably	
  Quick	
  
(seconds	
  -­‐	
  minutes)	
  	
  
Batch	
  	
  
(hours/days)	
  	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  9	
  
It’s	
  All	
  about	
  Timing	
  	
  
•  Event	
  driven	
  /	
  stream	
  processing	
  	
  	
  
•  High	
  resolu=on	
  –	
  every	
  tweet	
  gets	
  counted	
  	
  
•  Ad-­‐hoc	
  querying	
  	
  
•  Medium	
  resolu=on	
  (aggrega=ons)	
  	
  
•  Long	
  running	
  batch	
  jobs	
  (ETL,	
  map/reduce)	
  	
  
•  Low	
  resolu=on	
  (trends	
  &	
  pa`erns)	
  	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  10	
  
This	
  is	
  what	
  
we’re	
  here	
  
to	
  discuss	
  J	
  
VELOCITY	
  +	
  VAST	
  VOLUME	
  =	
  
	
  IN	
  MEMORY	
  +	
  BIG	
  DATA	
11	
  
§  RAM	
  is	
  the	
  new	
  disk	
  
§  Data	
  par==oned	
  across	
  a	
  cluster	
  
§  Large	
  “virtual”	
  memory	
  space	
  
§  Transac=onal	
  
§  Highly	
  available	
  
§  Code	
  collocated	
  with	
  data.	
  	
  
	
  	
  
In	
  Memory	
  Data	
  Grid	
  Review	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  12	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  13	
  
Data	
  Grid	
  +	
  Cassandra:	
  A	
  Complete	
  Solu=on	
  
•  Data	
  flows	
  through	
  the	
  in-­‐memory	
  cluster	
  async	
  to	
  Cassandra	
  
•  Side	
  effects	
  calculated	
  
•  Filtering	
  an	
  op=on	
  
•  Enrichment	
  an	
  op=on	
  
•  Results	
  instantly	
  available	
  
•  Internal	
  and	
  external	
  event	
  listeners	
  no=fied	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  14	
  
Simplified	
  Event	
  Flow	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  15	
  
Grid	
  –	
  Cassandra	
  Interface	
  
§  Hector	
  and	
  CQL	
  based	
  interface	
  
§  In	
  memory	
  data	
  must	
  be	
  mapped	
  to	
  column	
  families.	
  
§  Configurable	
  class	
  to	
  column	
  family	
  mapping	
  
§  Must	
  serialize	
  individual	
  fields	
  
§  Fixed	
  fields	
  can	
  use	
  defined	
  types	
  
§  Variable	
  fields	
  (	
  for	
  schemaless	
  in-­‐memory	
  mode)	
  need	
  serializers	
  
§  Object	
  model	
  fla`ening	
  
§  By	
  default,	
  nested	
  fields	
  are	
  fla`ened.	
  
§  Can	
  be	
  overridden	
  by	
  custom	
  serializer.	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  16	
  
Virtues	
  and	
  Limita=ons	
  
§  Could	
  be	
  faster:	
  	
  high	
  availability	
  has	
  a	
  cost	
  
§  Complex	
  flows	
  not	
  easy	
  to	
  assemble	
  or	
  understand	
  with	
  simple	
  
event	
  handlers	
  
§  Complete	
  stack,	
  not	
  just	
  two	
  tools	
  of	
  many	
  
§  Fast.	
  
§  Microsecond	
  latencies	
  for	
  in	
  memory	
  opera=ons	
  
§  Fast	
  enough	
  for	
  almost	
  anybody	
  
§  Highly	
  available/self	
  healing	
  
§  Elas=c	
  
§  Popular	
  open	
  source,	
  real	
  =me,	
  in-­‐memory,	
  streaming	
  
computa=on	
  plaMorm.	
  
§  Includes	
  distributed	
  run=me	
  and	
  intui=ve	
  API	
  for	
  defining	
  
distributed	
  processing	
  flows.	
  
§  Scalable	
  and	
  fault	
  tolerant.	
  
§  Developed	
  at	
  BackType,	
  	
  
	
  	
  	
  	
  	
  and	
  open	
  sourced	
  by	
  Twi`er	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  17	
  
Storm	
  Background	
  
§  Streams	
  
§  Unbounded	
  sequence	
  of	
  tuples	
  
§  Spouts	
  
§  Source	
  of	
  streams	
  (Queues)	
  
§  Bolts	
  
§  Func=ons,	
  Filters,	
  Joins,	
  Aggrega=ons	
  
§  Topologies	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  18	
  
Storm	
  Abstrac=ons	
  
Spout	
  
Bolt	
  
Topologies	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  19	
  
Streaming	
  word	
  count	
  with	
  Storm	
  
§  Storm	
  has	
  a	
  simple	
  builder	
  interface	
  to	
  crea=ng	
  stream	
  processing	
  
topologies	
  
§  Storm	
  delegates	
  persistence	
  to	
  external	
  providers	
  
§  Cassandra,	
  because	
  of	
  its	
  write	
  performance,	
  is	
  commonly	
  used	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  20	
  
Storm	
  :	
  Op=mis=c	
  Processing	
  
§  Storm	
  (quite	
  ra=onally)	
  assumes	
  success	
  is	
  normal	
  
§  Storm	
  uses	
  batching	
  and	
  pipelining	
  for	
  performance	
  
§  Therefore	
  the	
  spout	
  must	
  be	
  able	
  to	
  replay	
  tuples	
  on	
  demand	
  
in	
  case	
  of	
  error.	
  
§  Any	
  kind	
  of	
  quasi-­‐queue	
  like	
  data	
  source	
  can	
  be	
  fashioned	
  
into	
  a	
  spout.	
  
§  No	
  persistence	
  is	
  ever	
  required,	
  and	
  speed	
  a`ained	
  by	
  
minimizing	
  network	
  hops	
  during	
  topology	
  processing.	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  21	
  
Fast.	
  	
  Want	
  to	
  go	
  faster?	
  
§  Eliminate	
  non-­‐memory	
  components	
  
§  Subs=tute	
  disk	
  based	
  queue	
  for	
  reliable	
  in-­‐memory	
  queue	
  
§  Subs=tute	
  disk	
  based	
  state	
  persistence	
  to	
  in-­‐memory	
  
persistence	
  
§  Asynchronously	
  update	
  disk	
  based	
  state	
  (C*)	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  22	
  
Sample	
  Architecture	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  23	
  
References	
  
§  Try	
  the	
  Cloudify	
  recipe	
  
§  Download	
  Cloudify	
  :	
  h`p://www.cloudifysource.org/	
  
§  Download	
  the	
  Recipe	
  (apps/xapstream,	
  services/xapstream):	
  
–  h`ps://github.com/CloudifySource/cloudify-­‐recipes	
  
§  XAP	
  –	
  Cassandra	
  Interface	
  Details;	
  
§  h`p://wiki.gigaspaces.com/wiki/display/XAP95/Cassandra+Space+Persistency	
  
§  Check	
  out	
  the	
  source	
  for	
  the	
  XAP	
  Spout	
  and	
  a	
  sample	
  state	
  
implementa=on	
  backed	
  by	
  XAP,	
  and	
  a	
  Storm	
  friendly	
  streaming	
  
implemen=on	
  on	
  github:	
  
§  h`ps://github.com/Gigaspaces/storm-­‐integra=on	
  
§  For	
  more	
  background	
  on	
  the	
  effort,	
  check	
  out	
  my	
  recent	
  blog	
  posts	
  at	
  
h`p://blog.gigaspaces.com/	
  
§  h`p://blog.gigaspaces.com/gigaspaces-­‐and-­‐storm-­‐part-­‐1-­‐storm-­‐clouds/	
  
§  h`p://blog.gigaspaces.com/gigaspaces-­‐and-­‐storm-­‐part-­‐2-­‐xap-­‐integra=on/	
  
§  Part	
  3	
  coming	
  soon.	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  24	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  25	
  
Twi`er	
  Storm	
  With	
  Cassandra	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  26	
  
Storm	
  Overview	
  
§  Streams	
  
§  Unbounded	
  sequence	
  of	
  tuples	
  
§  Spouts	
  
§  Source	
  of	
  streams	
  (Queues)	
  
§  Bolts	
  
§  Func=ons,	
  Filters,	
  Joins,	
  Aggrega=ons	
  
§  Topologies	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  27	
  
Storm	
  Concepts	
  
Spouts	
  
Bolt	
  
Topologies	
  
Challenge	
  –	
  Word	
  Count	
  
Word:Count
Tweets	
  
Count	
  
?®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  28	
  
• HoWest	
  topics	
  
• URL	
  men:ons	
  
• etc.	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  29	
  
Streaming	
  word	
  count	
  with	
  Storm	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  30	
  
Supercharging	
  Storm	
  
§  Storm	
  doesn’t	
  supply	
  persistence,	
  but	
  provides	
  for	
  it	
  
§  Storm	
  op=mizes	
  IO	
  to	
  slow	
  persistence	
  (e.g.	
  databases)	
  using	
  
batching.	
  
§  Storm	
  processes	
  streams.	
  	
  The	
  stream	
  provider	
  itself	
  needs	
  to	
  
support	
  persistency,	
  batching,	
  and	
  reliability.	
  
Tweets,	
  
events,whatever….	
  
XAP	
  Real	
  Time	
  Analy=cs	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  31	
  
®	
  Copyright	
  2011	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  
Two	
  Layer	
  Approach	
  
§  Advantage:	
  Minimal	
  
“impedance	
  mismatch”	
  
between	
  layers.	
  
–  Both	
  NoSQL	
  cluster	
  
technologies,	
  with	
  similar	
  
advantages	
  
§  Grid	
  layer	
  serves	
  as	
  an	
  in	
  
memory	
  cache	
  for	
  interac=ve	
  
requests.	
  
§  Grid	
  layer	
  serves	
  as	
  a	
  real	
  =me	
  
computa=on	
  fabric	
  for	
  CEP,	
  and	
  
limited	
  (	
  to	
  allocated	
  memory)	
  
real	
  =me	
  distributed	
  query	
  
capability.	
  
In	
  Memory	
  Compute	
  Cluster
NoSQL	
  Cluster
...
Raw	
  Event	
  Stream
Raw	
  Event	
  Stream
Raw	
  Event	
  Stream
Real	
  Time	
  Events
Raw	
  And	
  Derived	
  Events
Real	
  Time	
  Events
Reporting	
  Engine
SCALE
SCALE
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  33	
  
Simplified	
  Architecture	
  
§  Flowing	
  event	
  streams	
  through	
  memory	
  for	
  side	
  effects	
  
§  Event	
  driven	
  architecture	
  execu=ng	
  in-­‐memory	
  
§  Raw	
  events	
  flushed,	
  aggrega=ons/deriva=ons	
  retained	
  
§  All	
  layers	
  horizontally	
  scalable	
  
§  All	
  layers	
  highly	
  available	
  
§  Real-­‐=me	
  analy=cs	
  &	
  cached	
  batch	
  analy=cs	
  on	
  same	
  scalable	
  
layer	
  
§  Data	
  grid	
  provides	
  a	
  transac=onal/consistent	
  façade	
  on	
  
NoSQL	
  store	
  (in	
  this	
  case	
  elimina=ng	
  SQL	
  database	
  en=rely)	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  34	
  
Key	
  Concepts	
  
Keep	
  Things	
  In	
  Memory	
  
Facebook	
  keeps	
  80%	
  of	
  its	
  
data	
  in	
  Memory	
  	
  
(Stanford	
  research)	
  
RAM	
  is	
  100-­‐1000x	
  faster	
  
than	
  Disk	
  (Random	
  seek)	
  
•  Disk:	
  5	
  -­‐10ms	
  	
  	
  
•  RAM:	
  ~0.001msec	
  	
  
Take	
  Aways	
  
§  A	
  data	
  grid	
  can	
  serve	
  different	
  needs	
  for	
  big	
  data	
  analy=cs:	
  
§  Supercharge	
  a	
  dedicated	
  stream	
  processing	
  cluster	
  like	
  Storm.	
  
–  Provide	
  fast,	
  reliable,	
  transac=onal	
  tuple	
  streams	
  and	
  state	
  
§  Provide	
  a	
  general	
  purpose	
  analy=cs	
  plaMorm	
  
–  Roll	
  your	
  own	
  
§  Simplify	
  overall	
  architecture	
  while	
  enhancing	
  scalability	
  
–  Ultra	
  high	
  performance/low	
  latency	
  
–  Dynamically	
  scalable	
  processing	
  and	
  in-­‐memory	
  storage	
  
–  Eliminate	
  messaging	
  =er	
  
–  Eliminate	
  or	
  minimize	
  need	
  for	
  RDBMS	
  
§  Real:me	
  Analy:cs	
  with	
  Storm	
  and	
  Hadoop	
  
§  hWp://www.slideshare.net/Hadoop_Summit/real:me-­‐
analy:cs-­‐with-­‐storm	
  
§  Learn	
  and	
  fork	
  the	
  code	
  on	
  github:	
  	
  	
  
hWps://github.com/Gigaspaces/storm-­‐integra:on	
  
§  Twi`er	
  Storm:	
  	
  
hWp://storm-­‐project.net	
  
§  XAP	
  +	
  Storm	
  Detailed	
  Blog	
  Post	
  
	
  	
  	
  	
  	
  
hWp://blog.gigaspaces.com/gigaspaces-­‐and-­‐storm-­‐part-­‐2-­‐xap-­‐
integra:on/	
  
	
   ®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  37	
  
References	
  	
  
®	
  Copyright	
  2013	
  Gigaspaces	
  Ltd.	
  All	
  Rights	
  Reserved	
  38	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confSujee Maniyam
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopDataWorks Summit
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoopmarkgrover
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...Chris Huang
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidYahoo Developer Network
 
Druid at Hadoop Ecosystem
Druid at Hadoop EcosystemDruid at Hadoop Ecosystem
Druid at Hadoop EcosystemSlim Bouguerra
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataRostislav Pashuto
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Hadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdHadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdKevin Weil
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopAnu Ravindranath
 

Was ist angesagt? (20)

Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Architecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with HadoopArchitecting a Fraud Detection Application with Hadoop
Architecting a Fraud Detection Application with Hadoop
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
 
Druid at Hadoop Ecosystem
Druid at Hadoop EcosystemDruid at Hadoop Ecosystem
Druid at Hadoop Ecosystem
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
Hadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant birdHadoop summit 2010 frameworks panel elephant bird
Hadoop summit 2010 frameworks panel elephant bird
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
 

Ähnlich wie Real Time Big Data Analytics with Storm, Cassandra and In-Memory Computing

Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timeAerospike, Inc.
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detectionhadooparchbook
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTableSqrrl
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Vinay Kumar Chella
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
In-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConfIn-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConfNazarii Cherkas
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldArmonDadgar
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveAerospike, Inc.
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology OverviewDan Lynn
 
Dask and Machine Learning Models in Production - PyColorado 2019
Dask and Machine Learning Models in Production - PyColorado 2019Dask and Machine Learning Models in Production - PyColorado 2019
Dask and Machine Learning Models in Production - PyColorado 2019William Cox
 
Java Flight Recorder Behind the Scenes
Java Flight Recorder Behind the ScenesJava Flight Recorder Behind the Scenes
Java Flight Recorder Behind the ScenesStaffan Larsen
 
Buzz Words Dunning Real-Time Learning
Buzz Words Dunning Real-Time LearningBuzz Words Dunning Real-Time Learning
Buzz Words Dunning Real-Time LearningMapR Technologies
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialhadooparchbook
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 

Ähnlich wie Real Time Big Data Analytics with Storm, Cassandra and In-Memory Computing (20)

Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-time
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTable
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Amazon Aurora: Database Week SF
Amazon Aurora: Database Week SFAmazon Aurora: Database Week SF
Amazon Aurora: Database Week SF
 
In-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConfIn-Memory Stream Processing with Hazelcast Jet @JEEConf
In-Memory Stream Processing with Hazelcast Jet @JEEConf
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology Overview
 
Dask and Machine Learning Models in Production - PyColorado 2019
Dask and Machine Learning Models in Production - PyColorado 2019Dask and Machine Learning Models in Production - PyColorado 2019
Dask and Machine Learning Models in Production - PyColorado 2019
 
Java Flight Recorder Behind the Scenes
Java Flight Recorder Behind the ScenesJava Flight Recorder Behind the Scenes
Java Flight Recorder Behind the Scenes
 
Buzz Words Dunning Real-Time Learning
Buzz Words Dunning Real-Time LearningBuzz Words Dunning Real-Time Learning
Buzz Words Dunning Real-Time Learning
 
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorialHadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 

Mehr von DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mehr von DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Kürzlich hochgeladen

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Real Time Big Data Analytics with Storm, Cassandra and In-Memory Computing

  • 1. Real  Time  Big  Data  With  Storm,   Cassandra,  and  In-­‐Memory  Compu=ng   DeWayne  Filppi   @dfilppi  
  • 2.  Big  Data  Predic=ons     “Over  the  next  few  years  we'll  see  the  adop=on  of  scalable   frameworks  and  pla1orms  for  handling   streaming,  or  near  real-­‐=me,  analysis  and  processing.  In  the   same  way  that  Hadoop  has  been  borne  out  of  large-­‐scale  web   applica=ons,  these  plaMorms  will  be  driven  by  the  needs  of  large-­‐ scale  loca=on-­‐aware  mobile,  social  and  sensor  use.”   Edd  Dumbill,  O’REILLY   2 ® Copyright 2013 Gigaspaces Ltd. All Rights Reserved
  • 3. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  3   The  Two  Vs  of  Big  Data     Velocity   Volume  
  • 4. We’re  Living  in  a  Real  Time  World…   Homeland Security Real Time Search Social   eCommerce User  Tracking  &   Engagement   Financial Services ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  4  
  • 5. The  Flavors  of  Big  Data  Analy=cs     Coun:ng   Correla:ng   Research   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  5  
  • 6. Analy=cs  @  Twi`er  –  Coun=ng     §  How  many  signups,   tweets,  retweets  for  a   topic?   §  What’s  the  average   latency?   §  Demographics   §  Countries  and  ci=es   §  Gender     §  Age  groups     §  Device  types     §  …       ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  6  
  • 7. Analy=cs  @  Twi`er  –  Correla=ng     §  What  devices  fail  at  the   same  =me?   §  What  features  get  user   hooked?   §  What  places  on  the   globe  are  “happening”?   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  7  
  • 8. Analy=cs  @  Twi`er  –  Research     §  Sen=ment  analysis   §  “Obama  is  popular”   §  Trends   §  “People  like  to  tweet   aeer  watching   American  Idol”   §  Spam  pa`erns     §  How  can  you  tell  when   a  user  spams?   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  8  
  • 9. It’s  All  about  Timing     “Real  :me”     (<  few  Seconds)     Reasonably  Quick   (seconds  -­‐  minutes)     Batch     (hours/days)     ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  9  
  • 10. It’s  All  about  Timing     •  Event  driven  /  stream  processing       •  High  resolu=on  –  every  tweet  gets  counted     •  Ad-­‐hoc  querying     •  Medium  resolu=on  (aggrega=ons)     •  Long  running  batch  jobs  (ETL,  map/reduce)     •  Low  resolu=on  (trends  &  pa`erns)     ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  10   This  is  what   we’re  here   to  discuss  J  
  • 11. VELOCITY  +  VAST  VOLUME  =    IN  MEMORY  +  BIG  DATA 11  
  • 12. §  RAM  is  the  new  disk   §  Data  par==oned  across  a  cluster   §  Large  “virtual”  memory  space   §  Transac=onal   §  Highly  available   §  Code  collocated  with  data.         In  Memory  Data  Grid  Review   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  12  
  • 13. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  13   Data  Grid  +  Cassandra:  A  Complete  Solu=on   •  Data  flows  through  the  in-­‐memory  cluster  async  to  Cassandra   •  Side  effects  calculated   •  Filtering  an  op=on   •  Enrichment  an  op=on   •  Results  instantly  available   •  Internal  and  external  event  listeners  no=fied  
  • 14. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  14   Simplified  Event  Flow  
  • 15. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  15   Grid  –  Cassandra  Interface   §  Hector  and  CQL  based  interface   §  In  memory  data  must  be  mapped  to  column  families.   §  Configurable  class  to  column  family  mapping   §  Must  serialize  individual  fields   §  Fixed  fields  can  use  defined  types   §  Variable  fields  (  for  schemaless  in-­‐memory  mode)  need  serializers   §  Object  model  fla`ening   §  By  default,  nested  fields  are  fla`ened.   §  Can  be  overridden  by  custom  serializer.  
  • 16. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  16   Virtues  and  Limita=ons   §  Could  be  faster:    high  availability  has  a  cost   §  Complex  flows  not  easy  to  assemble  or  understand  with  simple   event  handlers   §  Complete  stack,  not  just  two  tools  of  many   §  Fast.   §  Microsecond  latencies  for  in  memory  opera=ons   §  Fast  enough  for  almost  anybody   §  Highly  available/self  healing   §  Elas=c  
  • 17. §  Popular  open  source,  real  =me,  in-­‐memory,  streaming   computa=on  plaMorm.   §  Includes  distributed  run=me  and  intui=ve  API  for  defining   distributed  processing  flows.   §  Scalable  and  fault  tolerant.   §  Developed  at  BackType,              and  open  sourced  by  Twi`er   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  17   Storm  Background  
  • 18. §  Streams   §  Unbounded  sequence  of  tuples   §  Spouts   §  Source  of  streams  (Queues)   §  Bolts   §  Func=ons,  Filters,  Joins,  Aggrega=ons   §  Topologies   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  18   Storm  Abstrac=ons   Spout   Bolt   Topologies  
  • 19. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  19   Streaming  word  count  with  Storm   §  Storm  has  a  simple  builder  interface  to  crea=ng  stream  processing   topologies   §  Storm  delegates  persistence  to  external  providers   §  Cassandra,  because  of  its  write  performance,  is  commonly  used  
  • 20. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  20   Storm  :  Op=mis=c  Processing   §  Storm  (quite  ra=onally)  assumes  success  is  normal   §  Storm  uses  batching  and  pipelining  for  performance   §  Therefore  the  spout  must  be  able  to  replay  tuples  on  demand   in  case  of  error.   §  Any  kind  of  quasi-­‐queue  like  data  source  can  be  fashioned   into  a  spout.   §  No  persistence  is  ever  required,  and  speed  a`ained  by   minimizing  network  hops  during  topology  processing.  
  • 21. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  21   Fast.    Want  to  go  faster?   §  Eliminate  non-­‐memory  components   §  Subs=tute  disk  based  queue  for  reliable  in-­‐memory  queue   §  Subs=tute  disk  based  state  persistence  to  in-­‐memory   persistence   §  Asynchronously  update  disk  based  state  (C*)  
  • 22. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  22   Sample  Architecture  
  • 23. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  23   References   §  Try  the  Cloudify  recipe   §  Download  Cloudify  :  h`p://www.cloudifysource.org/   §  Download  the  Recipe  (apps/xapstream,  services/xapstream):   –  h`ps://github.com/CloudifySource/cloudify-­‐recipes   §  XAP  –  Cassandra  Interface  Details;   §  h`p://wiki.gigaspaces.com/wiki/display/XAP95/Cassandra+Space+Persistency   §  Check  out  the  source  for  the  XAP  Spout  and  a  sample  state   implementa=on  backed  by  XAP,  and  a  Storm  friendly  streaming   implemen=on  on  github:   §  h`ps://github.com/Gigaspaces/storm-­‐integra=on   §  For  more  background  on  the  effort,  check  out  my  recent  blog  posts  at   h`p://blog.gigaspaces.com/   §  h`p://blog.gigaspaces.com/gigaspaces-­‐and-­‐storm-­‐part-­‐1-­‐storm-­‐clouds/   §  h`p://blog.gigaspaces.com/gigaspaces-­‐and-­‐storm-­‐part-­‐2-­‐xap-­‐integra=on/   §  Part  3  coming  soon.  
  • 24. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  24  
  • 25. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  25   Twi`er  Storm  With  Cassandra  
  • 26. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  26   Storm  Overview  
  • 27. §  Streams   §  Unbounded  sequence  of  tuples   §  Spouts   §  Source  of  streams  (Queues)   §  Bolts   §  Func=ons,  Filters,  Joins,  Aggrega=ons   §  Topologies   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  27   Storm  Concepts   Spouts   Bolt   Topologies  
  • 28. Challenge  –  Word  Count   Word:Count Tweets   Count   ?®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  28   • HoWest  topics   • URL  men:ons   • etc.  
  • 29. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  29   Streaming  word  count  with  Storm  
  • 30. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  30   Supercharging  Storm   §  Storm  doesn’t  supply  persistence,  but  provides  for  it   §  Storm  op=mizes  IO  to  slow  persistence  (e.g.  databases)  using   batching.   §  Storm  processes  streams.    The  stream  provider  itself  needs  to   support  persistency,  batching,  and  reliability.   Tweets,   events,whatever….  
  • 31. XAP  Real  Time  Analy=cs   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  31  
  • 32. ®  Copyright  2011  Gigaspaces  Ltd.  All  Rights  Reserved   Two  Layer  Approach   §  Advantage:  Minimal   “impedance  mismatch”   between  layers.   –  Both  NoSQL  cluster   technologies,  with  similar   advantages   §  Grid  layer  serves  as  an  in   memory  cache  for  interac=ve   requests.   §  Grid  layer  serves  as  a  real  =me   computa=on  fabric  for  CEP,  and   limited  (  to  allocated  memory)   real  =me  distributed  query   capability.   In  Memory  Compute  Cluster NoSQL  Cluster ... Raw  Event  Stream Raw  Event  Stream Raw  Event  Stream Real  Time  Events Raw  And  Derived  Events Real  Time  Events Reporting  Engine SCALE SCALE
  • 33. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  33   Simplified  Architecture  
  • 34. §  Flowing  event  streams  through  memory  for  side  effects   §  Event  driven  architecture  execu=ng  in-­‐memory   §  Raw  events  flushed,  aggrega=ons/deriva=ons  retained   §  All  layers  horizontally  scalable   §  All  layers  highly  available   §  Real-­‐=me  analy=cs  &  cached  batch  analy=cs  on  same  scalable   layer   §  Data  grid  provides  a  transac=onal/consistent  façade  on   NoSQL  store  (in  this  case  elimina=ng  SQL  database  en=rely)   ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  34   Key  Concepts  
  • 35. Keep  Things  In  Memory   Facebook  keeps  80%  of  its   data  in  Memory     (Stanford  research)   RAM  is  100-­‐1000x  faster   than  Disk  (Random  seek)   •  Disk:  5  -­‐10ms       •  RAM:  ~0.001msec    
  • 36. Take  Aways   §  A  data  grid  can  serve  different  needs  for  big  data  analy=cs:   §  Supercharge  a  dedicated  stream  processing  cluster  like  Storm.   –  Provide  fast,  reliable,  transac=onal  tuple  streams  and  state   §  Provide  a  general  purpose  analy=cs  plaMorm   –  Roll  your  own   §  Simplify  overall  architecture  while  enhancing  scalability   –  Ultra  high  performance/low  latency   –  Dynamically  scalable  processing  and  in-­‐memory  storage   –  Eliminate  messaging  =er   –  Eliminate  or  minimize  need  for  RDBMS  
  • 37. §  Real:me  Analy:cs  with  Storm  and  Hadoop   §  hWp://www.slideshare.net/Hadoop_Summit/real:me-­‐ analy:cs-­‐with-­‐storm   §  Learn  and  fork  the  code  on  github:       hWps://github.com/Gigaspaces/storm-­‐integra:on   §  Twi`er  Storm:     hWp://storm-­‐project.net   §  XAP  +  Storm  Detailed  Blog  Post             hWp://blog.gigaspaces.com/gigaspaces-­‐and-­‐storm-­‐part-­‐2-­‐xap-­‐ integra:on/     ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  37   References    
  • 38. ®  Copyright  2013  Gigaspaces  Ltd.  All  Rights  Reserved  38