SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
1	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Real-­‐&me	
  Learning	
  
for	
  Fun	
  and	
  Profit	
  
2	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
§  Contact:	
  
–  tdunning@maprtech.com	
  
–  @ted_dunning	
  
§  Slides	
  and	
  such	
  (available	
  late	
  tonight):	
  
–  hEp://slideshare.net/tdunning	
  
§  Hash	
  tags:	
  #mapr	
  #storm	
  #bbuzz	
  
	
  	
  
3	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
The	
  Challenge	
  
§  Hadoop	
  is	
  great	
  of	
  processing	
  vats	
  of	
  data	
  
–  But	
  sucks	
  for	
  real-­‐6me	
  (by	
  design!)	
  
	
  
§  Storm	
  is	
  great	
  for	
  real-­‐6me	
  processing	
  
–  But	
  lacks	
  any	
  way	
  to	
  deal	
  with	
  batch	
  processing	
  
§  It	
  sounds	
  like	
  there	
  isn’t	
  a	
  solu6on	
  
–  Neither	
  fashionable	
  solu6on	
  handles	
  everything	
  
4	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
This	
  is	
  not	
  a	
  problem.	
  
	
  It’s	
  an	
  opportunity!	
  
5	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
t	
  
now	
  
Hadoop	
  is	
  Not	
  Very	
  Real-­‐&me	
  
Unprocessed
Data	
  
Fully	
  
processed	
  
Latest	
  full	
  
period	
  
Hadoop	
  job	
  
takes	
  this	
  
long	
  for	
  this	
  
data	
  
6	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
t	
  
now	
  
Hadoop	
  works	
  
great	
  back	
  here	
  
Storm	
  
works	
  
here	
  
Real-­‐&me	
  and	
  Long-­‐&me	
  together	
  
Blended	
  
view	
  
Blended	
  
view	
  
Blended	
  
View	
  
7	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
One	
  Alterna&ve	
  
Search	
  
Engine	
  
NoSql	
  
de	
  Jour	
  
Consumer	
  
Real-­‐6me	
   Long-­‐6me	
  
?	
  
8	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Problems	
  
§  Simply	
  dumping	
  into	
  noSql	
  engine	
  doesn’t	
  quite	
  work	
  
§  Insert	
  rate	
  is	
  limited	
  
§  No	
  load	
  isola6on	
  
–  Big	
  retrospec6ve	
  jobs	
  kill	
  real-­‐6me	
  
§  Low	
  scan	
  performance	
  
–  Hbase	
  preEy	
  good,	
  but	
  not	
  stellar	
  
§  Difficult	
  to	
  set	
  boundaries	
  
–  where	
  does	
  real-­‐6me	
  end	
  and	
  long-­‐6me	
  begin?	
  
9	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Almost	
  a	
  Solu&on	
  
§  Lambda	
  architecture	
  talks	
  about	
  func6on	
  of	
  long-­‐6me	
  state	
  
–  Real-­‐6me	
  approximate	
  accelerator	
  adjusts	
  previous	
  result	
  to	
  current	
  state	
  
§  Sounds	
  good,	
  but	
  …	
  
–  How	
  does	
  the	
  real-­‐6me	
  accelerator	
  combine	
  with	
  long-­‐6me?	
  
–  What	
  algorithms	
  can	
  do	
  this?	
  
–  How	
  can	
  we	
  avoid	
  gaps	
  and	
  overlaps	
  and	
  other	
  errors?	
  
§  Needs	
  more	
  work	
  
10	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
A	
  Simple	
  Example	
  
§  Let’s	
  start	
  with	
  the	
  simplest	
  case	
  …	
  coun6ng	
  
§  Coun6ng	
  =	
  addi6on	
  
–  Addi6on	
  is	
  associa6ve	
  
–  Addi6on	
  is	
  on-­‐line	
  
–  We	
  can	
  generalize	
  these	
  results	
  to	
  all	
  associa6ve,	
  on-­‐line	
  func6ons	
  
–  But	
  let’s	
  start	
  simple	
  
11	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Data	
  
Sources	
  
Catcher	
  
Cluster	
  
Rough	
  Design	
  –	
  Data	
  Flow	
  
Catcher	
  
Cluster	
  
Query	
  Event	
  
Spout	
  
Logger	
  
Bolt	
  
Counter	
  
Bolt	
  
Raw	
  
Logs	
  
Logger	
  
Bolt	
  
Semi	
  
Agg	
  
Hadoop	
  
Aggregator	
  
Snap	
  
Long	
  
agg	
  
ProtoSpout	
  
Counter	
  
Bolt	
  
Logger	
  
Bolt	
  
Data	
  
Sources	
  
12	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Closer	
  Look	
  –	
  Catcher	
  Protocol	
  
Data	
  
Sources	
  
Catcher	
  
Cluster	
  
Catcher	
  
Cluster	
  
Data	
  
Sources	
  
The	
  data	
  sources	
  and	
  catchers	
  
communicate	
  with	
  a	
  very	
  simple	
  
protocol.	
  
	
  
Hello()	
  =>	
  list	
  of	
  catchers	
  
Log(topic,message)	
  =>	
  	
  
	
  	
  	
  	
  (OK|FAIL,	
  redirect-­‐to-­‐catcher)	
  
13	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Closer	
  Look	
  –	
  Catcher	
  Queues	
  
Catcher	
  
Cluster	
  
Catcher	
  
Cluster	
  
The	
  catchers	
  forward	
  log	
  requests	
  
to	
  the	
  correct	
  catcher	
  and	
  return	
  
that	
  host	
  in	
  the	
  reply	
  to	
  allow	
  the	
  
client	
  to	
  avoid	
  the	
  extra	
  hop.	
  
	
  
Each	
  topic	
  file	
  is	
  appended	
  by	
  
exactly	
  one	
  catcher.	
  
	
  
Topic	
  files	
  are	
  kept	
  in	
  shared	
  file	
  
storage.	
  
Topic	
  
File	
  
Topic	
  
File	
  
14	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Closer	
  Look	
  –	
  ProtoSpout	
  
The	
  ProtoSpout	
  tails	
  the	
  topic	
  files,	
  
parses	
  log	
  records	
  into	
  tuples	
  and	
  
injects	
  them	
  into	
  the	
  Storm	
  
topology.	
  
	
  
Last	
  fully	
  acked	
  posi6on	
  stored	
  in	
  
shared,	
  transac6onally	
  correct	
  file	
  
system.	
  
Topic	
  
File	
  
Topic	
  
File	
  
ProtoSpout	
  
15	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Closer	
  Look	
  –	
  Counter	
  Bolt	
  
§  Cri6cal	
  design	
  goals:	
  
–  fast	
  ack	
  for	
  all	
  tuples	
  
–  fast	
  restart	
  of	
  counter	
  
§  Ack	
  happens	
  when	
  tuple	
  hits	
  the	
  replay	
  log	
  (10’s	
  of	
  milliseconds,	
  
group	
  commit)	
  
§  Restart	
  involves	
  replaying	
  semi-­‐agg’s	
  +	
  replay	
  log	
  (very	
  fast)	
  
§  Replay	
  log	
  only	
  lasts	
  un6l	
  next	
  semi-­‐aggregate	
  goes	
  out	
  
Counter	
  
Bolt	
  
Replay	
  
Log	
  
Semi-­‐
aggregated	
  
records	
  
Incoming	
  
records	
  
Real-­‐6me	
   Long-­‐6me	
  
16	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
A	
  Frozen	
  Moment	
  in	
  Time	
  
§  Snapshot	
  defines	
  the	
  dividing	
  line	
  
§  All	
  data	
  in	
  the	
  snap	
  is	
  long-­‐6me,	
  all	
  
aser	
  is	
  real-­‐6me	
  
§  Semi-­‐agg	
  strategy	
  allows	
  clean	
  
combina6on	
  of	
  both	
  kinds	
  of	
  data	
  
§  Data	
  synchronized	
  snap	
  not	
  
needed	
  
Semi	
  
Agg	
  
Hadoop	
  
Aggregator	
  
Snap	
  
Long	
  
agg	
  
17	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Guarantees	
  
§  Counter	
  output	
  volume	
  is	
  small-­‐ish	
  
–  the	
  greater	
  of	
  k	
  tuples	
  per	
  100K	
  inputs	
  or	
  k	
  tuple/s	
  
–  1	
  tuple/s/label/bolt	
  for	
  this	
  exercise	
  
§  Persistence	
  layer	
  must	
  provide	
  guarantees	
  
–  distributed	
  against	
  node	
  failure	
  
–  must	
  have	
  either	
  readable	
  flush	
  or	
  closed-­‐append	
  
§  HDFS	
  is	
  distributed,	
  but	
  provides	
  no	
  guarantees	
  and	
  strange	
  
seman6cs	
  
§  MapRfs	
  is	
  distributed,	
  provides	
  all	
  necessary	
  guarantees	
  
18	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Presenta&on	
  Layer	
  
§  Presenta6on	
  must	
  
–  read	
  recent	
  output	
  of	
  Logger	
  bolt	
  
–  read	
  relevant	
  output	
  of	
  Hadoop	
  jobs	
  
–  combine	
  semi-­‐aggregated	
  records	
  
§  User	
  will	
  see	
  
–  counts	
  that	
  increment	
  within	
  0-­‐2	
  s	
  of	
  events	
  
–  seamless	
  and	
  accurate	
  meld	
  of	
  short	
  and	
  long-­‐term	
  data	
  
19	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
The	
  Basic	
  Idea	
  
§  Online	
  algorithms	
  generally	
  have	
  rela6vely	
  small	
  state	
  (like	
  
coun6ng)	
  
§  Online	
  algorithms	
  generally	
  have	
  a	
  simple	
  update	
  (like	
  coun6ng)	
  
§  If	
  we	
  can	
  do	
  this	
  with	
  coun6ng,	
  we	
  can	
  do	
  it	
  with	
  all	
  kinds	
  of	
  
algorithms	
  
20	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Summary	
  –	
  Part	
  1	
  
§  Semi-­‐agg	
  strategy	
  +	
  snapshots	
  allows	
  correct	
  real-­‐6me	
  counts	
  
–  because	
  addi6on	
  is	
  on-­‐line	
  and	
  associa6ve	
  
§  Other	
  on-­‐line	
  associa6ve	
  opera6ons	
  include:	
  
–  k-­‐means	
  clustering	
  (see	
  Dan	
  Filimon’s	
  talk	
  at	
  16.)	
  
–  count	
  dis6nct	
  (see	
  hyper-­‐log-­‐log	
  counters	
  from	
  streamlib	
  or	
  kmv	
  from	
  
Brickhouse)	
  
–  top-­‐k	
  values	
  
–  top-­‐k	
  (count(*))	
  (see	
  streamlib)	
  
–  contextual	
  Bayesian	
  bandits	
  (see	
  part	
  2	
  of	
  this	
  talk)	
  
21	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Example	
  2	
  –	
  AB	
  tes&ng	
  in	
  real-­‐&me	
  
§  I	
  have	
  15	
  versions	
  of	
  my	
  landing	
  page	
  
§  Each	
  visitor	
  is	
  assigned	
  to	
  a	
  version	
  
–  Which	
  version?	
  
§  A	
  conversion	
  or	
  sale	
  or	
  whatever	
  can	
  happen	
  
–  How	
  long	
  to	
  wait?	
  
§  Some	
  versions	
  of	
  the	
  landing	
  page	
  are	
  horrible	
  
–  Don’t	
  want	
  to	
  give	
  them	
  traffic	
  
22	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
A	
  Quick	
  Diversion	
  
§  You	
  see	
  a	
  coin	
  
–  What	
  is	
  the	
  probability	
  of	
  heads?	
  
–  Could	
  it	
  be	
  larger	
  or	
  smaller	
  than	
  that?	
  
§  I	
  flip	
  the	
  coin	
  and	
  while	
  it	
  is	
  in	
  the	
  air	
  ask	
  again	
  
§  I	
  catch	
  the	
  coin	
  and	
  ask	
  again	
  
§  I	
  look	
  at	
  the	
  coin	
  (and	
  you	
  don’t)	
  and	
  ask	
  again	
  
§  Why	
  does	
  the	
  answer	
  change?	
  
–  And	
  did	
  it	
  ever	
  have	
  a	
  single	
  value?	
  
23	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
A	
  Philosophical	
  Conclusion	
  
§  Probability	
  as	
  expressed	
  by	
  humans	
  is	
  subjec6ve	
  and	
  depends	
  on	
  
informa6on	
  and	
  experience	
  
24	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
I	
  Dunno	
  
0 0.2 0.4 0.6 0.8 1
p
Prob(p)
25	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
5	
  heads	
  out	
  of	
  10	
  throws	
  
0 0.2 0.4 0.6 0.8 1
p
Prob(p)
26	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
2	
  heads	
  out	
  of	
  12	
  throws	
  
0 0.2 0.4 0.6 0.8 1
p
Prob(p)
Mean	
  
Using	
  any	
  single	
  number	
  as	
  a	
  “best”	
  
es6mate	
  denies	
  the	
  uncertain	
  nature	
  of	
  
a	
  distribu6on	
  
Adding	
  confidence	
  bounds	
  s6ll	
  loses	
  most	
  of	
  
the	
  informa6on	
  in	
  the	
  distribu6on	
  and	
  
prevents	
  good	
  modeling	
  of	
  the	
  tails	
  
27	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Bayesian	
  Bandit	
  
§  Compute	
  distribu6ons	
  based	
  on	
  data	
  
§  Sample	
  p1	
  and	
  p2	
  from	
  these	
  distribu6ons	
  
§  Put	
  a	
  coin	
  in	
  bandit	
  1	
  if	
  p1	
  >	
  p2	
  
§  Else,	
  put	
  the	
  coin	
  in	
  bandit	
  2	
  
28	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
And	
  it	
  works!	
  
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε-greedy, ε = 0.05
Bayesian Bandit with Gamma-Normal
29	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Video	
  Demo	
  
30	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
The	
  Code	
  
§  Select	
  an	
  alterna6ve	
  
§  Select	
  and	
  learn	
  
§  But	
  we	
  already	
  know	
  how	
  to	
  count!	
  
n = dim(k)[1]!
p0 = rep(0, length.out=n)!
for (i in 1:n) {!
p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1)!
}!
return (which(p0 == max(p0)))!
for (z in 1:steps) {!
i = select(k)!
j = test(i)!
k[i,j] = k[i,j]+1!
}!
return (k)!
31	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
The	
  Basic	
  Idea	
  
§  We	
  can	
  encode	
  a	
  distribu6on	
  by	
  sampling	
  
§  Sampling	
  allows	
  unifica6on	
  of	
  explora6on	
  and	
  exploita6on	
  
§  Can	
  be	
  extended	
  to	
  more	
  general	
  response	
  models	
  
§  Note	
  that	
  learning	
  here	
  =	
  coun6ng	
  =	
  on-­‐line	
  algorithm	
  
32	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Generalized	
  Banditry	
  
§  Suppose	
  we	
  have	
  an	
  infinite	
  number	
  of	
  bandits	
  
–  suppose	
  they	
  are	
  each	
  labeled	
  by	
  two	
  real	
  numbers	
  x	
  and	
  y	
  in	
  [0,1]	
  
–  also	
  that	
  expected	
  payoff	
  is	
  a	
  parameterized	
  func6on	
  of	
  x	
  and	
  y	
  
–  now	
  assume	
  a	
  distribu6on	
  for	
  θ	
  that	
  we	
  can	
  learn	
  online	
  
§  Selec6on	
  works	
  by	
  sampling	
  θ,	
  then	
  compu6ng	
  f	
  
§  Learning	
  works	
  by	
  propaga6ng	
  updates	
  back	
  to	
  θ	
  
–  If	
  f	
  is	
  linear,	
  this	
  is	
  very	
  easy	
  
§  Don’t	
  just	
  have	
  to	
  have	
  two	
  labels,	
  could	
  have	
  labels	
  and	
  context	
  
	
  
E z[ ]= f (x, y |θ)
33	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Caveats	
  
§  Original	
  Bayesian	
  Bandit	
  only	
  requires	
  real-­‐6me	
  
§  Generalized	
  Bandit	
  may	
  require	
  access	
  to	
  long	
  history	
  for	
  learning	
  
–  Pseudo	
  online	
  learning	
  may	
  be	
  easier	
  than	
  true	
  online	
  
§  Bandit	
  variables	
  can	
  include	
  content,	
  6me	
  of	
  day,	
  day	
  of	
  week	
  
§  Context	
  variables	
  can	
  include	
  user	
  id,	
  user	
  features	
  
§  Bandit	
  ×	
  context	
  variables	
  provide	
  the	
  real	
  power	
  
34	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
§  Contact:	
  
–  tdunning@maprtech.com	
  
–  @ted_dunning	
  
§  Slides	
  and	
  such	
  (available	
  late	
  tonight):	
  
–  hEp://slideshare.net/tdunning	
  
§  Hash	
  tags:	
  #mapr	
  #storm	
  #bbuzz	
  
	
  	
  
35	
  ©MapR	
  Technologies	
  -­‐	
  Confiden6al	
  
Thank	
  You	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Adrianos Dadis
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesDataWorks Summit/Hadoop Summit
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...Spark Summit
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Apache Storm
Apache StormApache Storm
Apache StormEdureka!
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series DatabaseDataWorks Summit
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot modePrakash Chockalingam
 

Was ist angesagt? (19)

Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Spark Streaming into context
Spark Streaming into contextSpark Streaming into context
Spark Streaming into context
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot mode
 

Ähnlich wie Buzz Words Dunning Real-Time Learning

Storm Users Group Real Time Hadoop
Storm Users Group Real Time HadoopStorm Users Group Real Time Hadoop
Storm Users Group Real Time HadoopMapR Technologies
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceTed Dunning
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for MahoutTed Dunning
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time TogetherMapR Technologies
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clusteringTed Dunning
 
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...DataStax Academy
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
Data stream with cruise control
Data stream with cruise controlData stream with cruise control
Data stream with cruise controlBill Liu
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programmingkhstandrews
 
Xen_and_Rails_deployment
Xen_and_Rails_deploymentXen_and_Rails_deployment
Xen_and_Rails_deploymentAbhishek Singh
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey J On The Beach
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLELinaro
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipcPeter Lawrey
 
Deploying Large Spark Models to production and model scoring in near real time
Deploying Large Spark Models to production and model scoring in near real timeDeploying Large Spark Models to production and model scoring in near real time
Deploying Large Spark Models to production and model scoring in near real timesubhojit banerjee
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time togetherTed Dunning
 

Ähnlich wie Buzz Words Dunning Real-Time Learning (20)

Storm Users Group Real Time Hadoop
Storm Users Group Real Time HadoopStorm Users Group Real Time Hadoop
Storm Users Group Real Time Hadoop
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Data stream with cruise control
Data stream with cruise controlData stream with cruise control
Data stream with cruise control
 
Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
ParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel ProgrammingParaForming - Patterns and Refactoring for Parallel Programming
ParaForming - Patterns and Refactoring for Parallel Programming
 
Xen_and_Rails_deployment
Xen_and_Rails_deploymentXen_and_Rails_deployment
Xen_and_Rails_deployment
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
 
London hug
London hugLondon hug
London hug
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
Deploying Large Spark Models to production and model scoring in near real time
Deploying Large Spark Models to production and model scoring in near real timeDeploying Large Spark Models to production and model scoring in near real time
Deploying Large Spark Models to production and model scoring in near real time
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time together
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Kürzlich hochgeladen

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Kürzlich hochgeladen (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Buzz Words Dunning Real-Time Learning

  • 1. 1  ©MapR  Technologies  -­‐  Confiden6al   The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. Real-­‐&me  Learning   for  Fun  and  Profit  
  • 2. 2  ©MapR  Technologies  -­‐  Confiden6al   §  Contact:   –  tdunning@maprtech.com   –  @ted_dunning   §  Slides  and  such  (available  late  tonight):   –  hEp://slideshare.net/tdunning   §  Hash  tags:  #mapr  #storm  #bbuzz      
  • 3. 3  ©MapR  Technologies  -­‐  Confiden6al   The  Challenge   §  Hadoop  is  great  of  processing  vats  of  data   –  But  sucks  for  real-­‐6me  (by  design!)     §  Storm  is  great  for  real-­‐6me  processing   –  But  lacks  any  way  to  deal  with  batch  processing   §  It  sounds  like  there  isn’t  a  solu6on   –  Neither  fashionable  solu6on  handles  everything  
  • 4. 4  ©MapR  Technologies  -­‐  Confiden6al   This  is  not  a  problem.    It’s  an  opportunity!  
  • 5. 5  ©MapR  Technologies  -­‐  Confiden6al   t   now   Hadoop  is  Not  Very  Real-­‐&me   Unprocessed Data   Fully   processed   Latest  full   period   Hadoop  job   takes  this   long  for  this   data  
  • 6. 6  ©MapR  Technologies  -­‐  Confiden6al   t   now   Hadoop  works   great  back  here   Storm   works   here   Real-­‐&me  and  Long-­‐&me  together   Blended   view   Blended   view   Blended   View  
  • 7. 7  ©MapR  Technologies  -­‐  Confiden6al   One  Alterna&ve   Search   Engine   NoSql   de  Jour   Consumer   Real-­‐6me   Long-­‐6me   ?  
  • 8. 8  ©MapR  Technologies  -­‐  Confiden6al   Problems   §  Simply  dumping  into  noSql  engine  doesn’t  quite  work   §  Insert  rate  is  limited   §  No  load  isola6on   –  Big  retrospec6ve  jobs  kill  real-­‐6me   §  Low  scan  performance   –  Hbase  preEy  good,  but  not  stellar   §  Difficult  to  set  boundaries   –  where  does  real-­‐6me  end  and  long-­‐6me  begin?  
  • 9. 9  ©MapR  Technologies  -­‐  Confiden6al   Almost  a  Solu&on   §  Lambda  architecture  talks  about  func6on  of  long-­‐6me  state   –  Real-­‐6me  approximate  accelerator  adjusts  previous  result  to  current  state   §  Sounds  good,  but  …   –  How  does  the  real-­‐6me  accelerator  combine  with  long-­‐6me?   –  What  algorithms  can  do  this?   –  How  can  we  avoid  gaps  and  overlaps  and  other  errors?   §  Needs  more  work  
  • 10. 10  ©MapR  Technologies  -­‐  Confiden6al   A  Simple  Example   §  Let’s  start  with  the  simplest  case  …  coun6ng   §  Coun6ng  =  addi6on   –  Addi6on  is  associa6ve   –  Addi6on  is  on-­‐line   –  We  can  generalize  these  results  to  all  associa6ve,  on-­‐line  func6ons   –  But  let’s  start  simple  
  • 11. 11  ©MapR  Technologies  -­‐  Confiden6al   Data   Sources   Catcher   Cluster   Rough  Design  –  Data  Flow   Catcher   Cluster   Query  Event   Spout   Logger   Bolt   Counter   Bolt   Raw   Logs   Logger   Bolt   Semi   Agg   Hadoop   Aggregator   Snap   Long   agg   ProtoSpout   Counter   Bolt   Logger   Bolt   Data   Sources  
  • 12. 12  ©MapR  Technologies  -­‐  Confiden6al   Closer  Look  –  Catcher  Protocol   Data   Sources   Catcher   Cluster   Catcher   Cluster   Data   Sources   The  data  sources  and  catchers   communicate  with  a  very  simple   protocol.     Hello()  =>  list  of  catchers   Log(topic,message)  =>            (OK|FAIL,  redirect-­‐to-­‐catcher)  
  • 13. 13  ©MapR  Technologies  -­‐  Confiden6al   Closer  Look  –  Catcher  Queues   Catcher   Cluster   Catcher   Cluster   The  catchers  forward  log  requests   to  the  correct  catcher  and  return   that  host  in  the  reply  to  allow  the   client  to  avoid  the  extra  hop.     Each  topic  file  is  appended  by   exactly  one  catcher.     Topic  files  are  kept  in  shared  file   storage.   Topic   File   Topic   File  
  • 14. 14  ©MapR  Technologies  -­‐  Confiden6al   Closer  Look  –  ProtoSpout   The  ProtoSpout  tails  the  topic  files,   parses  log  records  into  tuples  and   injects  them  into  the  Storm   topology.     Last  fully  acked  posi6on  stored  in   shared,  transac6onally  correct  file   system.   Topic   File   Topic   File   ProtoSpout  
  • 15. 15  ©MapR  Technologies  -­‐  Confiden6al   Closer  Look  –  Counter  Bolt   §  Cri6cal  design  goals:   –  fast  ack  for  all  tuples   –  fast  restart  of  counter   §  Ack  happens  when  tuple  hits  the  replay  log  (10’s  of  milliseconds,   group  commit)   §  Restart  involves  replaying  semi-­‐agg’s  +  replay  log  (very  fast)   §  Replay  log  only  lasts  un6l  next  semi-­‐aggregate  goes  out   Counter   Bolt   Replay   Log   Semi-­‐ aggregated   records   Incoming   records   Real-­‐6me   Long-­‐6me  
  • 16. 16  ©MapR  Technologies  -­‐  Confiden6al   A  Frozen  Moment  in  Time   §  Snapshot  defines  the  dividing  line   §  All  data  in  the  snap  is  long-­‐6me,  all   aser  is  real-­‐6me   §  Semi-­‐agg  strategy  allows  clean   combina6on  of  both  kinds  of  data   §  Data  synchronized  snap  not   needed   Semi   Agg   Hadoop   Aggregator   Snap   Long   agg  
  • 17. 17  ©MapR  Technologies  -­‐  Confiden6al   Guarantees   §  Counter  output  volume  is  small-­‐ish   –  the  greater  of  k  tuples  per  100K  inputs  or  k  tuple/s   –  1  tuple/s/label/bolt  for  this  exercise   §  Persistence  layer  must  provide  guarantees   –  distributed  against  node  failure   –  must  have  either  readable  flush  or  closed-­‐append   §  HDFS  is  distributed,  but  provides  no  guarantees  and  strange   seman6cs   §  MapRfs  is  distributed,  provides  all  necessary  guarantees  
  • 18. 18  ©MapR  Technologies  -­‐  Confiden6al   Presenta&on  Layer   §  Presenta6on  must   –  read  recent  output  of  Logger  bolt   –  read  relevant  output  of  Hadoop  jobs   –  combine  semi-­‐aggregated  records   §  User  will  see   –  counts  that  increment  within  0-­‐2  s  of  events   –  seamless  and  accurate  meld  of  short  and  long-­‐term  data  
  • 19. 19  ©MapR  Technologies  -­‐  Confiden6al   The  Basic  Idea   §  Online  algorithms  generally  have  rela6vely  small  state  (like   coun6ng)   §  Online  algorithms  generally  have  a  simple  update  (like  coun6ng)   §  If  we  can  do  this  with  coun6ng,  we  can  do  it  with  all  kinds  of   algorithms  
  • 20. 20  ©MapR  Technologies  -­‐  Confiden6al   Summary  –  Part  1   §  Semi-­‐agg  strategy  +  snapshots  allows  correct  real-­‐6me  counts   –  because  addi6on  is  on-­‐line  and  associa6ve   §  Other  on-­‐line  associa6ve  opera6ons  include:   –  k-­‐means  clustering  (see  Dan  Filimon’s  talk  at  16.)   –  count  dis6nct  (see  hyper-­‐log-­‐log  counters  from  streamlib  or  kmv  from   Brickhouse)   –  top-­‐k  values   –  top-­‐k  (count(*))  (see  streamlib)   –  contextual  Bayesian  bandits  (see  part  2  of  this  talk)  
  • 21. 21  ©MapR  Technologies  -­‐  Confiden6al   Example  2  –  AB  tes&ng  in  real-­‐&me   §  I  have  15  versions  of  my  landing  page   §  Each  visitor  is  assigned  to  a  version   –  Which  version?   §  A  conversion  or  sale  or  whatever  can  happen   –  How  long  to  wait?   §  Some  versions  of  the  landing  page  are  horrible   –  Don’t  want  to  give  them  traffic  
  • 22. 22  ©MapR  Technologies  -­‐  Confiden6al   A  Quick  Diversion   §  You  see  a  coin   –  What  is  the  probability  of  heads?   –  Could  it  be  larger  or  smaller  than  that?   §  I  flip  the  coin  and  while  it  is  in  the  air  ask  again   §  I  catch  the  coin  and  ask  again   §  I  look  at  the  coin  (and  you  don’t)  and  ask  again   §  Why  does  the  answer  change?   –  And  did  it  ever  have  a  single  value?  
  • 23. 23  ©MapR  Technologies  -­‐  Confiden6al   A  Philosophical  Conclusion   §  Probability  as  expressed  by  humans  is  subjec6ve  and  depends  on   informa6on  and  experience  
  • 24. 24  ©MapR  Technologies  -­‐  Confiden6al   I  Dunno   0 0.2 0.4 0.6 0.8 1 p Prob(p)
  • 25. 25  ©MapR  Technologies  -­‐  Confiden6al   5  heads  out  of  10  throws   0 0.2 0.4 0.6 0.8 1 p Prob(p)
  • 26. 26  ©MapR  Technologies  -­‐  Confiden6al   2  heads  out  of  12  throws   0 0.2 0.4 0.6 0.8 1 p Prob(p) Mean   Using  any  single  number  as  a  “best”   es6mate  denies  the  uncertain  nature  of   a  distribu6on   Adding  confidence  bounds  s6ll  loses  most  of   the  informa6on  in  the  distribu6on  and   prevents  good  modeling  of  the  tails  
  • 27. 27  ©MapR  Technologies  -­‐  Confiden6al   Bayesian  Bandit   §  Compute  distribu6ons  based  on  data   §  Sample  p1  and  p2  from  these  distribu6ons   §  Put  a  coin  in  bandit  1  if  p1  >  p2   §  Else,  put  the  coin  in  bandit  2  
  • 28. 28  ©MapR  Technologies  -­‐  Confiden6al   And  it  works!   11000 100 200 300 400 500 600 700 800 900 1000 0.12 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 n regret ε-greedy, ε = 0.05 Bayesian Bandit with Gamma-Normal
  • 29. 29  ©MapR  Technologies  -­‐  Confiden6al   Video  Demo  
  • 30. 30  ©MapR  Technologies  -­‐  Confiden6al   The  Code   §  Select  an  alterna6ve   §  Select  and  learn   §  But  we  already  know  how  to  count!   n = dim(k)[1]! p0 = rep(0, length.out=n)! for (i in 1:n) {! p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1)! }! return (which(p0 == max(p0)))! for (z in 1:steps) {! i = select(k)! j = test(i)! k[i,j] = k[i,j]+1! }! return (k)!
  • 31. 31  ©MapR  Technologies  -­‐  Confiden6al   The  Basic  Idea   §  We  can  encode  a  distribu6on  by  sampling   §  Sampling  allows  unifica6on  of  explora6on  and  exploita6on   §  Can  be  extended  to  more  general  response  models   §  Note  that  learning  here  =  coun6ng  =  on-­‐line  algorithm  
  • 32. 32  ©MapR  Technologies  -­‐  Confiden6al   Generalized  Banditry   §  Suppose  we  have  an  infinite  number  of  bandits   –  suppose  they  are  each  labeled  by  two  real  numbers  x  and  y  in  [0,1]   –  also  that  expected  payoff  is  a  parameterized  func6on  of  x  and  y   –  now  assume  a  distribu6on  for  θ  that  we  can  learn  online   §  Selec6on  works  by  sampling  θ,  then  compu6ng  f   §  Learning  works  by  propaga6ng  updates  back  to  θ   –  If  f  is  linear,  this  is  very  easy   §  Don’t  just  have  to  have  two  labels,  could  have  labels  and  context     E z[ ]= f (x, y |θ)
  • 33. 33  ©MapR  Technologies  -­‐  Confiden6al   Caveats   §  Original  Bayesian  Bandit  only  requires  real-­‐6me   §  Generalized  Bandit  may  require  access  to  long  history  for  learning   –  Pseudo  online  learning  may  be  easier  than  true  online   §  Bandit  variables  can  include  content,  6me  of  day,  day  of  week   §  Context  variables  can  include  user  id,  user  features   §  Bandit  ×  context  variables  provide  the  real  power  
  • 34. 34  ©MapR  Technologies  -­‐  Confiden6al   §  Contact:   –  tdunning@maprtech.com   –  @ted_dunning   §  Slides  and  such  (available  late  tonight):   –  hEp://slideshare.net/tdunning   §  Hash  tags:  #mapr  #storm  #bbuzz      
  • 35. 35  ©MapR  Technologies  -­‐  Confiden6al   Thank  You