Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Experience with Kafka & Storm

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 24 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie Experience with Kafka & Storm (20)

Aktuellste (20)

Anzeige

Experience with Kafka & Storm

  1. 1. Target and Connect Intelligently Experience with Kafka & Storm Otto Mok Solution Architect, AcuityAds April 30, 2014 – Toronto Hadoop User Group
  2. 2. 2 Agenda • Background – What does AcuityAds do? • Use case – What are we trying to do? • High-level System Architecture – How does the data flow? • Kafka & Storm – What did we do wrong?
  3. 3. 3 Background Source: https://www.google.ca/search?q=banner+ads&tbm=isch&tbo=u
  4. 4. 4 Background • Digital Advertising – Website banner, pre-roll video, free mobile app • Buy ad impressions at ‘real-time’ – Response within 50ms for auction • Find best match between people and ads – Show ad that you care about • Use machine learning algo to ‘learn’ – Data, data, data
  5. 5. 5 Use case • 10+ billion daily impressions • 30,000+ new sites daily • How many daily impressions by site? • How are the impressions distributed? – Country, Province, Gender, Age Range, etc...
  6. 6. 6 High-level System Architecture • 10+ billion daily bid requests • Make up to 4 billion daily bids • Serve millions of daily impressions • 10+ TB of messages daily • 300k+ message / second Bidder Adserver Kafka Hbase/Hadoop Storm
  7. 7. 7 Kafka Source: http://kafka.apache.org/documentation.html
  8. 8. 8 Kafka - Spec • Kafka v0.8.0 • Servers – 10 x 2U(10 x 3TB) JBOD • Total storage – 300 TB • Replication – 3x • Unique data – 100 TB • Capacity – a few days • Producer acknowledgment – never waits • Topic - BIDREQUEST
  9. 9. 9 Kafka - Monitoring • Nagios – Ping, CPU, memory, network I/O, disk space • Producer-Consumer group message counting – Hourly consumption rate check Topic Consumer Group ID Producer Count Consumer Count Error Ratio BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00 BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88
  10. 10. 10 Kafka - Monitoring • Kafka Web Console – Partition offset for each consumer group
  11. 11. 11 Kafka - Issues • Issue 1 - Partitions – 10 partitions – Each partition > 1 TB a day – 100 TB / 1 TB – no problem! • Each partition is stored in a directory – /disk05/kafka-logs/BIDREQUEST-09 – /disk09/kafka-logs/BIDREQUEST-03
  12. 12. 12 Kafka - Issues • Issue 2 – Unbalanced partition distribution – Some servers running out of space – Some servers are not “leader” for any partition • Network glitch cause server to drop out of cluster, no longer leader after rejoin • auto.leader.rebalance.enable=true
  13. 13. 13 Lots of data – now what? Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
  14. 14. 14 Use case - again • 10+ billion daily impressions • 30,000+ new sites daily • How many daily impressions by site? • How are the impressions distributed? – Country, Province, Gender, Age Range, etc...
  15. 15. 15 Storm Source: http://storm.incubator.apache.org/documentation/Tutorial.html
  16. 16. 16 Storm - Spec • Storm v0.8.2 • Servers – 13 x Dual Quad Core Xeon 36G RAM • 4 worker slots per server • Total logical CPUs – 208 • Total memory – 468 G • Total slots – 52 worker slots (JVMs)
  17. 17. 17 Storm - Monitor
  18. 18. 18 Storm - Topology • Spout read each BidRequest from Kafka topic • Determine new or existing, emit tuples to different “streams”
  19. 19. 19 Storm - Topology • InsertInventoryBolt – Process tuples from NewInventory stream – Field grouping on sourceId, domainName – Tick tuple every 1 second • UpdateInventoryBolt – Process tuples from ExistingInventory stream – Field grouping on inventoryId – Tick tuple every 1 second
  20. 20. 20 Storm - Topology • LogInventoryBolt – Process tuples from ExistingInventory stream – Field grouping on inventoryId – Tick tuple every 10 seconds
  21. 21. 21 Storm - Issues • Issue – Low uptime – 10 workers, 100 executors – Not processing many tuples – Process latency < 10ms • Bolts restarts due to uncaught Exceptions
  22. 22. 22 Conclusion • Cost – Bleed edge technology  bugs – Support  mailing lists – Monitoring  roll your own – Operation  dedicated personnel • Benefit – Near real-time data on site impression volume & distribution by geo, demo, etc...
  23. 23. 23 Forward Looking • Kafka v0.8.1.1 – Allow specify broker hostname for producer & consumer – Change # of partitions of a topic online • Storm v0.9.1 – Faster pure Java Netty transport – View logs from each server from Storm UI – Tick tuple using floating point seconds – Storm on Hadoop (HDP 2.1)
  24. 24. 24 Thank you Otto Mok otto.mok@acuityads.com Source: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg

×