Real-time systems at Twitter (Velocity 2012)

Our talk covers the migration of the Twitter architecture from primarily Ruby on Rails (RoR) to a JVM-based SOA system with emphasis on high performance, scalability, and resilience to failure. General lessons include the advantages of asynchronous, real-time architectures over synchronous, process / thread-oriented systems, as well as caching and data store patterns.

  1. 1. real-time systems @twitter @raffi & @a_a velocity 2012
  2. 2. ROUTING PRESENTATION LOGIC STORAGE & RETRIEVAL T-Bird T-Flock + Haplo Monorail Darkwing Flock(s)
  3. 3. what are the big problems?⇢ monolithic application⇢ lack of self-service infrastructure⇢ painful to add new services & features
  4. 4. what did we want to achieve?⇢ big infrastructure wins in speed, efficiency, reliability⇢ separation of concerns⇢ team independence
  5. 5. stats.timeFuture("request_latency_ms") { // dispatch to do work}
  6. 6. ts( AVG, timelineservice, audubon.role.timelineservice, service/client/ woodstar.prod/ getStatusTimeline/request_latency_ms.p50)
  7. 7. TFE(HTTP Proxy) Woodstar
  8. 8. TFE (HTTP Proxy)Monorail Woodstar
  9. 9. 100% Monorail TFE (HTTP Proxy) 0% Woodstar Monorail Woodstar
  10. 10. 0% Monorail TFE (HTTP Proxy) 100% Woodstar Monorail Woodstar
  11. 11. TFE(HTTP Proxy) Woodstar
  12. 12. TFE (HTTP Proxy) Woodstar Tweetypie GizmoduckTimeline (Tweet (UserService Service) Service)
  13. 13. network substrate⇢ connection management⇢ protocol codecs⇢ transient error handling⇢ service discovery⇢ observability
  14. 14. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  15. 15. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  16. 16. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  17. 17. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  18. 18. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  19. 19. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  20. 20. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  21. 21. TFE (HTTP Proxy) Woodstar Tweetypie GizmoduckTimeline (Tweet (UserService Service) Service)
  22. 22. ServerBuilder() .name("ServiceName") .reportTo(statsReceiver) .tracer(ZipkinTracer()) .codec(Http()) .maxConcurrentRequests(1000) .requestTimeout(500.milliseconds) .build(Service[Request, Response])
  23. 23. TFE (HTTP Proxy)Monorail Woodstar
  24. 24. TFE (HTTP Proxy)Macaw Macaw Macaw Woodstar+Activity +Search +Logging
  25. 25. class EchoLoadTest(service: ParrotThriftService)extends RecordProcessor { val client = new EchoService.ServiceToClient( service, new TBinaryProtocol.Factory()) def processLines( job: ParrotJob, lines: Seq[String]) { lines.map(client.echo(_)) }}
  26. 26. TFE(HTTP Proxy) Monorail
  27. 27. TFE(HTTP Proxy) Monorail Woodstar
  28. 28. ROUTING PRESENTATION LOGIC STORAGE & RETRIEVAL T-Bird T-Flock + Haplo Monorail Darkwing Flock(s)
  29. 29. ROUTING PRESENTATION LOGIC STORAGE & RETRIEVAL Tweetypie Monorail T-Bird Gizmoduck T-Flock + Woodstar Haplo TFE TLS Macaw Darkwing +Swift Social Graph Service Macaw Flock(s) +Disco Story Service
  30. 30. where are we?⇢ team organization that mimics the software stack⇢ able to launch massive features in parallel
  31. 31. mentions statuses/show users/show3000 500 4002250 375 3001500 250 200 750 125 100 0 0 0 p50 p95 p999 p50 p95 p999 p50 p95 p999
  32. 32. some more statistics⇢ 45% of traffic on the JVM stack⇢ we’re a lot faster⇢ we’re a lot more reliable⇢ we fix bugs faster⇢ 12 deploys, yesterday
  33. 33. #JoinTheFlock