How hard can it be

•Als PPTX, PDF herunterladen•

0 gefällt mir•154 views

Max Kossatz

How we developed micro services at Hitbox/Smashcast

Software

• Authentication
• Slowmode
• Moderator
• Admins
• Subscribers
• Timeout
• Ban
• Limits per user
• Notify messages
• Imagelog
• Unlimited Rooms
Not that easy!
• IP-Ban
• Raffle
• Voting
• Whisper
• Blacklist
• Block user
• Posting Images
• DDOS Protection
• Limits per channel
• Chatlog
• System messages

Get Image
Check
Image
Adult
Check
Save on
CDN

• Click limits
• Spamming
• Load balancing
• Limits for frontend
• Authentication
• Extendable
Not that easy!

• Bots
• Statistics
• Main KPI for streams
• Advertisement
Not that easy!

WebSocket Load
balancing
Permission and
security model
(Admin, Mods, ...)
Frontend Server Backend Server
UI
Frontend
Server
data storage
Redis
Cluster
Smashcast
REST-API
Backend
Server
Auto scaling
Auto scaling
Long Polling Fallback
Fallback
Server

• Small, cheap machines
• Frontend handles the connections, no logic
• Backend stateless, can be restarted/upgraded any time
• When a frontend breaks it affects only for a few user
• Socket.io for handling websockets
• Up & Downscale as needed
Servers

Let‘s do this again for
all other services!

• Easy to use
• Good performance
• Easy to cluster
• Great web-interface for monitoring
• Proven stability
RabbitMQ

Load
balancing
Frontend
Server
Chat
Service
Auto scaling
RabbitMQ
Cluster
Cheering
Service
Feed
Service
Viewcount
Service
Auto scaling

JoinChannel
JoinChannel
OK
Send & Receive
Data

• Some services don’t need a login
• There is always the need for schedule things
• Services need to send & get information from and to the
API
• What happen when a frontend server dies?
Examples

Load
balancing
Frontend
Server
Chat
Service
Auto scaling
RabbitMQ
Cluster
Cheering
Service
Feed
Service
Viewcount
Service
Auto scaling
Login
Service
Cron
Service
API
Service
Cleanup
Service

Routed to
cheering
service
Cheering
service
updates
Redis, etc.
Stores msg in
cron service
to get sent
back in the
future
Cron service
sends back to
cheering
service
User clicks
cheering icon
Cheering
service
collects data
and sends it
Routed to
channel
User gets
update

• Split chat into smaller services
• A lot of new services
• Open up infrastructure for 3rd party services
• ???
Future

Empfohlen

How to develop innovative, scalable systemsMax Kossatz

Web development tips and tricksmaxo_64

Billing system using node PresentationSusanta Chakraborty

Mura vs WordpressRonnie Duke

针对iPad平台的高性能网站架构jeffz

MobileClientrpatil82

MongoBoston - MongoHQbenwyrosdick

Mobile gotchaphegaro

Empfohlen

How to develop innovative, scalable systemsMax Kossatz

Web development tips and tricksmaxo_64

Billing system using node PresentationSusanta Chakraborty

Mura vs WordpressRonnie Duke

针对iPad平台的高性能网站架构jeffz

MobileClientrpatil82

MongoBoston - MongoHQbenwyrosdick

Mobile gotchaphegaro

Kentico CMSRaavish patel

LinkedIn Mobile: How do we do it?phegaro

«How to start in web application penetration testing» by Maxim Dzhalamaga 0xdec0de

Signal rity1Yaniv Rodenski

Express yourselfYaniv Rodenski

Dmitry Soshnikov, Ymc universal appsYandex

Node ts1Yaniv Rodenski

Buzzr Multi-Site Hosted CMSEd Sussman

Javascript for Wep AppsMichael Puckett

Chirp 2010: Scaling TwitterJohn Adams

BGF 2012 (Browsergames Forum)Christof Wegmann

Sm west 2010-microsoft-workshopPrashant Ohal

Cloud and Windows AzureRadu Vunvulea

Building Lightning Fast Websites (for Twin Cities .NET User Group)strommen

Massively Social != Massively MultiplayerPaul Furio

John adams talk cloudyJohn Adams

Introduction to Web SecurityKamil Lelonek

Show Me the Numbers: Automated Browser colleenfry

External JavaScript Widget Development Best Practices (updated) (v.1.1) Volkan Özçelik

Distributed "Web Scale" SystemsRicardo Vice Santos

Christian Corsano (Io-Interactive) - Hitman – the first episodic AAA game and...DevGAMM Conference

The Core of Microservice Architecture(First Approach)enyert

Weitere ähnliche Inhalte

Was ist angesagt?

Kentico CMSRaavish patel

LinkedIn Mobile: How do we do it?phegaro

«How to start in web application penetration testing» by Maxim Dzhalamaga 0xdec0de

Signal rity1Yaniv Rodenski

Express yourselfYaniv Rodenski

Dmitry Soshnikov, Ymc universal appsYandex

Node ts1Yaniv Rodenski

Buzzr Multi-Site Hosted CMSEd Sussman

Javascript for Wep AppsMichael Puckett

Was ist angesagt? (9)

Kentico CMS

LinkedIn Mobile: How do we do it?

«How to start in web application penetration testing» by Maxim Dzhalamaga

Signal rity1

Express yourself

Dmitry Soshnikov, Ymc universal apps

Node ts1

Buzzr Multi-Site Hosted CMS

Javascript for Wep Apps

Ähnlich wie How hard can it be

Chirp 2010: Scaling TwitterJohn Adams

BGF 2012 (Browsergames Forum)Christof Wegmann

Sm west 2010-microsoft-workshopPrashant Ohal

Cloud and Windows AzureRadu Vunvulea

Building Lightning Fast Websites (for Twin Cities .NET User Group)strommen

Massively Social != Massively MultiplayerPaul Furio

John adams talk cloudyJohn Adams

Introduction to Web SecurityKamil Lelonek

Show Me the Numbers: Automated Browser colleenfry

External JavaScript Widget Development Best Practices (updated) (v.1.1) Volkan Özçelik

Distributed "Web Scale" SystemsRicardo Vice Santos

Christian Corsano (Io-Interactive) - Hitman – the first episodic AAA game and...DevGAMM Conference

The Core of Microservice Architecture(First Approach)enyert

Architecture evolutionamit bezalel

From Local to Global AWS Germany

Hacklu2011 tricaudstricaud

Java scriptwidgetdevelopmentjstanbul2012Volkan Özçelik

External JavaScript Widget Development Best PracticesVolkan Özçelik

Nokta techpresentationAnkaraCloud

브라우저에 날개를 달자NAVER SHOPPING

Ähnlich wie How hard can it be (20)

Chirp 2010: Scaling Twitter

BGF 2012 (Browsergames Forum)

Sm west 2010-microsoft-workshop

Cloud and Windows Azure

Building Lightning Fast Websites (for Twin Cities .NET User Group)

Massively Social != Massively Multiplayer

John adams talk cloudy

Introduction to Web Security

Show Me the Numbers: Automated Browser

External JavaScript Widget Development Best Practices (updated) (v.1.1)

Distributed "Web Scale" Systems

Christian Corsano (Io-Interactive) - Hitman – the first episodic AAA game and...

The Core of Microservice Architecture(First Approach)

Architecture evolution

From Local to Global

Hacklu2011 tricaud

Java scriptwidgetdevelopmentjstanbul2012

External JavaScript Widget Development Best Practices

Nokta techpresentation

브라우저에 날개를 달자

Kürzlich hochgeladen

Introduction Computer Science - Software Design.pdfFerryKemperman

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

MYjobs Presentation Django-based projectAnoyGreter

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

What are the key points to focus on before starting to learn ETL Development....kzayra69

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

2.pdf Ejercicios de programación competitivaDiego Iván Oliveros Acosta

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC

Kürzlich hochgeladen (20)

Introduction Computer Science - Software Design.pdf

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

Intelligent Home Wi-Fi Solutions | ThinkPalm

Recruitment Management Software Benefits (Infographic)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Best Web Development Agency- Idiosys USA.pdf

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

MYjobs Presentation Django-based project

CRM Contender Series: HubSpot vs. Salesforce

What are the key points to focus on before starting to learn ETL Development....

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

2.pdf Ejercicios de programación competitiva

Unveiling the Future: Sylius 2.0 New Features

英国UN学位证,北安普顿大学毕业证书1:1制作

Cloud Data Center Network Construction - IEEE

Implementing Zero Trust strategy with Azure

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

How to Track Employee Performance A Comprehensive Guide.pdf

How hard can it be

1. How hard can it be? How hard can it be?

6. REALTIME

8. • Authentication • Slowmode • Moderator • Admins • Subscribers • Timeout • Ban • Limits per user • Notify messages • Imagelog • Unlimited Rooms Not that easy! • IP-Ban • Raffle • Voting • Whisper • Blacklist • Block user • Posting Images • DDOS Protection • Limits per channel • Chatlog • System messages

9. 10 ≠ 100000

10. Images

11.

12. 100Mb GIFs...

13. Logfiles!

14.

15. 50k in a channel...

16. DDOS!

17. Get Image Check Image Save on CDN

18. Get Image Check Image Adult Check Save on CDN

19.

20. • Click limits • Spamming • Load balancing • Limits for frontend • Authentication • Extendable Not that easy!

21.

22.

23.

24. • Bots • Statistics • Main KPI for streams • Advertisement Not that easy!

25. 1st Version

26.

27. WebSocket Load balancing Permission and security model (Admin, Mods, ...) Frontend Server Backend Server UI Frontend Server data storage Redis Cluster Smashcast REST-API Backend Server Auto scaling Auto scaling Long Polling Fallback Fallback Server

28. • Small, cheap machines • Frontend handles the connections, no logic • Backend stateless, can be restarted/upgraded any time • When a frontend breaks it affects only for a few user • Socket.io for handling websockets • Up & Downscale as needed Servers

29. Let‘s do this again for all other services!

30. 2nd Version

31.

32. • Easy to use • Good performance • Easy to cluster • Great web-interface for monitoring • Proven stability RabbitMQ

33. Server Structure

34. Load balancing Frontend Server Chat Service Auto scaling RabbitMQ Cluster Cheering Service Feed Service Viewcount Service Auto scaling

35. What‘s Happening

36. JoinChannel JoinChannel OK Send & Receive Data

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49. Generic Services

50. • Some services don’t need a login • There is always the need for schedule things • Services need to send & get information from and to the API • What happen when a frontend server dies? Examples

51. Load balancing Frontend Server Chat Service Auto scaling RabbitMQ Cluster Cheering Service Feed Service Viewcount Service Auto scaling Login Service Cron Service API Service Cleanup Service

52.

53. Routed to cheering service Cheering service updates Redis, etc. Stores msg in cron service to get sent back in the future Cron service sends back to cheering service User clicks cheering icon Cheering service collects data and sends it Routed to channel User gets update

54. Microservice Library

55. Demo

56. Is it Working?

57.

58. Future

59. • Split chat into smaller services • A lot of new services • Open up infrastructure for 3rd party services • ??? Future

60. Thank you! max@smashcast.tv

61. We are hiring!

62.

63.

64.

65. „Self“ DDOS

66.

67. Communication Flow

Hinweis der Redaktion

So, how hard can it be?
Thats me, 1980/81 with my first computer, anyone know the computer? I have studed arts, lived in new york & berlin, have made startups and have crashed startups
Smashcast, was until April Hitbox, but Hitbox got bought by Azubu, a competitor and now we are Smashcast
What is Smashcast? This is the frontpage
And thats a stream page, you see the live stream, chat, etc.
Real time is important. When something happens on the stream viewers wants to react as fast as possible. Thats why we have build a real time infrastructure based on websockets. Now a few explanations of the elelemts on the site that use this infrastructure.
The chat, ca do everything a chat needs to do, including posting images, gifs, selfies, etc.
Sounds easy, but isnt when you want to scale it!
And there is a big difference between a small whatsapp group and a huge real time chat!!!
Take for example images in chat, sounds easy! Look at the person in the top left corner 
Just link to the source and you are done! We had this for two years…. Until we realized: there is a problem!
For example: someone posts a 100+mb gif, then alle viewers will start to download it and, when there internet is slow, dont forget, there is a 3Mbit stream running next to the chat, the stream will lag, giving a bad user experience! But there are bigger problems with this images in the chat!
Imagine you hate one of the smaller streamer (like 5-20 viewers) on smashcast. You set up a small server with a nice gif on it and you post this gif, when the streamer is live, in his chat. And now you have the IP-address of all his viewers and the IP of the streamer in your log files! So next step is
Googling DDOS and bring down this streamer you hate! So easy! But there is even a third problem!
Imagine a stream has 50k viewers and someone posts a gif. The server where the gif is hosted must be strong, because he will get now 50k hits at the same time!
Ist like a DDOS…
So we need to get the image, check it and save it in the CDN
And we are testing AWS machine learning for porn detection
Back to the realtime features. Cheering is another on on the site. It allows people to cheer for a team or a stream.
Lot of things to do! And thats just the beginning! Again, here you run quite fast into problems with too many messages you send to the users, etc.
Another feature on the site is the feed below the stream. All viewers get updates via the websocket
Last but not least the viewcounter.
Thats this number here. Maybe the most important thing on the site
So this number is the main KPI for all streams, the bigger the better so a lot of people try to influence this number. Updates are send out in realtime or every 10 seconds (depending on how big the stream is) to all viewers.
So, lets explain how we started it the realtime system a few years ago, the famous first version.
We went with nodejs & redis, redis because it is a great product for storing data that you need fast. When we have a lot of users the redis servers make thousends of requests/seconds without any problems and AWS offers a very good managed version of redis. Nodejs because of its fast io for this, nowadays i would maybe move to go.
So we went with a typical frontend/backend setup, the frontend handles the websocket connection and is quite dump, the backend all the chat logic. Fallback server is for the less than 1% that don’t support websockets, some providers block them and some older android versions too
We use AWS Single core machines This 1st version worked fine, but the problem is:
So we had a similar infrastructure for the viewcounter and when we worked on cheering & the feed we would have to build a similar infrastructure for this too. So we need a different approach
Sounds easy, or? Exists since 30 years.
Thats when we decided to use rabbitmq. We could have used Kafka too but i have quite some experience with rabbitmq and it fits perfect. Anyone using it too?
As i said, it is easy to use, easy to mantain, and the best thing is the web interface, i will show it to you later.
So, how does the new server structure looks like? This is Mike Pence, Vice President of the USA while visiting NASA….
We still have the frontend server (with fallback, of course) and in between the rabbitmq cluster, which distribute the messages to the services and then back to the frontend.
So how does this work now? How is a command from the frontend send to the backend, processed and back to the frontend?
First, you need to login into every service. Ok, this flow is not that complicated, but wait for it! 
This is a login message the userinterface sends to the frontend server he is connected to. In this case he wants to login (joinchannel) to the chat service for the channel „karlus“. The frontend server is then sending this message to the rabbitmq-cluster
In rabbitmq there are two exchanges defined: fromFrontend & toFrontend. The frontendserver are connected to both, on one they are listening and on the other they are sending messages. So this message is send to the fromfrontend exchange because it is from the frontend server.
Here you can see this. The frontend server sets the routing and the routing key is chat.joinchannel.karlus because it is a joinchannel command for the chat service for the channel karlus
The fromFrontend exchange routes now all messages where the routing key starts with chat. To the chat queue.
The chat backend servers (in this case two servers) are connected to this chat queue and rabbitmq is distributing the messages via round robin to the chat backend servers. This is how the backend servers than get their messages. They have no clue that the message is coming from a frontend server, could come from other services, etc.
The chat backend servers then process the message, in this case do the login, etc. and then send back a message to rabbitmq to the „toFrontend“ exchange.
Each frontend server has his own queue that is connected directly to the „tofrontend“ exchange. With this setup it is possible for the backend to send a message directly to one frontend server (for example, the loginmsg, because this is only for one user) or to all frontend servers (for example, a chat msg).
After processing the message the backend server sends ther message back to rabbitmq. For a normal chat message this would be like this, again, the routin gkey is the service, command & channel.
The loginmsg is a bit different because it is send directly to the frontend server that send the message because the other frontend servers dont need to see it, so there is no routing key, only the queue is specified where it is send to.
And this is how this looks in the user interface? Green are the messages from the user interface, white the messages from the frontend server.
Here you see the login msg we just saw
And here the message coming back from the chat backend server.
When we build this system we realized that we need some generic services that can be used by other services.
For example some services dont need a login A cronjob is needed for example for the cheering service to send status updates back to the viewers or for the viewcount server. The connection to & from the API is based on a service, so other services can send a message to a service and this will then interact with the REST API. The cleanup service is there to log out viewers from other services when a frontend server goes down.
So we added this needed services to the same rabbitmq cluster
Some services are really simple, this is the main function for the login service for example.
This can lead to quite complicated flows.
To handle this we have our own library that we use to connect & use with rabbitmq
https://ec2-52-3-222-66.compute-1.amazonaws.com/#/ https://rabbitmq/#/
Well, and at the end, is the chat system working? Does it scale?
Well, i dont have a screenshot about our latest record that was close to 200k, but this one shows you a channel with 100k people. All 154k connections where handled by 16 frontend servers and 8 backend servers, costing us around $20 for the evening.
Well, and at the end, is the chat system working? Does it scale?
As i said, it is easy to use, easy to mantain, and the best thing is the web interface, i will show it to you later.
Just one mor think:
Just one mor think:
Just one mor think:
It was during at that time biggest event ever, 60k people on one stream and suddenly all of them saw this.
And we did this!
I know, this sound s stupid, but i will give you two examples: Imagine you have a stream with 100k viewers. Every time a new viewer comes to this stream he/she gets the info about how to get the stream from our server. Now imagine the streamer has a problem, lets say his computer crashes and the stream drops, mean is getting black or stucked. What does 100k people do?
This. And lets hope that your api can handle this! And they wont stop until trhe have a stream again!
Sounds easy, or? Exists since 30 years.