This is my presentation for a "Streaming Service" like Netflix or Amazon Prime.
This was a part of an Interview I did woth a company so there is a lot of text explaining all components in detail.
3. ZoneLB
LBLBLB
Zone2Zone3Zone1
• Request arrives at the zone level load balancer
• The zone load balancer is responsible for routing the requests to the
correct zones
• The zone load balancer would use geolocation-based routing policy
• Using Geolocation based routing policy allows for
⁻ Low latency
⁻ Region based distribution/availability of content and/or features
• In addition to using geolocation-based routing, the system will also use
health checks to determine the health of a zone and re-route traffic to
another zone in case the geo-location based chosen zone is unhealthy
• Using a multi-value routing policy enables the system to continue to be
highly available at the cost of some performance even during zone
outages
• Routing traffic to a non-preferred zone would still need to be constrained
since connecting to any zone could result in issues with region based
content delivery
Zone Load Balancer
4. • Once the request is routed to a zone, each zone has a set of load balancers
• These load balancers are responsible to route the traffic to the servers
• The routing policy for these 2nd level load balancers would be round-robin
• Choosing round robin as the routing policy ensures that no server/instance
is overused leading to failures
• This load balancer also checks for the health of their instances/servers in
order to not route the request to a server in a bad state
Regular Load Balancer
LBLBLB
Zone2Zone3Zone1
M2M3M1M3M2M1M3M2M1…..
5. • Once the request reaches the instance, it is first greeted by the API Gateway.
• When using AWS, we could use the API Gateway service provided by AWS
but for flexibility purposes, we chose to have a 3rd party/ home grown API
gateway in this case
• The API Gateway can then route the request based on the endpoint to either
services hosted on other instances or micro-services or even present as logic
in the instance itself
• In this particular architecture we are choosing to have the business logic / API
endpoint functions within the instance. For example, logic for authentication
or billing or the logic for fetching the recommendations / watch history from
the in-memory database
• This could be extended to instead use micro-services for each of the logic /
API endpoint functions too.
• These instances are configured to be dynamically scalable
₋ Horizontal scaling for AWS/Azure
₋ Combination of Horizontal and Vertical Scaling if using custom cloud
Instances / Servers
Auth DB
(SQL)
Billing DB
(SQL)
APIGateway
API call
Login
API call Get
Library
API call Set
Watched
API call
Get Info
6. • The main database architecture is designed to be fast and reliable with a combination of NoSQL and in-memory database
• The Library DB as seen in the architecture diagram is used to store data such as
⁻ Names and metadata of video titles ( such as movies and tv-series )
⁻ Each users watch and search history
⁻ Other metadata for each user which can be used in analytics such as watch times, which titles the user engaged more
with, etc
⁻ Some of this metadata is written / updated with user actions ( such as adding a title to the users watch history as soon
as they play the title for x seconds/minutes, while some are updated by services such as recommendation engine (
example, adding titles to the recommended field/key after running through the algorithm for recommendation )
Database
Library
DB
(NoSQL)
In-Memory
DB
7. • This database is chosen to be NoSQL since there are some key characteristics about the data stored :
⁻ The number of writes to this database is not very high due to the nature of the data
⁻ NoSQL will allow us to shard the data better in order to be able to scale horizontally in the database cluster
⁻ Data for each user can be consolidated within one document and fetched with 1 call instead of having multiple tables
and/or performing joins etc as in relational dbs
⁻ Similar case applies for data about each title. All metadata can be consolidated within one document
• In addition to the NoSQL database, we have added an in-memory database ( such as Redis ) to allow for faster read times
• This will allow us to not only deliver initial content quickly to the client, but also help with features such as fast search, etc
• The In-Memory database can be configured to persist data at either regular intervals or at every write. In case of this
architecture though, regular interval updating makes more sense because none of the data we are storing is time-critical.
For example, if we are writing the time stamp for when the user/client paused a video. In this case, if the data in memory is
lost for some reason, and the pause time stamp is not updated in persistent storage, it still does not affect the usability of
the system. By making this decision we are prioritizing performance over consistency for non time-critical data
• For enabling fast searches, we can use custom Redis solutions that enable representing a prefix-tree / trie in the Redis key-
value format. Prefix trees allow for quick string searches. Prefixy is an implementation that allows for such a Redis mapped
prefix tree along with ranks for searches based on popularity.
Database
8. • When a new title needs to be uploaded to be served via the video streaming platform, we go through a series of steps to
make sure of seamless delivery of the content
• A newly uploaded title first goes through the process of transcoding ( converting the video into multiple formats and
resolutions )
• The processed videos are pushed to an object storage for persistent storage
• In order to facilitate fast and reliable streaming though, these videos/titles are then pushed to CDN servers around the
world to enable availability as close as possible to a client
New Content onboarding and delivery
CDN Object
Storage
Video
Transcoding
Video Upload
9. • Video transcoding is the process of converting a video into multiple formats
and multiple resolutions for each format
• This is required in order for providing seamless service even at low
performing internet speeds at the client side
• In our system the transcoding process is done with a Hadoop cluster
• When a title is uploaded, it gets queued in HDFS to be processed
• The title is broken down into multiple chunks to be processed by the
mappers
Video Transcoding
Queue
M2M1
R2R1
MapReduce
mp4
1080p
mp4
720p
mov
1080p
Hadoop Cluster
• Mappers convert the chunks of video into different video formats and resolutions
• These chunks are then re-stitched into a single title with the reducers and then written out along with extra meta-data
• The converted titles are then pushed to an object storage for persistent storage as mentioned before
10. • The recommendation system in our services uses AI/ML for recommendation
• The recommendation system uses 2 types of ML techniques
‾ Content Based
‾ Collaborative Based
• In the content-based system, the recommendations are calculated based on
each user’s watch and search history
• In the collaborative-based system, the recommendations are calculated based
on the results of multiple users and various trends
Recommendation Engine
Recommendation
Engine
• The recommendation system in our service would also take into consideration the times of the day, week or year, etc
• For example, which type shows are usually more viewed on weekends ( drama vs family show ) or during different times of
year ( Halloween time vs Christmas )
• The recommendation system then updates the in-memory database which holds the “landing page” for each user with the
personalized recommendations at regular intervals