This was presented at aws Summit Bahrain 2019.
Holidayme's journey to microservices and serverless in under 18 months. Entirely on aws using API Gateway, Lambda, X-Ray, CloudWatch and Metrics and many more.
3. Efficiency
Dev & QA reduced by 2X
Visibility
More metrics and
dashboards
Automation
CI/CD without 3rd party
tools
Ownership
Single responsibility from
Dev to Production
Predictability
Out of Box Events and
Alarms
5. VPC
AWS Cloud
Availability Zone 1
UI Servers
Auto Scaling group
Availability Zone 2
API Servers
Auto Scaling group
EC2 Instance EC2 Instance
EC2 Instance EC2 Instance
ALB
ALB
CloudFront S3
Clients
ElasticCache
RDS
DynamoDB
Static Content
Apps/Affiliates Our Architecture in 2017
6. OUR SDLCWe went deeper into
understanding development
to deployment cycle
7. Scattered Focus
Monolithic Approach
Inversed QA Efficiency
More time was spent to test
functionalities which were not
modified
Siloed Responsibilities
Developer: My code works fine,
Admin must have done the
wrong deployment
Jinx
Do not touch those
files/settings. Server will stop
working
12. HOLIDAYS PLATFORM REVAMP
Tight delivery schedule of
3 months from scratch
Finalized to go ahead with
Lambda and API Gateway
Used aws X-Ray for
performance bottlenecks
Cloudwatch Metrics and
Cloudwatch logs
13.
14. Delivered Entire Product 8
days early
50+ Microservices Latency under 950ms Inbuilt Monitoring and APM
FINAL DELIVERY
15. IT WAS JUST THE
BEGINNING
We were not only able to
reduce our go-to market
time, but improved
efficiencies on multiple
fronts
17. USE CASE: IMAGE OPTIMIZATION
80M +
IMAGES
50+ TB DATA DIFFERENT DEVICES 4 DIFFERENT
RESOLUTIONS
COMPUTE AND TIME
INTENSIVE
18. IMAGE OPTIMIZATION WORKFLOW
AWS Cloud
Client Amazon CloudFront API Gateway Lambda S3 Bucket
Original High
Resolution Image
Optimized and
Resized Image
19. USE CASE: RETRY BOOKING
LARGE CODE BASE CUSTOM BUSINESS
LOGIC
SCHEDULED JOBS BUGS CUSTOM
NOTIFICATIONS
20. RETRY BOOKING WORKFLOW
Client
Push booking
transaction
Push notification
on success
Retry n times
SQS Lambda
Push after n retries
SQS DL
Notify on failure
SES
AWS Cloud
SNS
21. API Gateway Lambda Systems Manager SQS SNS SES Kinesis
Kinesis Firehose S3 Cognito IAM DynamoDB CloudTrail EC2
ElasticCache CloudWatch CloudFormation Route 53 X-Ray Step Functions ALB
Budgets EventBridge CloudFront
22. Aws Summit Mumbai
May 2017
Migrated all internal services to
lambda
CloudWatch Events and
Dynamodb
Aug. 2017
Initiated entire product on
Lambda + API Gateway
Jan. 2018
Delivered Holidays product 8
days early
Integrated with X-Ray
Mar. 2018
Cloudformation Templates
June 2018
Fully integrated CI/CD pipeline
Oct 2018
Event Driven Design pattern
SQS, SES, Lambda Authorizer,
S3
Feb 2019
500+ Microservices
100% serverless
May 2019
23. WHAT'S NEXT
We are planning to leverage
aws Bahrain region to offer
low latency for all our
Middle East customers.
Today I am going to share with you our serverless journey.
And how it all started along with some of the use cases which transformed the way we think and build.
Now when we look back in time and compare how we used to do things before serverless and now, There is a clear distinction in these 5 areas.
We are able to release features more quickly
Have more visibility without the need of 3rd party integrations
Fully automated CI/CD pipeline
Developers have end to end ownership from development to production
And better proactive insights to take action before things turning ugly
So, Why it all started?
For a very simple reason, Business wants to run much faster then what any engineering team could deliver.
And this got us thinking to relook at how we do things, how can we improve our architecture and increase speed of delivery.
This was our architecture in 2017, Pretty standard!
Separate EC2 instances for UI and API all inside VPC and Auto Scaling Groups sitting behind Application Load Balancers.
Honestly we were not able to identify any problem with this architecture. It was fine!
So we decided to go deeper and look into our Software Development Life Cycle
We started analysing day to day activities from development to QA to Deployment on production. The kind of challenges we faced.
In no time, we were able to identify 4 areas where most of the time was getting wasted. And I am sure even you would be able to corelate to these problems
Since there were too many moving parts, developers had to juggle between creating new features and making sure existing keeps on working.
For even a small change, QA efforts were increasing as they were required to not only test new features, BUT also other features even though nobody touched them because they were part of single deployment package.
And this is what we called Inversed QA Efficiency. If no bugs were identified in testing those untouched features, It was waste of effort as we could have gone live even without testing those.
And I am sure you must have heard this many times, Any bug in production, Developer would say, my code is running fine on my system. The deployment guy must have done something wrong.
So we decided to address these problems one by one and Microservices looked the right approach to atleast solve our initial two problems. As this would enable focused development approach for developers and QA will be required to test only those services which are modified. Removing the need of testing untouched features.
I am sure you all must be aware of difference between Monolithic and Microservice approach. But still for reference I pulled this up.
And this is the exact reference I used to explain my engineering team why we need to shift to Microservices Architecture.
Now the challenge was which framework to use? How to create our Microservices framework?
I was attending aws Summit Mumbai back in 2017 and Mr. Werner Vogels was presenting the keynote. And there was a Eureka moment where I realized that Lambda now has support for .net Core. Since .NET is our primary development language, we could start testing Lambda out and this will not only solve our problem of which microservice framework to choose, but also will open the doors for serverless world!
We took our initial architecture and replaced just two things. API Gateway instead of Load Balancer and individual lambda function for each feature instead of EC2 instances.
BUT still all this was in theory. Why? Just imagine going to business guys and saying.
Hey look, we have come up with new fancy architecture and for next few months we won’t be able to take any new features. What do you think the answer would be?
It was in around Christmas of 2017 where we asked to revamp our entire Holidays platform with aggressive timeline of 3 months.
So the engineering team decided that this is the best opportunity to try on what we believe and lets deliver this project using Lambda, API Gateway for microservices, X-Ray for application monitoring and CloudWatch for logs and metrics.
And we delivered it before time.
Now we had a production application using microservices and serverless.
And a proof to the business that this new architecture will help increase our efficiency.
But this was just the beginning of what we could achieve.
And within 18 months we were able to completely transform all our monolithic applications to serverless and also improve on these 6 verticals.
In these 18 months, we were also able to take benefit of Event Driven Design pattern using aws services and made more robust and easy to maintain applications.
There are few use cases which I’ll share with you.
To convince customers, Online Travel relies heavily on good quality images. And in our case at the time, we hade more than 80 Million images with over 50 TB in storage size.
We were required to make a balance as when to show high quality image and when to show small thumbnails or device specific image.
This required to resize and optimize every single image to 4 different sizes. This would mean over 320 Million additional images in different resolutions. But it would require lot of time and compute.
So we leveraged out of box features provided by aws.
We solved the problem using this architecture. We were not required to resize every image before hand, but made it on demand.
In Online travel there are times when a customer booking fails due to various reasons. So we had created a flow to retry booking for certain number of times and act accordingly.
Trust me it is not simple and we had written 1000s lines of code which obviously had bugs. We had to configure schedulers to constantly monitor these scenarios.
These are the list of aws services that we are currently using and counting.