2. Principal Data Architect at Home24
Data Services: Search, Recommendations, Ranking
Worked on: Here Maps, Sapo.pt, DataJet, Xing, …
Scala, Perl, Prolog, Java, SQL, R, …
AWS: Step-Functions, Lambda Function, EMR, EC2,
Batch, SQS, SNS, Firehose, Athena, API Gateway, ...
5. ● 15 persons of 12 Nationalities
● Serverless Lovers. For data ingestion we have:
● AWS Technologies: Step-Functions, Cloud-Formation, Lambda Functions,
Athena, EMR, Redshift, S3, ...
Production Development
Number of Lambdas 625 2311
Number of Step Function 113 490
Consumed time (a month) 3,383,525 sec (39 days) 5,371,037 sec (62 days)
Number of requests (a month) 2,014,203 Requests 3,300,118 Requests
6. ● Majority of our Streams are low rate messages
● The Big Stream doesn’t have an easily predictable rate of
messages and can peak to 100 messages/sec
● We will have many more low rate Streams
7. Main requirements
● Store new Stream Data in Raw S3 Bucket
● Refine Raw S3 Bucket data to a Refined S3 Bucket
● Wrong formatted messages shall not stop the flow
● Notification shall be sent on bad data
● Data must be refined in less than 10 minutes
Other
● Able to replay many days of data fast
● For development, every developer shall be able to deploy his version
independently
8. Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
9. Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
10. Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
● A Lambda copies the data from the SQS to a
Firehose
● The Lambda Function is invoked once a
minute via CloudWatch Event
11. Requirements
● Collect data from SNS
● The data must be stored as received in S3.
● Files size must be easy to process on
Lambda (< 10MB)
● At least 1 file per minute must be created
Architecture
● A SQS Queue collects all data from the SNS
● A Lambda copies the data from the SQS to a
Firehose
● The Lambda Function is invoked once a
minute via CloudWatch Event
● Firehose merges the data and creates files
on Raw S3 Bucket
13. Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
14. Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
15. Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
● Non empty Dead-Letter SQS means
there is an error on the data
16. Requirement
● When some message are not
processable, send a notification.
Architecture
● The data is deleted from the SQS
Queue after successful copy to
Firehose
● On case of error, the messages will
end on the Dead-Letter Queue
● Non empty Dead-Letter SQS means
there is an error on the data
● After fixing the Lambda function, one
can always copy the messages back
to the Raw SQS
17. Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
18. Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
19. Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
20. Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
● A file with the same key, as Raw file, is
created on the Refine S3 Bucket
21. Requirements
● Decompress data (zip, deflate, gz,
base64, ...)
● Normalize fields (dates for example)
● Add metadata
● Convert all to JSON
● Stored on S3
Architecture
● When a new file is created on Raw S3
Bucket a message is sent to SQS via
SNS
● The Lambda Function is invoked once
a minute via CloudWatch Event and
process all unprocessed files
● A file with the same key, as Raw file, is
created on the Refine S3 Bucket
● Messages that fail to process will end
on the Dead Letter Queue
23. Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
24. Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
25. Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
● The execution time of the Refiner
Lambda will rise and the Refiner
Lambdas will work in parallel
26. Requirements
● Replay multiple days of data
Architecture
● Lambda Function List files on the
Raw S3 Bucket and send
messages to SQS
● Since the files in Raw and Refine
have the same key, the files will
always overwrite the existing ones
● The execution time of the Refiner
Lambda will rise and the Refiner
Lambdas will work in parallelParallelism:
● our Lambda goes to ~190 sec, 3 lambdas
running in parallel.
● 9198 S3 objects
● 30 GB of GZip data, 10GB/hour
27. Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
28. Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
29. Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
● SNS can write to multiple
SQS
30. Requirement
● Developers shall be able to
deploy their Stream
Processors
● No interaction with external
team shall be required
Architecture
● We created an internal SNS
where we clone the external
messages
● SNS can write to multiple
SQS
● Same CloudFormation magic
and every developer can
deploy his own Environment
31. EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
32. EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
33. EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
Scale Scale while it has credits to 1
vCPU. To have more vCPUs you
need to use more expensive
instance types or implement
autoscaling
Out of the box until a certain
level.
2 vCPU * 5 Lambdas = 10
vCPUs
34. EC2 Lambda
CPU /
Price
1 t2.nano (5% vCPU and 500MB)
0.0063*24*30 = 4.536$/month
Considering 3 seconds a minute
with the highest memory (2
vCPU and 1536 MB)
3*60*24*30*10*(0.000002501+0
.0000002) = 3.5$/month
Devops Higher Low
Scale Scale while it has credits to 1
vCPU. To have more vCPUs you
need to use more expensive
instance types or implement
autoscaling
Out of the box until a certain
level.
2 vCPU * 5 Lambdas = 10
vCPUs
Price wise, lambda seems a good solution. For our problems, 10 vCPUs is
clearly more than enough.
35. Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
36. Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
Fast stream 3 Shards 36.7$/month
Puts 1.1$/month
Requests
51.8$/month
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
● Fast Stream: 25 message/second (64.8 million requests/month)
with spikes of 100 message/second
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
37. Kinesys SQS
Slow stream 2 Shards 24.5$/month
Puts 0.042$/Month
Requests
2.07$/month
Fast stream 3 Shards 36.7$/month
Puts 1.1$/month
Requests
51.8$/month
Errors Errors have to be controlled
externally
Errors go to
DeadLeter Queue
We analyze our 2 types of stream of data:
● Slow Stream: 1 message/sec (2.6 million requests/month)
● Fast Stream: 25 message/second (64.8 million requests/month)
with spikes of 100 message/second
On SQS you pay PUTs and GETs on Kinesys you pay PUTs
38. ● You just pay for what you use
● Scalability is not an issue at our messages volume (top 100
messages/second)
○ SQS and Firehose can easily process that volume of messages
○ Multiple Lambdas can work in parallel in case of high traffic or
replay.
● Separated Lambdas by Stream help understanding the logs
● Separated environments simplify developers work
● Data is on S3 and it can be queried via Athena, EMR, Redshift
Spectrum, ...