The role of AWS in the Datalandscape of a fast growing Startup
1. THE ROLE OF AWS IN THE DATALANDSCAPE
OF A FAST GROWING STARTUP
September 2020
2. 2
Cluno is “your car“
without the hassle of
really owning it
CLUNO IS YOUR ONE-STOP
SHOP.
* The following services are included in the monthly package price: car registration, liability insurance, partial and fully comprehensive insurance with deductible, car tax, GEZ fees, maintenance,
winterfit tires, inspections and general inspections. The monthly package price does not include: fuel, electricity, AdBlue, windscreen washer fluid, motor oil.
Your monthly subscription fee includes
everything.*
Just drive – Cluno takes care of
everything else.
3. 3
Cluno gives you the
freedom to walk away
anytime
ALWAYS DRIVE THE CAR THAT
FITS YOUR DAY-TO-DAY LIFE.
Cluno is highly flexible:
Drive as long as you want with a 6-
month minimum term per car.
4. 4
Cluno turns car ownership
into a superior, digital
user experience
CLUNO IS YOUR LIFETIME
MOBILITY COMPANION.
Subscribe in 3 minutes:
Get approved once (ID and solvency
check) and sign via the app or web.
5. 5
Importance of a data platform already in early stage of company
High complexity through
- Many services
- Flexibility to customer
- Digital experience
Requirements for growing fast and
successful
- Taking the right decisions
- Act/react fast
- Learn fast
Data culture
- Data driven decisions
- Data platform which gives insights, helps making data
driven decisions and automates complex processes by
data products
Requirements to data platform
- Scalable – more data points/data sources/consumers
- Flexible - additional use cases/requirements
- Low initial costs in early startup phase
- Quick value
Challenge Solution
6. 6
Initial thoughts on Architecture
Store
Data Platform
Ingest ServeData Producers
Self Service
Technical Consumers
Business
Consumers
Data lake
Topic Queue
BI Service
Endpoint
AI
7. 7
Step 0: Most pragmatic spike
Store
Data Platform
Ingest ServeData Producers
Observing
Technical Consumers
Business
Consumers
Excel tables
Manual download
PowerPoint slides
Manual import
CSV files
SQL DB
8. 8
Step 1: Automate ingest and storage of data
Store
Data Platform
Ingest ServeData Producers
Observing
Technical Consumers
Business
Consumers
PowerPoint slides
Manual import
CSV files
SQL DB
AWS Lambda
Amazon
S3
9. 9
Step 1: Automate ingest and storage of data
StoreIngest Serve
WHAT
- Lambdas import full snapshot of datasource 1x per day
- S3 as storage for raw CSV files and source for futher
analysis
WHY
- Lambda
- managed service (fast start)
- pay as you use (free tier)
- S3
- cheap in costs
- batch storage
- filebased external tables (enabled by glue
metastore)
- CSV
- lowest development time
RESULT
- Automated import
- Single datasource
- Faster Analysis
- More time for Analyst which can be used for building
reports
Newly added services
Data Platform
10. 10
Amazon Quicksight
Step 2: Accessibility for other users
Store
Data Platform
AWS Lambda
Amazon
S3
AWS
Glue
Amazon
Athena
Ingest ServeData Producers
Technical Consumers
Business
Consumers
Observing
Manual import
CSV files
SQL DB
11. 11
Step 2: Accessibility for other users
StoreIngest Serve
WHAT
- Parquet files saved in S3 instead of raw CSV files
- Glue used as metadata storage
- Athena as SQL interface
- Quicksight as dashboarding tool
WHY
- Parquet
- Structured format, on the fly to develop
- Memory efficient (columnar storage)
- Glue + Athena
- Automatic schema definition
- Fast setup (serverless, SQL connector)
- Quicksight
- Pay as you use
- Fast setup (connector to athena)
- In-memory optimized calculation engine (SPICE)
RESULT
- Further automation (schema, dashboards)
- Faster results (dashboards update every morning)
- More time for Analyst which can be used for deeper
analysis
Newly added services
Data Platform
12. 12
Automated Request
Amazon API Gateway
Tableau
Main Dashboards
Step 3: Improve Accessibility for other users
Store
Data Platform
AWS Lambda
Amazon
S3
AWS
Glue
Amazon
Athena
Ingest ServeData Producers
Technical Consumers
Business
Consumers
ObservingSQL DB
Amazon Athena View
13. 13
Step 3: Improve Accessibility for other users
StoreIngest Serve
WHAT
- Views created on top of athena tables
- Tableau instead of Quicksight
- API Gateway as interface for data consumers
WHY
- Views
- Analysts can own views
- Analysts are better aware of needed business
logic
- Tableau
- Split of metadata and data (replacing datasets,
sharing calculated fields)
- Vizualization possibilities (personalization on CI,
wider range of diagrams)
- Bigger community
- API Gateway
- Machine readable insights from external network
to network of datasources
RESULT
- Datasets and Insights generated closer with teams,
where business knowledge is
- Scalable architecture for creating dashboards
- Automated processes (for website or internal tools)
Newly added services
Data Platform
14. 14
Step 4: Getting realtime input
Tableau
Main Dashboards
Amazon DynamoDB
Amazon Kinesis
Data Firehose
Amazon SNS
Store
Data Platform
AWS Lambda
SQL DB
Amazon SQS Amazon
S3
AWS
Glue
Amazon
Athena
Amazon Athena View
Ingest ServeData Producers
Technical Consumers
Business
Consumers
Amazon API Gateway
Automated Request
Observing
15. 15
Step 4: Getting realtime input
StoreIngest Serve
WHAT
- SNS/SQS as event hub
- Lambda as node and for deduplication
- DynamoDB with most recent records
- Kinesis as batch stream to save change events in S3
WHY
- SNS/SQS
- One producer to many consumers
- Serverless
- Scalable
- Lambda
- One node instead of multiple queues
- Logic can be included (deduplication)
- Kinesis
- Collect data until processing to S3 file
- DynamoDB
- No bigger adjustments in data structure needed
(similar to Athena)
RESULT
- Realtime DB can be used from existing APIs for realtime
data products
- Base for more granular data points (change of
dimensions)
Newly added services
Data Platform
16. 16
Amazon API Gateway
Packaged Code
Step 5: Redefine storing and serving layer
Tableau
Main Dashboards
Amazon DynamoDB
Amazon Kinesis
Data Firehose
Amazon SNS
Store
Data Platform
AWS Lambda
SQL DB
Amazon SQS Amazon
S3
AWS
Glue
Amazon
Athena
Amazon Athena View
Ingest ServeData Producers
Tableau
Self Service
Technical Consumers
Business
Consumers
17. 17
Step 5: Redefine storing and serving layer
StoreIngest Serve
WHAT
- Metrics calculated iteratively and dimensions get
historized with realtime events
- Enablement in other teams to create own Tableau
dashboards
- Packaged code provided instead of API with logic
WHY
- Events as source for metrics
- Scalable
- Realtime history
- More granular information about changes
- Tableau enablement
- Data ressources no bottleneck
- Packaged code
- Ownership on expert domain only
RESULT
- More granular insights
- Faster decisions (realtime and self service)
- More stable data product environment
Newly added services
Data Platform
18. 18
Next step: Data platform as service platform
Tableau
Main Dashboards
Amazon DynamoDB
Amazon Kinesis
Data Firehose
Amazon SNS
Amazon API Gateway
Store
Data Platform
AWS Lambda
SQL DB
Amazon SQS Amazon
S3
AWS
Glue
Amazon
Athena
Amazon Athena View
Packaged Code
Ingest ServeData Producers
Tableau
Self Service
Technical Consumers
Business
Consumers
Owner
Business logic
Analysts
Independency
Data Engineers
19. 19
Lessons learned
Priority
- the earlier the stage in an area, the more important is quick value
- the later the stage, the more important is clean architecture to stay flexible and scalable
Ø constantly switch from generating new features to structuring the architecture around
Toolset
- you constantly need to reevaluate you choosen toolset
- it can make sense to implement a tool, knowing that you will decommission it later
- keep in mind that tools will change
Ø don‘t commit too much to one tool
Team setup
- move from centralised team with experts to crossfuntional teams as soon as teamsize and maturity big enough
- move ownership to business teams as soon as possible
Ø ownership needs to be where the business knowledge is
20. JOIN US FOR THE RIDE
Max Ehrlich
Head of Data
max.ehrlich@cluno.com Cluno GmbH
www.cluno.com