See how the Otto.de team built a scalable and resilient logging solution and how they’re scaling Logstash, addressing housekeeping for Elasticsearch, and collecting usage metrics for analytics and billing.
3. Elastic{ON} Tour: Frankfurt 2018
OTTO – Number 1 in Fashion & Lifestyle*
07.11.18 3
*B2C mail order, GfK figures 2014
6.800
brands (in-house and many
premium third-party brands)
Over 2,8 million items
online
Huge product portfolio from fashion and
lifestyle to household appliances and
multimedia, DIY, kitchens, furniture and
toys
120
specialist catalogues
The only ‘big book’
company to make
the jump into the
digital world
6 specialist online shops
4. Elastic{ON} Tour: Frankfurt 2018
Business Domains Mirror the System
Architecture
Page Assembly
Tesla
ShopOffice
AfterSales
Search
P13N
Order
User
Authentication
Tracking
u.v.m.
Code ownership results in
high code quality
!
Continuous Delivery
permits more than 800
deployments per week.
!
Verticals develop and test
features fast and
independently
!
schematic presentation
Business phases of the order process frame the distributed and parallel development. The technical mirror to this phase
model allows highest possible flexibility in business concept development.
!
07.11.18 4
5. Elastic{ON} Tour: Frankfurt 2018 07.11.18 5
Step by Step Fragmentation of otto.de
Dedicated Monolith Vertical Architecture Micro Services & „Cloud Readiness“
I
n
d
e
xI
n
d
e
x
I
n
d
e
x
I
n
d
e
x
I
n
d
e
x
I
n
d
e
x
Serverless & Cloud
2011
df /var/log
2013
~3TB data
2015
~17TB data
2018
up to 42TB data
&
splunk >
6. Elastic{ON} Tour: Frankfurt 2018
DISTRIBUTED LOGGING @ AWS
a.k.a. „How to enable tenants to log data“
07.11.18 6
7. Elastic{ON} Tour: Frankfurt 2018
Requirements
• Security
• Encryption (at rest, in transit)
• Authentication & Authorization
• Isolation of resources („multi-tenancy“)
• Accessing other vertical‘s logs
=> Rethink classic operations model, become a service provider
07.11.18 7
8. Elastic{ON} Tour: Frankfurt 2018
Core Principles
• Multi tenancy
• Shared responsibility
• Security by design
• Automation (Goal: Provision the logging platform during lunch break)
07.11.18 8
10. Elastic{ON} Tour: Frankfurt 2018
Challenges
• AWS Cross Account actions
• (Near-)Realtime processing of logs
• Processing multiple data formats (JSON, Syslog ...)
• Queueing input data for failure scenarios
• Autoscaling Logstash
• Automation of Elasticsearch cluster management
• Keeping up with new features in the Elastic stack
07.11.18 10
11. Elastic{ON} Tour: Frankfurt 2018
ELASTIC CLOUD
ENTERPRISE
a.k.a. „How to provision tons of Elasticsearch and Kibana clusters“
07.11.18 11
12. Elastic{ON} Tour: Frankfurt 2018
How Elastic Cloud Enterprise Helped Us
• No need to build custom provisioning service for Elasticsearch
and Kibana clusters
• Provides security features via Elastic features (authentication,
authorization, integration with LDAP)
• Supports multiple Elastic stack versions
• Easy to set up a basic installation
• Customizable (stack packs, underlying EC2 instance)
• Extensive API
• Updates do not require downtime
• Multi-tenancy
07.11.18 12
18. Elastic{ON} Tour: Frankfurt 2018
Autoscaling logstash
07.11.18 18
• Autoscale between 30 and 110
Logstash containers in total
• Scaling based on demand and
cluster size
• Manual intervention possible
• consume unexpected peaks
• stopping ingest in failure scenarios
20. Elastic{ON} Tour: Frankfurt 2018
Curator as a Service
• Verticals provide configuration in Git
• Synchronized to S3
• Master-worker architecture based on AWS Lambda, CloudWatch and
SQS
07.11.18 20
21. Elastic{ON} Tour: Frankfurt 2018
Challenges
• Verticals are responsible for curator configuration
• Lack of knowledge (both Curator and Elasticsearch)
• Monitoring Curator
• Tracking origins of failed runs (misconfiguration, internal failure)
• Transparency for verticals
• AWS Lambda limits (max. 3 minutes runtime)
07.11.18 21
23. Elastic{ON} Tour: Frankfurt 2018
Learnings
• Spread knowledge about Elasticsearch with teams
• Create a sustainable knowledgebase
• Automation is essential
• Reduce operational overhead
• Have time to develop and introduce new features
• Know your I/O limits and requirements
• Scaling Logstash is not a trivial task
07.11.18 23
25. Elastic{ON} Tour: Frankfurt 2018
Next steps
• Develop custom Logstash pipeline management solution
• Move Housekeeping workers from AWS Lambda to AWS Fargate
• Evaluate Index Lifecyle Management via Elasticsearch
• Upgrade to ECE 2.0
• Leverage potential of ECE & Elastic features
• Tenants use and know about Machine Learning & APM
• ECE 2.0 features
• When available: cross-cluster search
07.11.18 25
30. K2 - Cloud Readiness 08.05.2018 30
Autonomous and team specific utilization of
technology.
Decoupling of interfaces and system components
reduces architecture complexity and
interdependences especially with service
components.
Perimeter protection is replaced by an integrated
security concept, reducing vulnerability and allowing
customised implementations.
Cloud migration allows new sourcing and
scaling models