In a brisk presentation, we introduce Zenko, the multi-cloud data controller. Zenko is a stack of microservices (Docker containers) deployed via either Kubernetes or Docker Swarm.
The presentation comprised a demo deployment, so make sure to check out the video.
There is a lot of extra material at the end of the presentation going more in depth in topics that were only touched on briefly during the presentation.
Docker Meetup Tokyo #23 - Zenko Open Source Multi-Cloud Data Controller - Laure Vergeron
1. DOCKER TOKYO - 2018-05-15
Zenko: the
Multi-Cloud Data
ControllerWe enable people who create value with data
by making AWS storage cheaper
Laure
Vergeron
Technology Evangelist
R&D Engineer
3. Agenda
1 - What is multi-cloud?
2 - Zenko: introduction to the multi-cloud data controller
3 - Zenko Orbit: introduction to the multi-cloud management UI
4 - CloudServer standalone: a simple local S3 endpoint
5 - Zenko S3 API: basically just AWS S3 API
6 - Zenko Enterprise Edition: coming later in 2018
7 - Zenkommunity
8 - Demos!
6. ▪Content Distribution
▪ Media companies have tens of thousands of movies, which they store on Private Cloud for
control. When it is time to publish a movie, it makes sense to copy it to a public cloud to use
its transcoding and CDN services.
▪Compute Bursting
▪ Banks have to do risk analysis leveraging thousands of CPU every night. These intense
computation only run for a few hours. Rather than having idle servers for the rest of the day, it
makes sense to use Public Cloud services for the computation
▪Analytics
▪ E-commerce company do more and more machine learning on their very large data lake.
Rather than setting up Hadoop infrastructures in-house, the company can copy just a data set
to an Hadoop cloud, compute the appropriate algorithm, and get back the result and destroy
the cloud copy of the data to save on storage cost.
▪Long-term Archival / cold storage
▪ While storing data which is regularly accessed is cheaper in a private cloud, long term archive
of never accessed data is cheaper in long term archive cloud offering. Automatic archival of
never accessed data would save a lot of money.
Examples of Use-Cases for Multi-Cloud
12. • Single Interface to any Cloud
▪ S3 API as a single API set to any cloud
• Allow reuse in the Cloud
▪ Maintain the native cloud format
• Always know your data and where it is
▪ Metadata search
• Trigger actions based on data
▪ Data Workflow to manage replication, location
The Zenko Multi-Cloud Data Controller
12
14. Zenko: one endpoint, one API, one namespace, any cloud in native format
● One API: the S3 API
● One endpoint: your Zenko
endpoint
● Any cloud
currently supported: Google Cloud
Platform, Microsoft Azure,
Amazon S3, Wasabi, Digital
Ocean Spaces, Scality RING,
local storage
● Data stored in native format:
use services native to your
cloud; Zenko does not lock
you in!
15. METADATA
DATA STORAGE
DMD REST/Sproxyd AWS S3 API CLOUD API
Shared Local
Storage
S3 API
APP
METADATA
APP
S3 CALLS
Zenko Open Source
S3 API—Single API set and 360° access to any cloud
Native format—Data written through Zenko is stored
in the native format of the target cloud storage and
can be read directly, without going through Zenko.
Backbeat, your data workflow manager —
Policy-based data management engine
Leveraging MongoDB for metadata search—
Agregated metadata across all clouds in a replicated
MongoDB gives a search tool for optimal data insight
HA/Failover – Deployed as multiple containers for
resilience via metal-k8s
Simple Security – single-tenant credentials
managed locally
S3 API
S3 CALLS
METADATA DATA
MONGO DB
Metadata Search
Bucket LOCATION
BACKBEAT
Data Policy Engine
Bucket LOCATION
CRR/DATADATA
Zenko: a stack of microservices
17. Zenko: Orbit management UI
● https://admin.zenko.io
● Free to use until 1TB
● Can either
○ get a sandbox
○ register own
instance
● Available as a
pay-as-you-go service for
large capacities
● No support with free
edition
20. 20
CloudServer: What?
• S3 API served in a Docker Container
• Written in Node.js
• 100% open source (Apache 2.0)
• S3 stands for Simple Storage Service. The S3
API provides a simple interface used to store
objects in buckets.
• Single AWS S3 API interface to access
multiple backend data storage both
on-premise or public in the cloud.
21. Open Source Scality CloudServer Adoption
▪Launched June 2016
▪Open-source
implementation of AWS S3
API
▪Code available on Github
under Apache 2.0 license
▪Packaged in Docker
container for easy
deployment
▪Seamless upgrade to S3
Connector for the RING
Now Over
1,000,000
25. • Zenko EE is not yet available
– We are doing our first beta deployments
at large American customers
– We are looking at Q1 2019 for GA
• Zenko CE is readily available, with
community support
– https://forum.zenko.io
– Plenty of documentation on readthedocs
– Send an email to your SE or to
zenko@scality.com
• Zenko Orbit is available pay-as-you-go
– First TB of data managed is free
25
Zenko EE : COMING SOON! Just not yet...
27. Community Meetups
• Initiated prior to our S3 Server launch
• At Docker Tokyo on May 15th
• At Scality Tokyo Open Source Night on May 16th
• Participating at open source events
for Docker, Nodejs, etc...
Developer Hackathons
• Paris and San Francisco in 2015-2018 ; maybe Tokyo next?
• Co-sponsoring with partners – focused on a specific project goal
(e.g., IP Drives, Backblaze integrations)
• Great for building visibility & community participation
27
Building a Developer community
28. How can I get involved with Zenko?
• Let us know what you do with Zenko stack!
▪ zenko@scality.com
▪ Get your project/company featured on the website in a quote
• Contribute tutorials
▪ Get a blogpost featuring your introduction of your tutorial
▪ Become part of our readTheDocs hosted documentation
• Contribute code
▪ It’s an opportunity to drive the roadmap with us !
▪ Join the team and be part of the Zenko craze !
▪ We have Contributing Guidelines on the GitHub repos, and we’ll answer your
questions via GitHub issues or our forum forum.zenko.io
• Meet us at AWS Re:invent, DockerCon, Meetups...
▪ All info is on www.zenko.io
28
30. 30
Ready to join us?
• Create an account on our Forum
• http://forum.zenko.io/
• Clone Zenko, and its microservices
• https://github.com/scality/Zenko/
• https://github.com/scality/S3/
• https://github.com/scality/backbeat
• Install s3cmd and AWS CLI
• Read the docs ;)
• Start with a minikube deployment
• https://github.com/scality/Zenko/blob/master/charts/minikube.md
• Reach out on the Forum as you have questions
• Try a full bare-metal-kubespray deployment
• https://github.com/scality/Zenko/tree/master/charts
31. DIY and Demo
Get your own Zenko sandbox
Deploy Zenko on Minikube and register it on Orbit
36. Zenko: one namespace, any cloud
Zenko local Zenko to Public Cloud AWS
Region / Location Region Public Cloud “bucket” Region
Bucket Bucket Prefix in public cloud bucket Bucket
$ aws --profile zenko s3 mb s3://remote-bucket --region aws-zenkobucket
make_bucket: s3://mybucket
$ aws --profile zenko s3 cp /etc/hosts s3://remote-bucket/test
upload: /etc/hosts to s3://remote-bucket/test
$ aws --profile zenko s3 ls
2018-05-14 17:08:50 remote-bucket
$ aws --profile zenko s3 ls s3://remote-bucket
2018-05-14 17:09:18 235 test
$ aws --profile aws s3 ls
2018-05-14 16:00:53 zenkobucket
$ aws --profile aws s3 ls s3://zenkobucket
PRE remote-bucket/
2018-05-14 17:09:18 235 test
37. Zenko: a stack of microservices
$ docker stack services ls
ID NAME MODE REPLICAS IMAGE PORTS
1j8jb41llhtm zenko-prod_s3-data replicated 1/1 zenko/cloudserver:pensieve-3 *:30010->9991/tcp
3y7vayna97bt zenko-prod_s3-front replicated 1/1 zenko/cloudserver:pensieve-3 *:30009->8000/tcp
957xksl0cbge zenko-prod_mongodb-init replicated 0/1 mongo:3.6.3-jessie
cn0v7cf2jxkb zenko-prod_queue replicated 1/1 wurstmeister/kafka:1.0.0 *:30008->9092/tcp
jjx9oabeugx1 zenko-prod_mongodb replicated 1/1 mongo:3.6.3-jessie *:30007->27017/tcp
o530bkuognu5 zenko-prod_lb global 1/1 zenko/loadbalancer:latest *:80->80/tcp
r69lgbue0o3o zenko-prod_backbeat-api replicated 1/1 zenko/backbeat:pensieve-4
ut0ssvmi10tx zenko-prod_backbeat-consumer replicated 1/1 zenko/backbeat:pensieve-4
vj2fr90qviho zenko-prod_cache replicated 1/1 redis:alpine *:30011->6379/tcp
vqmkxu7yo859 zenko-prod_quorum replicated 1/1 zookeeper:3.4.11 *:30006->2181/tcp
y7tt98x7jdl9 zenko-prod_backbeat-producer replicated 1/1 zenko/backbeat:pensieve-4
[...]
Zenko: Multi-Cloud Data Controller
Cloudserver Backbeat Utapi
S3 API
Multi Cloud API translation
Event-driven data manager
Replication engine
Usage Stats
Custom node.js Kafka- & Zookeeper- -based Redis-based
Bare-Metal Kubespray: custom deployment of Kubespray
39. CloudServer implements the AWS S3 Bucket Versioning API
• Create a versioned Bucket (PUT Bucket Versioning) – enables Bucket to maintain object versions
• If an object with the same key is PUT or DELETED, it becomes the current version
– DELETE marker used for indicating current version deleted as per AWS semantics
– Assigns version IDs for older versions
Enables Data Restores
• Access to previous states/versions of an object (GET object with specified version ID)
Required for both CRR & Lifecycle Management APIs
• As specified in S3 API
• When writing to AWS S3, the target bucket must have
versioning enabled!
39
Zenko S3 API: Bucket Versioning
40. ● When versioning is enabled on a bucket:
● CREATE NEW VERSIONS:
○ Put Object, Complete Multipart Upload and Object Copy (to a versioning-enabled bucket) will
return a version id in the ‘x-amz-version-id’ response header.
○ No special syntax necessary.
● When versioning is enabled or suspended:
● TARGETING SPECIFIC VERSIONS:
○ Include the version id in the request query for GET/HEAD Object or PUT/GET Object ACL
■ Example: `GET [bucket]/[object]?versionId=[versionId]`
○ For Object Copy or Upload Copy Part, to copy a specific version from a version-enabled
bucket, add the version id to the ‘x-amz-copy-source’ header:
■ Example value: `[sourcebucket]/[sourceobject]?versionId=[versionId]`
○ Omitting a specific version will get the result for the latest / current version.
Zenko S3 API: Bucket Versioning
41. ● When versioning is enabled or suspended (cont.):
● NULL VERSIONS:
○ Null versions are created when putting an object before versioning is configured or when
versioning is suspended.
■ Only one null version is maintained in version history.
New null versions will overwrite previous null versions.
○ Target the null version in version-specific actions by specifying the version ID ‘null’.
● DELETING OBJECTS:
○ Regular deletion of objects will create delete markers and return ‘x-amz-delete-marker’: ‘true’
and the version ID of the delete marker in ‘x-amz-version-id’ response headers.
○ Objects with delete markers as the latest version will behave as if they have been deleted when
performing non-version specific actions.
○ Permanently remove delete markers or specific versions by specifying the version ID in the
request query. Example: `DELETE [bucket]/[object]?versionId=[versionId]`
Zenko S3 API: Bucket Versioning
42. ● When versioning is enabled or suspended (cont.):
● MULTI-OBJECT DELETE:
○ Specify the specific version of an object to delete in a multi-object delete request in the XML
body. Example: http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html
● At any time:
● LISTING OBJECTS:
○ A regular listing will list the most recent versions of an object and ignore objects with delete
markers as their latest version.
○ To list all object versions and delete markers in a bucket, specify ‘versions’ in request query:
■ Example: `GET [bucket]?versions`
○ FMI about output: consult S3 Connector documentation
● GET BUCKET VERSIONING STATUS: use Get Bucket Versioning API.
Zenko S3 API: Bucket Versioning
43. ● Utapi can be accessed through a REST API with service available on a dedicated port
● API routes use AWS Signature Version V4 for authentication
● API calls for listing metrics can use account credentials or create an IAM user with policy allowing
to list metrics
● Metrics are collected in 15 minute intervals (not configurable)
● Requests for listing metrics use POST routes and require at least start time and a list of resources
(accounts/buckets/users)
● Refer to wiki here for listing metrics https://github.com/scality/utapi#listing-metrics-with-utapi
Extended S3 API: UTAPI: UTilization API
44. S3 Buckets with associated Location
•Assigned as an optional request parameter
“LocationConstraint”
•In PUT Bucket API command, application can
specify a location for each Bucket
•Enables S3 Connector to manage Buckets across
multiple RINGs for scaling or access to multiple DC’s
•Enables Zenko to access multiple Public Clouds
Location Mapping
- Configuration file to manage multiple location to
multiple backends mappings
- Defines the default location for object PUTs
- Object GET Access is transparent
44
Zenko S3 API: Bucket Location Control
46. Zenko Multi-Cloud Async Replication - CRR
Remote disaster recovery (DR) for WAN environments
- Follows the AWS S3 API “cross-region replication” (CRR) API
- Async bucket replication: source Bucket -> target Bucket
- Versioning must be enabled on source & target
Target bucket in S3/RING
- CRR to remote S3/RING – in current release
CRR Features:
- Full site sync
- Bucket to bucket sync
- Monitoring statistics (throughput, backlog, RTO/RPO)
- Failback
- CRR from 1 region to many others (one public cloud to several others)
46
50. 50
File or object?
Why we do file:
- We know it
- Easy hierarchy
- fopen() and fclose()
- Lots of best practices
- Perf of NAS / over LAN
Why we do object:
- Billions of entries
- Storage accessed over
WAN
- For modern apps (REST)
- Listing large volumes
52. 52
CloudServer tree structure CheatSheet
S3/locationConfig.json and S3/config.json - setup your own endpoints
S3/lib/server.js - your entrypoint into the service
Arsenal/lib/s3routes/routes/*.js - S3 route calls
S3/lib/api/{{yourAPIcommand}} - S3 API calls
S3/lib/data/wrapper.js & multipleBackendGateway.js - gateway to external
clients
S3/lib/data/external/*Client.js - current clients
S3/conf/authData.json - setup your own credentials
Arsenal/lib/storage/metadata/* - check how metadata works
53. These commands assume you have S3 cloned locally, s3cmd configured for your S3 server, AWS
cli configured for a real AWS bucket, and your locationConfig set up
- START SERVER:
S3BACKEND=mem S3DATA=multiple npm start
- MAKE BUCKET:
s3cmd mb s3://[bucket-name]
- PUT OBJECT TO SPECIFIC LOCATION:
s3cmd put [/path/to/file] s3://[bucket-name]/[object-name] --add-header
x-amz-meta-scal-location-constraint:‘[location-name]’
- LIST OBJECTS IN BUCKET:
s3cmd ls s3://[bucket-name]/[object-name]
- GET S3 OBJECT METADATA:
s3cmd info s3://[bucket-name]/[object-name]
- IF PUT TO AWS, LIST OBJECTS ON AWS:
aws s3api list-objects --bucket [bucket-name]
Start S3 Server & Put Object Commands