DataStax Enterprise - La plateforme de base de données pour le Cloud
1. DataStax Enterprise
La plate-forme de base de données pour le Cloud
Victor Coustenoble Matinée Cloud / Google, WeScale, DataStax,WattGo
Ingénieur Solutions 29/05/15
victor.coustenoble@datastax.com
@vizanalytics
2. Agenda
Confidential 2
• Introduction
• Qu’est-ce qu’une base de données « Cloud » ?
• Apache Cassandra et DataStax Enterprise
• Google Cloud Platform et DataStax
• Cas d’utilisation
3. DataStax
Fondé en avril 2010
~35 500+
Santa Clara, Austin, New York, London, Paris, Sydney
400+
Employés Pourcent Clients
3
19. Mais mon SGBDR peut faire ça non ?
Confidential 19
Plus scale up que scale out
Réplication Maître/Esclave
Compléxité de la Répartition, de l’Administration
…
20. Quelle base de données “Cloud” ?
20
Deux choix principaux:
• Utilisation d’une base de données du fournisseur Cloud
Avantage: Boîte noire, pas d’administration (vraiment?)
Inconvénient: Boîte noire, coût, performances, lien fort avec le founisseur
• Déployer sa propre base de données (Cassandra par exemple)
Avantage: Pas de lien fort avec le fournisseur, optimisation des
performances, moindre coût, intallation hybride, plus grande adoption
Inconvénient: Administration
25. Cloud & Cloud Hybride
• DataStax Enterprise et Cassandra sont disponibles en multi-data center et dans le cloud (Amazon
AWS, Google Cloud et Microsoft Azure)
• Les données écrites dans n’importe quel noeud sont aussi automatiquement et de manière
transparente écrites sur tous les autres noeuds dans les autres data centers sans ETL
Data Centre 1
Data Centre 2
Public Cloud
31. Confidential 31
DataStax Enterprise
Robustesse et
Support pour la
Production
Puissance pour le
Développement
Support 24/7
Cassandra certifié
Administration avancée
Sécurité avancée
Recherche
Analytique
32. Analytique avec DataStax Enterprise
• Isolation des ressources pour différents cas d’utilisation : OLTP, Recherche, Analytique
Cassandra est fait pour ça!
• Vous pouvez créer des data centers isolés, virtuels et optimisés suivant les besoins –
différentes charges de travail, matériels, disponibilité, etc …
• Cassandra repliquera les données pour vous – sans ETL
32
Replication
Cassandra
Application
Opérationnelle
Analytique
33. Rappel : Attributs clés d’une base de données Cloud
• Elasticité Transparente
• Scalabilité Transparente
• Haute Disponibilité
• Distribution Simple des Données
• Redondance des Données
• Support de Multiples Types de Données
• Simple à Administrer
• Support de Multiples Infrastructures
• Sécurité
39. Support de Multiples Types de Données
• Le modèle de données de Cassandra (basé sur Google Bigtable)
améne une flexibilité du stockage
• Stratégie DataStax de Multi-Modèle : Bientôt support de JSON et d’un
modèle de données Graphe (via l’acquisition Aurelius/TitanDB)
ID Name SSN DOB
Portfolio Keyspace
Customer Table
40. Simple à Administrer
• OpsCenter et les outils fournis dans les différents fournisseurs Cloud permettent
des installation et configurations rapides.
• Tout peut être administré et supervisé via une application Web ou via des API
REST.
• Services automatiques d’administration et de supervision (performance), Alertes,
Backup/Restore, PITR …
• Accès sécurisé
41. Support de Multiples Infrastructures
Cassandra est supporté par les plus importants fournisseurs Cloud et
Operating Systems.
42. Securité
• Fonctionnalités de sécurité standard: Authentification, Autorisation,
Chiffrement à la volée
• Support de fonctionnalités avancées : LDAP, Kerberos, Chiffrement
sur disque, pistes d’Audit
47. 47
DataStax & Google Cloud Platform
= Ferrari & The Autobahn
Blazing Fast Performance
"In the past we've had high performance and high throughput options ... for our
network attached persistent disk (a great offering). But, sometimes you need to
take it up a notch ... and you want to have access to local flash for your
application, especially if you are doing something like a high
performance Cassandra Cluster. And the way that works on Cloud
Platform is somewhat unique. You can take any standard VM and attach flash to that.”
- Navneet Joneja, Product Manager, Google Cloud Platform at Google Cloud Platform
Live 2014, November 2014, San Francisco
40px in height.
52. The Smart Way to Manage Sensors for Energy and Cost Savings
Thousands of sensors on rooftop machines in commercial buildings
Poor performance and limited scale with legacy technologies
Live in production in ~2 months ingesting, normalizing, and analyzing time-series sensor data
Reduced TCO by over 67% compared to relational systems
Linear scale, 100% uptime with DataStax Enterprise and Google Cloud Platform Use Case: Internet of Things
54. Plus d’information
• http://www.datastax.com/deliver-blazingly-fast-online-applications-with-apache-cassandra-on-google-
cloud-platform
• Cassandra Performance Benchmark
http://planetcassandra.org/blog/post/cassandra-performance-benchmark-aws-google-compute-engine-
rackspace-cloud/
• Cassandra Hits One Million Writes Per Second on Google Compute Engine
http://googlecloudplatform.blogspot.co.uk/2014/03/cassandra-hits-one-million-writes-per-second-on-
google-compute-engine.html
• DataStax http://www.datastax.com
• Getting Started http://www.datastax.com/documentation/gettingstarted/
• Training http://www.datastax.com/what-we-offer/products-services/training
• Downloads http://www.datastax.com/download
• Documentation http://www.datastax.com/docs
• Developer Blog http://www.datastax.com/dev/blog
• Academy https://academy.datastax.com
• Community Site http://planetcassandra.org
Key Takeaway-
Introduce the company, our incredible growth and global presence, that we are in about 25% of the FORTUNE 100, and the fact that many of the online and mobile applications you already use every day are actually built on DataStax.
Talk Track-
DataStax, the leading distributed database technology, delivers Apache Cassandra to the world’s most innovative companies such as Netflix, Rackspace, Pearson Education and Constant Contact. DataStax is built to be agile, always-on, and predictably scalable to any size.
We were founded in April 2010, so we are a little over 4 years old. We are headquartered in Santa Clara, California and have offices in Austin TX, New York, London, England and Sydney Australia. We now have over 330 employees; this number will reach well over 400 by the end of our fiscal year (Jan 31 2015) and double by the end of FY16.
Currently 25% of the Fortune 100 use us, and our success has been built on our customers success and today and we have over 500 customers worldwide, in over 40 countries. The logos you see here are ones that you are already using every day.
These applications are all built on DataStax and Apache Cassandra.
So how have we come so far in such a short time…..?
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
Cassandra is designed to handle big data workloads across multiple data centers with no single point of failure, providing enterprises with continuous availability without compromising performance.
It uses aspects of Dynamos partitioning and replication and a log-structured data model similar to Bigtable’s.
It takes its distribution algorithm from Dynamo and its data model from Bigtable.
Cassandra is a reinvented database which is lightening fast and always on ideal for todays online applications where relational databases like Oracle can’t keep up.
This means that in todays world, cassandra stores and processes real time information at fast, predictive performance and built in fault tolerance
Replacing nodes, upgrading nodes
Talk about consistency levels
i.e. all, local quorum
Automatically drop down to a weaker consistency etc..
In addition, a clustered database configuration should allow for some sort of easy load balancing so that an even distribution of the total workload is experienced.
Masterless with no single point of failure - all nodes are equal and can perform all functions.
Out of the box Cassandra is datacenter and rack aware. It will attempt to have replica data placed on nodes in different racks.
Out of the box replication of data between data centres.
Out of the box active/active across multiple data centres.
Out of the box support for hybrid cloud deployments.
Cassandra clusters can be set-up and used to achieve zero RPO i.e. zero data loss on failure
No outage required for upgrades
No outage required for capacity expansion/reduction.
Cassandra was architected from the outset to be completely masterless, rack aware and deployed across multiple data centres.
High availability in Cassandra is a core part of its design and architecture and is one of the most compelling reasons to use Cassandra.
sustain one million writes per second to Cassandra with a median latency of 10.3 ms and 95% completing under 23 ms
sustain a loss of ⅓ of the instances and volumes and still maintain the 1 million writes per second (though with higher latency)
scale up and down linearly so that the configuration described can be used to create a cost effective solution
go from nothing in existence to a fully configured and deployed instances hitting 1 million writes per second took just 70 minutes. A configured environment can achieve the same throughput in 20 minutes.
Key Takeaway-
Rip Tide I/O delivered an IoT (Internet of Things) application ingesting vast amounts of time-series data from thousands of building sensors entirely with DataStax in Google Cloud Platform within 2 months with a very lean team
Talk Track-
Riptide IO helps large enterprises navigate the transition to an internet based, data-driven world of integrated device management. Their team turns small commercial buildings into “smart buildings” to save the world’s energy resources & retailers’ operating expenses. They connect sensors on rooftop machines in commercial buildings that house retailers small & large. By ingesting, organizing, tagging, normalizing, & analyzing time-series sensor data from machines. Rip Tide IO helps retailers optimize their customers’ experience, improve operations, reduce energy footprints & save millions of dollars. Data points from these sensors are captured every few minutes, legacy relational systems could not ingest that tremendous amount of data. It is necessary for them to have an always-on system set up quickly with a lean team to capture, analyze & optimize the time-series sensor data. Total cost of ownership was also an issue as retailers needed to achieve high performance, at the lowest possible cost.
DataStax Enterprise provides data management for time-series data, scalability and 100% uptime. Large community support was also available for them to tap into.
When deployed on Google Cloud Platform, they were able to bring their application to market within 2 months with a very lean time with huge cost savings.
Saved retailers millions of dollars in energy usage and operational costs . Reduced TCO by over 67% compared to traditional relational systems