SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Guide to NoSQL with MySQL
                                                                 Delivering the Best of Both Worlds for
                                                                                        Web Scalability




                                                                                   A MySQL® White Paper
                                                                                             February 2012




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Table of Contents
                     Introduction ................................................................................................... 3!
                     New Technologies Driving New Database Demands ................................ 3!
                     NoSQL Key-Value Access to MySQL with Memcached ............................ 5!
                           Memcached Implementation for InnoDB .................................................. 5!
                           Memcached Implementation for MySQL Cluster ...................................... 6!
                           Using Memcached for Schema-less Data ................................................ 8!
                     Scaling MySQL with Replication and Sharding ......................................... 8!
                           MySQL Replication ................................................................................... 8!
                           Sharding ................................................................................................. 10!
                     MySQL Cluster: Auto-Sharding, HA & Online Schema Maintenance .... 11!
                           MySQL Cluster Architecture ................................................................... 11!
                           Auto-Sharding for High Write Scalability ................................................ 12!
                           Scaling Across Data Centers ................................................................. 13!
                           Benchmarking Scalability & Performance on Commodity Hardware ..... 14!
                           Scaling Operational Agility to Support Rapidly Evolving Services ......... 14!
                           On-Line Schema Evolution ..................................................................... 15!
                           Delivering 99.999% Availability .............................................................. 16!
                           SQL and NoSQL Interfaces in MySQL Cluster ...................................... 17!
                     Integrating MySQL with Data Stores and Frameworks ........................... 18!
                           Binlog API ............................................................................................... 18!
                           Apache Sqoop for Hadoop Integration ................................................... 18!
                     Operational Best Practices ........................................................................ 19!
                     Conclusion .................................................................................................. 20!
                     Additional Resources ................................................................................. 20!




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                                            Page 2
Introduction
       The explosive growth in data volumes is driving new levels of innovation in database technologies, reflected
       in the rise of Big Data and NoSQL data stores, coupled with the continued evolution of Relational Database
       Management Systems (RDBMS).

       For the first time in many years, software architects, developers and administrators are faced with an array of
       new challenges in managing this data growth, and a bewildering array of potential solutions.

       MySQL is the leading open source database for web and cloud services. Oracle’s MySQL team has therefore
       been at the forefront in understanding these challenges, and innovating with capabilities that are designed to
       blend both relational and NoSQL technologies.

       This Guide introduces these capabilities and demonstrates how organizations can achieve:
       1. High scalability and availability with flexible development, rapid iteration and ease-of-use;
       2. While maintaining the benefits of data integrity and the ability to support rich queries.

       The result is solutions that deliver the best of both technologies, enabling organizations to harness the
       benefits of data growth while leveraging established and proven industry standards and skill-sets across their
       development and operational environments to capture, enrich and gain new insight from big data.



       New Technologies Driving New Database Demands
       It should come as no surprise that data volumes are growing at a much faster rate than anyone previously
                                                                                        1
       predicted. The McKinsey Global Institute estimates this growth at 40% per annum .




                                         Figure 1: Key Trends Driving Growth in Data Volumes

       Increasing internet penetration rates, social networking, high speed wireless broadband connecting smart
       mobile devices, and new Machine to Machine (M2M) interactions are just some technologies fueling this
       growth, and reflected in the following research:
                                                                                      2
          • Internet penetration rates growing to over 30% of the global population ;

       1
           http://www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf
       2
           http://www.internetworldstats.com/emarketing.htm


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                         Page 3
•    By 2015, global IP networks will deliver 7.3 petabytes of data every 5 minutes – that’s the gigabyte
                                                                                           3
                equivalent of all movies ever made crossing the network, in just 5 minutes ;
                                                                      4
           •    5.9bn mobile device subscriptions at the end of 2011;
                                                                                  5
           •    Global eCommerce revenues forecast to reach $1 trillion by 2014 .

       The databases needed to support such a massive growth in data have to meet new challenges, including:
         • Automatically scaling database operations, both reads and writes, across commodity hardware;
         • High availability for 24x7 service uptime;
         • Choice of data access patterns and APIs introducing new levels of flexibility and convenience for
             developers;
         • Rapidly iterate and evolve databases and schemas to meet user demands for new functionality and
             dynamically scale for growth;
         • Ease-of-Use with reduced barriers to entry enabling developers to quickly innovate in the delivery
             of differentiated services.

       Of course it can be argued that databases have had these requirements for years – but they have rarely had
       to address all of these requirements at once, in a single application. For example, some databases have
       been optimized for low latency and high availability, but then don’t have a means to scale write operations
       and can be difficult to use. Other databases maybe very simple to start with, but lack capabilities that make
       them suitable for applications with demanding uptime requirements.

       It is also important to recognize that not all data is created equal. It is not catastrophic to lose individual
       elements of log data or status updates – the value of this data is more in its aggregated analysis. But some
       data is critical, such as ecommerce transactions, financial trades, customer updates, billing operations, order
       placement and processing, access and authorization, service entitlements, etc.

       Services generating and consuming this type of data still need the back-end data-store to meet the scalability
       and flexibility challenges discussed above, while still:
                                                          6
         • Preserving transactional integrity with ACID compliance;
         • Enabling deeper insight by running complex queries against the data;
         • Leveraging the proven benefits of industry standards and skillsets to reduce cost, risk and complexity;
         • Maintaining a well-structured data model, allowing simple data re-use between applications.

       With these types of requirements, it becomes clear that there is no single solution to meet all needs, and it is
       therefore essential to mix and match technologies in order to deliver functionality demanded by the business.




                                            Figure 2: Combining the best of NoSQL & RDBMS


       3
         http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-
       481360_ns827_Networking_Solutions_White_Paper.html
       4
         http://royal.pingdom.com/2012/01/17/internet-2011-in-numbers/
       5
         http://techcrunch.com/2011/01/03/j-p-morgan-global-e-commerce-revenue-to-grow-by-19-percent-in-2011-to-680b/
       6
         http://en.wikipedia.org/wiki/ACID


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                          Page 4
Through a combination of NoSQL access methods, technologies for scaling out on commodity hardware,
       integrated HA and support for on-line operations such as schema changes, MySQL is able to offer solutions
       that blend the best of both worlds.



       NoSQL Key-Value Access to MySQL with Memcached
       Providing NoSQL interfaces directly on top of a MySQL storage engine offers several major advantages:
         • Delivers high performance for simple queries by bypassing the SQL layer;
         • Enables developers to work in their native programming languages and choose if and when to apply
              schemas when handling data management within the application. Faster development cycles result in
              accelerated time to market for new services.

       Using NoSQL interfaces to MySQL, users can maintain all of the advantages of their existing relational
       database infrastructure including the ability to run complex queries, while providing blazing fast performance
       for simple Key/Value operations, using an API to complement regular SQL access to their data.

       MySQL Cluster 7.2 includes support for the Memcached API, providing a native Key-Value interface to the
       database. This functionality has also been previewed as part of MySQL 5.6 for native access to the InnoDB
       storage engine.

       Using the Memcached API, web services can directly access InnoDB and MySQL Cluster without
       transformations to SQL, ensuring low latency and high throughput for read/write queries. Operations such as
       SQL parsing are eliminated and more of the server’s hardware resources (CPU, memory and I/O) are
       dedicated to servicing the query within the storage engine itself.

       Over and above higher performance and faster development cycles, there are a number of additional
       benefits:
         • Preserves investments in Memcached infrastructure by re-using existing Memcached clients and
              eliminates the need for application changes;
         • Flexibility to concurrently access the same data set with SQL, allowing complex queries to be run while
              simultaneously supporting Key-Value operations from Memcached.
         • Extends Memcached functionality by integrating persistent, crash-safe, transactional database back-
              ends offering ACID compliance, rich query support and extensive management and monitoring tools;
         • Reduces service disruption caused by cache re-population after an outage;
         • Simplifies web infrastructure by optionally compressing the caching and database layers into a single
              data tier, managed by MySQL;
         • Reduces development and administration effort by eliminating the cache invalidation and data
              consistency checking required to ensure synchronization between the database and cache when
              updates are committed;
         • Eliminates duplication of data between the cache and database, enabling simpler re-use of data
              across multiple applications, and reducing memory footprint;
         • Simple to develop and deploy as many web properties are already powered by a MySQL database
              with caching provided by Memcached.

       The following sections of the Guide discuss the Memcached implementation for both InnoDB and MySQL
       Cluster.

       Memcached Implementation for InnoDB
       The Memcached API for InnoDB has been previewed as part of the MySQL 5.6 Development Milestone
       Release.

       As illustrated in the following figure, Memcached for InnoDB is implemented via a Memcached daemon plug-
       in to the mysqld process, with the Memcached protocol mapped to the native InnoDB API.


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                        Page 5
Figure 3: Memcached API Implementation for InnoDB

       With the Memcached daemon running in the same process space, users get very low latency access to their
       data while also leveraging the scalability enhancements delivered with InnoDB and a simple deployment and
       management model. Multiple web / application servers can remotely access the Memcached / InnoDB server
       to get direct access to a shared data set.

       With simultaneous SQL access, users can maintain all the advanced functionality offered by InnoDB
       including support for Foreign Keys, XA transactions and complex JOIN operations.

       Additional Memcached API features provided in the preview release include:
         • Support for both the Memcached text-based and binary protocols; the Memcached API for InnoDB
              passes all of the 55 memcapable tests, ensuring compatibility with existing applications and clients;
         • Supports users mapping multiple columns into a “value”. A pre-defined “separator” which is fully
              configurable separates the value.
         • Deployment flexibility with three options for local caching of data: “cache-only”, “innodb-only”, and
              “caching” (both “cache” and “innodb store”). These local options apply to each of four Memcached
              operations (set, get, delete and flush).
         • All Memcached commands are captured in the MySQL Binary Log (Binlog), used for replication. This
              makes it very easy to scale database operations out across multiple commodity nodes, as well as
              provide a foundation for high availability.

       You can learn how to configure the Memcached API for InnoDB here:
       http://blogs.innodb.com/wp/2011/04/get-started-with-innodb-memcached-daemon-plugin/


       Memcached Implementation for MySQL Cluster
       Memcached API support for MySQL Cluster was introduced with General Availability (GA) of the 7.2 release,
       and joins an extensive range of NoSQL interfaces that are already available for MySQL Cluster (discussed
       later in this Guide).

       Like Memcached, MySQL Cluster provides a distributed hash table with in-memory performance. MySQL
       Cluster extends Memcached functionality by adding support for write-intensive workloads, a full relational
       model with ACID compliance (including persistence), rich query support, auto-sharding and 99.999%
       availability, with extensive management and monitoring capabilities.



Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                      Page 6
All writes are committed directly to MySQL Cluster, eliminating cache invalidation and the overhead of data
       consistency checking to ensure complete synchronization between the database and cache.




                                   Figure 4: Memcached API Implementation with MySQL Cluster

       Implementation is simple - the application sends reads and writes to the Memcached process (using the
       standard Memcached API). This in turn invokes the Memcached Driver for NDB (which is part of the same
       process), which in turn calls the NDB API for very quick access to the data held in MySQL Cluster’s data
       nodes.

       The solution has been designed to be very flexible, allowing the application architect to find a configuration
       that best fits their needs. It is possible to co-locate the Memcached API in either the data nodes or application
       nodes, or alternatively within a dedicated Memcached layer.

       Developers can still have some or all of the data cached within the Memcached server (and specify whether
       that data should also be persisted in MySQL Cluster) – so it is possible to choose how to treat different
       pieces of data, for example:
           • Storing the data purely in MySQL Cluster is best for data that is volatile, i.e. written to and read from
               frequently;
           • Storing the data both in MySQL Cluster and in Memcached is often the best option for data that is
               rarely updated but frequently read;
           • Data that has a short lifetime, is read frequently and does not need to be persistent could be stored
               only in Memcached.

       The benefit of this flexible approach to deployment is that users can configure behavior on a per-key-prefix
       basis (through tables in MySQL Cluster) and the application doesn’t have to care – it just uses the
       Memcached API and relies on the software to store data in the right place(s) and to keep everything
       synchronized.

       Any changes made to the key-value data stored in MySQL Cluster will be recorded in the binary log, and so
       will be applied during replication and point-in-time recovery.




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                           Page 7
Using Memcached for Schema-less Data
       By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row –
       thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each
       value is linked to a pre-defined column in a specific table.

       Of course if the application needs to access the same data through SQL then developers can map key
       prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in
       MySQL Cluster.



       Scaling MySQL with Replication and Sharding
       When it comes to scaling MySQL for highly demanding web applications, replication and sharding (or
       horizontal partitioning) are typically part of the solution. As discussed below, when using the MySQL Server
       with InnoDB, replication can be configured asynchronously or semi-synchronously, and sharding is
       implemented at the application layer.

       Using MySQL Cluster, discussed in the next section of this Guide, sharding is implemented automatically at
       the database layer with support for active-active, multi-master replication, thereby providing an alternative
       development and deployment model for applications requiring the highest levels of write throughput and
       availability.


       MySQL Replication
       One of the ways many new applications achieve scalability is via replicating databases across multiple
       commodity nodes in order to distribute the workload.

       From very early releases, replication has been a standard and integrated component of MySQL, providing the
       foundation for extreme levels of database scalability and high availability in leading web properties including
       some of the world’s most highly trafficked Web sites including Facebook, YouTube, Google, Yahoo!, Flickr
       and Wikipedia.

       Using MySQL replication, it is simple for users to rapidly create multiple replicas of their database to
       elastically scale-out beyond the capacity constraints of a single instance, enabling them to serve rapidly
       growing database workloads as well as withstand failures.

       MySQL Replication works by configuring one server as a master, with one or more additional servers
       configured as slaves. The master server will log the changes to the database. Once these changes have
                                                                                                               8
       been logged, they are then sent and applied to the slave(s) immediately or after a set time interval .
       As illustrated in the figure below, the master server writes updates to its binary log files that serve as a record
       of updates to be sent to slave servers. When a slave connects to the master, it determines the last position
       read in the logs on its most recent successful connection to the master. The slave then receives any updates
       which have taken place since that time and applies them, before continuing to poll the master for new
       updates. The slave continues to service read requests during these processes.




       8
           Time Delayed Replication is a feature of the MySQL 5.6 Development Milestone Release


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                             Page 8
Figure 5: MySQL Replication Supports HA and Read Scalability “Out of the Box”

       Replication is often employed in a scale-out implementation so that requests which simply “read” data can be
       directed to slave servers. This allows transactions involving writes to be exclusively executed on the master
       server. This typically leads to not only a performance boost but a more efficient use of resources, and is
       implemented either within the application code or by using the appropriate MySQL connector (e.g. the
       Connector/J JDBC or mysqlnd PHP drivers).




                                             Figure 6: Scaling Reads with MySQL Replication

       MySQL replication provides the ability to scale out large web farms where reads (SELECTs) represent the
       majority of operations performed on the database.




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                       Page 9
9                            10
       The MySQL 5.6 Development Milestone and Early Access Releases preview a host of new capabilities that
       directly enhance replication performance, data integrity and availability, including:
                                               11
          • Global Transaction Identifiers which tag each replication event with a unique ID, making it simple to
               failover from a master to the most up-to-date slave, as well as create multi-master circular (ring)
               replication topologies;
          • Multi-Threaded Slaves that improve performance by using multiple execution threads to apply
               replication events in parallel to slave servers;
          • Crash-Safe Slaves which enable slaves to automatically recover from failures;
          • Replication Event Checksums that protect the integrity of slave databases by automatically detecting
               corruption whether caused by memory, disk or network failures, or by the database itself;

       Other key features include the preview of optimized row-based replication, binlog group commit, time-delayed
       replication and remote binlog backup, collectively enhancing performance, flexibility and ease of use.

       MySQL replication can be deployed in a range of topologies to support diverse scaling and HA requirements.
       To learn more about these options and how to configure MySQL replication, refer to the following whitepaper:
       http://mysql.com/why-mysql/white-papers/mysql-wp-replication.php

       Sharding
       Sharding is commonly used by large web properties to increase the scalability of writes across their
       database. In high traffic Web 2.0 environments such as social networking a growing percentage of content is
       generated by users themselves, thereby demanding higher levels of write scalability, handling rapidly
       changing data.

       Despite growing write volumes, the reality is that read operations still predominate in Web 2.0-type
       environments and that millions of users can still be serviced with web applications, without requiring any
       database sharding at all.

       Sharding – also called application partitioning - involves the application dividing the database into smaller
       data sets and distributing them across multiple servers. This approach enables very cost effective scalability
       as low cost commodity servers can be easily deployed when capacity requirements grow. In addition, query
       processing accesses just a sub-set of the data, thereby boosting performance.

       Applications need to become “shard-aware” so that writes can be directed to the correct shard, and cross-
       shard JOIN operations are avoided. If best practices are applied, then sharding becomes an effective way to
       scale relational databases supporting web-based write-intensive workloads.

       As sharding is implemented at the application layer, it is important to utilize best practices in application and
       database design. The Architecture and Design Consulting Engagement offered by the MySQL Consulting
       group in Oracle has enabled many customers to reduce the complexity of deployments where sharding is
       used for scalability.

       By deploying MySQL with sharding and replication, leading social networking sites such as Facebook, Flickr,
       LinkedIn and Twitter have been able to scale their requirements for relational data management
       exponentially.

       If applications involve high volumes of write operations (i.e. INSERTs and UPDATES), coupled with the need
       for continuous operation, it is worthwhile considering using MySQL Cluster, discussed below.




       9
           http://blogs.oracle.com/MySQL/entry/mysql_development_milestone_562_taking_mysql_replication_to_the_next_level
       10
            http://blogs.oracle.com/MySQL/entry/mysql_5_6_replication_new
       11
            http://d2-systems.blogspot.com/2011/10/global-transaction-identifiers-feature.html


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                            Page 10
MySQL Cluster: Auto-Sharding, HA & Online Schema
       Maintenance
       MySQL Cluster has many attributes that make it ideal for new generations of highly dynamic, highly scalable
       applications, including:
             • Auto-sharding for write-scalability;
             • Cross-data center geographic synchronous and asynchronous replication;
             • Online scaling and schema upgrades;
             • SQL and NoSQL interfaces;
             • Integrated HA for 99.999% uptime.

       MySQL Cluster is a scalable, real-time, ACID-compliant transactional database, combining 99.999%
       availability with the low TCO of open source. Designed around a distributed, multi-master architecture with no
       single point of failure, MySQL Cluster scales horizontally on commodity hardware with auto-sharding to serve
       read and write intensive workloads, accessed via SQL and NoSQL interfaces.

       MySQL Cluster is deployed in the some of the largest web, telecoms and embedded applications on the
                                                                 12
       planet, serving high-scale, business-critical applications , including:
          ! High volume OLTP;
          ! Real time analytics;
          ! Ecommerce, inventory management, shopping carts, payment processing, fulfillment tracking, etc.;
          ! Financial trading with fraud detection;
          ! Mobile and micro-payments;
          ! Session management & caching;
          ! Feed streaming, analysis and recommendations;
          ! Content management and delivery;
          ! Massively Multiplayer Online Games;
          ! Communications and presence services;
          ! Subscriber / user profile management and entitlements


       MySQL Cluster Architecture
       As illustrated in the figure below, MySQL Cluster comprises three types of node which collectively provide
       service to the application:

                 •    Data nodes manage the storage and access to data. Tables are automatically sharded across the
                      data nodes which also transparently handle load balancing, replication, failover and self-healing.

                 •    Application nodes provide connectivity from the application logic to the data nodes. Multiple APIs
                      are presented to the application. MySQL provides a standard SQL interface, including connectivity
                      to all of the leading web development languages and frameworks. There are also a whole range of
                      NoSQL interfaces including memcached, REST/HTTP, C++ (NDB-API), Java and JPA.

                 •    Management nodes are used to configure the cluster and provide arbitration in the event of
                      network partitioning.




       12
            http://mysql.com/customers/cluster/


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                           Page 11
Figure 7: The MySQL Cluster architecture provides high write scalability across multiple SQL &
                                                      NoSQL APIs

       The following section of the Guide discusses how MySQL Cluster auto-shards for write scalability, delivers
       HA and on-line operations, enabling developers and administrators to support new and rapidly evolving
       services.

       Auto-Sharding for High Write Scalability
       MySQL Cluster is designed to support write-intensive web workloads with the ability to scale from several to
       several hundred nodes – allowing services to start small and scale quickly as demand takes-off.

       MySQL Cluster is implemented as an active/active, multi-master database ensuring updates can be handled
       by any data node, and are instantly available to all of the other applications / clients accessing the cluster.

       Tables are automatically sharded across a pool of low cost commodity data nodes, enabling the database to
       scale horizontally to serve read and write-intensive workloads, accessed both from SQL and directly via
       NoSQL APIs.

       By automatically sharding tables at the database layer, MySQL Cluster eliminates the need to shard at the
       application layer, greatly simplifying application development and maintenance. Sharding is entirely
       transparent to the application which is able to connect to any node in the cluster and have queries
       automatically access the correct shards needed to satisfy a query or commit a transaction.

       By default, sharding is based on the hashing of the whole primary key, which generally leads to a more even
       distribution of data and queries across the cluster than alternative approaches such as range partitioning.

       It is important to note that unlike other distributed databases, users do not lose the ability to perform JOIN
       operations or sacrifice ACID-guarantees when performing queries and transactions across shards. The
       Adaptive Query Localization feature of MySQL Cluster 7.2 pushes JOIN operations down to the data nodes
       where they are executed locally and in parallel, significantly reducing network hops and delivering 70x higher
                                       14
       throughput and lower latency .

       This enables users to perform complex queries, and so MySQL Cluster is better able to serve those use-
       cases that have the need to run real-time analytics across live data sets, along with high throughput OLTP


       14
            http://dev.mysql.com/tech-resources/articles/mysql-cluster-7.2.html


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                         Page 12
operations. Examples include recommendations engines and clickstream analysis in web applications, pre-
       pay billing promotions in mobile telecoms networks or fraud detection in payment systems.




                                                 Figure 8: Auto-Sharding in MySQL Cluster

       The figure below demonstrates how MySQL Cluster shards tables across data nodes of the cluster.

       From the figure below, you will also see that MySQL Cluster automatically creates “node groups” from the
       number of replicas and data nodes specified by the user. Updates are synchronously replicated between
       members of the node group to protect against data loss and enable sub-second failover in the event of a
       node failure.




                                       Figure 9: Automatic Creation of Node Groups & Replicas


       Scaling Across Data Centers
       Web services are global and so developers will want to ensure their databases can scale-out across regions.
       MySQL Cluster offers Geographic Replication that distributes clusters to remote data centers, serving to
       reduce the affects of geographic latency by pushing data closer to the user, as well as supporting disaster
       recovery.

       Geographic Replication is implemented via standard asynchronous MySQL replication, with one important
       difference: support for active / active geographically distributed clusters. Therefore, if your applications are


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                          Page 13
attempting to update the same row on different clusters at the same time, MySQL Cluster’s Geographic
       Replication can detect and resolve the conflict, ensuring each site can actively serve read and write requests
       while maintaining data consistency across the clusters.

       Geographic Replication also enables data to be replicated in real-time to other MySQL storage engines. A
       typical use-case is to replicate tables from MySQL Cluster to the InnoDB storage engine in order to generate
       complex reports from near real-time data, with full performance isolation from the live data store.

       New in MySQL Cluster 7.2: Multi-Site Clustering
       MySQL Cluster 7.2 includes Multi-Site Clustering, providing a new option for cross data center scalability. For
       the first time splitting data nodes across data centers is a supported deployment option. With this deployment
       model, tables are automatically sharded across sites with synchronous replication between them. Failovers
       between sites are handled automatically by MySQL Cluster.

       Whether you are replicating within the cluster, between Clusters or between storage engines, all replication
       activities are performed concurrently – so users do not need to trade-off one scaling strategy for another.

       Benchmarking Scalability & Performance on Commodity Hardware
       All of the technologies discussed above are designed to provide read and write scalability for your most
       demanding transactional web applications – but what do they mean in terms of delivered performance?
                                                                                              15
       The MySQL Cluster development team recently ran a series of benchmarks that characterized performance
       across eight commodity dual socket (2.93GHz), 6-core Intel servers, each equipped with 48GB of RAM,
       running Linux and connected via Infiniband.

       As seen in the figure below, MySQL Cluster delivered just over 1 Billion Reads per minute and 110m Writes
       per minute. Synchronous replication between node groups was configured, enabling both high performance
       and high availability – without compromise.


                                   SELECT Queries per Minute                      UPDATE Queries per Minute
                           1,200                                                  120

                           1,000                                                  100

                            800                                                    80
                                                                       Millions
                Millions




                            600                                                    60

                            400                                                    40

                            200                                                    20

                              0                                                     0
                                       2            4             8                       4                          8
                                           Number of Data Nodes                               Number of Data Nodes



                                    Figure 5: MySQL Cluster performance scaling-out on commodity nodes.

       These results demonstrate how users can build a highly performing, highly scalable MySQL Cluster from low-
       cost commodity hardware to power their most mission-critical web services.

       Scaling Operational Agility to Support Rapidly Evolving Services
       Scaling performance is just one dimension – albeit a very important one – of scaling today’s databases. As a
       new service gains in popularity it is important to be able to evolve the underlying infrastructure seamlessly,
       without incurring downtime and having to add lots of additional DBA or developer resource.

       15
            http://mysql.com/why-mysql/benchmarks/mysql-cluster/


Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                         Page 14
Users may need to increase the capacity and performance of the database; enhance their application (and
       therefore their database schema) to deliver new capabilities and upgrade their underlying platforms.

       MySQL Cluster can perform all of these operations and more on-line – without interrupting service to the
       application or clients.

       In the following example, the cluster on the left is configured with two application and data nodes and a single
       management server. As the service grows, the users are able to scale the database and add management
       redundancy – all of which can be performed as an online operation. An added advantage of scaling the
       Application Nodes is that they provide elasticity in scaling, so can be scaled back down if demand to the
       database decreases.




                      Figure 11: Doubling Database Capacity & Performance On-Demand and On-Line

       With its shared-nothing architecture, it is possible to avoid database outages by using rolling restarts to not
       only add but also upgrade nodes within the cluster. Using this approach, users can:
              • Upgrade or patch the underlying hardware and operating system;
              • Upgrade or patch MySQL Cluster, with full online upgrades between releases.

       MySQL Cluster supports on-line, non-blocking backups, ensuring service interruptions are again avoided
       during this critical database maintenance task.


       On-Line Schema Evolution
       As services evolve, developers often want to add new functionality, which in many instances may demand
       updating the database schema.

       This operation can be very disruptive for many databases, with ALTER TABLE commands taking the
       database offline for the duration of the operation. When users have large tables with many millions of rows,
       downtime can stretch into hours or even days.

       MySQL Cluster supports on-line schema changes, enabling users to add new columns and tables and add
       and remove indexes – all while continuing to serve read and write requests, and without affecting response
       times.




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                         Page 15
Of course, if using the Memcached API, the application can avoid the need for data schemas altogether.

       Unlike other on-line schema update solutions, MySQL Cluster does not need to create temporary tables,
       therefore avoiding the need to provision double the usual memory or disk space in order to complete the
       operation.

       Delivering 99.999% Availability
       No matter how well a new service is architected for performance, scalability and agility, if it is not up and
       available, it will never be successful.

       The architecture of MySQL Cluster is designed to deliver 99.999% availability, which includes both regularly
       scheduled maintenance operations, discussed above, as well as systems failures.

       The distributed, shared-nothing architecture of MySQL Cluster has been carefully designed to ensure
       resilience to failures, with self-healing recovery:
              • The heartbeating mechanism within MySQL Cluster automatically detects any failures instantly and
                   control is automatically failed over to other active nodes in the cluster, without interrupting service
                   to the clients.
              • In the event of a failure, the MySQL Cluster nodes are able to self-heal by automatically restarting
                   and resynchronizing with other nodes before re-joining the cluster, all of which is completely
                   transparent to the application.
              • The data within a data node is synchronously replicated to all nodes within the Node Group. If a
                   data node fails, then there is always at least one other data node storing the same information.
              • With its shared-nothing architecture, each data node has its own disk and memory storage, so the
                   risk of a complete cluster failure caused by a shared component, such as storage, is eliminated.

       Designing the cluster in this way makes the system reliable and highly available since single points of failure
       have been eliminated. Any node can be lost without it affecting the system as a whole. As illustrated in the
       figure below, an application can, for example, continue executing even though multiple management, API and
       Data Nodes are down, provided that there are one or more surviving nodes in its node group.




            Figure 12: With no single point of failure, MySQL Cluster delivers extreme resilience to failures

       As already discussed, in addition to site-level high-availability achieved through the redundant architecture of
       MySQL Cluster, geographic redundancy can be achieved using replication between two or more Clusters, or
       splitting data nodes of a single cluster across sites.




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                          Page 16
SQL and NoSQL Interfaces in MySQL Cluster
       As MySQL Cluster stores tables in data nodes, rather than in the MySQL Server, there are multiple interfaces
       available to access the database. We have already discussed the Memcached API, but MySQL Cluster
       offers other interfaces as well for maximum developer flexibility.

       Developers have a choice between:
         • SQL for complex queries and access to a rich ecosystem of applications and expertise;
         • Simple Key-Value interfaces bypassing the SQL layer for blazing fast reads & writes;
         • Real-time interfaces for microsecond latency.

       The following chart shows all of the access methods available to the database. The native API for MySQL
       Cluster is the C++ based NDB API. All other interfaces access the data through the NDB API.




                                   Figure 13: Ultimate Developer Flexibility – MySQL Cluster APIs

       At the extreme right hand side of the chart, an application has embedded the NDB API library enabling it to
       make native C++ calls to the database, and therefore delivering the lowest possible latency.

       On the extreme left hand side of the chart, MySQL presents a standard SQL interface to the data nodes, and
       provides connectivity to all of the standard MySQL connectors including:
             • Common web development languages and frameworks, i.e. PHP, Perl, Python, Ruby, Ruby on
                 Rails, Spring, Django, etc;
             • JDBC (for additional connectivity into ORMs including EclipseLink, Hibernate, etc);
             • .NET, ODBC, etc.

       Whichever API is chosen for an application, it is important to emphasize that all of these SQL and NoSQL
       access methods can be used simultaneously, across the same data set, to provide the ultimate in developer
       flexibility. Therefore, MySQL Cluster maybe supporting any combination of the following services, in real-
       time:
               • Relational queries using the SQL API;
               • Key-Value-based web services using the Memcached and REST/HTTP and APIs;
               • Enterprise applications with the ClusterJ and JPA APIs;
               • Real-time web services (i.e. presence and location based) using the NDB API.

       With auto-sharding, cross-data center geographic replication, online operations, SQL and NoSQL interfaces
       and 99.999% availability, MySQL Cluster is already serving some of the most demanding web and mobile
       telecoms services on the planet.




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                     Page 17
Integrating MySQL with Data Stores and Frameworks
       Increases in data variety and volume are driving more holistic approaches to managing the data lifecyle –
       from data acquisition to organization and then analysis. In the past, one type of database may have been
       sufficient for most needs – now data management is becoming increasingly heterogenous, with a
       requirement for simple integration between different data stores and frameworks, each deployed for a specific
       task.

       As the leading open source database for web and cloud environments, MySQL has long offered
       comprehensive integration with applications and development environments. This integration is growing with
       new Oracle and community developed initiatives, including:
             - The Binary Log (Binlog) API;
             - Apache Sqoop - for MySQL integration with Hadoop / HDFS.

       Binlog API
       The Binlog API enables developers to reduce the complexity of integration between different stores by
       standardizing SQL data management operations on MySQL, while enabling replication to other services
       within their data management infrastructure.

       The Binlog API exposes a programmatic C++ interface to the binary log, implemented as a standalone
       library. Using the API, developers can read and parse binary log events both from existing binlog files as well
       as from running servers and replicate the changes into other data stores, including Hadoop, search engines,
       columnar stores, BI tools, etc.

       You can download the code as part of MySQL Server Snapshot: mysql-5.6-labs-binary-log-api
       from http://labs.mysql.com!

       To demonstrate the possibilities of the Binlog API, an example application for replicating changes from
       MySQL Server to Apache Solr (a full text search server) has been developed. The example is available with
       the source download on Launchpad.


       Apache Sqoop for Hadoop Integration
       Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured
       datastores such as relational databases. Users can use Sqoop to import data from MySQL into the Hadoop
       Distributed File System (HDFS) or related systems such as Hive and HBase. Conversely, users can use
       Sqoop to extract data from Hadoop and export it to external structured datastores such as relational
       databases and enterprise data warehouses.

       Sqoop integrates with Oozie, allowing users to schedule and automate import and export tasks. Sqoop uses
       a connector-based architecture which supports plugins that provide connectivity to new external systems.

       When using Sqoop, the dataset being transferred is sliced up into different partitions and a map-only job is
       launched with individual mappers responsible for transferring a slice of this dataset. Each record of the data
       is handled in a type safe manner since Sqoop uses the database metadata to infer the data types.

       By default Sqoop includes connectors for most leading databases including MySQL and Oracle, in addition to
       a generic JDBC connector that can be used to connect to any database that is accessible via JDBC.
       Sqoop also includes a specialized fast-path connector for MySQL that uses MySQL-specific batch tools to
       transfer data with high throughput.

       You can see practical examples of importing and exporting data with Sqoop here:
       https://blogs.apache.org/sqoop/entry/apache_sqoop_overview



Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                         Page 18
Operational Best Practices
       For business critical applications, it is recommended to rely on Oracle’s MySQL Enterprise Edition or MySQL
       Cluster Carrier Grade (CGE) Edition.

       MySQL Enterprise Edition and Cluster CGE include the most comprehensive set of advanced features,
       management tools and technical support to help you achieve the highest levels of MySQL scalability, security
       and uptime. It includes:

               •     MySQL Database
               •     Oracle Premier Support for MySQL: Staffed with MySQL Experts, Oracleʼs dedicated MySQL
                     technical support team can help you solve your most complex issues rapidly, and make the most of
                     your MySQL deployments. Oracle MySQL Support Engineers are available 24/7/365 either online
                     or by phone and have direct access to the MySQL developers. Oracle Premier Support does not
                     include any limit on the number of support incidents, and you get support in 29 languages.
               •     MySQL Enterprise Backup: Reduces the risk of data loss by enabling online "Hot" backups of
                     your databases.
               •     New! MySQL Enterprise Scalability: Enables you to meet the sustained performance and
                     scalability requirements of ever increasing user, query and data loads. MySQL Thread Pool
                     provides an efficient, thread-handling model designed to reduce overhead in managing client
                     connections, and statement execution threads.
               •     New! MySQL Enterprise Security: Provides ready to use external authentication modules to
                     easily integrate MySQL with existing security infrastructures including PAM and Windows Active
                     Directory.
               •     New! MySQL Enterprise High Availability: Provides you with certified and supported solutions,
                     including MySQL Replication, Oracle VM Templates for MySQL and Windows Failover Clustering
                     for MySQL.
               •     MySQL Enterprise Monitor: The MySQL Enterprise Monitor and the MySQL Query Analyzer
                     continuously monitor your databases and alerts you to potential problems before they impact your
                     system. It's like having a "Virtual DBA Assistant" at your side to recommend best practices to
                     eliminate security vulnerabilities, improve replication, optimize performance and more. As a result,
                     the productivity of your developers, DBAs and System Administrators is improved significantly.
               •     MySQL Workbench: Provides data modeling, SQL development, and comprehensive
                     administration tools for server configuration, user administration, and much more.
               •     MySQL Enterprise Oracle Certifications: MySQL Enterprise Edition is certified with Oracle
                     Linux, Oracle VM, Oracle Fusion middleware, Oracle Secure Backup and Oracle GoldenGate.

       Additional MySQL Services from Oracle to help you develop, deploy and manage highly scalable & available
       MySQL applications include:

       Oracle University
       Oracle University offers an extensive range of MySQL training from introductory courses (i.e. MySQL
       Essentials, MySQL DBA, etc.) through to advanced certifications such as MySQL High Availability and
       MySQL Cluster Administration. It is also possible to define custom training plans for delivery on-site.
       You can learn more about MySQL training from the Oracle University here: http://www.mysql.com/training/

       MySQL Consulting
       To ensure best practices are leveraged from the initial design phase of a project through to implementation
       and sustaining, users can engage Professional Services consultants. Delivered remote or onsite, these
       engagements help in optimizing the architecture for scalability, high availability and adaptability

       Again Oracle offers a full range of consulting services, from Architecture and Design through to High
       Availability, Replication and Clustering. You can learn more at http://www.mysql.com/consulting/



Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                        Page 19
Conclusion
       The massive growth in data volumes is placing unprecedented demands on databases and frameworks.
       There are many new approaches that promise high scalability and availability with flexible development, rapid
       iteration and ease-of-use. As discussed, many applications also need to maintain transactional integrity of
       data, as well as the ability to run complex queries against their data to ensure business insight and
       competitive advantage.

       Through developments in NoSQL interfaces, replication and MySQL Cluster, users can blend the best of
       NoSQL and relational data stores to provide a complete solution to application requirements, and enjoy the
       benefits of both technologies.



       Additional Resources
       MySQL Cluster 5.6 Development Milestone Release Article:
       https://dev.mysql.com/tech-resources/articles/

       MySQL Cluster 5.6 and MySQL Cluster 7.2 Downloads:
       http://dev.mysql.com/downloads/

       MySQL Cluster 7.2 New Features (including Memcached API configuration):
       http://mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php

       Configuring Memcached API for InnoDB:
       http://blogs.innodb.com/wp/2011/04/get-started-with-innodb-memcached-daemon-plugin/

       MySQL Replication Whitepaper:
       http://mysql.com/why-mysql/white-papers/mysql-wp-replication.php




Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                                                                                    Page 20

Weitere ähnliche Inhalte

Was ist angesagt?

Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationKenneth Peeples
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...ScaleBase
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...ScaleBase
 
Ijaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseerIjaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseerijaprr_editor
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
Nationwide_Ensures_Basel_II_Compliance
Nationwide_Ensures_Basel_II_ComplianceNationwide_Ensures_Basel_II_Compliance
Nationwide_Ensures_Basel_II_ComplianceAndrew Painter
 
Ijaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseerIjaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseerijaprr
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platformDavid Walker
 
Cognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use CasesCognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use CasesSenturus
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreDATAVERSITY
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)James Serra
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 

Was ist angesagt? (20)

Crimson 3 - Final case presentation
Crimson 3 - Final case presentationCrimson 3 - Final case presentation
Crimson 3 - Final case presentation
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
Distributed RDBMS: Data Distribution Policy: Part 1 - What is a Data Distribu...
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
Distributed RDBMS: Data Distribution Policy: Part 2 - Creating a Data Distrib...
 
Ijaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseerIjaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseer
 
Agile Edge Valtech
Agile Edge ValtechAgile Edge Valtech
Agile Edge Valtech
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
Nationwide_Ensures_Basel_II_Compliance
Nationwide_Ensures_Basel_II_ComplianceNationwide_Ensures_Basel_II_Compliance
Nationwide_Ensures_Basel_II_Compliance
 
Ijaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseerIjaprr vol1-2-6-9naseer
Ijaprr vol1-2-6-9naseer
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
Cognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use CasesCognos Data Module Architectures & Use Cases
Cognos Data Module Architectures & Use Cases
 
EIM Tutorial
EIM TutorialEIM Tutorial
EIM Tutorial
 
NoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value StoreNoSQL – Beyond the Key-Value Store
NoSQL – Beyond the Key-Value Store
 
dvprimer-architecture
dvprimer-architecturedvprimer-architecture
dvprimer-architecture
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 

Andere mochten auch

NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLNoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLAndrew Morgan
 
HandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQLHandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQLJui-Nan Lin
 
JPHP - О проекте на простом языке
JPHP - О проекте на простом языкеJPHP - О проекте на простом языке
JPHP - О проекте на простом языкеDmitry Zaytsev
 
A comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notesA comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notesJoão Gabriel Lima
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Presentación1 xmas cards
Presentación1 xmas cardsPresentación1 xmas cards
Presentación1 xmas cardsmjaldudo
 
Technology and education
Technology and educationTechnology and education
Technology and educationpeacefrog4404
 
Trekbin's Force.com Platform Offering
Trekbin's Force.com Platform OfferingTrekbin's Force.com Platform Offering
Trekbin's Force.com Platform OfferingTrekbin
 
Bcn Cool Hunter newsletter 10
Bcn Cool Hunter newsletter 10Bcn Cool Hunter newsletter 10
Bcn Cool Hunter newsletter 10Dafne Patruno
 
Didmo Investor Presentation Summer 2008
Didmo   Investor Presentation Summer 2008Didmo   Investor Presentation Summer 2008
Didmo Investor Presentation Summer 2008Joseph Oliver
 
Calendario de exposiciones 4ºa
Calendario de exposiciones 4ºaCalendario de exposiciones 4ºa
Calendario de exposiciones 4ºaGeohistoria23
 
Disorders3
Disorders3Disorders3
Disorders3cheloina
 
Cloud Services for the BIO-ICT Project
Cloud Services for the BIO-ICT ProjectCloud Services for the BIO-ICT Project
Cloud Services for the BIO-ICT ProjectTomo Popovic
 
Presentación de antonio
Presentación de antonioPresentación de antonio
Presentación de antoniofirstbilingual1
 
The life of an entrepreneur - by me
The life of an entrepreneur - by meThe life of an entrepreneur - by me
The life of an entrepreneur - by meJasper Goyvaerts
 
Software Architecture for Automated Fault Analysis: Scalable Deployment and U...
Software Architecture for Automated Fault Analysis: Scalable Deployment and U...Software Architecture for Automated Fault Analysis: Scalable Deployment and U...
Software Architecture for Automated Fault Analysis: Scalable Deployment and U...Tomo Popovic
 

Andere mochten auch (20)

NoSQL and MySQL
NoSQL and MySQLNoSQL and MySQL
NoSQL and MySQL
 
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLNoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
 
HandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQLHandlerSocket - A NoSQL plugin for MySQL
HandlerSocket - A NoSQL plugin for MySQL
 
Java JSON Benchmark
Java JSON BenchmarkJava JSON Benchmark
Java JSON Benchmark
 
JPHP - О проекте на простом языке
JPHP - О проекте на простом языкеJPHP - О проекте на простом языке
JPHP - О проекте на простом языке
 
A comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notesA comparison between several no sql databases with comments and notes
A comparison between several no sql databases with comments and notes
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Presentación1 xmas cards
Presentación1 xmas cardsPresentación1 xmas cards
Presentación1 xmas cards
 
Technology and education
Technology and educationTechnology and education
Technology and education
 
Bcn cool hunter 15
Bcn cool hunter 15Bcn cool hunter 15
Bcn cool hunter 15
 
Trekbin's Force.com Platform Offering
Trekbin's Force.com Platform OfferingTrekbin's Force.com Platform Offering
Trekbin's Force.com Platform Offering
 
Bcn Cool Hunter newsletter 10
Bcn Cool Hunter newsletter 10Bcn Cool Hunter newsletter 10
Bcn Cool Hunter newsletter 10
 
Didmo Investor Presentation Summer 2008
Didmo   Investor Presentation Summer 2008Didmo   Investor Presentation Summer 2008
Didmo Investor Presentation Summer 2008
 
Calendario de exposiciones 4ºa
Calendario de exposiciones 4ºaCalendario de exposiciones 4ºa
Calendario de exposiciones 4ºa
 
Disorders3
Disorders3Disorders3
Disorders3
 
Cloud Services for the BIO-ICT Project
Cloud Services for the BIO-ICT ProjectCloud Services for the BIO-ICT Project
Cloud Services for the BIO-ICT Project
 
Presentación de antonio
Presentación de antonioPresentación de antonio
Presentación de antonio
 
The life of an entrepreneur - by me
The life of an entrepreneur - by meThe life of an entrepreneur - by me
The life of an entrepreneur - by me
 
Software Architecture for Automated Fault Analysis: Scalable Deployment and U...
Software Architecture for Automated Fault Analysis: Scalable Deployment and U...Software Architecture for Automated Fault Analysis: Scalable Deployment and U...
Software Architecture for Automated Fault Analysis: Scalable Deployment and U...
 
Presentacion de sertecfi
Presentacion de sertecfiPresentacion de sertecfi
Presentacion de sertecfi
 

Ähnlich wie Guide to NoSQL with MySQL: Delivering the Best of Both Worlds for Web Scalability

Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraDATAVERSITY
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGateJeffrey T. Pollock
 
Microsoft Sql Server 2016 Is Now Live
Microsoft Sql Server 2016 Is Now LiveMicrosoft Sql Server 2016 Is Now Live
Microsoft Sql Server 2016 Is Now LiveAmber Moore
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
What Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdfWhat Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdfLaura Miller
 
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White PaperSql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White PaperDavid J Rosenthal
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREAraf Karsh Hamid
 
Guide to MySQL Embedded
Guide to MySQL EmbeddedGuide to MySQL Embedded
Guide to MySQL EmbeddedVlad Alexandru
 
Stateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the CloudStateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the CloudMarkus Eisele
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudantPeter Tutty
 
Juniper: Data Center Evolution
Juniper: Data Center EvolutionJuniper: Data Center Evolution
Juniper: Data Center EvolutionTechnologyBIZ
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectHitesh Jani
 
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...eSAT Publishing House
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDBAhsan Bilal
 
How SQL Server Can Streamline Database Administration
How SQL Server Can Streamline Database AdministrationHow SQL Server Can Streamline Database Administration
How SQL Server Can Streamline Database AdministrationDirect Deals, LLC
 
MySQL en el mundo real. Evolución desde la compra por Oracle
MySQL en el mundo real. Evolución desde la compra por OracleMySQL en el mundo real. Evolución desde la compra por Oracle
MySQL en el mundo real. Evolución desde la compra por OracleLibreCon
 
GigaOm Research: Bare-Metal-Clouds
GigaOm Research: Bare-Metal-CloudsGigaOm Research: Bare-Metal-Clouds
GigaOm Research: Bare-Metal-CloudsBenjamin Shrive
 

Ähnlich wie Guide to NoSQL with MySQL: Delivering the Best of Both Worlds for Web Scalability (20)

Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Microsoft Sql Server 2016 Is Now Live
Microsoft Sql Server 2016 Is Now LiveMicrosoft Sql Server 2016 Is Now Live
Microsoft Sql Server 2016 Is Now Live
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
What Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdfWhat Are The Best Databases for Web Applications In 2023.pdf
What Are The Best Databases for Web Applications In 2023.pdf
 
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White PaperSql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
Sql Server 2014 Platform for Hybrid Cloud Technical Decision Maker White Paper
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Guide to MySQL Embedded
Guide to MySQL EmbeddedGuide to MySQL Embedded
Guide to MySQL Embedded
 
Stateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the CloudStateful on Stateless - The Future of Applications in the Cloud
Stateful on Stateless - The Future of Applications in the Cloud
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
Juniper: Data Center Evolution
Juniper: Data Center EvolutionJuniper: Data Center Evolution
Juniper: Data Center Evolution
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute Project
 
No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...No sql databases new millennium database for big data, big users, cloud compu...
No sql databases new millennium database for big data, big users, cloud compu...
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
How SQL Server Can Streamline Database Administration
How SQL Server Can Streamline Database AdministrationHow SQL Server Can Streamline Database Administration
How SQL Server Can Streamline Database Administration
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
MySQL en el mundo real. Evolución desde la compra por Oracle
MySQL en el mundo real. Evolución desde la compra por OracleMySQL en el mundo real. Evolución desde la compra por Oracle
MySQL en el mundo real. Evolución desde la compra por Oracle
 
IBM Cloud
IBM CloudIBM Cloud
IBM Cloud
 
GigaOm Research: Bare-Metal-Clouds
GigaOm Research: Bare-Metal-CloudsGigaOm Research: Bare-Metal-Clouds
GigaOm Research: Bare-Metal-Clouds
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

Guide to NoSQL with MySQL: Delivering the Best of Both Worlds for Web Scalability

  • 1. Guide to NoSQL with MySQL Delivering the Best of Both Worlds for Web Scalability A MySQL® White Paper February 2012 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
  • 2. Table of Contents Introduction ................................................................................................... 3! New Technologies Driving New Database Demands ................................ 3! NoSQL Key-Value Access to MySQL with Memcached ............................ 5! Memcached Implementation for InnoDB .................................................. 5! Memcached Implementation for MySQL Cluster ...................................... 6! Using Memcached for Schema-less Data ................................................ 8! Scaling MySQL with Replication and Sharding ......................................... 8! MySQL Replication ................................................................................... 8! Sharding ................................................................................................. 10! MySQL Cluster: Auto-Sharding, HA & Online Schema Maintenance .... 11! MySQL Cluster Architecture ................................................................... 11! Auto-Sharding for High Write Scalability ................................................ 12! Scaling Across Data Centers ................................................................. 13! Benchmarking Scalability & Performance on Commodity Hardware ..... 14! Scaling Operational Agility to Support Rapidly Evolving Services ......... 14! On-Line Schema Evolution ..................................................................... 15! Delivering 99.999% Availability .............................................................. 16! SQL and NoSQL Interfaces in MySQL Cluster ...................................... 17! Integrating MySQL with Data Stores and Frameworks ........................... 18! Binlog API ............................................................................................... 18! Apache Sqoop for Hadoop Integration ................................................... 18! Operational Best Practices ........................................................................ 19! Conclusion .................................................................................................. 20! Additional Resources ................................................................................. 20! Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 2
  • 3. Introduction The explosive growth in data volumes is driving new levels of innovation in database technologies, reflected in the rise of Big Data and NoSQL data stores, coupled with the continued evolution of Relational Database Management Systems (RDBMS). For the first time in many years, software architects, developers and administrators are faced with an array of new challenges in managing this data growth, and a bewildering array of potential solutions. MySQL is the leading open source database for web and cloud services. Oracle’s MySQL team has therefore been at the forefront in understanding these challenges, and innovating with capabilities that are designed to blend both relational and NoSQL technologies. This Guide introduces these capabilities and demonstrates how organizations can achieve: 1. High scalability and availability with flexible development, rapid iteration and ease-of-use; 2. While maintaining the benefits of data integrity and the ability to support rich queries. The result is solutions that deliver the best of both technologies, enabling organizations to harness the benefits of data growth while leveraging established and proven industry standards and skill-sets across their development and operational environments to capture, enrich and gain new insight from big data. New Technologies Driving New Database Demands It should come as no surprise that data volumes are growing at a much faster rate than anyone previously 1 predicted. The McKinsey Global Institute estimates this growth at 40% per annum . Figure 1: Key Trends Driving Growth in Data Volumes Increasing internet penetration rates, social networking, high speed wireless broadband connecting smart mobile devices, and new Machine to Machine (M2M) interactions are just some technologies fueling this growth, and reflected in the following research: 2 • Internet penetration rates growing to over 30% of the global population ; 1 http://www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf 2 http://www.internetworldstats.com/emarketing.htm Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 3
  • 4. By 2015, global IP networks will deliver 7.3 petabytes of data every 5 minutes – that’s the gigabyte 3 equivalent of all movies ever made crossing the network, in just 5 minutes ; 4 • 5.9bn mobile device subscriptions at the end of 2011; 5 • Global eCommerce revenues forecast to reach $1 trillion by 2014 . The databases needed to support such a massive growth in data have to meet new challenges, including: • Automatically scaling database operations, both reads and writes, across commodity hardware; • High availability for 24x7 service uptime; • Choice of data access patterns and APIs introducing new levels of flexibility and convenience for developers; • Rapidly iterate and evolve databases and schemas to meet user demands for new functionality and dynamically scale for growth; • Ease-of-Use with reduced barriers to entry enabling developers to quickly innovate in the delivery of differentiated services. Of course it can be argued that databases have had these requirements for years – but they have rarely had to address all of these requirements at once, in a single application. For example, some databases have been optimized for low latency and high availability, but then don’t have a means to scale write operations and can be difficult to use. Other databases maybe very simple to start with, but lack capabilities that make them suitable for applications with demanding uptime requirements. It is also important to recognize that not all data is created equal. It is not catastrophic to lose individual elements of log data or status updates – the value of this data is more in its aggregated analysis. But some data is critical, such as ecommerce transactions, financial trades, customer updates, billing operations, order placement and processing, access and authorization, service entitlements, etc. Services generating and consuming this type of data still need the back-end data-store to meet the scalability and flexibility challenges discussed above, while still: 6 • Preserving transactional integrity with ACID compliance; • Enabling deeper insight by running complex queries against the data; • Leveraging the proven benefits of industry standards and skillsets to reduce cost, risk and complexity; • Maintaining a well-structured data model, allowing simple data re-use between applications. With these types of requirements, it becomes clear that there is no single solution to meet all needs, and it is therefore essential to mix and match technologies in order to deliver functionality demanded by the business. Figure 2: Combining the best of NoSQL & RDBMS 3 http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11- 481360_ns827_Networking_Solutions_White_Paper.html 4 http://royal.pingdom.com/2012/01/17/internet-2011-in-numbers/ 5 http://techcrunch.com/2011/01/03/j-p-morgan-global-e-commerce-revenue-to-grow-by-19-percent-in-2011-to-680b/ 6 http://en.wikipedia.org/wiki/ACID Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 4
  • 5. Through a combination of NoSQL access methods, technologies for scaling out on commodity hardware, integrated HA and support for on-line operations such as schema changes, MySQL is able to offer solutions that blend the best of both worlds. NoSQL Key-Value Access to MySQL with Memcached Providing NoSQL interfaces directly on top of a MySQL storage engine offers several major advantages: • Delivers high performance for simple queries by bypassing the SQL layer; • Enables developers to work in their native programming languages and choose if and when to apply schemas when handling data management within the application. Faster development cycles result in accelerated time to market for new services. Using NoSQL interfaces to MySQL, users can maintain all of the advantages of their existing relational database infrastructure including the ability to run complex queries, while providing blazing fast performance for simple Key/Value operations, using an API to complement regular SQL access to their data. MySQL Cluster 7.2 includes support for the Memcached API, providing a native Key-Value interface to the database. This functionality has also been previewed as part of MySQL 5.6 for native access to the InnoDB storage engine. Using the Memcached API, web services can directly access InnoDB and MySQL Cluster without transformations to SQL, ensuring low latency and high throughput for read/write queries. Operations such as SQL parsing are eliminated and more of the server’s hardware resources (CPU, memory and I/O) are dedicated to servicing the query within the storage engine itself. Over and above higher performance and faster development cycles, there are a number of additional benefits: • Preserves investments in Memcached infrastructure by re-using existing Memcached clients and eliminates the need for application changes; • Flexibility to concurrently access the same data set with SQL, allowing complex queries to be run while simultaneously supporting Key-Value operations from Memcached. • Extends Memcached functionality by integrating persistent, crash-safe, transactional database back- ends offering ACID compliance, rich query support and extensive management and monitoring tools; • Reduces service disruption caused by cache re-population after an outage; • Simplifies web infrastructure by optionally compressing the caching and database layers into a single data tier, managed by MySQL; • Reduces development and administration effort by eliminating the cache invalidation and data consistency checking required to ensure synchronization between the database and cache when updates are committed; • Eliminates duplication of data between the cache and database, enabling simpler re-use of data across multiple applications, and reducing memory footprint; • Simple to develop and deploy as many web properties are already powered by a MySQL database with caching provided by Memcached. The following sections of the Guide discuss the Memcached implementation for both InnoDB and MySQL Cluster. Memcached Implementation for InnoDB The Memcached API for InnoDB has been previewed as part of the MySQL 5.6 Development Milestone Release. As illustrated in the following figure, Memcached for InnoDB is implemented via a Memcached daemon plug- in to the mysqld process, with the Memcached protocol mapped to the native InnoDB API. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 5
  • 6. Figure 3: Memcached API Implementation for InnoDB With the Memcached daemon running in the same process space, users get very low latency access to their data while also leveraging the scalability enhancements delivered with InnoDB and a simple deployment and management model. Multiple web / application servers can remotely access the Memcached / InnoDB server to get direct access to a shared data set. With simultaneous SQL access, users can maintain all the advanced functionality offered by InnoDB including support for Foreign Keys, XA transactions and complex JOIN operations. Additional Memcached API features provided in the preview release include: • Support for both the Memcached text-based and binary protocols; the Memcached API for InnoDB passes all of the 55 memcapable tests, ensuring compatibility with existing applications and clients; • Supports users mapping multiple columns into a “value”. A pre-defined “separator” which is fully configurable separates the value. • Deployment flexibility with three options for local caching of data: “cache-only”, “innodb-only”, and “caching” (both “cache” and “innodb store”). These local options apply to each of four Memcached operations (set, get, delete and flush). • All Memcached commands are captured in the MySQL Binary Log (Binlog), used for replication. This makes it very easy to scale database operations out across multiple commodity nodes, as well as provide a foundation for high availability. You can learn how to configure the Memcached API for InnoDB here: http://blogs.innodb.com/wp/2011/04/get-started-with-innodb-memcached-daemon-plugin/ Memcached Implementation for MySQL Cluster Memcached API support for MySQL Cluster was introduced with General Availability (GA) of the 7.2 release, and joins an extensive range of NoSQL interfaces that are already available for MySQL Cluster (discussed later in this Guide). Like Memcached, MySQL Cluster provides a distributed hash table with in-memory performance. MySQL Cluster extends Memcached functionality by adding support for write-intensive workloads, a full relational model with ACID compliance (including persistence), rich query support, auto-sharding and 99.999% availability, with extensive management and monitoring capabilities. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 6
  • 7. All writes are committed directly to MySQL Cluster, eliminating cache invalidation and the overhead of data consistency checking to ensure complete synchronization between the database and cache. Figure 4: Memcached API Implementation with MySQL Cluster Implementation is simple - the application sends reads and writes to the Memcached process (using the standard Memcached API). This in turn invokes the Memcached Driver for NDB (which is part of the same process), which in turn calls the NDB API for very quick access to the data held in MySQL Cluster’s data nodes. The solution has been designed to be very flexible, allowing the application architect to find a configuration that best fits their needs. It is possible to co-locate the Memcached API in either the data nodes or application nodes, or alternatively within a dedicated Memcached layer. Developers can still have some or all of the data cached within the Memcached server (and specify whether that data should also be persisted in MySQL Cluster) – so it is possible to choose how to treat different pieces of data, for example: • Storing the data purely in MySQL Cluster is best for data that is volatile, i.e. written to and read from frequently; • Storing the data both in MySQL Cluster and in Memcached is often the best option for data that is rarely updated but frequently read; • Data that has a short lifetime, is read frequently and does not need to be persistent could be stored only in Memcached. The benefit of this flexible approach to deployment is that users can configure behavior on a per-key-prefix basis (through tables in MySQL Cluster) and the application doesn’t have to care – it just uses the Memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized. Any changes made to the key-value data stored in MySQL Cluster will be recorded in the binary log, and so will be applied during replication and point-in-time recovery. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 7
  • 8. Using Memcached for Schema-less Data By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table. Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster. Scaling MySQL with Replication and Sharding When it comes to scaling MySQL for highly demanding web applications, replication and sharding (or horizontal partitioning) are typically part of the solution. As discussed below, when using the MySQL Server with InnoDB, replication can be configured asynchronously or semi-synchronously, and sharding is implemented at the application layer. Using MySQL Cluster, discussed in the next section of this Guide, sharding is implemented automatically at the database layer with support for active-active, multi-master replication, thereby providing an alternative development and deployment model for applications requiring the highest levels of write throughput and availability. MySQL Replication One of the ways many new applications achieve scalability is via replicating databases across multiple commodity nodes in order to distribute the workload. From very early releases, replication has been a standard and integrated component of MySQL, providing the foundation for extreme levels of database scalability and high availability in leading web properties including some of the world’s most highly trafficked Web sites including Facebook, YouTube, Google, Yahoo!, Flickr and Wikipedia. Using MySQL replication, it is simple for users to rapidly create multiple replicas of their database to elastically scale-out beyond the capacity constraints of a single instance, enabling them to serve rapidly growing database workloads as well as withstand failures. MySQL Replication works by configuring one server as a master, with one or more additional servers configured as slaves. The master server will log the changes to the database. Once these changes have 8 been logged, they are then sent and applied to the slave(s) immediately or after a set time interval . As illustrated in the figure below, the master server writes updates to its binary log files that serve as a record of updates to be sent to slave servers. When a slave connects to the master, it determines the last position read in the logs on its most recent successful connection to the master. The slave then receives any updates which have taken place since that time and applies them, before continuing to poll the master for new updates. The slave continues to service read requests during these processes. 8 Time Delayed Replication is a feature of the MySQL 5.6 Development Milestone Release Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 8
  • 9. Figure 5: MySQL Replication Supports HA and Read Scalability “Out of the Box” Replication is often employed in a scale-out implementation so that requests which simply “read” data can be directed to slave servers. This allows transactions involving writes to be exclusively executed on the master server. This typically leads to not only a performance boost but a more efficient use of resources, and is implemented either within the application code or by using the appropriate MySQL connector (e.g. the Connector/J JDBC or mysqlnd PHP drivers). Figure 6: Scaling Reads with MySQL Replication MySQL replication provides the ability to scale out large web farms where reads (SELECTs) represent the majority of operations performed on the database. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 9
  • 10. 9 10 The MySQL 5.6 Development Milestone and Early Access Releases preview a host of new capabilities that directly enhance replication performance, data integrity and availability, including: 11 • Global Transaction Identifiers which tag each replication event with a unique ID, making it simple to failover from a master to the most up-to-date slave, as well as create multi-master circular (ring) replication topologies; • Multi-Threaded Slaves that improve performance by using multiple execution threads to apply replication events in parallel to slave servers; • Crash-Safe Slaves which enable slaves to automatically recover from failures; • Replication Event Checksums that protect the integrity of slave databases by automatically detecting corruption whether caused by memory, disk or network failures, or by the database itself; Other key features include the preview of optimized row-based replication, binlog group commit, time-delayed replication and remote binlog backup, collectively enhancing performance, flexibility and ease of use. MySQL replication can be deployed in a range of topologies to support diverse scaling and HA requirements. To learn more about these options and how to configure MySQL replication, refer to the following whitepaper: http://mysql.com/why-mysql/white-papers/mysql-wp-replication.php Sharding Sharding is commonly used by large web properties to increase the scalability of writes across their database. In high traffic Web 2.0 environments such as social networking a growing percentage of content is generated by users themselves, thereby demanding higher levels of write scalability, handling rapidly changing data. Despite growing write volumes, the reality is that read operations still predominate in Web 2.0-type environments and that millions of users can still be serviced with web applications, without requiring any database sharding at all. Sharding – also called application partitioning - involves the application dividing the database into smaller data sets and distributing them across multiple servers. This approach enables very cost effective scalability as low cost commodity servers can be easily deployed when capacity requirements grow. In addition, query processing accesses just a sub-set of the data, thereby boosting performance. Applications need to become “shard-aware” so that writes can be directed to the correct shard, and cross- shard JOIN operations are avoided. If best practices are applied, then sharding becomes an effective way to scale relational databases supporting web-based write-intensive workloads. As sharding is implemented at the application layer, it is important to utilize best practices in application and database design. The Architecture and Design Consulting Engagement offered by the MySQL Consulting group in Oracle has enabled many customers to reduce the complexity of deployments where sharding is used for scalability. By deploying MySQL with sharding and replication, leading social networking sites such as Facebook, Flickr, LinkedIn and Twitter have been able to scale their requirements for relational data management exponentially. If applications involve high volumes of write operations (i.e. INSERTs and UPDATES), coupled with the need for continuous operation, it is worthwhile considering using MySQL Cluster, discussed below. 9 http://blogs.oracle.com/MySQL/entry/mysql_development_milestone_562_taking_mysql_replication_to_the_next_level 10 http://blogs.oracle.com/MySQL/entry/mysql_5_6_replication_new 11 http://d2-systems.blogspot.com/2011/10/global-transaction-identifiers-feature.html Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 10
  • 11. MySQL Cluster: Auto-Sharding, HA & Online Schema Maintenance MySQL Cluster has many attributes that make it ideal for new generations of highly dynamic, highly scalable applications, including: • Auto-sharding for write-scalability; • Cross-data center geographic synchronous and asynchronous replication; • Online scaling and schema upgrades; • SQL and NoSQL interfaces; • Integrated HA for 99.999% uptime. MySQL Cluster is a scalable, real-time, ACID-compliant transactional database, combining 99.999% availability with the low TCO of open source. Designed around a distributed, multi-master architecture with no single point of failure, MySQL Cluster scales horizontally on commodity hardware with auto-sharding to serve read and write intensive workloads, accessed via SQL and NoSQL interfaces. MySQL Cluster is deployed in the some of the largest web, telecoms and embedded applications on the 12 planet, serving high-scale, business-critical applications , including: ! High volume OLTP; ! Real time analytics; ! Ecommerce, inventory management, shopping carts, payment processing, fulfillment tracking, etc.; ! Financial trading with fraud detection; ! Mobile and micro-payments; ! Session management & caching; ! Feed streaming, analysis and recommendations; ! Content management and delivery; ! Massively Multiplayer Online Games; ! Communications and presence services; ! Subscriber / user profile management and entitlements MySQL Cluster Architecture As illustrated in the figure below, MySQL Cluster comprises three types of node which collectively provide service to the application: • Data nodes manage the storage and access to data. Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healing. • Application nodes provide connectivity from the application logic to the data nodes. Multiple APIs are presented to the application. MySQL provides a standard SQL interface, including connectivity to all of the leading web development languages and frameworks. There are also a whole range of NoSQL interfaces including memcached, REST/HTTP, C++ (NDB-API), Java and JPA. • Management nodes are used to configure the cluster and provide arbitration in the event of network partitioning. 12 http://mysql.com/customers/cluster/ Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 11
  • 12. Figure 7: The MySQL Cluster architecture provides high write scalability across multiple SQL & NoSQL APIs The following section of the Guide discusses how MySQL Cluster auto-shards for write scalability, delivers HA and on-line operations, enabling developers and administrators to support new and rapidly evolving services. Auto-Sharding for High Write Scalability MySQL Cluster is designed to support write-intensive web workloads with the ability to scale from several to several hundred nodes – allowing services to start small and scale quickly as demand takes-off. MySQL Cluster is implemented as an active/active, multi-master database ensuring updates can be handled by any data node, and are instantly available to all of the other applications / clients accessing the cluster. Tables are automatically sharded across a pool of low cost commodity data nodes, enabling the database to scale horizontally to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs. By automatically sharding tables at the database layer, MySQL Cluster eliminates the need to shard at the application layer, greatly simplifying application development and maintenance. Sharding is entirely transparent to the application which is able to connect to any node in the cluster and have queries automatically access the correct shards needed to satisfy a query or commit a transaction. By default, sharding is based on the hashing of the whole primary key, which generally leads to a more even distribution of data and queries across the cluster than alternative approaches such as range partitioning. It is important to note that unlike other distributed databases, users do not lose the ability to perform JOIN operations or sacrifice ACID-guarantees when performing queries and transactions across shards. The Adaptive Query Localization feature of MySQL Cluster 7.2 pushes JOIN operations down to the data nodes where they are executed locally and in parallel, significantly reducing network hops and delivering 70x higher 14 throughput and lower latency . This enables users to perform complex queries, and so MySQL Cluster is better able to serve those use- cases that have the need to run real-time analytics across live data sets, along with high throughput OLTP 14 http://dev.mysql.com/tech-resources/articles/mysql-cluster-7.2.html Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 12
  • 13. operations. Examples include recommendations engines and clickstream analysis in web applications, pre- pay billing promotions in mobile telecoms networks or fraud detection in payment systems. Figure 8: Auto-Sharding in MySQL Cluster The figure below demonstrates how MySQL Cluster shards tables across data nodes of the cluster. From the figure below, you will also see that MySQL Cluster automatically creates “node groups” from the number of replicas and data nodes specified by the user. Updates are synchronously replicated between members of the node group to protect against data loss and enable sub-second failover in the event of a node failure. Figure 9: Automatic Creation of Node Groups & Replicas Scaling Across Data Centers Web services are global and so developers will want to ensure their databases can scale-out across regions. MySQL Cluster offers Geographic Replication that distributes clusters to remote data centers, serving to reduce the affects of geographic latency by pushing data closer to the user, as well as supporting disaster recovery. Geographic Replication is implemented via standard asynchronous MySQL replication, with one important difference: support for active / active geographically distributed clusters. Therefore, if your applications are Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 13
  • 14. attempting to update the same row on different clusters at the same time, MySQL Cluster’s Geographic Replication can detect and resolve the conflict, ensuring each site can actively serve read and write requests while maintaining data consistency across the clusters. Geographic Replication also enables data to be replicated in real-time to other MySQL storage engines. A typical use-case is to replicate tables from MySQL Cluster to the InnoDB storage engine in order to generate complex reports from near real-time data, with full performance isolation from the live data store. New in MySQL Cluster 7.2: Multi-Site Clustering MySQL Cluster 7.2 includes Multi-Site Clustering, providing a new option for cross data center scalability. For the first time splitting data nodes across data centers is a supported deployment option. With this deployment model, tables are automatically sharded across sites with synchronous replication between them. Failovers between sites are handled automatically by MySQL Cluster. Whether you are replicating within the cluster, between Clusters or between storage engines, all replication activities are performed concurrently – so users do not need to trade-off one scaling strategy for another. Benchmarking Scalability & Performance on Commodity Hardware All of the technologies discussed above are designed to provide read and write scalability for your most demanding transactional web applications – but what do they mean in terms of delivered performance? 15 The MySQL Cluster development team recently ran a series of benchmarks that characterized performance across eight commodity dual socket (2.93GHz), 6-core Intel servers, each equipped with 48GB of RAM, running Linux and connected via Infiniband. As seen in the figure below, MySQL Cluster delivered just over 1 Billion Reads per minute and 110m Writes per minute. Synchronous replication between node groups was configured, enabling both high performance and high availability – without compromise. SELECT Queries per Minute UPDATE Queries per Minute 1,200 120 1,000 100 800 80 Millions Millions 600 60 400 40 200 20 0 0 2 4 8 4 8 Number of Data Nodes Number of Data Nodes Figure 5: MySQL Cluster performance scaling-out on commodity nodes. These results demonstrate how users can build a highly performing, highly scalable MySQL Cluster from low- cost commodity hardware to power their most mission-critical web services. Scaling Operational Agility to Support Rapidly Evolving Services Scaling performance is just one dimension – albeit a very important one – of scaling today’s databases. As a new service gains in popularity it is important to be able to evolve the underlying infrastructure seamlessly, without incurring downtime and having to add lots of additional DBA or developer resource. 15 http://mysql.com/why-mysql/benchmarks/mysql-cluster/ Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 14
  • 15. Users may need to increase the capacity and performance of the database; enhance their application (and therefore their database schema) to deliver new capabilities and upgrade their underlying platforms. MySQL Cluster can perform all of these operations and more on-line – without interrupting service to the application or clients. In the following example, the cluster on the left is configured with two application and data nodes and a single management server. As the service grows, the users are able to scale the database and add management redundancy – all of which can be performed as an online operation. An added advantage of scaling the Application Nodes is that they provide elasticity in scaling, so can be scaled back down if demand to the database decreases. Figure 11: Doubling Database Capacity & Performance On-Demand and On-Line With its shared-nothing architecture, it is possible to avoid database outages by using rolling restarts to not only add but also upgrade nodes within the cluster. Using this approach, users can: • Upgrade or patch the underlying hardware and operating system; • Upgrade or patch MySQL Cluster, with full online upgrades between releases. MySQL Cluster supports on-line, non-blocking backups, ensuring service interruptions are again avoided during this critical database maintenance task. On-Line Schema Evolution As services evolve, developers often want to add new functionality, which in many instances may demand updating the database schema. This operation can be very disruptive for many databases, with ALTER TABLE commands taking the database offline for the duration of the operation. When users have large tables with many millions of rows, downtime can stretch into hours or even days. MySQL Cluster supports on-line schema changes, enabling users to add new columns and tables and add and remove indexes – all while continuing to serve read and write requests, and without affecting response times. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 15
  • 16. Of course, if using the Memcached API, the application can avoid the need for data schemas altogether. Unlike other on-line schema update solutions, MySQL Cluster does not need to create temporary tables, therefore avoiding the need to provision double the usual memory or disk space in order to complete the operation. Delivering 99.999% Availability No matter how well a new service is architected for performance, scalability and agility, if it is not up and available, it will never be successful. The architecture of MySQL Cluster is designed to deliver 99.999% availability, which includes both regularly scheduled maintenance operations, discussed above, as well as systems failures. The distributed, shared-nothing architecture of MySQL Cluster has been carefully designed to ensure resilience to failures, with self-healing recovery: • The heartbeating mechanism within MySQL Cluster automatically detects any failures instantly and control is automatically failed over to other active nodes in the cluster, without interrupting service to the clients. • In the event of a failure, the MySQL Cluster nodes are able to self-heal by automatically restarting and resynchronizing with other nodes before re-joining the cluster, all of which is completely transparent to the application. • The data within a data node is synchronously replicated to all nodes within the Node Group. If a data node fails, then there is always at least one other data node storing the same information. • With its shared-nothing architecture, each data node has its own disk and memory storage, so the risk of a complete cluster failure caused by a shared component, such as storage, is eliminated. Designing the cluster in this way makes the system reliable and highly available since single points of failure have been eliminated. Any node can be lost without it affecting the system as a whole. As illustrated in the figure below, an application can, for example, continue executing even though multiple management, API and Data Nodes are down, provided that there are one or more surviving nodes in its node group. Figure 12: With no single point of failure, MySQL Cluster delivers extreme resilience to failures As already discussed, in addition to site-level high-availability achieved through the redundant architecture of MySQL Cluster, geographic redundancy can be achieved using replication between two or more Clusters, or splitting data nodes of a single cluster across sites. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 16
  • 17. SQL and NoSQL Interfaces in MySQL Cluster As MySQL Cluster stores tables in data nodes, rather than in the MySQL Server, there are multiple interfaces available to access the database. We have already discussed the Memcached API, but MySQL Cluster offers other interfaces as well for maximum developer flexibility. Developers have a choice between: • SQL for complex queries and access to a rich ecosystem of applications and expertise; • Simple Key-Value interfaces bypassing the SQL layer for blazing fast reads & writes; • Real-time interfaces for microsecond latency. The following chart shows all of the access methods available to the database. The native API for MySQL Cluster is the C++ based NDB API. All other interfaces access the data through the NDB API. Figure 13: Ultimate Developer Flexibility – MySQL Cluster APIs At the extreme right hand side of the chart, an application has embedded the NDB API library enabling it to make native C++ calls to the database, and therefore delivering the lowest possible latency. On the extreme left hand side of the chart, MySQL presents a standard SQL interface to the data nodes, and provides connectivity to all of the standard MySQL connectors including: • Common web development languages and frameworks, i.e. PHP, Perl, Python, Ruby, Ruby on Rails, Spring, Django, etc; • JDBC (for additional connectivity into ORMs including EclipseLink, Hibernate, etc); • .NET, ODBC, etc. Whichever API is chosen for an application, it is important to emphasize that all of these SQL and NoSQL access methods can be used simultaneously, across the same data set, to provide the ultimate in developer flexibility. Therefore, MySQL Cluster maybe supporting any combination of the following services, in real- time: • Relational queries using the SQL API; • Key-Value-based web services using the Memcached and REST/HTTP and APIs; • Enterprise applications with the ClusterJ and JPA APIs; • Real-time web services (i.e. presence and location based) using the NDB API. With auto-sharding, cross-data center geographic replication, online operations, SQL and NoSQL interfaces and 99.999% availability, MySQL Cluster is already serving some of the most demanding web and mobile telecoms services on the planet. Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 17
  • 18. Integrating MySQL with Data Stores and Frameworks Increases in data variety and volume are driving more holistic approaches to managing the data lifecyle – from data acquisition to organization and then analysis. In the past, one type of database may have been sufficient for most needs – now data management is becoming increasingly heterogenous, with a requirement for simple integration between different data stores and frameworks, each deployed for a specific task. As the leading open source database for web and cloud environments, MySQL has long offered comprehensive integration with applications and development environments. This integration is growing with new Oracle and community developed initiatives, including: - The Binary Log (Binlog) API; - Apache Sqoop - for MySQL integration with Hadoop / HDFS. Binlog API The Binlog API enables developers to reduce the complexity of integration between different stores by standardizing SQL data management operations on MySQL, while enabling replication to other services within their data management infrastructure. The Binlog API exposes a programmatic C++ interface to the binary log, implemented as a standalone library. Using the API, developers can read and parse binary log events both from existing binlog files as well as from running servers and replicate the changes into other data stores, including Hadoop, search engines, columnar stores, BI tools, etc. You can download the code as part of MySQL Server Snapshot: mysql-5.6-labs-binary-log-api from http://labs.mysql.com! To demonstrate the possibilities of the Binlog API, an example application for replicating changes from MySQL Server to Apache Solr (a full text search server) has been developed. The example is available with the source download on Launchpad. Apache Sqoop for Hadoop Integration Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Users can use Sqoop to import data from MySQL into the Hadoop Distributed File System (HDFS) or related systems such as Hive and HBase. Conversely, users can use Sqoop to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses. Sqoop integrates with Oozie, allowing users to schedule and automate import and export tasks. Sqoop uses a connector-based architecture which supports plugins that provide connectivity to new external systems. When using Sqoop, the dataset being transferred is sliced up into different partitions and a map-only job is launched with individual mappers responsible for transferring a slice of this dataset. Each record of the data is handled in a type safe manner since Sqoop uses the database metadata to infer the data types. By default Sqoop includes connectors for most leading databases including MySQL and Oracle, in addition to a generic JDBC connector that can be used to connect to any database that is accessible via JDBC. Sqoop also includes a specialized fast-path connector for MySQL that uses MySQL-specific batch tools to transfer data with high throughput. You can see practical examples of importing and exporting data with Sqoop here: https://blogs.apache.org/sqoop/entry/apache_sqoop_overview Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 18
  • 19. Operational Best Practices For business critical applications, it is recommended to rely on Oracle’s MySQL Enterprise Edition or MySQL Cluster Carrier Grade (CGE) Edition. MySQL Enterprise Edition and Cluster CGE include the most comprehensive set of advanced features, management tools and technical support to help you achieve the highest levels of MySQL scalability, security and uptime. It includes: • MySQL Database • Oracle Premier Support for MySQL: Staffed with MySQL Experts, Oracleʼs dedicated MySQL technical support team can help you solve your most complex issues rapidly, and make the most of your MySQL deployments. Oracle MySQL Support Engineers are available 24/7/365 either online or by phone and have direct access to the MySQL developers. Oracle Premier Support does not include any limit on the number of support incidents, and you get support in 29 languages. • MySQL Enterprise Backup: Reduces the risk of data loss by enabling online "Hot" backups of your databases. • New! MySQL Enterprise Scalability: Enables you to meet the sustained performance and scalability requirements of ever increasing user, query and data loads. MySQL Thread Pool provides an efficient, thread-handling model designed to reduce overhead in managing client connections, and statement execution threads. • New! MySQL Enterprise Security: Provides ready to use external authentication modules to easily integrate MySQL with existing security infrastructures including PAM and Windows Active Directory. • New! MySQL Enterprise High Availability: Provides you with certified and supported solutions, including MySQL Replication, Oracle VM Templates for MySQL and Windows Failover Clustering for MySQL. • MySQL Enterprise Monitor: The MySQL Enterprise Monitor and the MySQL Query Analyzer continuously monitor your databases and alerts you to potential problems before they impact your system. It's like having a "Virtual DBA Assistant" at your side to recommend best practices to eliminate security vulnerabilities, improve replication, optimize performance and more. As a result, the productivity of your developers, DBAs and System Administrators is improved significantly. • MySQL Workbench: Provides data modeling, SQL development, and comprehensive administration tools for server configuration, user administration, and much more. • MySQL Enterprise Oracle Certifications: MySQL Enterprise Edition is certified with Oracle Linux, Oracle VM, Oracle Fusion middleware, Oracle Secure Backup and Oracle GoldenGate. Additional MySQL Services from Oracle to help you develop, deploy and manage highly scalable & available MySQL applications include: Oracle University Oracle University offers an extensive range of MySQL training from introductory courses (i.e. MySQL Essentials, MySQL DBA, etc.) through to advanced certifications such as MySQL High Availability and MySQL Cluster Administration. It is also possible to define custom training plans for delivery on-site. You can learn more about MySQL training from the Oracle University here: http://www.mysql.com/training/ MySQL Consulting To ensure best practices are leveraged from the initial design phase of a project through to implementation and sustaining, users can engage Professional Services consultants. Delivered remote or onsite, these engagements help in optimizing the architecture for scalability, high availability and adaptability Again Oracle offers a full range of consulting services, from Architecture and Design through to High Availability, Replication and Clustering. You can learn more at http://www.mysql.com/consulting/ Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 19
  • 20. Conclusion The massive growth in data volumes is placing unprecedented demands on databases and frameworks. There are many new approaches that promise high scalability and availability with flexible development, rapid iteration and ease-of-use. As discussed, many applications also need to maintain transactional integrity of data, as well as the ability to run complex queries against their data to ensure business insight and competitive advantage. Through developments in NoSQL interfaces, replication and MySQL Cluster, users can blend the best of NoSQL and relational data stores to provide a complete solution to application requirements, and enjoy the benefits of both technologies. Additional Resources MySQL Cluster 5.6 Development Milestone Release Article: https://dev.mysql.com/tech-resources/articles/ MySQL Cluster 5.6 and MySQL Cluster 7.2 Downloads: http://dev.mysql.com/downloads/ MySQL Cluster 7.2 New Features (including Memcached API configuration): http://mysql.com/why-mysql/white-papers/mysql_wp_cluster7_architecture.php Configuring Memcached API for InnoDB: http://blogs.innodb.com/wp/2011/04/get-started-with-innodb-memcached-daemon-plugin/ MySQL Replication Whitepaper: http://mysql.com/why-mysql/white-papers/mysql-wp-replication.php Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Page 20