4. tier (memcache, redis, etc) and database not expected to do heavy processing. Whenever there
is a tradeoff between speed and capabilities, MySQL tends to keep its database engine fast.
MySQL Advantages:
● Free, fast, light weight, open source
● High on developer productivity, easy and quick to setup.
● Active open source development community, more capabilities being added with every
release.
MySQL Limitations:
● Single threaded replication, unable to support heavy writes.
● Limited scalability on single node, unable to handle large data size.
● Limited data consistent high availability under heavy writes.
MySQL Scalability Options:
● Scale out reads by adding slaves. MySQL 5.6 offers GTID based parallel replication
which is still per schema. MySQL 5.7 offers true parallel replication.
● Scale out reads and writes by sharding. This should be part of architecture runway
discussion. Carefully review the sharding key, sharding unit, process to handle data
merges and choose it only when your application growth absolutely warrants need for
application based partitioning. Currently there is no mature auto sharding technology, so
application logic is required to deal with the complexity of sharding, from key lookups,
result set merges and additional aggregation logic. Few open source sharding
technologies to consider: jetpants from Yahoo tumblr, vitess from Google, MySQL fabric
from Oracle, Tesora virtualization
When to choose Oracle:
Use Oracle when you need ACID compliant RDBMS for storing revenue critical medium to large
footprint transactional and persistent data. Oracle is preferred for large scale enterprise data
marts and data warehouses where you need lots of functionality within database like
partitioning, parallelism, resource manager, very large joins and analytic functions and the DB
storage requirement is in the range of hundreds of terabytes. RAC is also a good choice for
5. large OLTP database with heavy read/write traffic, especially when sharding is not an option for
application requirement.
Oracle Advantages:
● Feature rich to support VLDBs
● HA with data consistency
● Well instrumented with detailed data dictionary for better performance monitoring and
tuning
● Well integrated with Hadoop for reporting and analytics
Oracle Limitations:
● H/W setup is rack dependent. All nodes within Oracle cluster should be setup within
same cabinet and connected to same switch.
● Cluster installation dependent on infrastructure expertise. Experienced DBAs, storage
admins, sys admins needed to draw cabling diagram and provide custom install
instructions to data center team.
● Annual Licensing cost
Oracle Scalability Options:
● Consider block storage SAN like EMC. Compared to file based NFS storage like Netapp,
SAN provides heavy throughput for IO intensive workloads.
● Implement Oracle dNFS (it’s free and not a licensed feature) for Netapp based storage
to provide multi path to NFS and increase the IO throughput.
● Add more nodes to Oracle cluster.
● Consider engineered appliances like Exadata if you can afford.
Feature Oracle MySQL
ACID ✔ ✔
Standard SQL Support ✔ ✔
Open Source ✖ ✔
6. Easier to install and manage ✖ ✔
Independent of Infrastructure expertise ✖ ✔
Popular with Web Apps ✖ ✔
VLDB ✔ ✖
Reporting Analytics ✔ ✖
Enterprise DW ✔ ✖
HA with data consistency ✔ ✖
Resource Manager ✔ ✖
Instrumentation ✔ ✖
Low Cost ✖ ✔
Big Data:
As the world becomes more instrumented, the volumes of data available to the enterprise are
growing by orders of magnitude exceeding the processing capacity of traditional data
warehouses based on relational data stores like Oracle. There are enterprise solutions like
Teradata, Netezza, Vertica, Exadata, Greenplum to mention a few but they are largely sold at
high price points.
Big Data Analytics:
As data sources became more diverse: structured, unstructured (images, documents, videos),
semistructured (XML, JSON, log files), Hadoop gained prominence as a platform designed to
use commodity hardware to build a massively parallel and highly scalable cost effective data
processing cluster. Inspired by GFS (Google File System), Hadoop is today deployed in many
organizations to store and analyze large amounts of log data. These low value event streams
are converted into high value aggregates for BI (business intelligence) applications and drive
actionable business insights, that’s what big data analytics is all about.
Hadoop Limitations: