Welcome to the Trafodion briefing series. With Trafodion, the future is now! Trafodion fulfills the promised delivery of both a real-time and Transactional SQL-on-HBase DBMS. The purpose of this segment is to introduce Trafodion and to discuss the business and technical cases for deploying Trafodion.
To begin with, the name Trafodion has special significance in that it is the Welsh translation for “transactions”. Trafodion is an open-source project that was incubated at HP Labs to develop a transactional SQL-on-HBase DBMS engine. Trafodion represents the combination of HBase plus transactional SQL technologies that HP has engineered representing more than 20 years worth of HP investments into database technologies and solutions. Trafodion specifically targets transactional or operational workloads as opposed to data warehousing or analytic workloads. Transactional workloads describe workloads previous identified as OLTP (online transaction processing) workloads, but expands that definition from the broad range of enterprise-level transactional applications (ERP, CRM, etc…) to include the new transactions generated from social and mobile data interactions and observations often incorporating a mixture of structured and semi-structured data.
From a HBase perspective, Trafodion is designed to:
Ride the unstoppable Hadoop wave or momentum that is currently underway. Hadoop is transforming how companies store, process and share big data. Companies are now able to process an unprecedented volumes of data using low cost server and storage technologies.
Most Hadoop software is available as open-source that can be downloaded for free! This offers companies new economic and technology infrastructure models that free businesses from vendor lock-in and prohibitive software licensing fees.
Community development holds the promise of leveraging a large pool of talented development resources that can speed time to market for new features and capabilities.
Provide schema flexibility and support for all forms of business data including structured, unstructured, and semi-structured data formats. Trafodion hosted applications can seamlessly access and join data from Trafodion, native HBase, and Hive tables without expensive replication or data movement overhead.
From a Transactional SQL perspective, Trafodion offers:
A comprehensive and full-functioned ANSI SQL DBMS which allows companies to reuse and leverage existing SQL skills to improve developer productivity.
Extends Hadoop HBase by adding support for ACID transaction protection that guarantees data consistency across multiple rows, tables, SQL statements.
Includes many optimizations for low-latency read and write transactions in support of the fast response time requirements of the operational SQL workloads Trafodion is targeting.
Hadoop workloads can be broadly categorized into different workload types i.e. Operational, Interactive, Non-Interactive, and Batch. These categories vary greatly in terms of their response time expectations as well as the amount of data that is typically processed (see the chart). The bracketed categories as shown are where the marketplace (vendors and customers) have predominantly focused their attention and therefore these are the most mature in nature in terms of development efforts and solution offerings. For the most part these categories represent efforts centered around “analytics” and business intelligence processing on “big data” problems. These workloads are well positioned to leverage Hadoop strengths and capabilities, map-reduce in particular.
In contrast, the workloads defined as “Operational” or “Transactional SQL” is an emerging Hadoop market category and therefore the least mature in nature. In part this is a direct result of Hadoop being perceived as having a number of weaknesses (or gaps) in terms of addressing the requirements for transactional workloads. Typically they have very stringent requirements in terms of response times (sub-second) expectations, transactional data integrity, number of users, concurrency, availability, and data volumes. In combination, these requirements can expose Hadoop limitations in terms of transaction support, bulletproof data integrity, real time performance, operational optimizations, and managing workloads comprised of a complex mix of concurrently executing transactions all with varying priorities.
Traditionally these workloads have been relegated to the domain of relational databases but there is growing interest and pressure to embrace these workloads in Hadoop due to Hadoop’ s perceived benefits of significantly reduced costs, reduced vendor lock-in, and its ability to seamlessly scale to larger workloads and data. This is exactly the workload that Trafodion is targeting. Trafodion addresses each of these limitations and as a result provides a differentiated DBMS capable of hosting these applications and their data.
The following quotes provide confirmation into the growing interest for transactional/operational workloads to be hosted in the Hadoop environment.
In this quote from GigaOM, it is noted that “there are other key features for an operational database: concurrency, interactive write speed, and distributed transactional support”. At the time of his writing, Mr Turian notes “Currently no existing SQL-on-Hadoop solution satisfies these requirements.”
In this quote from Forrester Research, it is noted that the #5 reason that Hadoop is Kicking Can and Taking Names is that “the future of Hadoop is real-time and transactional”. Also noted is that “the groundwork is being laid for an eruption in the data management technologies as Hadoop sneaks its way into the transactional database market”.
In this quote from Doug Cutting who is a co-founder of the Apache Hadoop project, he notes “So I think the prediction we can make here is that it is inevitable that we will see just about every kind of workload being moved to this platform – even Online Transaction Processing.”
These quotes provide confirmation into the growing interest for transactional/operational workloads to be hosted in the Hadoop environment. The only impediment is the lack of a database engine capable of supporting these transactional workload requirements. It is precisely this marketplace that Trafodion is intended to address.
Lets look at those application areas where Trafodion is a good fit. Transactional workloads are deemed mission critical in nature because they typically help companies make money, touch their customers or prospects, or help them run and operate their business. These transactional applications are commonplace in every industry sector. So what are some common use cases for Trafodion:
The first and most obvious are those applications being deployed on HBase today. Trafodion greatly simplifies these development efforts by providing a SQL application interface and ODBC/JDBC connectivity for existing 3rd party tools and applications.
The second use case is for new or existing applications where the scalability or licensing cost of hosting using as existing RDBMS is deemed prohibitive. The combination of Hadoop scalability, reduced infrastructure costs, and Trafodion’s open source distribution addresses these cases.
With the advent of the “growing internet of things”, the number and types of access devices has driven tremendous transaction and data growth and also the type of data that needs to be captured and utilized as part of the transactions. These next generation transactional applications often require multi-structured data types which implies that operational data is evolving rapidly to include a variety of data formats and types of data, for example transactional structured data combined with visual images.
Application characteristics that are a good fit for Trafodion:
Needs low latency access resulting in sub-second response time
Demands high concurrency
Requires distributed transaction support and ACID protection
Data integrity is of paramount importance
Requires compute and storage resources scalable beyond a single node
Uses open platform comprising of Linux and Hadoop ecosystem
Trafodion delivers on the promise of a full featured and optimized transactional SQL-on-HBase DBMS solution with full transactional data protection. This combination of HBase and an enterprise-class transactional SQL engine overcomes Hadoop’s weaknesses in terms of supporting operational workloads. Customers gain the following recognized benefits:
Ability to leverage their in-house SQL learnings and expertise versus having to learn complex map/reduce programming.
Seamless support for existing and new customer written or ISV operational applications drives investment protection and improved development productivity.
Workload optimizations provide the foundation for the delivery of next generation real-time transaction processing applications.
Guaranteed transactional consistency across multiple SQL statements, tables, and rows.
All while gaining or complementing the promised benefits associated with Hadoop!
Open source sponsorship and investment from HP
Finally you are encouraged to listen to additional segments of the Trafodion briefing series. Additional information can be found on the Trafodion wiki at www.Trafodion.org