The business analytics marketplace is experiencing a challenge as classic BI tools meet up with evolving big data technologies, in particular Hadoop. We explore how IBM works to meet this challenge, providing a big picture perspective of their big data offerings around Hadoop, its open data platform and BigInsights.
1. The Big Picture on Big Data and Cognos
www.senturus.com/blog/big-picture-big-data-cognos/
August 1, 2016 Business Strategy & Perspectives
IBM has a long history of supporting major open source projects and the most widely adopted open standards. Their
enterprise customers have benefited from the flexibility, choice, and innovation that come with the open source
philosophy. Major projects include SOA (Service-Oriented Architecture), Linux, Eclipse, and now Hadoop. The big
data analytics open source offering is known as the IBM Open Platform with Apache Hadoop. The commercial side
of this platform, announced in early 2015, is a suite of products for the enterprise branded as BigInsights.
To better understand IBM's big data offerings around Hadoop and its open data platform, it is helpful to put this in
context of the overall vision for the platform and the three phases of the IBM Big Data Analytics lifecycle:
1. Pull in all types of data from disparate sources
2. Put the data into a business context
3. Produce intelligent, data driven business outcomes, for example, operational efficiency, customer
engagement, or risk management
IBM endeavors to cover a lot of business territory with its analytics platform. For the enterprise IT department, the
technology enables data integration, governance, security, and regulatory compliance. For line of business
managers, the analytics environment is the home of customer and operational intelligence. While analytics play an
important role in increasing operational efficiency and eliminating business process bottlenecks, it is the customer-
centric analytics that have captured the imagination of business executives. Big data analytics offers many
opportunities for improving customer relationships and increasing engagement across marketing channels.
A common big data use case is delivering relevant promotions to customers. We all share the experience of
receiving credit card offers in the mail from the bank and tossing the envelope directly into the recycling bin without
even thinking about it. Despite the dismal response rate, it was cost effective for the bank to send the same direct
mail piece to everyone. With a big data platform, it is possible to develop customer profiles and create targeted
offers for each segment. For example, customers that have a single account and a short customer history would be
candidates for a different array of promotions than someone who has been a customer for decades. The cost of
amassing enough data and having the processing power to crunch the numbers in a timely fashion has dropped
1/3
2. enough to make it profitable to do so.
With digital advertising and social media data, analysis is required on huge amounts of unstructured data. A couple
of years ago this was experimental at best, but now Hadoop software enables capturing and processing
unprecedented amounts of data. It complements the enterprise data warehouse and is an integral part of the
business intelligence ecosystem.
Open Data Platform ODPi
The ODPi open data platform is a consortium of IBM and 18 other enterprise software vendors working together to
maximize the adoption of technologies based on Apache Hadoop. The goal of ODPi is to accelerate software
development by providing a standard Hadoop solution on which an applications can be run, whether it is
commercial software, open source, or custom code developed in-house. This gives enterprise customers assurance
that they are not locking themselves into a single vendor's Hadoop solution. It also permits using a Hadoop
implementation with products from multiple vendors. For Hadoop to fulfill its role as an enterprise data source, it
must accommodate a broad audience who will be using many different applications.
To that end, the ODPi provides a core platform of agreed on and tested big data Apache Hadoop modules. This is
the ODPi standard, on which the vendors build their applications. For example, Hortonworks, IBM Open Platform
4.0 with Apache Hadoop, EMC Pivotal HD 3.0, and Infosys IIP all adhere to the ODPi standard. Analytics software
vendors or in-house development shops can concentrate on developing applications further up the stack, knowing
that the Hadoop core adheres to a standard and its application will interoperate with any compliant Hadoop system.
This accelerates development, promotes code re-use, and simplifies the technical architecture. Implementing a
Hadoop distribution that adheres to the ODPi standard means not being locked into a proprietary technology.
As a standard, only time will tell if the ODPi will have a lasting impact. The organization has been criticized as being
nothing more than a joint marketing effort for vendors pushing their own commercial flavor of Hadoop. Also to note
are the big data vendors who are conspicuous by their absence: Cloudera, MapR, and Amazon (AWS – EMR
Elastic MapReduce).
IBM BigInsights and Cognos
On top of Hadoop, IBM has developed a suite of big data and analytics tools under the BigInsights brand. There are
tools for scaling and managing the platform (BigInsights Enterprise Management), a machine learning engine
(BigInsights Data Scientist – Decision Trees, PageRank, Clustering) and a data exploration and discovery tool
(BigSheets). Of particular interest to Cognos customers is BigSQL which runs SQL queries against Hadoop or in
other words, BigSQL permits Cognos to use Hadoop as a data source.
This is interesting as data stored in Hadoop only becomes useful when it is put into a business context. Cognos
Analytics (V11) is well suited for this role. It is a powerful tool for BI developers and business power users, enabling
the presentation of Hadoop data in a visually appealing format for executives, managers, and line of business
staffers. Big data becomes much more valuable when it can be interpreted and understood by non-technical users.
Cognos supports connecting to Hadoop using Hive, which translates code from SQL to MapReduce to get results
from Hadoop. There will always be some latency as Hive cannot change the nature of MapReduce, which
distributes processing work across Hadoop nodes. The query is split into discrete chunks of work and the results are
assembled as they are returned. SQL join conditions, which are commonplace in Cognos generated SQL, create an
additional layer of complexity for MapReduce. This further increases the query processing time and will prevent
some queries from running at all.
IBM addresses these problems with BigSQL. It works on the same Hive megastore, but produces faster and more
reliable results. BigSQL is not just about performance, but also assuring that the SQL query will run. It optimizes
2/3
3. SQL for MapReduce so that it will run faster and prevent having to modify the Cognos Framework Manager model
or hand code SQL inside of Cognos. An alternative to Hive and BigSQL is Impala, which makes similar claims to
performance.
Success with Big Data requires getting key pieces to work together. With BigInsights and BigSQL, IBM is providing
tools for facilitating Hadoop adoption, including interoperability with existing Cognos infrastructure and functionality.
Stay on top of business intelligence topics, read other Senturus blogs at: http://www.senturus.com/blog/.
Resources
Senturus webinar Running Cognos on Hadoop:
http://www.senturus.com/resources/running-cognos-on-hadoop/
Video of Hive and BigSQL performance test results:
https://developer.ibm.com/hadoop/blog/2015/10/23/hive-and-big-sql-performance-test-update/
IBM BigSQL technology sandbox demo cloud environment for Hadoop and BigSQL:
https://my.imdemocloud.com/projects/3467
Thanks to David Currie for contributing this article. David is a long-time business analytics consultant. He blogs
about business intelligence and big data at davidpcurrie.com.
Big Data / IBM Cognos
3/3