29. Vendor-neutral: Integration
of disparate technologies
curated and supported via
ecosystem of partners
Flexible: Flexibility built into
architecture at multiple levels
from infrastructure to Data
services to analytics libraries
Modular: Pluggable light-
weight architecture to allow
appropriate tool for data
management
Solution focused: Focus on the
full stack from the physical
infrastructure to analytics
modules to help with solution
building
OPAL (Open Platform for Analytics)
30. Hitachi Big Data Analytics Ecosystem
Data Connectors
(HSDP For Streaming
Data)
Data Services
(Pentaho Tools for Integration,
Orchestration, ETL, Indexing)
Analytic Libraries
(R, Weka, …)
System Integration & Customization Services
Visualization
(Extracts, Dashboards,
Reports)
Platform Services
(Hadoop, Storm, Spark, Cassandra, MongoDB)
Infrastructure
(Hyper-Converged Scale-Out Platform)
Cloud-Enabled
Public Private Hybrid
Telecom HealthCare Public SafetyData Center Finance
OTHER ONE
HITACHI
SOLUTIONS
Common Analytics Framework
(Developed jointly with Hitachi R&D Labs)
BIG DATA LAKE
Video, Image
& Audio
Email &
Documents
TX
Transactional
Data
TX
IT, Sensor &
Machine Logs
Social Media
Custom Solutions
31. Big Data Solution Development Challenges
A big part of solution development for analytics involves orchestration of
various moving parts in the infrastructure:
‒ Domain experts need access to appropriate data sets backed by technology stack to
gather insights based on intuitions.
‒ Time to value for a solution can be accelerated by providing an agile mechanism to
deploy from prototype to production application quickly.
Infrastructure
Installation
Component
Deployment
Application Logic
Implementation
Testing
Configure
Cluster &
Monitoring
32. Easier to Provision Infrastructure
Easier to Deploy Services
Easier to Monitor, Troubleshoot, and Validate
Easier to Develop & Integrate New Services
Opal – Big Data Analytics Made Easy & Robust
Infrastructure
Installation
Component
Deployment
Application Logic
Implementation
Testing
Configure
Cluster &
Monitoring
Before
With Opal
Analysts appear to be fairly unanimous on many of the mega trends in banking
2. FinTech companies
3. Diito as 1 takes it’s toll
4. Different expectations of millennials.
5. From boring to do I want to manage all those regs.
6. Specialist providers, margins
More of a technology focus
Analysts appear to be fairly unanimous on many of the mega trends in banking
2. FinTech companies
3. Different expectations of millennials.
4. From boring to do I want to manage all those regs.
5. Specialist providers, margins
Visa study: advice Parents 78% friends 45%, internet 40%
I’m going to introduce you to some of the features and technologies of data analytics.
To do this you cant look at only your data.
Let’s take a look at why you need to blend big data, and how Pentaho provides the best approach to doing so.
How Hitachi announced it
How analysts saw it
Pentaho 5.0 reinforces Pentaho’s mission of delivering the future of analytics. Pentaho had continued to invest in BI and DI together with over 100 new features in PDI and over 250 in the platform overall.
Continued investments in big data—new integrations—specifically with Mongo and Cassandra—and continues to shield customers from changes in the market.
Open core and pluggable platform allows us to innovate quickly.
Pentaho is battle tested with over 1200 commercial customers.
-A simplified way to utilize analytics so that anyone within and organization can meet their personal needs and specifications
-Utilization of ANY type of data from ANY source to deliver powerful insight. This also means a company can easily scale their data integration and pull in new resources, as well as blend data from existing data infrastructure.
-A simplified deployment and management system to easily embed pentaho
14
-You need to be able to handle ANY DATA. No one used to think about Twitter or FB data, so we are positioning to handle that data
-You need to be able to handle ANY ENVIRONMENT. You can utilize existing architecture, but be ready for new tools and growth
-You need to be able to handle ANY TYPE OF ANYLTIC REPORTING. Whether a data scientist, senior manager or anyone else, you can deliver powerful analytical insight
Let’s take a look at why you need to blend big data, and how Pentaho provides the best approach to doing so.
Let’s look at an example of blending at the source to better understand these points. Here we are looking at an example of Telco customer experience analytics. Customer Experience Analytics have the same goal in every industry – preventing customer churn and creating better loyalty in order to protect and grow revenue – after all, in this age of commodotisation, service and fast response to product requests become the new differentiators driving loyalty in most industries. Telco customer allegiance comes mostly from satisfaction with calling plans and the quality and availability of service. Call detail records have long been created and derived from the operational systems for access to BI and reporting systems via warehousing, but they only make up part of the picture.
(Build click 2) Quality of service changes in real time dependent on the network – was the customer able to connect, to hear, to remain connected without being dropped, etc? This network-based data is usually captured in a Big Data source that is capable of handling the volume and unstructured nature of the data, and must be blended with the Call Detail Record information to give the complete picture of a customer’s experience.
(Build click 3) With Pentaho, you can easily create architected, blended views across both the traditional Call Detail Records in the warehouse, and the network data streaming into the Big Data/NoSQLstore (MongoDB in this example) without sacrificing the governance or performance you expect. These blended views allow your analysts and customer call centers to get this accurate, of-the-minute information in real time to determine the best action to take for each customer to maximize their satisfaction and retain them as loyal customers even when outages or other service quality issues occur.
Other solutions in the market talk about blending - but it’s not apples to apples. Blending “at the glass”, i.e. blending done by end users or analysts away from the source with no knowledge of the underlying semantics, often delivers inaccurate or even completely incorrect results, as there is no way to ensure that the chosen fields being matched truly do match.
For instance, think what happens when someone matches two fields both named “revenue” in records that match on “customer”, but one is a monthly sum total and the other is a daily total – this won’t be apparent to that analyst since they are blending based on similar names. The analyst then runs a summation that adds the two together as the day’s total revenue from that customer. He/she will have unwittingly added the monthly figure into each day’s total, distorting the actual revenue generated from that customer dramatically. Your business then targets that customer as highly profitable and offers significant discounts to maintain their interest. Not only have you targeted the wrong customer and potentially ignored the real profitable customers in favor of him, but you’ve also now given him undeserved discounts. The net result lowers your revenue from this customer, and potentially loses you profitable others who were more deserving but left you in favor of competitors offering them discounts. You’ve made the wrong decision because the analytics themselves were inaccurate and incorrect.
Your only choice to avoid this with tools that blend like this is to train every user and analyst on the semantics of the data to ensure reliable results – a solution that’s largely infeasible for most organizations as it would take far too much time and expense while impacting productivity.
Even if you can take on this level of investment in training, you still face issues with the timeliness of the data, since these tools do not pull from the source systems. How do you know the data pulled is indeed the latest and therefore the most accurate on that level as well?
This “just in time”, architected blending delivers accurate big data analytics based on the blended data. You can connect to, combine, and even transform data from any of the multiple data stores in your hybrid data ecosystem into these blended views, then query the data directly via that view using the full spectrum of analytics in the Pentaho Analytics platform, including predictive analytics. Most importantly, since these blends are architected on the source data, you maintain all the rules of governance and security over the data while providing the ease of use and real time access needed for today’s agile analytics requirements. Sensitive data is kept from those who are not allowed to use or view it. You maintain full lifecycle and change management and control, so you can assure the blends being used meet changing requirements. You preserve auditability. Your blends are designed with full knowledge of the underlying data volumes and source system capabilities and constraints, preserving throughput and performance during analytic access and preventing the “query from hell”/”runaway query” problems prevalent in many federation tools. Combining the power of design via drag-and-drop across all data sources, including schemas generated on read from big data sources, with knowledge of the full data semantics - the real meaning, cardinality, and match-ability of fields and values in the data - means your business gets accurate results in its analytics, leading to optimized decisions and actions that can really impact your business positively and improve your results.
Let’s take a look at why you need to blend big data, and how Pentaho provides the best approach to doing so.
No longer about the DBA.
Let’s take a look at why you need to blend big data, and how Pentaho provides the best approach to doing so.