This document discusses considerations for big data analytics strategies. It covers how big data analytics have evolved from focusing on structured data and batch processing to also including real-time, multi-structured data from various sources. It emphasizes that discovery is key and requires visual exploration of granular data details. Native big data analytics platforms are needed that can handle real-time streaming data and provide self-service capabilities through customizable applications. The document provides examples of how various companies are using big data analytics for applications like cybersecurity, customer analytics, and supply chain optimization.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Four Key Considerations for your Big Data Analytics Strategy
1. Arcadia Data. Proprietary and Confidential
Four Key Considerations for
Your Big Data Analytics Strategy
February 28, 2018
2. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Featured Speakers
Steve Wooledge, VP Marketing
Steve Wooledge is responsible for overall go-to-market strategy and marketing for Arcadia Data.
He is a 15-year veteran of enterprise software in both large public companies and early-stage
start-ups and has a passion for bringing innovative technology to market. Previously, Steve was
with MapR Technologies where he ran all product, solution, and digital marketing for their
converged data platform. He previously held senior management positions in marketing at
Teradata, Aster Data (acquired by Teradata), Interwoven (acquired by HP), and Business Objects
(acquired by SAP).
John Myers, Managing Research Director, EMA
John has nearly 20 years of experience in areas related to business analytics and business
intelligence in professional services, sales consulting, product management, industry analysis,
and research. He helped organizations solve their analytics problems, whether they related to
operational platforms like customer care, billing, or applied analytical applications, such as
revenue assurance or fraud management. John established thought leadership in emerging data
management paradigms such as big data (combination of multistructured and relational data
sets) applications and NoSQL access data stores.
3. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Logistics for Today’s Webinar
Slide 3
• An archived version of the event recording will be
available at www.enterprisemanagement.com
• After the webinar, an email with a link to the recording
will be sent to you
• Log questions in the chat panel located on the right
side of your screen
• Questions will be addressed during the Q&A session
of the event
QUESTIONS
EVENT RECORDING
5. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
§ Big Data Analytics Have Evolved
§ Discovery is in the Details
§ Real Time is a Real Thing
§ Self-Service Comes in an App
§ Implementation Examples
§ Question and Answer
Four Key Considerations for Your Big Data Analytics Strategy
6. Arcadia Data. Proprietary and Confidential
Arcadia Data 2015. Proprietary and Confidential. Kaiser Permanente 11.09.15 3
Outline
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Consideration #1:
Big Data Analytics Have Evolved
11. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
“Data” and “Platforms" Have Changed. Why Haven’t BI Tools?
From To
Data
Platforms
BI Tools
rows and columns and complex multi-structured
batch and interactive and real-time
small and large volumes
many sources
internal and external
tables and documents, search indexes, events
schema on write and schema on read
commodity hardware
ETL and ELT and ELDT
data lakes
?
rows and columns
batch
smaller data volumes
limited # sources
mainly internal
tables
schema on write
super computers
ETL
RDBMS
SQL queries
extracts
cubes
BI servers
small/med scale
Why haven’t
BI tools
evolved?( )
12. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
BI Built for Data Warehouses Fails us in Data Lakes, Because…
Agile only in name
Pathway to production is slow, requires
multiple steps, data duplication and pre-
summarization. Time-to-insight is delayed.
Extract to EDW?
Summarize
on BI Server?
Replicate
Security?
Acquire
New
Hardware?
Inefficient scale
Scaling to large data comes
at reduced concurrent access
for users.
# users
datavolume
good here
bad here
Cannot handle data variety
Big data is structured + real time and
streaming + complex + unstructured
structured
multistructured
small
big
batch
streaming
external
internal
✓
✘
13. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Data Warehouse BI Tools Treat Data Lakes Like Any Other Database
1. Land / secure
data
1) High cost to deploy, govern, and manage
2) Doesn’t take advantage of distributed power and open integration of Hadoop
2. Semantic
Modeling
3. Extract to
BI Server
4. Secure
5. Performance
Modeling
6. Analytic
/ Visual
Discovery
2nd
iteration
Nth
iteration
Iterate on steps 2 - 6 in feedback loop.
Iterate on steps 2 - 6 in feedback loop.
…
Data Warehouse and Data Lakes BI Server
With traditional BI tools,
the analytics process is the same for
data warehouses and data lakes.
Too early –
use cases not
fully defined yet.
Slow, repetitive
feedback loop to
refine models.
Too late –
need to re-model
based on use cases.
7.
Production
7.
Production
14. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Native BI within Data Lakes Provides Faster Time to Value
1. Land / secure
data
2. Analytic /
Visual
Discovery
3. Semantic
Modeling 5. Production
Native BI within
Data Lake
4. Optimize
Performance
1. Land / secure
data
High cost to deploy, govern, and manage
2. Semantic
Modeling
3. Extract to
BI Server
4. Secure
5. Performance
Modeling
6. Analytic
/ Visual
Discovery
Nth
iteration
Iterate on steps 2 - 6 in feedback loop.
Data Warehouse or Data Lake BI Server
7.
Production
7.
Production
…
Faster time to value
Quick feedback loops
- One security model
- No movement of data
- Discover first, take action
second. Performance
modeling for production
deployment is optional.
15. Arcadia Data. Proprietary and Confidential
Arcadia Data 2015. Proprietary and Confidential. Kaiser Permanente 11.09.15 3
Outline
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Consideration #2:
Discovery is in the Details
19. Arcadia Data. Proprietary and Confidential
Cybersecurity Demo App
19
Net flow data
over time
Machine
learning
output
Network graph
analysis
Drill to detailed
log files
20. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
BI for Data Lakes Must be Architected for Scale and Performance
Edge Node JDBC
BI Server
Data Warehouse BI Architecture
• BI server can’t scale out
• Significant data movement, modeling, security management
Data Lake Cluster
Edge Node BI Server DataNodes
“Big Data” BI Architecture
• Edge node BI server only scales via long planning
• Performance optimizations require heavy IT intervention
• Only passing SQL with no semantic information (e.g., filters)
Data Lake Cluster
Visualization Server DataNodes + Arcadia
Native BI within Data Lake Architecture
• Scales linearly with DataNodes while retaining agility
• Semantic model is “pushed down” and distributed
• Highly optimized “based on usage” physical model
• No data movement; single security model
Data Lake Cluster
Native BI = “Lossless”, high-definition analytics
DataNodes
Browser
Browser
Browser
21. Arcadia Data. Proprietary and Confidential
Arcadia Data 2015. Proprietary and Confidential. Kaiser Permanente 11.09.15 3
Outline
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Consideration #3:
Real Time is a Real Thing
25. Arcadia Data. Proprietary and Confidential
Data Drives Market Disruption
25
Arcadia Data Streaming Visualizations
Real Time Historical
Native Access for Streaming Visualizations: Real Time + Historical
26. Arcadia Data. Proprietary and Confidential
26
No Flattening: Native BI Handles the Complex Data in Real Time
27. Arcadia Data. Proprietary and Confidential
Arcadia Data 2015. Proprietary and Confidential. Kaiser Permanente 11.09.15 3
Outline
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Consideration #4:
Self-Service Comes in an App
31. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Advanced Visualizations
and Semantic Layer
Native BI is Built from the Ground Up for Data Lakes
• In-cluster for
high performance,
high concurrency.
• Distributed BI on
every node
• No data movement
• Unified security
• Single semantic layer
Data Lake on Hadoop Cluster
Data Node Data Node Data Node
Data Node Data Node
… … … …
… … …
32. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Query acceleration for
scale, performance,
and concurrency
Native BI Leverages Intelligence Learned During Data Discovery
Ad hoc
queries
Native BI tools make
recommendations–
build these with a click.
Data Lake
• Fast query responses
• Minimal modeling
• Live acceleration (no downtime)
All granular
Data
Analytical
views
Accelerated
application queries
NATIVE BI
PLATFORM
33. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
The Result: Faster BI Analytics and Higher User Concurrency
25 35
88 105
169
427404
644
1440
120
214
366
199
379.107
687
0
200
400
600
800
1000
1200
1400
1 2 5 10 15 30
Completion Time (seconds)
# of Concurrent Jobs
Query 1 Performance Testing - Heavy Query
Arcadia Hive Impala Spark
Customer Benchmark of a Legacy BI Tool Accelerated on a Data Lake
34. Arcadia Data. Proprietary and Confidential
Arcadia Data 2015. Proprietary and Confidential. Kaiser Permanente 11.09.15 3
Outline
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Company Introduction
Solution Overview
Enterprise Features
Customer Use-Cases
Real World Examples
35. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
Customer Value of Native Visual Analytics on Big Data
Ad tech
Trade surveillance for high
velocity trade volume across
exchanges to identify and
prevent abusive trade
behavior
Cybersecurity app to capture
investigative workflows, real-
time incident response, and
guided data exploration
Developed a new SaaS self-
service analytics platform
to give their customers
better marketing
attribution
Gives global brand
managers digital
campaign intelligence
across 100+ brands
INNOVATION
REDUCE RISK
Government
Improve patient outcomes
on 10+ million members by
predicting and controlling re-
admission risk.
Turn IoT data from enterprise
data servers into meaningful
lifecycle analytics data
service
Fortune 100
Online Retailer
Fortune 50
CPG Company
36. Arcadia Data. Proprietary and Confidential
Data Drives Market DisruptionCampaign Analysis Application
36
Understand high-level metrics with the ability to
drill down to details
Augment analysis with a variety
of data types & sources such as
actual display ad images
37. Arcadia Data. Proprietary and Confidential
Data Drives Market DisruptionRetail Store Drill Down
Interactive maps allow for
easy visualization of spatial
data zooming into details
38. Arcadia Data. Proprietary and Confidential38
Faster Supply Chain Optimization
“Supply chain optimization with visual
analytics has been transformative for us.”
— Director of BI & Analytics
Use Cases
• Integrate financial and physical flow data
• Self-service visual analytics
Challenges
• One-off consulting project typically costs
hundreds of thousands of dollars and lasts 6-8 months.
Results
• Business analysts have instant access to all data –
no data movement necessary
• Visualizations make it easy to highlight anomalies and
potential issues
• Analysts, engineers, and data scientists all can
create stories directly on the data
39. Proprietary and Confidential
Outline
Company
Solution
Enterpris
Custome
§ Deployment, Management, and Configuration
• Cloudera Manager, Apache Ambari, MCS
• Integration & parcel-based installation
§ Authentication
• Kerberos, LDAPS/AD, PAM and SAML
• Single sign-on for end users
§ Authorization
• Apache Sentry, Apache Ranger, MapR integration with delegation
• Arcadia role-based privilege model (RBAC)
§ SSL for internal and external connections
§ Encryption at rest (HDFS encryption zones)
Arcadia Enterprise Fully Integrated With Leading Hadoop Platforms
43. Data Drives Market Disruption
Arcadia Data. Proprietary and Confidential
Thank You
Learn More – Resource Center
https://www.arcadiadata.com/resources
Try Arcadia Instant– Free Download
www.arcadiadata.com/Instant
Read our Blog:
https://www.arcadiadata.com/blog/
Follow Arcadia on Social:
@arcadiadata
New! EMA and Arcadia
InfoBrief:
Plotting the Course
of Your Big Data
Analytics Strategy